2025-07-10 18:09:14 +09:00

3.4 KiB

Raw Blame History

LoRA Target Modules Reference

This document provides the correct target module names for different model architectures when using LoRA (Low-Rank Adaptation).

Model Architecture Detection

Use the inspection script to find correct target modules:

# In the nix development environment
python /home/centra/dev/pnn/inspect_conv1d_model.py [model_name]

Common Model Architectures

GPT-2 / DialoGPT Models

Model Type: GPT2LMHeadModel
Layer Type: Conv1D (not Linear!)
Base Model: microsoft/DialoGPT-small, gpt2, gpt2-medium, gpt2-large, gpt2-xl

Attention Modules

c_attn - Combined query, key, value projection (nf=3*hidden_size)
c_proj - Output projection

MLP Modules

mlp.c_fc - Feed-forward up projection
mlp.c_proj - Feed-forward down projection

Recommended Configurations

# Basic stage (attention only)
target_modules: ["c_attn", "c_proj"]

# Advanced stage (attention + MLP)
target_modules: ["c_attn", "c_proj", "mlp.c_fc", "mlp.c_proj"]

LLaMA Models

Model Type: LlamaForCausalLM
Layer Type: Linear
Base Model: meta-llama/Llama-2-7b-hf, meta-llama/Llama-3.2-8B

Attention Modules

q_proj - Query projection
k_proj - Key projection
v_proj - Value projection
o_proj - Output projection

MLP Modules

gate_proj - Gate projection
up_proj - Up projection
down_proj - Down projection

Recommended Configurations

# Basic stage (attention only)
target_modules: ["q_proj", "v_proj"]

# Advanced stage (attention + MLP)
target_modules: ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]

Mistral Models

Model Type: MistralForCausalLM
Layer Type: Linear
Base Model: mistralai/Mistral-7B-v0.1

Target Modules (same as LLaMA)

target_modules: ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]

Qwen Models

Model Type: QWenLMHeadModel
Layer Type: Linear
Base Model: Qwen/Qwen-7B

Target Modules

target_modules: ["c_attn", "c_proj", "w1", "w2"]

Important Notes

Conv1D vs Linear: GPT-2 based models use Conv1D layers, not Linear layers
Module Patterns: Use simple patterns like "c_attn" rather than full paths like "transformer.h.0.attn.c_attn"
Testing: Always test your configuration before training by creating a PEFT model
Architecture Variations: Different model families use different naming conventions

Troubleshooting

Error: "Target module not found"

Run the inspection script to find actual module names
Check if the model uses Conv1D or Linear layers
Verify the module naming pattern for your specific model

Error: "No trainable parameters"

Ensure target modules exist in the model
Check that the module names match exactly
Verify the model architecture is supported by PEFT

Testing Your Configuration

from peft import get_peft_model, LoraConfig, TaskType

# Test configuration
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=8,
    lora_alpha=16,
    lora_dropout=0.1,
    target_modules=["c_attn", "c_proj"],  # Your target modules
    bias="none"
)

# Try to create PEFT model
try:
    peft_model = get_peft_model(model, lora_config)
    peft_model.print_trainable_parameters()
    print("✓ Configuration works!")
except Exception as e:
    print(f"✗ Configuration failed: {e}")

3.4 KiB Raw Blame History

LoRA Target Modules Reference

Model Architecture Detection

Common Model Architectures

GPT-2 / DialoGPT Models

Attention Modules

MLP Modules

Recommended Configurations

LLaMA Models

Attention Modules

MLP Modules

Recommended Configurations

Mistral Models

Target Modules (same as LLaMA)

Qwen Models

Target Modules

Important Notes

Troubleshooting

Error: "Target module not found"

Error: "No trainable parameters"

Testing Your Configuration

3.4 KiB

Raw Blame History