3.4 KiB
3.4 KiB
LoRA Target Modules Reference
This document provides the correct target module names for different model architectures when using LoRA (Low-Rank Adaptation).
Model Architecture Detection
Use the inspection script to find correct target modules:
# In the nix development environment
python /home/centra/dev/pnn/inspect_conv1d_model.py [model_name]
Common Model Architectures
GPT-2 / DialoGPT Models
- Model Type: GPT2LMHeadModel
- Layer Type: Conv1D (not Linear!)
- Base Model: microsoft/DialoGPT-small, gpt2, gpt2-medium, gpt2-large, gpt2-xl
Attention Modules
c_attn- Combined query, key, value projection (nf=3*hidden_size)c_proj- Output projection
MLP Modules
mlp.c_fc- Feed-forward up projectionmlp.c_proj- Feed-forward down projection
Recommended Configurations
# Basic stage (attention only)
target_modules: ["c_attn", "c_proj"]
# Advanced stage (attention + MLP)
target_modules: ["c_attn", "c_proj", "mlp.c_fc", "mlp.c_proj"]
LLaMA Models
- Model Type: LlamaForCausalLM
- Layer Type: Linear
- Base Model: meta-llama/Llama-2-7b-hf, meta-llama/Llama-3.2-8B
Attention Modules
q_proj- Query projectionk_proj- Key projectionv_proj- Value projectiono_proj- Output projection
MLP Modules
gate_proj- Gate projectionup_proj- Up projectiondown_proj- Down projection
Recommended Configurations
# Basic stage (attention only)
target_modules: ["q_proj", "v_proj"]
# Advanced stage (attention + MLP)
target_modules: ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
Mistral Models
- Model Type: MistralForCausalLM
- Layer Type: Linear
- Base Model: mistralai/Mistral-7B-v0.1
Target Modules (same as LLaMA)
target_modules: ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
Qwen Models
- Model Type: QWenLMHeadModel
- Layer Type: Linear
- Base Model: Qwen/Qwen-7B
Target Modules
target_modules: ["c_attn", "c_proj", "w1", "w2"]
Important Notes
- Conv1D vs Linear: GPT-2 based models use
Conv1Dlayers, notLinearlayers - Module Patterns: Use simple patterns like
"c_attn"rather than full paths like"transformer.h.0.attn.c_attn" - Testing: Always test your configuration before training by creating a PEFT model
- Architecture Variations: Different model families use different naming conventions
Troubleshooting
Error: "Target module not found"
- Run the inspection script to find actual module names
- Check if the model uses Conv1D or Linear layers
- Verify the module naming pattern for your specific model
Error: "No trainable parameters"
- Ensure target modules exist in the model
- Check that the module names match exactly
- Verify the model architecture is supported by PEFT
Testing Your Configuration
from peft import get_peft_model, LoraConfig, TaskType
# Test configuration
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=8,
lora_alpha=16,
lora_dropout=0.1,
target_modules=["c_attn", "c_proj"], # Your target modules
bias="none"
)
# Try to create PEFT model
try:
peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()
print("✓ Configuration works!")
except Exception as e:
print(f"✗ Configuration failed: {e}")