progressive-llm/LORA_TARGET_MODULES.md
2025-07-10 18:09:14 +09:00

124 lines
No EOL
3.4 KiB
Markdown

# LoRA Target Modules Reference
This document provides the correct target module names for different model architectures when using LoRA (Low-Rank Adaptation).
## Model Architecture Detection
Use the inspection script to find correct target modules:
```bash
# In the nix development environment
python /home/centra/dev/pnn/inspect_conv1d_model.py [model_name]
```
## Common Model Architectures
### GPT-2 / DialoGPT Models
- **Model Type**: GPT2LMHeadModel
- **Layer Type**: Conv1D (not Linear!)
- **Base Model**: microsoft/DialoGPT-small, gpt2, gpt2-medium, gpt2-large, gpt2-xl
#### Attention Modules
- `c_attn` - Combined query, key, value projection (nf=3*hidden_size)
- `c_proj` - Output projection
#### MLP Modules
- `mlp.c_fc` - Feed-forward up projection
- `mlp.c_proj` - Feed-forward down projection
#### Recommended Configurations
```yaml
# Basic stage (attention only)
target_modules: ["c_attn", "c_proj"]
# Advanced stage (attention + MLP)
target_modules: ["c_attn", "c_proj", "mlp.c_fc", "mlp.c_proj"]
```
### LLaMA Models
- **Model Type**: LlamaForCausalLM
- **Layer Type**: Linear
- **Base Model**: meta-llama/Llama-2-7b-hf, meta-llama/Llama-3.2-8B
#### Attention Modules
- `q_proj` - Query projection
- `k_proj` - Key projection
- `v_proj` - Value projection
- `o_proj` - Output projection
#### MLP Modules
- `gate_proj` - Gate projection
- `up_proj` - Up projection
- `down_proj` - Down projection
#### Recommended Configurations
```yaml
# Basic stage (attention only)
target_modules: ["q_proj", "v_proj"]
# Advanced stage (attention + MLP)
target_modules: ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
```
### Mistral Models
- **Model Type**: MistralForCausalLM
- **Layer Type**: Linear
- **Base Model**: mistralai/Mistral-7B-v0.1
#### Target Modules (same as LLaMA)
```yaml
target_modules: ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
```
### Qwen Models
- **Model Type**: QWenLMHeadModel
- **Layer Type**: Linear
- **Base Model**: Qwen/Qwen-7B
#### Target Modules
```yaml
target_modules: ["c_attn", "c_proj", "w1", "w2"]
```
## Important Notes
1. **Conv1D vs Linear**: GPT-2 based models use `Conv1D` layers, not `Linear` layers
2. **Module Patterns**: Use simple patterns like `"c_attn"` rather than full paths like `"transformer.h.0.attn.c_attn"`
3. **Testing**: Always test your configuration before training by creating a PEFT model
4. **Architecture Variations**: Different model families use different naming conventions
## Troubleshooting
### Error: "Target module not found"
- Run the inspection script to find actual module names
- Check if the model uses Conv1D or Linear layers
- Verify the module naming pattern for your specific model
### Error: "No trainable parameters"
- Ensure target modules exist in the model
- Check that the module names match exactly
- Verify the model architecture is supported by PEFT
## Testing Your Configuration
```python
from peft import get_peft_model, LoraConfig, TaskType
# Test configuration
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=8,
lora_alpha=16,
lora_dropout=0.1,
target_modules=["c_attn", "c_proj"], # Your target modules
bias="none"
)
# Try to create PEFT model
try:
peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()
print("✓ Configuration works!")
except Exception as e:
print(f"✗ Configuration failed: {e}")
```