124 lines
No EOL
3.4 KiB
Markdown
124 lines
No EOL
3.4 KiB
Markdown
# LoRA Target Modules Reference
|
|
|
|
This document provides the correct target module names for different model architectures when using LoRA (Low-Rank Adaptation).
|
|
|
|
## Model Architecture Detection
|
|
|
|
Use the inspection script to find correct target modules:
|
|
|
|
```bash
|
|
# In the nix development environment
|
|
python /home/centra/dev/pnn/inspect_conv1d_model.py [model_name]
|
|
```
|
|
|
|
## Common Model Architectures
|
|
|
|
### GPT-2 / DialoGPT Models
|
|
- **Model Type**: GPT2LMHeadModel
|
|
- **Layer Type**: Conv1D (not Linear!)
|
|
- **Base Model**: microsoft/DialoGPT-small, gpt2, gpt2-medium, gpt2-large, gpt2-xl
|
|
|
|
#### Attention Modules
|
|
- `c_attn` - Combined query, key, value projection (nf=3*hidden_size)
|
|
- `c_proj` - Output projection
|
|
|
|
#### MLP Modules
|
|
- `mlp.c_fc` - Feed-forward up projection
|
|
- `mlp.c_proj` - Feed-forward down projection
|
|
|
|
#### Recommended Configurations
|
|
```yaml
|
|
# Basic stage (attention only)
|
|
target_modules: ["c_attn", "c_proj"]
|
|
|
|
# Advanced stage (attention + MLP)
|
|
target_modules: ["c_attn", "c_proj", "mlp.c_fc", "mlp.c_proj"]
|
|
```
|
|
|
|
### LLaMA Models
|
|
- **Model Type**: LlamaForCausalLM
|
|
- **Layer Type**: Linear
|
|
- **Base Model**: meta-llama/Llama-2-7b-hf, meta-llama/Llama-3.2-8B
|
|
|
|
#### Attention Modules
|
|
- `q_proj` - Query projection
|
|
- `k_proj` - Key projection
|
|
- `v_proj` - Value projection
|
|
- `o_proj` - Output projection
|
|
|
|
#### MLP Modules
|
|
- `gate_proj` - Gate projection
|
|
- `up_proj` - Up projection
|
|
- `down_proj` - Down projection
|
|
|
|
#### Recommended Configurations
|
|
```yaml
|
|
# Basic stage (attention only)
|
|
target_modules: ["q_proj", "v_proj"]
|
|
|
|
# Advanced stage (attention + MLP)
|
|
target_modules: ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
|
|
```
|
|
|
|
### Mistral Models
|
|
- **Model Type**: MistralForCausalLM
|
|
- **Layer Type**: Linear
|
|
- **Base Model**: mistralai/Mistral-7B-v0.1
|
|
|
|
#### Target Modules (same as LLaMA)
|
|
```yaml
|
|
target_modules: ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
|
|
```
|
|
|
|
### Qwen Models
|
|
- **Model Type**: QWenLMHeadModel
|
|
- **Layer Type**: Linear
|
|
- **Base Model**: Qwen/Qwen-7B
|
|
|
|
#### Target Modules
|
|
```yaml
|
|
target_modules: ["c_attn", "c_proj", "w1", "w2"]
|
|
```
|
|
|
|
## Important Notes
|
|
|
|
1. **Conv1D vs Linear**: GPT-2 based models use `Conv1D` layers, not `Linear` layers
|
|
2. **Module Patterns**: Use simple patterns like `"c_attn"` rather than full paths like `"transformer.h.0.attn.c_attn"`
|
|
3. **Testing**: Always test your configuration before training by creating a PEFT model
|
|
4. **Architecture Variations**: Different model families use different naming conventions
|
|
|
|
## Troubleshooting
|
|
|
|
### Error: "Target module not found"
|
|
- Run the inspection script to find actual module names
|
|
- Check if the model uses Conv1D or Linear layers
|
|
- Verify the module naming pattern for your specific model
|
|
|
|
### Error: "No trainable parameters"
|
|
- Ensure target modules exist in the model
|
|
- Check that the module names match exactly
|
|
- Verify the model architecture is supported by PEFT
|
|
|
|
## Testing Your Configuration
|
|
|
|
```python
|
|
from peft import get_peft_model, LoraConfig, TaskType
|
|
|
|
# Test configuration
|
|
lora_config = LoraConfig(
|
|
task_type=TaskType.CAUSAL_LM,
|
|
r=8,
|
|
lora_alpha=16,
|
|
lora_dropout=0.1,
|
|
target_modules=["c_attn", "c_proj"], # Your target modules
|
|
bias="none"
|
|
)
|
|
|
|
# Try to create PEFT model
|
|
try:
|
|
peft_model = get_peft_model(model, lora_config)
|
|
peft_model.print_trainable_parameters()
|
|
print("✓ Configuration works!")
|
|
except Exception as e:
|
|
print(f"✗ Configuration failed: {e}")
|
|
``` |