progressive-llm/LORA_TARGET_MODULES.md

# LoRA Target Modules Reference

This document provides the correct target module names for different model architectures when using LoRA (Low-Rank Adaptation).

## Model Architecture Detection

Use the inspection script to find correct target modules:

```bash
# In the nix development environment
python /home/centra/dev/pnn/inspect_conv1d_model.py [model_name]
```

## Common Model Architectures

### GPT-2 / DialoGPT Models
- **Model Type**: GPT2LMHeadModel
- **Layer Type**: Conv1D (not Linear!)
- **Base Model**: microsoft/DialoGPT-small, gpt2, gpt2-medium, gpt2-large, gpt2-xl

#### Attention Modules
- `c_attn` - Combined query, key, value projection (nf=3*hidden_size)
- `c_proj` - Output projection

#### MLP Modules
- `mlp.c_fc` - Feed-forward up projection
- `mlp.c_proj` - Feed-forward down projection

#### Recommended Configurations
```yaml
# Basic stage (attention only)
target_modules: ["c_attn", "c_proj"]

# Advanced stage (attention + MLP)
target_modules: ["c_attn", "c_proj", "mlp.c_fc", "mlp.c_proj"]
```

### LLaMA Models
- **Model Type**: LlamaForCausalLM
- **Layer Type**: Linear
- **Base Model**: meta-llama/Llama-2-7b-hf, meta-llama/Llama-3.2-8B

#### Attention Modules
- `q_proj` - Query projection
- `k_proj` - Key projection
- `v_proj` - Value projection
- `o_proj` - Output projection

#### MLP Modules
- `gate_proj` - Gate projection
- `up_proj` - Up projection
- `down_proj` - Down projection

#### Recommended Configurations
```yaml
# Basic stage (attention only)
target_modules: ["q_proj", "v_proj"]

# Advanced stage (attention + MLP)
target_modules: ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
```

### Mistral Models
- **Model Type**: MistralForCausalLM
- **Layer Type**: Linear
- **Base Model**: mistralai/Mistral-7B-v0.1

#### Target Modules (same as LLaMA)
```yaml
target_modules: ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
```

### Qwen Models
- **Model Type**: QWenLMHeadModel
- **Layer Type**: Linear
- **Base Model**: Qwen/Qwen-7B

#### Target Modules
```yaml
target_modules: ["c_attn", "c_proj", "w1", "w2"]
```

## Important Notes

1. **Conv1D vs Linear**: GPT-2 based models use `Conv1D` layers, not `Linear` layers
2. **Module Patterns**: Use simple patterns like `"c_attn"` rather than full paths like `"transformer.h.0.attn.c_attn"`
3. **Testing**: Always test your configuration before training by creating a PEFT model
4. **Architecture Variations**: Different model families use different naming conventions

## Troubleshooting

### Error: "Target module not found"
- Run the inspection script to find actual module names
- Check if the model uses Conv1D or Linear layers
- Verify the module naming pattern for your specific model

### Error: "No trainable parameters"
- Ensure target modules exist in the model
- Check that the module names match exactly
- Verify the model architecture is supported by PEFT

## Testing Your Configuration

```python
from peft import get_peft_model, LoraConfig, TaskType

# Test configuration
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=8,
    lora_alpha=16,
    lora_dropout=0.1,
    target_modules=["c_attn", "c_proj"],  # Your target modules
    bias="none"
)

# Try to create PEFT model
try:
    peft_model = get_peft_model(model, lora_config)
    peft_model.print_trainable_parameters()
    print("✓ Configuration works!")
except Exception as e:
    print(f"✗ Configuration failed: {e}")
```