# LoRA Target Modules Reference This document provides the correct target module names for different model architectures when using LoRA (Low-Rank Adaptation). ## Model Architecture Detection Use the inspection script to find correct target modules: ```bash # In the nix development environment python /home/centra/dev/pnn/inspect_conv1d_model.py [model_name] ``` ## Common Model Architectures ### GPT-2 / DialoGPT Models - **Model Type**: GPT2LMHeadModel - **Layer Type**: Conv1D (not Linear!) - **Base Model**: microsoft/DialoGPT-small, gpt2, gpt2-medium, gpt2-large, gpt2-xl #### Attention Modules - `c_attn` - Combined query, key, value projection (nf=3*hidden_size) - `c_proj` - Output projection #### MLP Modules - `mlp.c_fc` - Feed-forward up projection - `mlp.c_proj` - Feed-forward down projection #### Recommended Configurations ```yaml # Basic stage (attention only) target_modules: ["c_attn", "c_proj"] # Advanced stage (attention + MLP) target_modules: ["c_attn", "c_proj", "mlp.c_fc", "mlp.c_proj"] ``` ### LLaMA Models - **Model Type**: LlamaForCausalLM - **Layer Type**: Linear - **Base Model**: meta-llama/Llama-2-7b-hf, meta-llama/Llama-3.2-8B #### Attention Modules - `q_proj` - Query projection - `k_proj` - Key projection - `v_proj` - Value projection - `o_proj` - Output projection #### MLP Modules - `gate_proj` - Gate projection - `up_proj` - Up projection - `down_proj` - Down projection #### Recommended Configurations ```yaml # Basic stage (attention only) target_modules: ["q_proj", "v_proj"] # Advanced stage (attention + MLP) target_modules: ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"] ``` ### Mistral Models - **Model Type**: MistralForCausalLM - **Layer Type**: Linear - **Base Model**: mistralai/Mistral-7B-v0.1 #### Target Modules (same as LLaMA) ```yaml target_modules: ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"] ``` ### Qwen Models - **Model Type**: QWenLMHeadModel - **Layer Type**: Linear - **Base Model**: Qwen/Qwen-7B #### Target Modules ```yaml target_modules: ["c_attn", "c_proj", "w1", "w2"] ``` ## Important Notes 1. **Conv1D vs Linear**: GPT-2 based models use `Conv1D` layers, not `Linear` layers 2. **Module Patterns**: Use simple patterns like `"c_attn"` rather than full paths like `"transformer.h.0.attn.c_attn"` 3. **Testing**: Always test your configuration before training by creating a PEFT model 4. **Architecture Variations**: Different model families use different naming conventions ## Troubleshooting ### Error: "Target module not found" - Run the inspection script to find actual module names - Check if the model uses Conv1D or Linear layers - Verify the module naming pattern for your specific model ### Error: "No trainable parameters" - Ensure target modules exist in the model - Check that the module names match exactly - Verify the model architecture is supported by PEFT ## Testing Your Configuration ```python from peft import get_peft_model, LoraConfig, TaskType # Test configuration lora_config = LoraConfig( task_type=TaskType.CAUSAL_LM, r=8, lora_alpha=16, lora_dropout=0.1, target_modules=["c_attn", "c_proj"], # Your target modules bias="none" ) # Try to create PEFT model try: peft_model = get_peft_model(model, lora_config) peft_model.print_trainable_parameters() print("✓ Configuration works!") except Exception as e: print(f"✗ Configuration failed: {e}") ```