progressive-llm/LORA_TARGET_MODULES.md
2025-07-10 18:09:14 +09:00

3.4 KiB

LoRA Target Modules Reference

This document provides the correct target module names for different model architectures when using LoRA (Low-Rank Adaptation).

Model Architecture Detection

Use the inspection script to find correct target modules:

# In the nix development environment
python /home/centra/dev/pnn/inspect_conv1d_model.py [model_name]

Common Model Architectures

GPT-2 / DialoGPT Models

  • Model Type: GPT2LMHeadModel
  • Layer Type: Conv1D (not Linear!)
  • Base Model: microsoft/DialoGPT-small, gpt2, gpt2-medium, gpt2-large, gpt2-xl

Attention Modules

  • c_attn - Combined query, key, value projection (nf=3*hidden_size)
  • c_proj - Output projection

MLP Modules

  • mlp.c_fc - Feed-forward up projection
  • mlp.c_proj - Feed-forward down projection
# Basic stage (attention only)
target_modules: ["c_attn", "c_proj"]

# Advanced stage (attention + MLP)
target_modules: ["c_attn", "c_proj", "mlp.c_fc", "mlp.c_proj"]

LLaMA Models

  • Model Type: LlamaForCausalLM
  • Layer Type: Linear
  • Base Model: meta-llama/Llama-2-7b-hf, meta-llama/Llama-3.2-8B

Attention Modules

  • q_proj - Query projection
  • k_proj - Key projection
  • v_proj - Value projection
  • o_proj - Output projection

MLP Modules

  • gate_proj - Gate projection
  • up_proj - Up projection
  • down_proj - Down projection
# Basic stage (attention only)
target_modules: ["q_proj", "v_proj"]

# Advanced stage (attention + MLP)
target_modules: ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]

Mistral Models

  • Model Type: MistralForCausalLM
  • Layer Type: Linear
  • Base Model: mistralai/Mistral-7B-v0.1

Target Modules (same as LLaMA)

target_modules: ["q_proj", "v_proj", "k_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]

Qwen Models

  • Model Type: QWenLMHeadModel
  • Layer Type: Linear
  • Base Model: Qwen/Qwen-7B

Target Modules

target_modules: ["c_attn", "c_proj", "w1", "w2"]

Important Notes

  1. Conv1D vs Linear: GPT-2 based models use Conv1D layers, not Linear layers
  2. Module Patterns: Use simple patterns like "c_attn" rather than full paths like "transformer.h.0.attn.c_attn"
  3. Testing: Always test your configuration before training by creating a PEFT model
  4. Architecture Variations: Different model families use different naming conventions

Troubleshooting

Error: "Target module not found"

  • Run the inspection script to find actual module names
  • Check if the model uses Conv1D or Linear layers
  • Verify the module naming pattern for your specific model

Error: "No trainable parameters"

  • Ensure target modules exist in the model
  • Check that the module names match exactly
  • Verify the model architecture is supported by PEFT

Testing Your Configuration

from peft import get_peft_model, LoraConfig, TaskType

# Test configuration
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=8,
    lora_alpha=16,
    lora_dropout=0.1,
    target_modules=["c_attn", "c_proj"],  # Your target modules
    bias="none"
)

# Try to create PEFT model
try:
    peft_model = get_peft_model(model, lora_config)
    peft_model.print_trainable_parameters()
    print("✓ Configuration works!")
except Exception as e:
    print(f"✗ Configuration failed: {e}")