progressive-llm/config/README.md
2025-07-10 18:09:14 +09:00

2.9 KiB

Training Configuration Files

This directory contains configuration files for different model sizes and use cases.

Available Configurations

Small Models (Testing)

  • training_config.yaml - Default configuration for small models (DialoGPT-small)
    • Memory: ~1GB VRAM
    • Batch size: 8
    • No quantization

Medium Models (8B)

  • training_config_large.yaml - Configuration for 8B models (Llama-3.2-8B)
    • Memory: ~12GB VRAM with 4-bit quantization
    • Batch size: 1, gradient accumulation: 16-64
    • 4-bit quantization enabled

Large Models (13B)

  • training_config_13b.yaml - Configuration for 13B models
    • Memory: ~16GB VRAM with 4-bit quantization
    • Batch size: 1, gradient accumulation: 32-128
    • Higher LoRA ranks (32-128)

Extra Large Models (70B)

  • training_config_70b.yaml - Configuration for 70B models
    • Memory: ~40GB+ VRAM with 4-bit quantization
    • Batch size: 1, gradient accumulation: 64-256
    • Maximum LoRA ranks (64-256)
    • Multi-GPU support with FSDP

Configuration Parameters

Model Settings

  • load_in_4bit: Enable 4-bit quantization (recommended for large models)
  • gradient_checkpointing: Trade compute for memory
  • use_flash_attention_2: Faster attention computation if available

Adapter Settings

  • r: LoRA rank (higher = more parameters but better capacity)
  • lora_alpha: LoRA scaling factor (typically 2x the rank)
  • init_lora_weights: Set to true for identity initialization

Training Settings

  • per_device_batch_size: Usually 1 for large models
  • gradient_accumulation_steps: Effective batch size multiplier
  • learning_rate: Lower for larger models
  • bf16: Use bfloat16 for better numerical stability

Usage

# For 8B models
python scripts/train_progressive.py --config config/training_config_large.yaml

# For 13B models
python scripts/train_progressive.py --config config/training_config_13b.yaml

# For 70B models (requires multiple GPUs)
python scripts/train_progressive.py --config config/training_config_70b.yaml

Memory Requirements

Model Size VRAM (4-bit) VRAM (16-bit) GPUs Recommended
8B 12-16GB 32GB 1x RTX 4090
13B 16-20GB 52GB 1x A100
70B 40-60GB 140GB 2x A100

Tips for Large Models

  1. Start with smaller models to validate your approach
  2. Use gradient checkpointing to reduce memory usage
  3. Monitor GPU memory during training
  4. Use lower learning rates for stability
  5. Consider multi-GPU setup for 70B+ models
  6. Enable flash attention if available for speed

Troubleshooting

  • OOM errors: Reduce batch size or enable gradient checkpointing
  • Slow training: Enable flash attention, use bf16
  • Poor convergence: Adjust learning rate or warmup steps
  • Multi-GPU issues: Check FSDP configuration