progressive-llm/config/README.md
2025-07-10 18:09:14 +09:00

85 lines
No EOL
2.9 KiB
Markdown

# Training Configuration Files
This directory contains configuration files for different model sizes and use cases.
## Available Configurations
### Small Models (Testing)
- `training_config.yaml` - Default configuration for small models (DialoGPT-small)
- Memory: ~1GB VRAM
- Batch size: 8
- No quantization
### Medium Models (8B)
- `training_config_large.yaml` - Configuration for 8B models (Llama-3.2-8B)
- Memory: ~12GB VRAM with 4-bit quantization
- Batch size: 1, gradient accumulation: 16-64
- 4-bit quantization enabled
### Large Models (13B)
- `training_config_13b.yaml` - Configuration for 13B models
- Memory: ~16GB VRAM with 4-bit quantization
- Batch size: 1, gradient accumulation: 32-128
- Higher LoRA ranks (32-128)
### Extra Large Models (70B)
- `training_config_70b.yaml` - Configuration for 70B models
- Memory: ~40GB+ VRAM with 4-bit quantization
- Batch size: 1, gradient accumulation: 64-256
- Maximum LoRA ranks (64-256)
- Multi-GPU support with FSDP
## Configuration Parameters
### Model Settings
- `load_in_4bit`: Enable 4-bit quantization (recommended for large models)
- `gradient_checkpointing`: Trade compute for memory
- `use_flash_attention_2`: Faster attention computation if available
### Adapter Settings
- `r`: LoRA rank (higher = more parameters but better capacity)
- `lora_alpha`: LoRA scaling factor (typically 2x the rank)
- `init_lora_weights`: Set to `true` for identity initialization
### Training Settings
- `per_device_batch_size`: Usually 1 for large models
- `gradient_accumulation_steps`: Effective batch size multiplier
- `learning_rate`: Lower for larger models
- `bf16`: Use bfloat16 for better numerical stability
## Usage
```bash
# For 8B models
python scripts/train_progressive.py --config config/training_config_large.yaml
# For 13B models
python scripts/train_progressive.py --config config/training_config_13b.yaml
# For 70B models (requires multiple GPUs)
python scripts/train_progressive.py --config config/training_config_70b.yaml
```
## Memory Requirements
| Model Size | VRAM (4-bit) | VRAM (16-bit) | GPUs Recommended |
|------------|--------------|---------------|------------------|
| 8B | 12-16GB | 32GB | 1x RTX 4090 |
| 13B | 16-20GB | 52GB | 1x A100 |
| 70B | 40-60GB | 140GB | 2x A100 |
## Tips for Large Models
1. **Start with smaller models** to validate your approach
2. **Use gradient checkpointing** to reduce memory usage
3. **Monitor GPU memory** during training
4. **Use lower learning rates** for stability
5. **Consider multi-GPU setup** for 70B+ models
6. **Enable flash attention** if available for speed
## Troubleshooting
- **OOM errors**: Reduce batch size or enable gradient checkpointing
- **Slow training**: Enable flash attention, use bf16
- **Poor convergence**: Adjust learning rate or warmup steps
- **Multi-GPU issues**: Check FSDP configuration