# Progressive LLM Training Documentation ## Setup ```bash curl -LsSf https://astral.sh/uv/install.sh | sh uv sync ``` ## Training ### Single GPU ```bash uv run scripts/train_progressive.py --config config/training_config_gemma3_1b.yaml ``` ### 8 GPUs ```bash ./scripts/train_gemma3_1b_8gpu.sh --strategy deepspeed ``` ## Configuration - `config/training_config_gemma3_1b.yaml` - Single GPU - `config/training_config_gemma3_1b_8gpu_deepspeed.yaml` - 8 GPUs ## Environment Copy `.env.example` to `.env` and set: - `HF_TOKEN` - HuggingFace token - `WANDB_API_KEY` - W&B API key ## Troubleshooting - Reduce `per_device_batch_size` for memory issues - `export NCCL_DEBUG=INFO` for NCCL errors - `nvidia-smi` to check GPUs