progressive-llm/docs/README.md
2025-07-10 22:25:11 +09:00

37 lines
No EOL
727 B
Markdown

# Progressive LLM Training Documentation
## Setup
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
```
## Training
### Single GPU
```bash
uv run scripts/train_progressive.py --config config/training_config_gemma3_1b.yaml
```
### 8 GPUs
```bash
./scripts/train_gemma3_1b_8gpu.sh --strategy deepspeed
```
## Configuration
- `config/training_config_gemma3_1b.yaml` - Single GPU
- `config/training_config_gemma3_1b_8gpu_deepspeed.yaml` - 8 GPUs
## Environment
Copy `.env.example` to `.env` and set:
- `HF_TOKEN` - HuggingFace token
- `WANDB_API_KEY` - W&B API key
## Troubleshooting
- Reduce `per_device_batch_size` for memory issues
- `export NCCL_DEBUG=INFO` for NCCL errors
- `nvidia-smi` to check GPUs