37 lines
No EOL
727 B
Markdown
37 lines
No EOL
727 B
Markdown
# Progressive LLM Training Documentation
|
|
|
|
## Setup
|
|
|
|
```bash
|
|
curl -LsSf https://astral.sh/uv/install.sh | sh
|
|
uv sync
|
|
```
|
|
|
|
## Training
|
|
|
|
### Single GPU
|
|
```bash
|
|
uv run scripts/train_progressive.py --config config/training_config_gemma3_1b.yaml
|
|
```
|
|
|
|
### 8 GPUs
|
|
```bash
|
|
./scripts/train_gemma3_1b_8gpu.sh --strategy deepspeed
|
|
```
|
|
|
|
## Configuration
|
|
|
|
- `config/training_config_gemma3_1b.yaml` - Single GPU
|
|
- `config/training_config_gemma3_1b_8gpu_deepspeed.yaml` - 8 GPUs
|
|
|
|
## Environment
|
|
|
|
Copy `.env.example` to `.env` and set:
|
|
- `HF_TOKEN` - HuggingFace token
|
|
- `WANDB_API_KEY` - W&B API key
|
|
|
|
## Troubleshooting
|
|
|
|
- Reduce `per_device_batch_size` for memory issues
|
|
- `export NCCL_DEBUG=INFO` for NCCL errors
|
|
- `nvidia-smi` to check GPUs |