- Updated training config for Gemma3 1B with CPU offload support
- Enhanced progressive_model.py with better error handling
- Added support for Mixture-of-Thoughts dataset
- Improved compatibility across different server environments
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>