- Supports scaling across 1 to over 1000 GPUs/TPUs with multi-GPU optimization.
- Includes memory-saving features such as Flash Attention, FSDP, LoRA, and QLoRA.
- Provides YAML configuration recipes for over 20 large language models and custom datasets.
- Codebase is readable and beginner-friendly with no abstractions.