DeepSpeed
Offers advanced optimizations for large-scale model training with ZeRO optimizations and mixed precision support.
Fairseq
Facebook AI’s sequence modeling toolkit with strong support for transformer models and distributed training.
T5X
Google’s scalable training framework focused on T5 and other large language models with TPU support.
Colossal-AI
An open-source system designed for efficient large model training with flexible parallelism strategies.