Key strength: Integrates FSDP2 with approximately 7% lower per-GPU memory usage and 1.5% performance improvement over FSDP1.
Top feature: Composable Parallelism
Best for: Training Large Language Models
Pricing: open-source
Quick start: Install TorchTitan