Torchtitan
TorchTitan is an open-source platform built natively on PyTorch for distributed training of large language models (LLMs). It supports composable parallelism techniques including data, tensor, pipeline, and expert parallelism, enabling scalable pre-training from experimentation to production. The platform integrates advanced features such as elastic scaling, checkpointing, logging, and debugging tools to facilitate efficient training workflows. TorchTitan also incorporates optimizations like Float8 training and SymmetricMemory to enhance hardware utilization. The platform is designed as a minimal clean-room implementation that allows developers to apply scaling with minimal changes to model code. It supports training of models in the Llama 3.1 family ranging from 8 billion to 405 billion parameters. TorchTitan includes components such as FSDP2 for 1D parallelism, Hybrid Sharded Data Parallel (HSDP) for 2D scaling, and DTensor-based checkpointing. It also provides a checkpointable data loader with support for the C4 dataset and Hugging Face tokenizers.
TorchTitan is an open-source PyTorch-native platform for distributed training of large language models with multi-dimensional composable parallelism.
Training Large Language Models
Researchers and developers can train LLMs such as the Llama 3.1 family at scale using PyTorch-native distributed training with composable parallelism.
Experimentation and Production Deployment
Enables rapid experimentation with custom training recipes and seamless scaling to production clusters with multi-GPU setups.