Key Features - torchtitan

✨

Supports 4D parallelism including data, tensor, pipeline, and expert parallelism in a modular and composable manner.

✨

Enables elastic scaling to adapt to varying computational resources and includes mechanisms to handle rank failures via web API.

✨

Provides selective and full activation checkpointing with efficient save/load using DTensor-based checkpointing (DCP).

✨

Logs metrics such as loss, GPU memory usage, throughput, TFLOPs, and MFU to TensorBoard or Weights & Biases, with CPU/GPU and memory profiling tools.

✨

Uses TOML files for configuration including batch size and learning rate schedulers, with helper scripts for Hugging Face tokenizer downloads.

✨

Includes a checkpointable data loader supporting the C4 dataset and Hugging Face tokenizers for streamlined data preparation.