Key Features

What you can do

Composable Parallelism

Supports 4D parallelism including data, tensor, pipeline, and expert parallelism in a modular and composable manner.

Elastic Scaling and Fault Tolerance

Enables elastic scaling to adapt to varying computational resources and includes mechanisms to handle rank failures via web API.

Advanced Checkpointing

Provides selective and full activation checkpointing with efficient save/load using DTensor-based checkpointing (DCP).

Comprehensive Logging and Debugging

Logs metrics such as loss, GPU memory usage, throughput, TFLOPs, and MFU to TensorBoard or Weights & Biases, with CPU/GPU and memory profiling tools.

Flexible Configuration

Uses TOML files for configuration including batch size and learning rate schedulers, with helper scripts for Hugging Face tokenizer downloads.

Integrated Dataset Support

Includes a checkpointable data loader supporting the C4 dataset and Hugging Face tokenizers for streamlined data preparation.