Key Features

What you can do

FP8 Precision Support

Enables FP8 precision on NVIDIA Hopper, Ada, and Blackwell GPUs to reduce memory utilization during training and inference.

Fused Kernels for Transformer Models

Includes optimized fused kernels that improve performance across FP8, FP16, and BF16 precisions on supported GPUs.

Framework Integration

Provides automatic mixed precision API integration with PyTorch and JAX, detecting the framework during installation for seamless use.

Framework-Agnostic C++ API

Offers a C++ API with FP8 kernel support for integration with custom deep learning libraries beyond Python frameworks.