FP8 Precision Support
Enables FP8 precision on NVIDIA Hopper, Ada, and Blackwell GPUs to reduce memory utilization during training and inference.
Fused Kernels for Transformer Models
Includes optimized fused kernels that improve performance across FP8, FP16, and BF16 precisions on supported GPUs.
Framework Integration
Provides automatic mixed precision API integration with PyTorch and JAX, detecting the framework during installation for seamless use.
Framework-Agnostic C++ API
Offers a C++ API with FP8 kernel support for integration with custom deep learning libraries beyond Python frameworks.