Key Features - transformerengine

✨

Enables FP8 precision on NVIDIA Hopper, Ada, and Blackwell GPUs to reduce memory utilization during training and inference.

✨

Includes optimized fused kernels that improve performance across FP8, FP16, and BF16 precisions on supported GPUs.

✨

Provides automatic mixed precision API integration with PyTorch and JAX, detecting the framework during installation for seamless use.

✨

Offers a C++ API with FP8 kernel support for integration with custom deep learning libraries beyond Python frameworks.