Transformerengine
Transformer Engine is an open-source library developed by NVIDIA designed to accelerate Transformer model training and inference on NVIDIA GPUs. It supports FP8 precision on Hopper, Ada, and Blackwell GPU architectures, which reduces memory usage while maintaining performance. The library provides optimized building blocks and fused kernels for Transformer layers, integrating with popular deep learning frameworks such as PyTorch and JAX through an automatic mixed precision API. It also offers a framework-agnostic C++ API for broader integration needs. The library targets developers working with Transformer-based models on NVIDIA hardware, particularly those leveraging newer GPU architectures that support FP8 precision. Installation requires specific system prerequisites including Linux, CUDA 12.1 or higher, and compatible NVIDIA GPUs. Transformer Engine is distributed under the Apache 2.0 license and is free to use.
Transformer Engine is an NVIDIA open-source library that accelerates Transformer models on supported GPUs using FP8 precision.
Training Large Transformer Models
Developers training large-scale Transformer models on NVIDIA GPUs can leverage FP8 precision to reduce memory usage and accelerate training.
Inference Optimization
Deploying Transformer models for inference on supported NVIDIA GPUs benefits from optimized kernels and lower memory footprint.