COR Brief
AI ToolsCode & DevelopmentTransformerengine
Code & Development

Transformerengine

Transformer Engine is an open-source library developed by NVIDIA designed to accelerate Transformer model training and inference on NVIDIA GPUs. It supports FP8 precision on Hopper, Ada, and Blackwell GPU architectures, which reduces memory usage while maintaining performance. The library provides optimized building blocks and fused kernels for Transformer layers, integrating with popular deep learning frameworks such as PyTorch and JAX through an automatic mixed precision API. It also offers a framework-agnostic C++ API for broader integration needs. The library targets developers working with Transformer-based models on NVIDIA hardware, particularly those leveraging newer GPU architectures that support FP8 precision. Installation requires specific system prerequisites including Linux, CUDA 12.1 or higher, and compatible NVIDIA GPUs. Transformer Engine is distributed under the Apache 2.0 license and is free to use.

Updated Dec 16, 2025open-source

Transformer Engine is an NVIDIA open-source library that accelerates Transformer models on supported GPUs using FP8 precision.

Pricing
open-source
Category
Code & Development
Company
Interactive PresentationOpen Fullscreen ↗
01
Enables FP8 precision on NVIDIA Hopper, Ada, and Blackwell GPUs to reduce memory utilization during training and inference.
02
Includes optimized fused kernels that improve performance across FP8, FP16, and BF16 precisions on supported GPUs.
03
Provides automatic mixed precision API integration with PyTorch and JAX, detecting the framework during installation for seamless use.
04
Offers a C++ API with FP8 kernel support for integration with custom deep learning libraries beyond Python frameworks.

Training Large Transformer Models

Developers training large-scale Transformer models on NVIDIA GPUs can leverage FP8 precision to reduce memory usage and accelerate training.

Inference Optimization

Deploying Transformer models for inference on supported NVIDIA GPUs benefits from optimized kernels and lower memory footprint.

1
Verify System Requirements
Ensure your system runs Linux with CUDA 12.1 or higher (12.8+ for Blackwell GPUs), cuDNN 9.3+, Python 3.12 recommended, and an NVIDIA GPU with Compute Capability 8.9 or above for FP8 support.
2
Install Transformer Engine
Install via pip using the command: pip3 install --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@stable. The installer sets the NVTE_FRAMEWORK environment variable automatically if needed.
3
Import and Use in PyTorch
Import the library in your PyTorch code with: import transformer_engine.pytorch as te, then use modules such as te.Linear to build Transformer layers.
4
Run on CUDA Device
Execute your model on a CUDA device, for example by creating input tensors on the GPU: inp = torch.randn(..., device='cuda').
5
Refer to Documentation
Consult the Quickstart Notebook and official documentation for detailed examples and advanced usage.
📊

Strategic Context for Transformerengine

Get weekly analysis on market dynamics, competitive positioning, and implementation ROI frameworks with AI Intelligence briefings.

Try Intelligence Free →
7 days free · No credit card
Pricing
Model: open-source

Transformer Engine is free and open-source under the Apache 2.0 license.

Assessment
Strengths
  • Supports FP8 precision with automatic scaling factor management for mixed precision training.
  • Includes fused kernels optimized for Transformer operations across multiple precisions.
  • Integrates with PyTorch and JAX frameworks via automatic detection during installation.
  • Provides a framework-agnostic C++ API for custom integration.
  • Supports both training and inference with reduced memory usage on supported NVIDIA GPUs.
Limitations
  • Requires specific NVIDIA hardware: Ampere or newer GPUs for base support, and Hopper/Ada/Blackwell GPUs for FP8 precision.
  • Open issues include build failures in Docker with certain PyTorch/CUDA versions and on L40S GPUs.
  • Development builds are unsupported and not recommended for general use.