Verify System Requirements
Ensure your system runs Linux with CUDA 12.1 or higher (12.8+ for Blackwell GPUs), cuDNN 9.3+, Python 3.12 recommended, and an NVIDIA GPU with Compute Capability 8.9 or above for FP8 support.
Install Transformer Engine
Install via pip using the command: pip3 install --no-build-isolation git+https://github.com/NVIDIA/TransformerEngine.git@stable. The installer sets the NVTE_FRAMEWORK environment variable automatically if needed.
Import and Use in PyTorch
Import the library in your PyTorch code with: import transformer_engine.pytorch as te, then use modules such as te.Linear to build Transformer layers.
Run on CUDA Device
Execute your model on a CUDA device, for example by creating input tensors on the GPU: inp = torch.randn(..., device='cuda').
Refer to Documentation
Consult the Quickstart Notebook and official documentation for detailed examples and advanced usage.