Key Features

What you can do

Linear Scaling with Sequence Length

Mamba-2 processes sequences with computation time increasing linearly as sequence length doubles, unlike transformers which scale quadratically.

Selective Token Transformation

The architecture allows tokens to be transformed uniquely through a selective mechanism within the state space model.

Hardware-Optimized Implementation

Includes optimizations such as kernel fusion and parallel scan to improve runtime performance on supported hardware.

Compatibility

Works with PyTorch version 1.12 or higher and CUDA 11.6 or newer.