Key Features - mamba-two-blocks

✨

Mamba-2 processes sequences with computation time increasing linearly as sequence length doubles, unlike transformers which scale quadratically.

✨

The architecture allows tokens to be transformed uniquely through a selective mechanism within the state space model.

✨

Includes optimizations such as kernel fusion and parallel scan to improve runtime performance on supported hardware.

✨

Works with PyTorch version 1.12 or higher and CUDA 11.6 or newer.