Key Features - latent-moe

✨

Projects full-dimensional token activations into a compact latent space before routing to experts, reducing memory and communication overhead.

✨

Allows for more experts and higher routing capacity within the model without increasing computational cost.

✨

Implemented in NVIDIA's Nemotron-3 Super and Ultra language models, demonstrating practical adoption in advanced AI systems.