Latent Space Projection
Projects full-dimensional token activations into a compact latent space before routing to experts, reducing memory and communication overhead.
Increased Expert Capacity
Allows for more experts and higher routing capacity within the model without increasing computational cost.
Integration with NVIDIA Nemotron-3
Implemented in NVIDIA's Nemotron-3 Super and Ultra language models, demonstrating practical adoption in advanced AI systems.