- Improves accuracy on benchmarks compared to standard MoE models at equivalent parameter counts.
- Reduces memory bandwidth and communication overhead in MoE architectures.
- Enables higher routing capacity without increasing runtime or computational cost.