Sparse Mixture of Experts Layers
• Specialized neural network layer
• Activates subset of expert subnetworks per token
• Improves model efficiency and specialization
Slide 1 of 12
← Previous
Home
Next →