Key Features - multi-token-prediction

✨

Predicts k additional future tokens per position using a shared output head with the main model after softmax on logits.

✨

Extends the prediction scope beyond the immediate next token to improve data efficiency during pre-training.

✨

Enables faster inference by supporting speculative decoding, as implemented in GLM-4.5.

✨

Achieves higher accuracy on code generation tasks, e.g., 95% accuracy with 4-token prediction versus 80% for single-token baselines.

✨

Works with MoE models like DeepSeek-V3, which activates a subset of parameters per token for efficient training and inference.