Multiple Future Token Prediction
Predicts k additional future tokens per position using a shared output head with the main model after softmax on logits.
Densified Training Signals
Extends the prediction scope beyond the immediate next token to improve data efficiency during pre-training.
Speculative Decoding Support
Enables faster inference by supporting speculative decoding, as implemented in GLM-4.5.
Improved Code Generation Accuracy
Achieves higher accuracy on code generation tasks, e.g., 95% accuracy with 4-token prediction versus 80% for single-token baselines.
Integration with Mixture-of-Experts Architectures
Works with MoE models like DeepSeek-V3, which activates a subset of parameters per token for efficient training and inference.