Key Features

What you can do

Multiple Future Token Prediction

Predicts k additional future tokens per position using a shared output head with the main model after softmax on logits.

Densified Training Signals

Extends the prediction scope beyond the immediate next token to improve data efficiency during pre-training.

Speculative Decoding Support

Enables faster inference by supporting speculative decoding, as implemented in GLM-4.5.

Improved Code Generation Accuracy

Achieves higher accuracy on code generation tasks, e.g., 95% accuracy with 4-token prediction versus 80% for single-token baselines.

Integration with Mixture-of-Experts Architectures

Works with MoE models like DeepSeek-V3, which activates a subset of parameters per token for efficient training and inference.