Multi-Token Prediction
Multi-Token Prediction (MTP) is a training objective and architectural technique used in large language models to predict multiple future tokens simultaneously at each position, rather than one token at a time. This approach densifies training signals by extending the prediction scope beyond the immediate next token, which can improve data efficiency and overall performance on evaluation benchmarks. Models such as DeepSeek-V3 and GLM-4.5 implement MTP to enhance training and inference capabilities. For example, DeepSeek-V3 is a 671 billion parameter mixture-of-experts model that activates 37 billion parameters per token and uses MTP alongside Multi-head Latent Attention for efficient training and inference. GLM-4.5 incorporates an MTP layer to support speculative decoding during inference after pre-training on large corpora of general and code/reasoning tokens.
Multi-Token Prediction enables simultaneous prediction of multiple future tokens per position to improve training efficiency and model performance.
Code Generation
Training models on large code corpora to improve generation accuracy and speed.
Efficient Large Language Model Training
Incorporating MTP into training pipelines to densify training signals and improve data efficiency.
Faster Inference via Speculative Decoding
Using MTP-enabled models like GLM-4.5 to perform speculative decoding during inference.