Getting Started

How to get started with Multi-Token Prediction

1

Review Technical Reports

Study detailed MTP implementation in papers such as the DeepSeek-V3 technical report on arXiv to understand output heads and softmax usage.

2

Access Pre-trained Models

Use platforms like Dataloop.ai to obtain pre-trained MTP models, for example, a 7B parameter model trained on 1T code tokens.

3

Integrate MTP Modules

Add MTP modules after transformer layers in your training pipeline to enable multi-token target prediction.

4

Benchmark Performance

Evaluate model performance on code and math benchmarks, comparing results against single-token prediction baselines.

5

Enable Speculative Decoding

For inference, activate speculative decoding features if supported by the model, such as in GLM-4.5.