- Without Model2vec: Distillation introduces a small performance drop compared to full transformer models.
- Without Model2vec: Static token averaging approach is less flexible than dynamic transformer models for some tasks.
Challenges that Model2vec addresses