Strengths
- Reduces model size by up to 50x, enabling deployment on resource-constrained devices.
- Inference speed up to 500x faster on CPU compared to original Sentence Transformers.
- Minimal performance loss relative to original transformer models.
- Runs locally without requiring external API calls.
- Easy integration with Hugging Face Hub and popular vector databases.
Limitations
- Distillation introduces a small performance drop compared to full transformer models.
- Static token averaging approach is less flexible than dynamic transformer models for some tasks.