- Reduces model size by up to 50x, enabling deployment on resource-constrained devices.
- Inference speed up to 500x faster on CPU compared to original Sentence Transformers.
- Minimal performance loss relative to original transformer models.
- Runs locally without requiring external API calls.