Key Features

What you can do

Model Size Reduction

Reduces Sentence Transformer model sizes by up to 50x, with models ranging from about 8 MB to 30 MB.

Accelerated CPU Inference

Speeds up inference by up to 500x on CPU by using fixed token vectors and simple averaging instead of full transformer computations.

Integration with Popular Tools

Integrates directly with Milvus, Weaviate, Spice.ai, Sentence Transformers, and LangChain for embedding generation and vector search.

Fine-tuning Support

Supports fine-tuning classifiers on static embeddings using PyTorch, Lightning, or scikit-learn for various classification tasks.

Open-source and Hugging Face Compatibility

Models can be loaded from Hugging Face Hub or local paths, facilitating easy access and deployment.