Model Size Reduction
Reduces Sentence Transformer model sizes by up to 50x, with models ranging from about 8 MB to 30 MB.
Accelerated CPU Inference
Speeds up inference by up to 500x on CPU by using fixed token vectors and simple averaging instead of full transformer computations.
Integration with Popular Tools
Integrates directly with Milvus, Weaviate, Spice.ai, Sentence Transformers, and LangChain for embedding generation and vector search.
Fine-tuning Support
Supports fine-tuning classifiers on static embeddings using PyTorch, Lightning, or scikit-learn for various classification tasks.
Open-source and Hugging Face Compatibility
Models can be loaded from Hugging Face Hub or local paths, facilitating easy access and deployment.