Model2vec
Model2Vec is an open-source Python library designed to convert Sentence Transformer models into compact static embedding models. It achieves this by computing fixed vectors for each token and then averaging these vectors to generate sentence embeddings, which enables high-throughput CPU inference without the need for full transformer computations at runtime. This approach reduces model sizes by up to 50 times, with the best models around 30 MB and the smallest approximately 8 MB, while accelerating inference speeds by up to 500 times with minimal performance loss compared to original Sentence Transformers. The library supports loading models from the Hugging Face Hub or local paths and integrates with several vector database and AI frameworks including Milvus, Weaviate, Spice.ai, Sentence Transformers, and LangChain. Model2Vec also supports fine-tuning classifiers on the static embeddings using PyTorch, Lightning, or scikit-learn for both single-label and multi-label classification tasks. It outperforms other static embedding methods such as GLoVe and BPEmb on benchmark tests. The library is open-source and free to use, with models hosted on Hugging Face Hub and optional authentication tokens for private models. Its target users are developers who require efficient embedding-based applications on resource-constrained devices or prefer local CPU inference without relying on external APIs.
Model2Vec distills Sentence Transformer models into compact static embeddings that enable fast CPU inference with significantly reduced model size.
Embedding-based Applications on Resource-Constrained Devices
Developers needing to deploy embedding models on devices with limited memory and CPU resources.
Local CPU Inference without External APIs
Applications requiring fast embedding generation locally without relying on external API calls.
Integration with Vector Databases
Use with vector search platforms like Milvus and Weaviate for efficient similarity search and retrieval.
pip install model2vec to install the library. Add [training] to enable fine-tuning features.StaticModel.from_pretrained('minishlab/potion-base-8M') to load a model from Hugging Face Hub.model.encode(["sentence1", "sentence2"]) to obtain sentence embeddings.pip install "pymilvus[model]" and create embedding function with model.dense.Model2VecEmbeddingFunction(model_source='minishlab/potion-base-8M').model_source when loading models from disk.