Leann
Leann is an open-source semantic search backend optimized for Retrieval-Augmented Generation (RAG) applications. It achieves significant storage efficiency, providing approximately 97% storage savings compared to traditional vector databases. The system supports local, privacy-focused deployments that do not rely on cloud services, enabling users to query private data sources such as Slack messages or Twitter posts securely on their own machines. Developed by Berkeley SkyLab, Leann uses an adaptive search pipeline combining coarse-grained filtering with accurate retrieval, alongside optimizations like GPU batching, ZMQ communication using distances instead of full embeddings, CPU/GPU overlapping, and selective caching of high-degree nodes to maintain performance with minimal storage overhead. Leann supports multiple large language model (LLM) providers through OpenAI-compatible APIs, including HuggingFace and Ollama. It is distributed primarily via its GitHub repository and can be installed quickly via PyPI. The tool is designed for developers and researchers building local AI agents and semantic search applications that prioritize privacy and low storage requirements.
Leann is an open-source, lightweight semantic search backend designed for efficient, privacy-focused RAG applications with substantial storage savings.
Local Semantic Search over Private Data
Query private Slack messages or Twitter posts without sending data to the cloud.
Building Local AI Agents with Long-Term Memory
Connect personal data sources to create AI agents that operate locally with privacy preservation.
uv pip install leann or clone the repository and install dependencies using the provided commands.export OPENAI_API_KEY="your-api-key-here".--llm openai --llm-model for generation or --embedding-mode openai --embedding-model for embeddings.