Trulens
TruLens is an open-source Python library designed for evaluating and tracing AI agents, retrieval-augmented generation (RAG) systems, and other large language model (LLM) applications. It provides programmatic feedback on inputs, outputs, and intermediate results through feedback functions, which help scale human review for quality assessment. The library supports evaluation metrics such as groundedness, context relevance, and answer relevance, and combines these with OpenTelemetry-based tracing to monitor app execution flows including retrieved context, tool calls, and plans. This enables developers to compare different app versions using metrics leaderboards. TruLens integrates with popular LLM providers like OpenAI and Google Gemini, requiring additional provider packages. It offers instrumentation tools such as decorators and wrappers to trace LLM applications without modifying existing code. A dashboard is available to visualize experiments, compare app versions, and review evaluation metrics. The library is free and open-source, distributed via PyPI, and targets developers building and iterating on LLM-based applications in Python.
TruLens is an open-source Python library for evaluating and tracing AI agents and LLM applications using feedback functions and OpenTelemetry tracing.
LLM Application Evaluation
Evaluating and tracing AI agents, RAG systems, and summarization pipelines to measure quality metrics and compare app versions.
pip install trulens trulens-providers-openai to install the core library and OpenAI provider package.@instrument() decorator or TruApp wrapper to trace your LLM app, defining feedback functions such as groundedness and answer relevance.os.environ["OPENAI_API_KEY"] = "your_key_here".from trulens.dashboard import run_dashboard; run_dashboard(session) to visualize results.