Infrastructure & MLOps

Trulens

TruLens is an open-source Python library designed for evaluating and tracing AI agents, retrieval-augmented generation (RAG) systems, and other large language model (LLM) applications. It provides programmatic feedback on inputs, outputs, and intermediate results through feedback functions, which help scale human review for quality assessment. The library supports evaluation metrics such as groundedness, context relevance, and answer relevance, and combines these with OpenTelemetry-based tracing to monitor app execution flows including retrieved context, tool calls, and plans. This enables developers to compare different app versions using metrics leaderboards. TruLens integrates with popular LLM providers like OpenAI and Google Gemini, requiring additional provider packages. It offers instrumentation tools such as decorators and wrappers to trace LLM applications without modifying existing code. A dashboard is available to visualize experiments, compare app versions, and review evaluation metrics. The library is free and open-source, distributed via PyPI, and targets developers building and iterating on LLM-based applications in Python.

Updated Dec 16, 2025open-source

Visit Trulens ↗Visual Guide

Overview

TruLens is an open-source Python library for evaluating and tracing AI agents and LLM applications using feedback functions and OpenTelemetry tracing.

Pricing

open-source

LLM Application Evaluation

Evaluating and tracing AI agents, RAG systems, and summarization pipelines to measure quality metrics and compare app versions.

Quick Start

Install TruLens and Provider Packages

Run pip install trulens trulens-providers-openai to install the core library and OpenAI provider package.

Instrument Your Application

Use the @instrument() decorator or TruApp wrapper to trace your LLM app, defining feedback functions such as groundedness and answer relevance.

Set API Keys

Configure environment variables with your LLM provider API keys, for example, os.environ["OPENAI_API_KEY"] = "your_key_here".

Run Your Instrumented Application

Execute your app to generate traces and evaluations automatically.

Launch the Dashboard

Import and run the dashboard with from trulens.dashboard import run_dashboard; run_dashboard(session) to visualize results.

📊

Strategic Context for Trulens

Get weekly analysis on market dynamics, competitive positioning, and implementation ROI frameworks with AI Intelligence briefings.

Try Intelligence Free →

7 days free · No credit card

Assessment

Strengths

Integrates evaluation into application development with minimal code changes.
Supports both ground truth and reference-free (LLM-as-judge) evaluation methods.
OpenTelemetry traces enable compatibility with existing observability tools.
Includes out-of-the-box feedback functions for common quality metrics.
Community-driven with extensibility for custom feedback functions.

Limitations

Requires additional provider packages for specific LLM integrations.
Depends on external API keys and credentials for LLM providers.
Ground truth evaluations require prepared datasets for initial experiments.