TruLens is now OpenTelemetry Compatible 🎉 Learn how to leverage it for agents!

TruLens: Evals and Tracing for Agents

Evaluate and Trace AI Agents

Evaluate, iterate faster, and select your best AI agent with TruLens.

TruLens: Move from Vibes to Metrics

Ship agentic workflows to production, faster. TruLens helps you objectively measure the quality and effectiveness of your AI agent. Evaluate critical components of your app's execution flow—retrieved context, tool calls, plans, and more—so you can expedite experiment evaluation at scale. Use it for agents, RAG, summarization, and beyond.

Trusted by

Evaluate

Evaluate how your choices are performing across multiple metrics, such as:

  • Groundedness
  • Context Relevance
  • Coherence

Iterate

Leverage and add to an extensible library of built-in metrics. Observe where apps have weaknesses to inform iteration on prompts, hyperparameters, and more.

Test

Compare different LLM apps on a metrics leaderboard to pick the best performing one.

How it works

TruLens diagram TruLens diagram vertical

Why Use TruLens to Validate Your AI Agent?

The fastest, easiest way to validate your AI Agent.

Interoperable tracing.

TruLens emits and evaluates OpenTelemetry traces, making it easy to integrate with your existing observability stack.

Scalable, trusted evals.

TruLens provides trusted, benchmarked evals to evaluate your agent's performance. Read more about our benchmarks and optimization process.

Get the breadth of metrics you need to evaluate app performance.

TruLens evaluates AI agents with metrics to measure their performance and minimize risk:

  • Context Relevance
  • Groundedness
  • Answer Relevance
  • Comprehensiveness
  • Harmful or toxic language
  • User sentiment
  • Language mismatch
  • Fairness and bias
  • Or other custom metrics you provide

TruLens can work with any AI Agent

Use TruLens for any AI Agent via the Python SDK or by ingesting OpenTelemetry traces.

    TruLens is loved by thousands of users for applications such as:

  • Agents
  • Retrieval Augmented Generation (RAG)
  • Summarization
  • Co-pilots
  • Use TruLens to identify the best performing version of your agent:

  • Quickly compare metrics across versions and identify trace-level regressions.
  • Make informed trade-offs between accuracy, reliability, cost, and latency.
  • See how the execution flow of your agent changes across versions.

Get started using TruLens today

You are critical to the ongoing success of TruLens. We encourage you to get started and provide ample feedback, so that TruLens improves over time.

Download

Get started with pip install trulens.

Documentation

Read about the library here.

Community

Come join the TruLens community on the AI R&D Discourse Forum.

TruLens is shepherded by Snowflake

Originally created by TruEra, TruLens is a community-driven open source project used by thousands of developers to make credible LLM apps faster. Since TruEra's acquisition by Snowflake, Snowflake now actively oversees and supports the development of TruLens in open source. Read more about Snowflake's commitment to growing TruLens in open source.

Why a colossal squid?

The colossal squid's eyeball is about the size of a soccer ball, making it the largest eyeball of any living creature. In addition, did you know that its eyeball contains light organs? That means that colossal squids have automatic headlights when looking around. We're hoping to bring similar guidance to model developers when creating, introspecting, and debugging neural networks. Read more about the amazing eyes of the colossal squid.