Evaluate and Track LLM Applications

Evaluate, iterate faster, and select your best LLM app with TruLens.

TruLens: Don't just vibe check your LLM app!

Create credible and powerful LLM apps, faster. TruLens is a software tool that helps you to objectively measure the quality and effectiveness of your LLM-based applications using feedback functions. Feedback functions help to programmatically evaluate the quality of inputs, outputs, and intermediate results, so that you can expedite and scale up experiment evaluation. Use it for a wide variety of use cases including question answering, summarization, retrieval-augmented generation, and agent-based applications.

Evaluate

Evaluate how your choices are performing across multiple feedback functions, such as:

  • Groundedness
  • Context Relevance
  • Safety

Iterate

Leverage and add to an extensible library of built-in feedback functions. Observe where apps have weaknesses to inform iteration on prompts, hyperparameters, and more.

Test

Compare different LLM apps on a metrics leaderboard to pick the best performing one.

How it works

TruLens diagram TruLens diagram vertical

Why Use TruLens for LLM applications?

The fastest, easiest way to validate your LLM app.

Start with a few lines of code.

TruLens fits easily into your LLM app dev process. Simply pip install from PyPI, and add a couple of lines to your LLM app. Track any application, and evaluate with the model of your choice.

Drive rapid iteration with scalable, programmatic feedback.

Human feedback is the most common way of evaluating LLM apps today - it’s important, but slow and limited. TruLens provides the higher volume, programmatic feedback that helps you to identify trouble spots and iterate rapidly.

TruLens Feednacl Functions

Get the breadth of feedback you need to evaluate app performance.

TruLens can evaluate your LLM app with the following kinds of feedback functions to increase performance and minimize risk:

  • Context Relevance
  • Groundedness
  • Answer Relevance
  • Comprehensiveness
  • Harmful or toxic language
  • User sentiment
  • Language mismatch
  • Fairness and bias
  • Or other custom feedback functions you provide

TruLens can work with any LLM-based app

Use TruLens for any LLM based app that you’re building with Python.

    TruLens is loved by thousands of users for applications such as:

  • Retrieval Augmented Generation (RAG)
  • Summarization
  • Co-pilots
  • Agents
  • TruLens can also help you to identify which of your LLM app versions is the best performing

  • Understand which version of your LLM apps is producing the best results across a variety of metrics
  • Make informed trade-offs between cost, latency and response quality.

Get started using TruLens today

You are critical to the ongoing success of TruLens. We encourage you to get started and provide ample feedback, so that TruLens improves over time.

Download

Get started with pip install trulens.

Documentation

Read about the library here.

Community

Come join the TruLens community on the AI Quality Forum slack.

What’s a Feedback Function?

A feedback function scores the output of an LLM application by analyzing generated text from an LLM (or a downstream model or application built on it) and metadata.

This is similar to labeling functions.  A human-in-the-loop can be used to discover a relationship between the feedback and input text. By modeling this relationship, we can then programmatically apply it to scale up model evaluation. You can read more in this blog: “What’s Missing to Evaluate Foundation Models at Scale”

TruLens Unlimited Feedback Functions

TruLens is shepherded by TruEra

TruEra is an AI Quality software company that helps organizations better test, debug, and monitor machine learning models and applications. Although TruEra both actively oversees the distribution of TruLens and helps organize the community around it, TruLens remains an open-source community project, not a TruEra product.

About the TruEra Research Team

TruLens originally emerged from the work of the TruEra Research Team. They are passionate about the importance of testing and quality in machine learning. They continue to be involved in the development of the TruLens community.

You can learn more about TruEra Research here.

Why a colossal squid?

The colossal squid’s eyeball is about the size of a soccer ball, making it the largest eyeball of any living creature. In addition, did you know that its eyeball contains light organs? That means that colossal squids have automatic headlights when looking around. We're hoping to bring similar guidance to model developers when creating, introspecting, and debugging neural networks. Read more about the amazing eyes of the colossal squid.