Evaluate and Track LLM Applications

Evaluate, iterate faster, and select your best LLM app with TruLens.

TruLens:
scale up and accelerate LLM app evaluation

Create credible and powerful LLM apps, faster. TruLens is a software tool that helps you to objectively measure the quality and effectiveness of your LLM-based applications using feedback functions. Feedback functions help to programmatically evaluate the quality of inputs, outputs, and intermediate results, so that you can expedite and scale up experiment evaluation. Use it for a wide variety of use cases including question answering, retrieval-augmented generation, and agent-based applications.

Evaluate

Evaluate how your choices are performing across multiple feedback functions, such as:

  • Groundedness
  • Relevance
  • Toxicity

Iterate

Leverage and add to an extensible library of built-in feedback functions. Observe where apps have weaknesses to inform iteration on prompts, hyperparameters, and more.

Test

Compare different LLM chains on a metrics leaderboard to pick the best performing one.

How it works

TruLens diagram TruLens diagram vertical

Why Use TruLens for LLM applications?

The fastest, easiest way to test and iterate on your LLM app.

Start with a few lines of code.

TruLens fits easily into your LLM app dev process. Simply pip install from PyPI, and add a couple of lines to your LLM app.

Drive rapid iteration with scalable, programmatic feedback.

Human feedback is the most common way of evaluating LLM apps today - it’s important, but slow and limited. TruLens provides the higher volume, programmatic feedback that helps you to identify trouble spots and iterate rapidly.

TruLens Feednacl Functions

Get the breadth of feedback you need to evaluate app performance.

TruLens can evaluate your LLM app with the following kinds of feedback functions to increase performance and minimize risk:

  • Truthfulness
  • Question answering relevance
  • Harmful or toxic language
  • User sentiment
  • Language mismatch
  • Response verbosity
  • Fairness and bias
  • Or other custom feedback functions you provide

TruLens can work with any LLM-based app

Use TruLens for any LLM based app that you’re building with Python.

    TruLens can be used to ensure AI Quality in a wide variety of use cases, such as:

  • Customer service chatbots for retail, manufacturing, insurance, banking, and more!
  • Informational chatbots for consumer research, corporate research, weather, healthcare, and more.
  • TruLens can also help you to identify which of your LLM app versions is the best performing

  • Understand which version of your LLM apps is producing the best results across a variety of metrics
  • Understand which model version has the lowest dollar cost (via API call volume) or risk

Get started using TruLens today

You are critical to the ongoing success of TruLens. We encourage you to get started and provide ample feedback, so that TruLens improves over time.

Download

Get started with pip install trulens.

Documentation

Read about the library here.

Community

Come join the TruLens community on the AI Quality Forum slack.

What’s a Feedback Function?

A feedback function scores the output of an LLM application by analyzing generated text from an LLM (or a downstream model or application built on it) and metadata.

This is similar to labeling functions.  A human-in-the-loop can be used to discover a relationship between the feedback and input text. By modeling this relationship, we can then programmatically apply it to scale up model evaluation. You can read more in this blog: “What’s Missing to Evaluate Foundation Models at Scale”

TruLens Unlimited Feedback Functions

TruLens is shepherded by TruEra

TruEra is an AI Quality software company that helps organizations better test, debug, and monitor machine learning models and applications. Although TruEra both actively oversees the distribution of TruLens and helps organize the community around it, TruLens remains an open-source community project, not a TruEra product.

About the TruEra Research Team

TruLens originally emerged from the work of the TruEra Research Team. They are passionate about the importance of testing and quality in machine learning. They continue to be involved in the development of the TruLens community.

You can learn more about TruEra Research here.

Why a colossal squid?

The colossal squid’s eyeball is about the size of a soccer ball, making it the largest eyeball of any living creature. In addition, did you know that its eyeball contains light organs? That means that colossal squids have automatic headlights when looking around. We're hoping to bring similar guidance to model developers when creating, introspecting, and debugging neural networks. Read more about the amazing eyes of the colossal squid.