Skip to content

TruLens 2.7: Unified Metric API, MLflow Integration, and More

TruLens 2.7 brings a cleaner evaluation API, first-class MLflow integration, improved Snowflake support, and a growing library of examplesโ€”making it easier than ever to evaluate, iterate, and trust your AI applications.


Unified Metric API

The headline feature of TruLens 2.7 is the Unified Metric API: a single Metric class that replaces both Feedback and MetricConfig. If you've been using either API, they continue to work with deprecation warningsโ€”but the new Metric class is the recommended path forward.

Why We Unified Them

Feedback was the original TruLens evaluation primitive, while MetricConfig emerged as a cleaner configuration-first alternative. Running both in parallel created confusion: which should I use? Do they behave the same? The new Metric class answers both questions with a single, consistent interface.

What Changed

The new Metric class uses an explicit selectors dictionary instead of chained .on() calls, making the mapping from LLM arguments to span data clear at a glance:

Before (Feedback API)

from trulens.core import Feedback
from trulens.providers.openai import OpenAI

provider = OpenAI()

f_answer_relevance = (
    Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
    .on_input()
    .on_output()
)

f_context_relevance = (
    Feedback(provider.context_relevance_with_cot_reasons, name="Context Relevance")
    .on_input()
    .on(Select.RecordCalls.retrieve.rets)
    .aggregate(numpy.mean)
)

After (Metric API)

import numpy as np
from trulens.core import Metric, Selector
from trulens.providers.openai import OpenAI

provider = OpenAI()

f_answer_relevance = Metric(
    implementation=provider.relevance_with_cot_reasons,
    name="Answer Relevance",
    selectors={
        "prompt": Selector.select_record_input(),
        "response": Selector.select_record_output(),
    },
)

f_context_relevance = Metric(
    implementation=provider.context_relevance_with_cot_reasons,
    name="Context Relevance",
    selectors={
        "question": Selector.select_record_input(),
        "context": Selector.select_context(collect_list=False),
    },
    agg=np.mean,
)

Both the old Feedback and MetricConfig classes continue to work but emit deprecation warnings guiding you to migrate. The behavior is identicalโ€”this is a pure API unification with no functional changes.

Learn more: Metric API Documentation


MLflow Integration

TruLens 2.7 adds first-class support for using TruLens feedback functions as MLflow scorers via mlflow.genai.evaluate (requires MLflow 3.10+). This means you can run the RAG Triad, agent evaluations, and any custom TruLens metric directly inside your MLflow evaluation pipelinesโ€”no adapter code required.

What You Can Evaluate

  • RAG scorers: Groundedness, Context Relevance, Answer Relevance
  • Output scorers: Coherence, Helpfulness, Sentiment
  • Agent trace scorers: ToolSelection, ToolCalling, ToolQuality

TruLens Scorers in MLflow

import mlflow
from trulens.providers.openai import OpenAI
from trulens.feedback.v2.feedback import Groundedness, ContextRelevance

provider = OpenAI()

# Define scorers using TruLens feedback functions
groundedness_scorer = Groundedness(provider=provider)
context_relevance_scorer = ContextRelevance(provider=provider)

with mlflow.start_run():
    results = mlflow.genai.evaluate(
        model=my_rag_app,
        data=eval_dataset,
        scorers=[groundedness_scorer, context_relevance_scorer],
    )
    print(results.tables["eval_results_table"])

This integration lets teams who already use MLflow for experiment tracking add TruLens's LLM-as-a-judge evaluations to their existing workflows without switching tools.

See the example: MLflow + TruLens Scorers Notebook


LiteLLM Custom Endpoints

TruLens's LiteLLM provider now correctly forwards api_base, api_key, and other routing parameters to completion calls. Previously, these params were silently dropped, making it impossible to use self-hosted models (Ollama, vLLM, etc.) or custom OpenAI-compatible endpoints as feedback providers.

Using Ollama as a Feedback Provider

from trulens.providers.litellm import LiteLLM

# Via direct kwarg
provider = LiteLLM(
    model_engine="ollama/llama3.1",
    api_base="http://localhost:11434",
)

# Via environment variable (LiteLLM reads OLLAMA_API_BASE automatically)
import os
os.environ["OLLAMA_API_BASE"] = "http://localhost:11434"
provider = LiteLLM(model_engine="ollama/llama3.1")

# Via completion_kwargs
provider = LiteLLM(
    model_engine="ollama/llama3.1",
    completion_kwargs={"api_base": "http://localhost:11434"},
)

This fix (reported in #1804) unlocks fully local evaluation pipelinesโ€”instrument your app with TruLens, run feedback functions against a local Ollama instance, and keep everything on-prem.

Learn more: LiteLLM Provider Documentation


Feedback Templates Reorganization

The trulens.feedback package now has a cleaner, domain-based layout for feedback template classes. What was a 1,500-line monolith is now organized into focused modules:

Module Contents
trulens.feedback.templates.rag Groundedness, ContextRelevance, PromptResponseRelevance, Answerability, Comprehensiveness
trulens.feedback.templates.safety Harmfulness, Toxicity, Maliciousness, Hate, Misogyny, Stereotypes
trulens.feedback.templates.quality Coherence, Correctness, Conciseness, Sentiment, Helpfulness
trulens.feedback.templates.agent ToolSelection, ToolCalling, ToolQuality, PlanAdherence, PlanQuality, LogicalConsistency

All existing imports continue to workโ€”prompts.py and v2/feedback.py are backward-compatible shims. This reorganization makes it easier to find, extend, and contribute feedback templates.


Snowflake Improvements

Password-Free Authentication

SnowflakeConnector now supports password-free authentication methods directlyโ€”no need to pre-build a Snowpark session. The externalbrowser SSO flow is the new recommended approach:

SSO Authentication

from trulens.connectors.snowflake import SnowflakeConnector
from trulens.core import TruSession

connector = SnowflakeConnector(
    account="myorg-myaccount",
    user="my.name@company.com",
    authenticator="externalbrowser",
    database="TRULENS_DB",
    schema="PUBLIC",
    warehouse="COMPUTE_WH",
)
session = TruSession(connector=connector)

Key-pair and OAuth token authentication are also supported via the private_key_file and token parameters respectively.

Snowsight Evaluations

Snowflake users should use the AI Observability Evaluations page in Snowsight rather than launching a local Streamlit dashboard. The run_dashboard_sis entrypoints are now deprecated with migration guidance. The Snowsight UI provides a fully managed, scalable view of your TruLens traces and evaluations without any local infrastructure.

Accurate Cortex Cost Tracking

Cortex model cost tracking now uses input/output split pricing for all supported models, giving you accurate cost breakdowns that match Snowflake's billing. Previously, a single blended rate was used.


New Example: Hybrid Search RAG with Qdrant

A new example notebook shows how to build and evaluate a Hybrid Search RAG pipeline using LangChain, Qdrant, and OpenAIโ€”then evaluate it end-to-end with TruLens.

The pipeline combines dense embeddings and sparse (BM25) retrieval for higher-quality context selection, and the notebook walks through applying the full RAG Triad (Groundedness, Context Relevance, Answer Relevance) to measure quality.

See the example: Hybrid Search RAG with LangChain and Qdrant


Get Started

Ready to try TruLens 2.7?

Install TruLens

pip install trulens --upgrade

Have feedback or feature requests? Open an issue or discussion on GitHub.