TruLens 2.7: Unified Metric API, MLflow Integration, and More¶
TruLens 2.7 brings a cleaner evaluation API, first-class MLflow integration, improved Snowflake support, and a growing library of examplesโmaking it easier than ever to evaluate, iterate, and trust your AI applications.
Unified Metric API¶
The headline feature of TruLens 2.7 is the Unified Metric API: a single Metric class that replaces both Feedback and MetricConfig. If you've been using either API, they continue to work with deprecation warningsโbut the new Metric class is the recommended path forward.
Why We Unified Them¶
Feedback was the original TruLens evaluation primitive, while MetricConfig emerged as a cleaner configuration-first alternative. Running both in parallel created confusion: which should I use? Do they behave the same? The new Metric class answers both questions with a single, consistent interface.
What Changed¶
The new Metric class uses an explicit selectors dictionary instead of chained .on() calls, making the mapping from LLM arguments to span data clear at a glance:
Before (Feedback API)
from trulens.core import Feedback
from trulens.providers.openai import OpenAI
provider = OpenAI()
f_answer_relevance = (
Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
.on_input()
.on_output()
)
f_context_relevance = (
Feedback(provider.context_relevance_with_cot_reasons, name="Context Relevance")
.on_input()
.on(Select.RecordCalls.retrieve.rets)
.aggregate(numpy.mean)
)
After (Metric API)
import numpy as np
from trulens.core import Metric, Selector
from trulens.providers.openai import OpenAI
provider = OpenAI()
f_answer_relevance = Metric(
implementation=provider.relevance_with_cot_reasons,
name="Answer Relevance",
selectors={
"prompt": Selector.select_record_input(),
"response": Selector.select_record_output(),
},
)
f_context_relevance = Metric(
implementation=provider.context_relevance_with_cot_reasons,
name="Context Relevance",
selectors={
"question": Selector.select_record_input(),
"context": Selector.select_context(collect_list=False),
},
agg=np.mean,
)
Both the old Feedback and MetricConfig classes continue to work but emit deprecation warnings guiding you to migrate. The behavior is identicalโthis is a pure API unification with no functional changes.
Learn more: Metric API Documentation
MLflow Integration¶
TruLens 2.7 adds first-class support for using TruLens feedback functions as MLflow scorers via mlflow.genai.evaluate (requires MLflow 3.10+). This means you can run the RAG Triad, agent evaluations, and any custom TruLens metric directly inside your MLflow evaluation pipelinesโno adapter code required.
What You Can Evaluate¶
- RAG scorers: Groundedness, Context Relevance, Answer Relevance
- Output scorers: Coherence, Helpfulness, Sentiment
- Agent trace scorers: ToolSelection, ToolCalling, ToolQuality
TruLens Scorers in MLflow
import mlflow
from trulens.providers.openai import OpenAI
from trulens.feedback.v2.feedback import Groundedness, ContextRelevance
provider = OpenAI()
# Define scorers using TruLens feedback functions
groundedness_scorer = Groundedness(provider=provider)
context_relevance_scorer = ContextRelevance(provider=provider)
with mlflow.start_run():
results = mlflow.genai.evaluate(
model=my_rag_app,
data=eval_dataset,
scorers=[groundedness_scorer, context_relevance_scorer],
)
print(results.tables["eval_results_table"])
This integration lets teams who already use MLflow for experiment tracking add TruLens's LLM-as-a-judge evaluations to their existing workflows without switching tools.
See the example: MLflow + TruLens Scorers Notebook
LiteLLM Custom Endpoints¶
TruLens's LiteLLM provider now correctly forwards api_base, api_key, and other routing parameters to completion calls. Previously, these params were silently dropped, making it impossible to use self-hosted models (Ollama, vLLM, etc.) or custom OpenAI-compatible endpoints as feedback providers.
Using Ollama as a Feedback Provider
from trulens.providers.litellm import LiteLLM
# Via direct kwarg
provider = LiteLLM(
model_engine="ollama/llama3.1",
api_base="http://localhost:11434",
)
# Via environment variable (LiteLLM reads OLLAMA_API_BASE automatically)
import os
os.environ["OLLAMA_API_BASE"] = "http://localhost:11434"
provider = LiteLLM(model_engine="ollama/llama3.1")
# Via completion_kwargs
provider = LiteLLM(
model_engine="ollama/llama3.1",
completion_kwargs={"api_base": "http://localhost:11434"},
)
This fix (reported in #1804) unlocks fully local evaluation pipelinesโinstrument your app with TruLens, run feedback functions against a local Ollama instance, and keep everything on-prem.
Learn more: LiteLLM Provider Documentation
Feedback Templates Reorganization¶
The trulens.feedback package now has a cleaner, domain-based layout for feedback template classes. What was a 1,500-line monolith is now organized into focused modules:
| Module | Contents |
|---|---|
trulens.feedback.templates.rag |
Groundedness, ContextRelevance, PromptResponseRelevance, Answerability, Comprehensiveness |
trulens.feedback.templates.safety |
Harmfulness, Toxicity, Maliciousness, Hate, Misogyny, Stereotypes |
trulens.feedback.templates.quality |
Coherence, Correctness, Conciseness, Sentiment, Helpfulness |
trulens.feedback.templates.agent |
ToolSelection, ToolCalling, ToolQuality, PlanAdherence, PlanQuality, LogicalConsistency |
All existing imports continue to workโprompts.py and v2/feedback.py are backward-compatible shims. This reorganization makes it easier to find, extend, and contribute feedback templates.
Snowflake Improvements¶
Password-Free Authentication¶
SnowflakeConnector now supports password-free authentication methods directlyโno need to pre-build a Snowpark session. The externalbrowser SSO flow is the new recommended approach:
SSO Authentication
from trulens.connectors.snowflake import SnowflakeConnector
from trulens.core import TruSession
connector = SnowflakeConnector(
account="myorg-myaccount",
user="my.name@company.com",
authenticator="externalbrowser",
database="TRULENS_DB",
schema="PUBLIC",
warehouse="COMPUTE_WH",
)
session = TruSession(connector=connector)
Key-pair and OAuth token authentication are also supported via the private_key_file and token parameters respectively.
Snowsight Evaluations¶
Snowflake users should use the AI Observability Evaluations page in Snowsight rather than launching a local Streamlit dashboard. The run_dashboard_sis entrypoints are now deprecated with migration guidance. The Snowsight UI provides a fully managed, scalable view of your TruLens traces and evaluations without any local infrastructure.
Accurate Cortex Cost Tracking¶
Cortex model cost tracking now uses input/output split pricing for all supported models, giving you accurate cost breakdowns that match Snowflake's billing. Previously, a single blended rate was used.
New Example: Hybrid Search RAG with Qdrant¶
A new example notebook shows how to build and evaluate a Hybrid Search RAG pipeline using LangChain, Qdrant, and OpenAIโthen evaluate it end-to-end with TruLens.
The pipeline combines dense embeddings and sparse (BM25) retrieval for higher-quality context selection, and the notebook walks through applying the full RAG Triad (Groundedness, Context Relevance, Answer Relevance) to measure quality.
See the example: Hybrid Search RAG with LangChain and Qdrant
Get Started¶
Ready to try TruLens 2.7?
Install TruLens
pip install trulens --upgrade
Quick Links¶
- TruLens Documentation
- GitHub Repository
- Metric API Guide
- MLflow Integration Example
- Snowflake Auth Documentation