Running on existing data
In many cases, developers have already logged runs of an LLM app they wish to evaluate or wish to log their app using another system. Feedback functions can also be run on existing data, independent of the recorder
.
At the most basic level, feedback implementations are simple callables that can be run on any arguments matching their signatures like so:
feedback_result = provider.relevance("<some prompt>", "<some response>")
Note
Running the feedback implementation in isolation will not log the evaluation results in TruLens.
In the case that you have already logged a run of your application with TruLens and have the record available, the process for running an (additional) evaluation on that record is by using tru.run_feedback_functions
:
tru_rag = TruCustomApp(rag, app_name="RAG", app_version="v1")
result, record = tru_rag.with_record(rag.query, "How many professors are at UW in Seattle?")
feedback_results = tru.run_feedback_functions(record, feedbacks=[f_lang_match, f_qa_relevance, f_context_relevance])
tru.add_feedbacks(feedback_results)
TruVirtual¶
If your application was run (and logged) outside of TruLens, TruVirtual
can be used to ingest and evaluate the logs.
The first step to loading your app logs into TruLens is creating a virtual app. This virtual app can be a plain dictionary or use our VirtualApp
class to store any information you would like. You can refer to these values for evaluating feedback.
virtual_app = dict(
llm=dict(
modelname="some llm component model name"
),
template="information about the template I used in my app",
debug="all of these fields are completely optional"
)
from trulens.core import Select, VirtualApp
virtual_app = VirtualApp(virtual_app) # can start with the prior dictionary
virtual_app[Select.RecordCalls.llm.maxtokens] = 1024
When setting up the virtual app, you should also include any components that you would like to evaluate in the virtual app. This can be done using the Select
class. Using selectors here lets use reuse the setup you use to define feedback functions. Below you can see how to set up a virtual app with a retriever component, which will be used later in the example for feedback evaluation.
from trulens.core import Select
retriever_component = Select.RecordCalls.retriever
virtual_app[retriever_component] = "this is the retriever component"
Now that you've set up your virtual app, you can use it to store your logged data.
To incorporate your data into TruLens, you have two options. You can either create a Record
directly, or you can use the VirtualRecord
class, which is designed to help you build records so they can be ingested to TruLens.
The parameters you'll use with VirtualRecord
are the same as those for Record
, with one key difference: calls are specified using selectors.
In the example below, we add two records. Each record includes the inputs and outputs for a context retrieval component. Remember, you only need to provide the information that you want to track or evaluate. The selectors are references to methods that can be selected for feedback, as we'll demonstrate below.
from trulens.apps.virtual import VirtualRecord
# The selector for a presumed context retrieval component's call to
# `get_context`. The names are arbitrary but may be useful for readability on
# your end.
context_call = retriever_component.get_context
rec1 = VirtualRecord(
main_input="Where is Germany?",
main_output="Germany is in Europe",
calls=
{
context_call: dict(
args=["Where is Germany?"],
rets=["Germany is a country located in Europe."]
)
}
)
rec2 = VirtualRecord(
main_input="Where is Germany?",
main_output="Poland is in Europe",
calls=
{
context_call: dict(
args=["Where is Germany?"],
rets=["Poland is a country located in Europe."]
)
}
)
data = [rec1, rec2]
Alternatively, suppose we have an existing dataframe of prompts, contexts and responses we wish to ingest.
import pandas as pd
data = {
'prompt': ['Where is Germany?', 'What is the capital of France?'],
'response': ['Germany is in Europe', 'The capital of France is Paris'],
'context': ['Germany is a country located in Europe.', 'France is a country in Europe and its capital is Paris.']
}
df = pd.DataFrame(data)
df.head()
To ingest the data in this form, we can iterate through the dataframe to ingest each prompt, context and response into virtual records.
data_dict = df.to_dict('records')
data = []
for record in data_dict:
rec = VirtualRecord(
main_input=record['prompt'],
main_output=record['response'],
calls=
{
context_call: dict(
args=[record['prompt']],
rets=[record['context']]
)
}
)
data.append(rec)
Now that we've ingested constructed the virtual records, we can build our feedback functions. This is done just the same as normal, except the context selector will instead refer to the new context_call
we added to the virtual record.
from trulens.providers.openai import OpenAI
from trulens.core import Feedback
# Initialize provider class
openai = OpenAI()
# Select context to be used in feedback. We select the return values of the
# virtual `get_context` call in the virtual `retriever` component. Names are
# arbitrary except for `rets`.
context = context_call.rets[:]
# Question/statement relevance between question and each context chunk.
f_context_relevance = (
Feedback(openai.context_relevance)
.on_input()
.on(context)
)
Then, the feedback functions can be passed to TruVirtual
to construct the recorder
. Most of the fields that other non-virtual apps take can also be specified here.
from trulens.apps.virtual import TruVirtual
virtual_recorder = TruVirtual(
app_name="a virtual app",
app=virtual_app,
feedbacks=[f_context_relevance]
)
To finally ingest the record and run feedbacks, we can use add_record
.
for record in data:
virtual_recorder.add_record(rec)
To optionally store metadata about your application, you can also pass an arbitrary dict
to VirtualApp
. This information can also be used in evaluation.
virtual_app = dict(
llm=dict(
modelname="some llm component model name"
),
template="information about the template I used in my app",
debug="all of these fields are completely optional"
)
from trulens.core.schema import Select
from trulens.apps.virtual import VirtualApp
virtual_app = VirtualApp(virtual_app)
The VirtualApp
metadata can also be appended.
virtual_app[Select.RecordCalls.llm.maxtokens] = 1024
This can be particularly useful for storing the components of an LLM app to be later used for evaluation.
retriever_component = Select.RecordCalls.retriever
virtual_app[retriever_component] = "this is the retriever component"