trulens.core.metric¶
trulens.core.metric
¶
Classes¶
InvalidSelector
¶
Metric
¶
Bases: FeedbackDefinition
Metric function container.
A metric evaluates some aspect of an AI application's behavior, such as relevance, groundedness, or custom quality measures.
Example
from trulens.core import Metric, Selector
# Direct construction with selectors
metric = Metric(
name="relevance_v1",
implementation=my_relevance_fn,
selectors={
"query": Selector.select_record_input(),
"response": Selector.select_record_output(),
},
)
# Or with a provider and fluent API
from trulens.providers.openai import OpenAI
provider = OpenAI()
metric = Metric(
implementation=provider.relevance
).on_input_output()
Note
The enable_trace_compression parameter is only applicable to metrics
that take 'trace' as an input parameter. It has no effect on other metrics.
Attributes¶
tru_class_info
instance-attribute
¶
tru_class_info: Class
Class information of this pydantic object for use in deserialization.
Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.
implementation
class-attribute
instance-attribute
¶
Implementation serialization.
aggregator
class-attribute
instance-attribute
¶
Aggregator method serialization.
combinations
class-attribute
instance-attribute
¶
combinations: Optional[FeedbackCombinations] = PRODUCT
Mode of combining selected values to produce arguments to each feedback function call.
feedback_definition_id
instance-attribute
¶
feedback_definition_id: FeedbackDefinitionID = (
feedback_definition_id
)
Id, if not given, uniquely determined from content.
if_exists
class-attribute
instance-attribute
¶
Only execute the feedback function if the following selector names something that exists in a record/app.
Can use this to evaluate conditionally on presence of some calls, for example. Feedbacks skipped this way will have a status of FeedbackResultStatus.SKIPPED.
if_missing
class-attribute
instance-attribute
¶
if_missing: FeedbackOnMissingParameters = ERROR
How to handle missing parameters in feedback function calls.
run_location
instance-attribute
¶
run_location: Optional[FeedbackRunLocation]
Where the feedback evaluation takes place (e.g. locally, at a Snowflake server, etc).
selectors
instance-attribute
¶
Selectors; pointers into Records of where
to get arguments for imp. In OTEL mode, these are Selector objects; in legacy mode,
these are Lens objects.
supplied_name
class-attribute
instance-attribute
¶
An optional name. Only will affect displayed tables.
higher_is_better
class-attribute
instance-attribute
¶
Feedback result magnitude interpretation.
metric_type
class-attribute
instance-attribute
¶
Implementation identifier for this metric.
E.g., "relevance", "groundedness", "text2sql". If not provided, defaults to the function name. This allows the same metric implementation to be used multiple times with different configurations and names.
description
class-attribute
instance-attribute
¶
Human-readable description of what this metric measures.
imp
class-attribute
instance-attribute
¶
imp: Optional[ImpCallable] = implementation
Implementation callable.
A serialized version is stored at FeedbackDefinition.implementation.
agg
class-attribute
instance-attribute
¶
agg: Optional[AggCallable] = agg
Aggregator method for metrics that produce more than one result.
A serialized version is stored at FeedbackDefinition.aggregator.
examples
class-attribute
instance-attribute
¶
Examples to use when evaluating the metric.
criteria
class-attribute
instance-attribute
¶
Criteria for the metric.
additional_instructions
class-attribute
instance-attribute
¶
Additional instructions for the metric.
min_score_val
class-attribute
instance-attribute
¶
Minimum score value for the metric.
max_score_val
class-attribute
instance-attribute
¶
Maximum score value for the metric.
temperature
class-attribute
instance-attribute
¶
Temperature parameter for the metric.
groundedness_configs
class-attribute
instance-attribute
¶
groundedness_configs: Optional[GroundednessConfigs] = (
groundedness_configs
)
Optional groundedness configuration parameters.
enable_trace_compression
class-attribute
instance-attribute
¶
Whether to compress trace data to reduce token usage when sending traces to metrics.
When True, traces are compressed to preserve essential information while removing redundant data. When False, full uncompressed traces are used. When None (default), the metric's default behavior is used. This flag is only applicable to metrics that take 'trace' as an input parameter.
name
property
¶
name: str
Name of the metric.
Derived from the name of the function implementing it if no supplied name provided.
Functions¶
load
staticmethod
¶
load(obj, *args, **kwargs)
Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.
model_validate
classmethod
¶
model_validate(*args, **kwargs) -> Any
Deserialized a jsonized version of the app into the instance of the class it was serialized from.
Note
This process uses extra information stored in the jsonized object and handled by WithClassInfo.
__init__
¶
__init__(
implementation: Optional[Callable] = None,
agg: Optional[Callable] = None,
examples: Optional[List[Tuple]] = None,
criteria: Optional[str] = None,
additional_instructions: Optional[str] = None,
min_score_val: Optional[int] = 0,
max_score_val: Optional[int] = 3,
temperature: Optional[float] = 0.0,
groundedness_configs: Optional[
GroundednessConfigs
] = None,
enable_trace_compression: Optional[bool] = None,
metric_type: Optional[str] = None,
description: Optional[str] = None,
imp: Optional[Callable] = None,
**kwargs
)
Initialize a metric.
| PARAMETER | DESCRIPTION |
|---|---|
implementation
|
The metric function to execute. Can be a plain callable or a method from a Provider class. |
agg
|
Aggregator function for combining multiple metric results. Defaults to np.mean. |
examples
|
User-supplied examples for this metric. |
criteria
|
Criteria for the metric evaluation. |
additional_instructions
|
Custom instructions for the metric. |
min_score_val
|
Minimum score value (default: 0). |
max_score_val
|
Maximum score value (default: 3). |
temperature
|
Temperature parameter for LLM-based metrics (default: 0.0). |
groundedness_configs
|
Optional groundedness configuration.
TYPE:
|
enable_trace_compression
|
Whether to compress trace data. |
metric_type
|
Implementation identifier (e.g., "relevance", "groundedness"). If not provided, defaults to the function name. |
description
|
Human-readable description of what this metric measures. |
imp
|
DEPRECATED. Use |
**kwargs
|
Additional arguments passed to parent class.
DEFAULT:
|
on_input_output
¶
on_input_output() -> Metric
Specifies that the metric implementation arguments are to be the main app input and output in that order.
Returns a new Metric object with the specification.
on_default
¶
on_default() -> Metric
Specifies that one argument metrics should be evaluated on the main app output and two argument metrics should be evaluated on main input and main output in that order.
Returns a new Metric object with this specification.
evaluate_deferred
staticmethod
¶
evaluate_deferred(
session: TruSession,
limit: Optional[int] = None,
shuffle: bool = False,
run_location: Optional[FeedbackRunLocation] = None,
) -> List[Tuple[Series, Future[FeedbackResult]]]
Evaluates metrics that were specified to be deferred.
Returns a list of tuples with the DB row containing the Metric and initial FeedbackResult as well as the Future which will contain the actual result.
| PARAMETER | DESCRIPTION |
|---|---|
limit
|
The maximum number of evals to start. |
shuffle
|
Shuffle the order of the metrics to evaluate.
TYPE:
|
run_location
|
Only run metrics with this run_location.
TYPE:
|
Constants that govern behavior:
-
TruSession.RETRY_RUNNING_SECONDS: How long to time before restarting a metric that was started but never failed (or failed without recording that fact).
-
TruSession.RETRY_FAILED_SECONDS: How long to wait to retry a failed metric.
aggregate
¶
aggregate(
func: Optional[AggCallable] = None,
combinations: Optional[FeedbackCombinations] = None,
) -> Metric
Specify the aggregation function in case the selectors for this metric generate more than one value for implementation argument(s). Can also specify the method of producing combinations of values in such cases.
Returns a new Metric object with the given aggregation function and/or the given combination mode.
on_prompt
¶
Create a variant of self that will take in the main app input or
"prompt" as input, sending it as an argument arg to implementation.
on_response
¶
Create a variant of self that will take in the main app output or
"response" as input, sending it as an argument arg to implementation.
on_context
¶
Create a variant of self that will attempt to take in the context from
a context retrieval as input, sending it as an argument arg to
implementation.
on
¶
on(*args, **kwargs) -> Metric
Create a variant of self with the same implementation but the given
selectors. Those provided positionally get their implementation argument
name guessed and those provided as kwargs get their name from the kwargs
key.
check_selectors
¶
check_selectors(
app: Union[AppDefinition, JSON],
record: Record,
source_data: Optional[Dict[str, Any]] = None,
warning: bool = False,
) -> bool
Check that the selectors are valid for the given app and record.
| PARAMETER | DESCRIPTION |
|---|---|
app
|
The app that produced the record.
TYPE:
|
record
|
The record that the metric will run on. This can be a mostly empty record for checking ahead of producing one. The utility method App.dummy_record is built for this purpose.
TYPE:
|
source_data
|
Additional data to select from when extracting metric function arguments. |
warning
|
Issue a warning instead of raising an error if a selector is invalid. As some parts of a Record cannot be known ahead of producing it, it may be necessary to not raise exception here and only issue a warning.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the selectors are valid. False if not (if warning is set). |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If a selector is invalid and warning is not set. |
run
¶
run(
app: Optional[Union[AppDefinition, JSON]] = None,
record: Optional[Record] = None,
source_data: Optional[Dict] = None,
**kwargs: Dict[str, Any]
) -> FeedbackResult
Run the metric on the given record. The app that
produced the record is also required to determine input/output argument
names.
| PARAMETER | DESCRIPTION |
|---|---|
app
|
The app that produced the record. This can be AppDefinition or a jsonized AppDefinition. It will be jsonized if it is not already.
TYPE:
|
record
|
The record to evaluate the metric on. |
source_data
|
Additional data to select from when extracting metric function arguments. |
**kwargs
|
Any additional keyword arguments are used to set or override selected metric function inputs. |
| RETURNS | DESCRIPTION |
|---|---|
FeedbackResult
|
A FeedbackResult object with the result of the metric. |
extract_selection
¶
extract_selection(
app: Optional[Union[AppDefinition, JSON]] = None,
record: Optional[Record] = None,
source_data: Optional[Dict] = None,
) -> Iterable[Dict[str, Any]]
Given the app that produced the given record, extract from record
the values that will be sent as arguments to the implementation as
specified by self.selectors. Additional data to select from can be
provided in source_data. All args are optional. If a
Record is specified, its calls are
laid out as app (see
layout_calls_as_app).
SkipEval
¶
Bases: Exception
Raised when evaluating a metric function implementation to skip it so it is not aggregated with other non-skipped results.
| PARAMETER | DESCRIPTION |
|---|---|
reason
|
Optional reason for why this evaluation was skipped. |
metric
|
The Metric instance this run corresponds to. |
ins
|
The arguments to this run. |
Selector
dataclass
¶
Functions¶
select_record_input
staticmethod
¶
select_record_output
staticmethod
¶
select_context
staticmethod
¶
Returns a Selector that tries to retrieve contexts.
| PARAMETER | DESCRIPTION |
|---|---|
collect_list
|
Assuming the returned
TYPE:
|
ignore_none_values
|
If True, skip evaluation when contexts are None. Defaults to True to prevent errors on missing data.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Selector
|
|