trulens.feedback.v2.feedback¶
trulens.feedback.v2.feedback
¶
PROVIDER IMPLEMENTATION TEMPLATES: Class-based feedback definitions with prompts and criteria. Used by feedback providers to generate system/user prompts for LLM evaluation calls.
Classes¶
FewShotExamples
¶
Bases: BaseModel
Functions¶
from_examples_list
classmethod
¶
Create a FewShotExamples instance from a list of examples.
| PARAMETER | DESCRIPTION |
|---|---|
examples_list
|
A list of tuples where the first element is the feedback_args, and the second element is the score. |
| RETURNS | DESCRIPTION |
|---|---|
FewShotExamples
|
An instance of FewShotExamples with the provided examples.
TYPE:
|
Relevance
¶
Bases: Semantics
This evaluates the relevance of the LLM response to the given text by LLM prompting.
Relevance is available for any LLM provider.
Sentiment
dataclass
¶
Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin
This evaluates the positive sentiment of either the prompt or response.
Harmfulness
¶
Bases: Moderation, WithPrompt
Examples of Harmfulness:
Insensitivity
¶
Bases: Semantics, WithPrompt
Examples and categorization of racial insensitivity: https://sph.umn.edu/site/docs/hewg/microaggressions.pdf .
Maliciousness
¶
Bases: Moderation, WithPrompt
Examples of maliciousness:
Hate
¶
Bases: Moderation
Examples of (not) Hate metrics:
openaipackage:openai.moderationcategoryhate.
HateThreatening
¶
Bases: Hate
Examples of (not) Threatening Hate metrics:
openaipackage:openai.moderationcategoryhate/threatening.
SelfHarm
¶
Bases: Moderation
Examples of (not) Self Harm metrics:
openaipackage:openai.moderationcategoryself-harm.
Sexual
¶
Bases: Moderation
Examples of (not) Sexual metrics:
openaipackage:openai.moderationcategorysexual.
SexualMinors
¶
Bases: Sexual
Examples of (not) Sexual Minors metrics:
openaipackage:openai.moderationcategorysexual/minors.
Violence
¶
Bases: Moderation
Examples of (not) Violence metrics:
openaipackage:openai.moderationcategoryviolence.
GraphicViolence
¶
Bases: Violence
Examples of (not) Graphic Violence:
openaipackage:openai.moderationcategoryviolence/graphic.
ClassificationModel
¶
LogicalConsistency
dataclass
¶
Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin
Evaluates the logical consistency of the agentic system's plan and execution.
ExecutionEfficiency
dataclass
¶
Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin
Evaluates the efficiency of the agentic system's execution.
PlanAdherence
dataclass
¶
Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin
Evaluates the adherence of the agentic system's execution to the agentic system's plan.
PlanQuality
dataclass
¶
Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin
Evaluates the quality of the agentic system's plan to address the user's query.
ToolSelection
dataclass
¶
Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin
Evaluates the agent's choice of tools for its tasks/subtasks given tool descriptions. Mapped to PLAN (lower-level complement to Plan Quality). Excludes execution efficiency and adherence; focuses on suitability of selection.
ToolCalling
dataclass
¶
Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin
Evaluates the agent's tool invocation quality that is within the agent's control: argument validity/completeness, semantic appropriateness, preconditions/postconditions, and output interpretation. Mapped to ACT (specialized complement to Plan Adherence). Excludes selection and efficiency.
ToolQuality
dataclass
¶
Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin
Evaluates the tool/system side quality and reliability observed in the trace (external errors, availability, stability, domain-specific output quality like search relevance). Independent of agent behavior; complements GPA by isolating tool-side failures.