trulens.feedback.v2.feedback¶
trulens.feedback.v2.feedback
¶
PROVIDER IMPLEMENTATION TEMPLATES: Class-based feedback definitions with prompts and criteria. Used by feedback providers to generate system/user prompts for LLM evaluation calls.
Classes¶
FewShotExamples
¶
Bases: BaseModel
Functions¶
from_examples_list
classmethod
¶
Create a FewShotExamples instance from a list of examples.
PARAMETER | DESCRIPTION |
---|---|
examples_list
|
A list of tuples where the first element is the feedback_args, and the second element is the score. |
RETURNS | DESCRIPTION |
---|---|
FewShotExamples
|
An instance of FewShotExamples with the provided examples.
TYPE:
|
Relevance
¶
Bases: Semantics
This evaluates the relevance of the LLM response to the given text by LLM prompting.
Relevance is available for any LLM provider.
Sentiment
dataclass
¶
Bases: Semantics
, WithPrompt
, CriteriaOutputSpaceMixin
This evaluates the positive sentiment of either the prompt or response.
Harmfulness
¶
Bases: Moderation
, WithPrompt
Examples of Harmfulness:
Insensitivity
¶
Bases: Semantics
, WithPrompt
Examples and categorization of racial insensitivity: https://sph.umn.edu/site/docs/hewg/microaggressions.pdf .
Maliciousness
¶
Bases: Moderation
, WithPrompt
Examples of maliciousness:
Hate
¶
Bases: Moderation
Examples of (not) Hate metrics:
openai
package:openai.moderation
categoryhate
.
HateThreatening
¶
Bases: Hate
Examples of (not) Threatening Hate metrics:
openai
package:openai.moderation
categoryhate/threatening
.
SelfHarm
¶
Bases: Moderation
Examples of (not) Self Harm metrics:
openai
package:openai.moderation
categoryself-harm
.
Sexual
¶
Bases: Moderation
Examples of (not) Sexual metrics:
openai
package:openai.moderation
categorysexual
.
SexualMinors
¶
Bases: Sexual
Examples of (not) Sexual Minors metrics:
openai
package:openai.moderation
categorysexual/minors
.
Violence
¶
Bases: Moderation
Examples of (not) Violence metrics:
openai
package:openai.moderation
categoryviolence
.
GraphicViolence
¶
Bases: Violence
Examples of (not) Graphic Violence:
openai
package:openai.moderation
categoryviolence/graphic
.
ClassificationModel
¶
LogicalConsistency
dataclass
¶
Bases: Semantics
, WithPrompt
, CriteriaOutputSpaceMixin
Evaluates the logical consistency of the agentic system's plan and execution.
ExecutionEfficiency
dataclass
¶
Bases: Semantics
, WithPrompt
, CriteriaOutputSpaceMixin
Evaluates the efficiency of the agentic system's execution.
PlanAdherence
dataclass
¶
Bases: Semantics
, WithPrompt
, CriteriaOutputSpaceMixin
Evaluates the adherence of the agentic system's execution to the agentic system's plan.
PlanQuality
dataclass
¶
Bases: Semantics
, WithPrompt
, CriteriaOutputSpaceMixin
Evaluates the quality of the agentic system's plan to address the user's query.
ToolSelection
dataclass
¶
Bases: Semantics
, WithPrompt
, CriteriaOutputSpaceMixin
Evaluates the agent's choice of tools for its tasks/subtasks given tool descriptions. Mapped to PLAN (lower-level complement to Plan Quality). Excludes execution efficiency and adherence; focuses on suitability of selection.
ToolCalling
dataclass
¶
Bases: Semantics
, WithPrompt
, CriteriaOutputSpaceMixin
Evaluates the agent's tool invocation quality that is within the agent's control: argument validity/completeness, semantic appropriateness, preconditions/postconditions, and output interpretation. Mapped to ACT (specialized complement to Plan Adherence). Excludes selection and efficiency.
ToolQuality
dataclass
¶
Bases: Semantics
, WithPrompt
, CriteriaOutputSpaceMixin
Evaluates the tool/system side quality and reliability observed in the trace (external errors, availability, stability, domain-specific output quality like search relevance). Independent of agent behavior; complements GPA by isolating tool-side failures.