trulens.feedback.v2.feedback¶

trulens.feedback.v2.feedback ¶

PROVIDER IMPLEMENTATION TEMPLATES: Class-based feedback definitions with prompts and criteria. Used by feedback providers to generate system/user prompts for LLM evaluation calls.

Classes¶

Feedback ¶

Bases: BaseModel

Base class for feedback functions.

Criteria ¶

Bases: str, Enum

A Criteria to evaluate.

OutputSpace ¶

Bases: Enum

Enum for valid output spaces of scores.

FewShotExamples ¶

Bases: BaseModel

Functions¶

from_examples_list `classmethod` ¶

from_examples_list(
    examples_list: List[Tuple[Dict[str, str], int]]
) -> FewShotExamples

Create a FewShotExamples instance from a list of examples.

PARAMETER	DESCRIPTION
`examples_list`	A list of tuples where the first element is the feedback_args, and the second element is the score. TYPE: `List[Tuple[Dict[str, str], int]]`

RETURNS	DESCRIPTION
`FewShotExamples`	An instance of FewShotExamples with the provided examples. TYPE: `FewShotExamples`

Relevance ¶

Bases: Semantics

This evaluates the relevance of the LLM response to the given text by LLM prompting.

Relevance is available for any LLM provider.

Sentiment `dataclass` ¶

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

This evaluates the positive sentiment of either the prompt or response.

Harmfulness ¶

Bases: Moderation, WithPrompt

Examples of Harmfulness:

Insensitivity ¶

Bases: Semantics, WithPrompt

Examples and categorization of racial insensitivity: https://sph.umn.edu/site/docs/hewg/microaggressions.pdf .

Maliciousness ¶

Bases: Moderation, WithPrompt

Examples of maliciousness:

Hate ¶

Bases: Moderation

Examples of (not) Hate metrics:

openai package: openai.moderation category hate.

HateThreatening ¶

Bases: Hate

Examples of (not) Threatening Hate metrics:

openai package: openai.moderation category hate/threatening.

SelfHarm ¶

Bases: Moderation

Examples of (not) Self Harm metrics:

openai package: openai.moderation category self-harm.

Sexual ¶

Bases: Moderation

Examples of (not) Sexual metrics:

openai package: openai.moderation category sexual.

SexualMinors ¶

Bases: Sexual

Examples of (not) Sexual Minors metrics:

openai package: openai.moderation category sexual/minors.

Violence ¶

Bases: Moderation

Examples of (not) Violence metrics:

openai package: openai.moderation category violence.

GraphicViolence ¶

Bases: Violence

Examples of (not) Graphic Violence:

openai package: openai.moderation category violence/graphic.

FeedbackOutput ¶

Bases: BaseModel

Feedback functions produce at least a floating score.

ClassificationModel ¶

Bases: Model

Functions¶

of_prompt `staticmethod` ¶

of_prompt(model: CompletionModel, prompt: str) -> None

Define a classification model from a completion model, a prompt, and optional examples.

LogicalConsistency `dataclass` ¶

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the logical consistency of the agentic system's plan and execution.

ExecutionEfficiency `dataclass` ¶

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the efficiency of the agentic system's execution.

PlanAdherence `dataclass` ¶

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the adherence of the agentic system's execution to the agentic system's plan.

PlanQuality `dataclass` ¶

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the quality of the agentic system's plan to address the user's query.

ToolSelection `dataclass` ¶

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the agent's choice of tools for its tasks/subtasks given tool descriptions. Mapped to PLAN (lower-level complement to Plan Quality). Excludes execution efficiency and adherence; focuses on suitability of selection.

ToolCalling `dataclass` ¶

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the agent's tool invocation quality that is within the agent's control: argument validity/completeness, semantic appropriateness, preconditions/postconditions, and output interpretation. Mapped to ACT (specialized complement to Plan Adherence). Excludes selection and efficiency.

ToolQuality `dataclass` ¶

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the tool/system side quality and reliability observed in the trace (external errors, availability, stability, domain-specific output quality like search relevance). Independent of agent behavior; complements GPA by isolating tool-side failures.

trulens.feedback.v2.feedback¶