trulens.feedback.templates¶

trulens.feedback.templates ¶

Feedback evaluation templates: prompt/criteria template classes used by feedback providers to generate system/user prompts for LLM evaluation calls.

Domain files

base.py – FeedbackTemplate and shared scaffolding rag.py – RAG evaluation templates (groundedness, relevance, …) safety.py – Moderation / safety templates quality.py – Text quality templates (coherence, sentiment, …) agent.py – Agentic evaluation templates

Only symbols explicitly listed in each domain module's __all__ are re-exported from this package.

Classes¶

LogicalConsistency `dataclass` ¶

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the logical consistency of the agentic system's plan and execution.

ExecutionEfficiency `dataclass` ¶

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the efficiency of the agentic system's execution.

PlanAdherence `dataclass` ¶

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the adherence of the agentic system's execution to the agentic system's plan.

PlanQuality `dataclass` ¶

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the quality of the agentic system's plan to address the user's query.

ToolSelection `dataclass` ¶

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the agent's choice of tools for its tasks/subtasks given tool descriptions. Mapped to PLAN (lower-level complement to Plan Quality). Excludes execution efficiency and adherence; focuses on suitability of selection.

ToolCalling `dataclass` ¶

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the agent's tool invocation quality that is within the agent's control: argument validity/completeness, semantic appropriateness, preconditions/postconditions, and output interpretation. Mapped to ACT (specialized complement to Plan Adherence). Excludes selection and efficiency.

ToolQuality `dataclass` ¶

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the tool/system side quality and reliability observed in the trace (external errors, availability, stability, domain-specific output quality like search relevance). Independent of agent behavior; complements GPA by isolating tool-side failures.

FeedbackTemplate ¶

Bases: BaseModel

Base class for feedback template definitions.

Subclasses define system_prompt, user_prompt, criteria, and output_space as ClassVars. Providers use these to build LLM evaluation prompts.

Criteria ¶

Bases: str, Enum

A Criteria to evaluate.

OutputSpace ¶

Bases: Enum

Enum for valid output spaces of scores.

FewShotExamples ¶

Bases: BaseModel

Functions¶

from_examples_list `classmethod` ¶

from_examples_list(
    examples_list: List[Tuple[Dict[str, str], int]]
) -> FewShotExamples

Create a FewShotExamples instance from a list of examples.

PARAMETER	DESCRIPTION
`examples_list`	A list of tuples where the first element is the feedback_args and the second element is the score. TYPE: `List[Tuple[Dict[str, str], int]]`

RETURNS	DESCRIPTION
`FewShotExamples`	An instance of FewShotExamples with the provided
`FewShotExamples`	examples.

FeedbackOutput ¶

Bases: BaseModel

Feedback functions produce at least a floating score.

ClassificationModel ¶

Bases: Model

Functions¶

of_prompt `staticmethod` ¶

of_prompt(model: CompletionModel, prompt: str) -> None

Define a classification model from a completion model, a prompt, and optional examples.

Sentiment `dataclass` ¶

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

This evaluates the positive sentiment of either the prompt or response.

Harmfulness ¶

Bases: Moderation, WithPrompt

Examples of Harmfulness:

Insensitivity ¶

Bases: Semantics, WithPrompt

Examples and categorization of racial insensitivity: https://sph.umn.edu/site/docs/hewg/microaggressions.pdf .

Maliciousness ¶

Bases: Moderation, WithPrompt

Examples of maliciousness:

HateThreatening ¶

Bases: Hate

Examples of (not) Threatening Hate metrics:

openai package: openai.moderation category hate/threatening.

SelfHarm ¶

Bases: Moderation

Examples of (not) Self Harm metrics:

openai package: openai.moderation category self-harm.

Sexual ¶

Bases: Moderation

Examples of (not) Sexual metrics:

openai package: openai.moderation category sexual.

SexualMinors ¶

Bases: Sexual

Examples of (not) Sexual Minors metrics:

openai package: openai.moderation category sexual/minors.

Violence ¶

Bases: Moderation

Examples of (not) Violence metrics:

openai package: openai.moderation category violence.

GraphicViolence ¶

Bases: Violence

Examples of (not) Graphic Violence:

openai package: openai.moderation category violence/graphic.

trulens.feedback.templates¶