Skip to content

trulens.feedback.templates

trulens.feedback.templates

Feedback evaluation templates: prompt/criteria template classes used by feedback providers to generate system/user prompts for LLM evaluation calls.

Domain files

base.py – FeedbackTemplate and shared scaffolding rag.py – RAG evaluation templates (groundedness, relevance, …) safety.py – Moderation / safety templates quality.py – Text quality templates (coherence, sentiment, …) agent.py – Agentic evaluation templates

Only symbols explicitly listed in each domain module's __all__ are re-exported from this package.

Classes

LogicalConsistency dataclass

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the logical consistency of the agentic system's plan and execution.

ExecutionEfficiency dataclass

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the efficiency of the agentic system's execution.

PlanAdherence dataclass

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the adherence of the agentic system's execution to the agentic system's plan.

PlanQuality dataclass

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the quality of the agentic system's plan to address the user's query.

ToolSelection dataclass

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the agent's choice of tools for its tasks/subtasks given tool descriptions. Mapped to PLAN (lower-level complement to Plan Quality). Excludes execution efficiency and adherence; focuses on suitability of selection.

ToolCalling dataclass

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the agent's tool invocation quality that is within the agent's control: argument validity/completeness, semantic appropriateness, preconditions/postconditions, and output interpretation. Mapped to ACT (specialized complement to Plan Adherence). Excludes selection and efficiency.

ToolQuality dataclass

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

Evaluates the tool/system side quality and reliability observed in the trace (external errors, availability, stability, domain-specific output quality like search relevance). Independent of agent behavior; complements GPA by isolating tool-side failures.

FeedbackTemplate

Bases: BaseModel

Base class for feedback template definitions.

Subclasses define system_prompt, user_prompt, criteria, and output_space as ClassVars. Providers use these to build LLM evaluation prompts.

Criteria

Bases: str, Enum

A Criteria to evaluate.

OutputSpace

Bases: Enum

Enum for valid output spaces of scores.

FewShotExamples

Bases: BaseModel

Functions
from_examples_list classmethod
from_examples_list(
    examples_list: List[Tuple[Dict[str, str], int]]
) -> FewShotExamples

Create a FewShotExamples instance from a list of examples.

PARAMETER DESCRIPTION
examples_list

A list of tuples where the first element is the feedback_args and the second element is the score.

TYPE: List[Tuple[Dict[str, str], int]]

RETURNS DESCRIPTION
FewShotExamples

An instance of FewShotExamples with the provided

FewShotExamples

examples.

FeedbackOutput

Bases: BaseModel

Feedback functions produce at least a floating score.

ClassificationModel

Bases: Model

Functions
of_prompt staticmethod
of_prompt(model: CompletionModel, prompt: str) -> None

Define a classification model from a completion model, a prompt, and optional examples.

Sentiment dataclass

Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin

This evaluates the positive sentiment of either the prompt or response.

Harmfulness

Bases: Moderation, WithPrompt

Examples of Harmfulness:

Insensitivity

Bases: Semantics, WithPrompt

Examples and categorization of racial insensitivity: https://sph.umn.edu/site/docs/hewg/microaggressions.pdf .

Maliciousness

Bases: Moderation, WithPrompt

Examples of maliciousness:

HateThreatening

Bases: Hate

Examples of (not) Threatening Hate metrics:

  • openai package: openai.moderation category hate/threatening.

SelfHarm

Bases: Moderation

Examples of (not) Self Harm metrics:

  • openai package: openai.moderation category self-harm.

Sexual

Bases: Moderation

Examples of (not) Sexual metrics:

  • openai package: openai.moderation category sexual.

SexualMinors

Bases: Sexual

Examples of (not) Sexual Minors metrics:

  • openai package: openai.moderation category sexual/minors.

Violence

Bases: Moderation

Examples of (not) Violence metrics:

  • openai package: openai.moderation category violence.

GraphicViolence

Bases: Violence

Examples of (not) Graphic Violence:

  • openai package: openai.moderation category violence/graphic.