trulens.feedback.templates¶
trulens.feedback.templates
¶
Feedback evaluation templates: prompt/criteria template classes used by feedback providers to generate system/user prompts for LLM evaluation calls.
Domain files
base.py β FeedbackTemplate and shared scaffolding rag.py β RAG evaluation templates (groundedness, relevance, β¦) safety.py β Moderation / safety templates quality.py β Text quality templates (coherence, sentiment, β¦) agent.py β Agentic evaluation templates
Only symbols explicitly listed in each domain module's __all__
are re-exported from this package.
Classes¶
LogicalConsistency
dataclass
¶
Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin
Evaluates the logical consistency of the agentic system's plan and execution.
ExecutionEfficiency
dataclass
¶
Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin
Evaluates the efficiency of the agentic system's execution.
PlanAdherence
dataclass
¶
Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin
Evaluates the adherence of the agentic system's execution to the agentic system's plan.
PlanQuality
dataclass
¶
Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin
Evaluates the quality of the agentic system's plan to address the user's query.
ToolSelection
dataclass
¶
Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin
Evaluates the agent's choice of tools for its tasks/subtasks given tool descriptions. Mapped to PLAN (lower-level complement to Plan Quality). Excludes execution efficiency and adherence; focuses on suitability of selection.
ToolCalling
dataclass
¶
Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin
Evaluates the agent's tool invocation quality that is within the agent's control: argument validity/completeness, semantic appropriateness, preconditions/postconditions, and output interpretation. Mapped to ACT (specialized complement to Plan Adherence). Excludes selection and efficiency.
ToolQuality
dataclass
¶
Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin
Evaluates the tool/system side quality and reliability observed in the trace (external errors, availability, stability, domain-specific output quality like search relevance). Independent of agent behavior; complements GPA by isolating tool-side failures.
FeedbackTemplate
¶
Bases: BaseModel
Base class for feedback template definitions.
Subclasses define system_prompt, user_prompt, criteria, and output_space as ClassVars. Providers use these to build LLM evaluation prompts.
FewShotExamples
¶
Bases: BaseModel
Functions¶
from_examples_list
classmethod
¶
Create a FewShotExamples instance from a list of examples.
| PARAMETER | DESCRIPTION |
|---|---|
examples_list
|
A list of tuples where the first element is the feedback_args and the second element is the score. |
| RETURNS | DESCRIPTION |
|---|---|
FewShotExamples
|
An instance of FewShotExamples with the provided |
FewShotExamples
|
examples. |
ClassificationModel
¶
Sentiment
dataclass
¶
Bases: Semantics, WithPrompt, CriteriaOutputSpaceMixin
This evaluates the positive sentiment of either the prompt or response.
Harmfulness
¶
Bases: Moderation, WithPrompt
Examples of Harmfulness:
Insensitivity
¶
Bases: Semantics, WithPrompt
Examples and categorization of racial insensitivity: https://sph.umn.edu/site/docs/hewg/microaggressions.pdf .
Maliciousness
¶
Bases: Moderation, WithPrompt
Examples of maliciousness:
HateThreatening
¶
Bases: Hate
Examples of (not) Threatening Hate metrics:
openaipackage:openai.moderationcategoryhate/threatening.
SelfHarm
¶
Bases: Moderation
Examples of (not) Self Harm metrics:
openaipackage:openai.moderationcategoryself-harm.
Sexual
¶
Bases: Moderation
Examples of (not) Sexual metrics:
openaipackage:openai.moderationcategorysexual.
SexualMinors
¶
Bases: Sexual
Examples of (not) Sexual Minors metrics:
openaipackage:openai.moderationcategorysexual/minors.
Violence
¶
Bases: Moderation
Examples of (not) Violence metrics:
openaipackage:openai.moderationcategoryviolence.