Skip to content

πŸ“– Stock Feedback Functions

trulens_eval.feedback.provider.hugs.Huggingface

Bases: Provider

Out of the box feedback functions calling Huggingface APIs.

Functions

language_match

language_match(text1: str, text2: str) -> Tuple[float, Dict]

Uses Huggingface's papluca/xlm-roberta-base-language-detection model. A function that uses language detection on text1 and text2 and calculates the probit difference on the language detected on text1. The function is: 1.0 - (|probit_language_text1(text1) - probit_language_text1(text2))

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.hugs import Huggingface
huggingface_provider = Huggingface()

feedback = Feedback(huggingface_provider.language_match).on_input_output() 

The on_input_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text1

Text to evaluate.

TYPE: str

text2

Comparative text to evaluate.

TYPE: str

Returns:

float: A value between 0 and 1. 0 being "different languages" and 1
being "same languages".

context_relevance

context_relevance(prompt: str, context: str) -> float

Uses Huggingface's truera/context_relevance model, a model that uses computes the relevance of a given context to the prompt. The model can be found at https://huggingface.co/truera/context_relevance. Usage:

from trulens_eval import Feedback
from trulens_eval.feedback.provider.hugs import Huggingface
huggingface_provider = Huggingface()

feedback = Feedback(huggingface_provider.context_relevance).on_input_output() 
The on_input_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
prompt

The given prompt.

TYPE: str

context

Comparative contextual information.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being irrelevant and 1

TYPE: float

float

being a relevant context for addressing the prompt.

positive_sentiment

positive_sentiment(text: str) -> float

Uses Huggingface's cardiffnlp/twitter-roberta-base-sentiment model. A function that uses a sentiment classifier on text.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.hugs import Huggingface
huggingface_provider = Huggingface()

feedback = Feedback(huggingface_provider.positive_sentiment).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "negative sentiment" and 1

TYPE: float

float

being "positive sentiment".

toxic

toxic(text: str) -> float

Uses Huggingface's martin-ha/toxic-comment-model model. A function that uses a toxic comment classifier on text.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.hugs import Huggingface
huggingface_provider = Huggingface()

feedback = Feedback(huggingface_provider.not_toxic).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 1 being "toxic" and 0 being "not

TYPE: float

float

toxic".

pii_detection

pii_detection(text: str) -> float

NER model to detect PII.

Example

hugs = Huggingface()

# Define a pii_detection feedback function using HuggingFace.
f_pii_detection = Feedback(hugs.pii_detection).on_input()

The on(...) selector can be changed. See Feedback Function Guide: Selectors

PARAMETER DESCRIPTION
text

A text prompt that may contain a name.

TYPE: str

RETURNS DESCRIPTION
float

The likelihood that a name is contained in the input text.

pii_detection_with_cot_reasons

pii_detection_with_cot_reasons(text: str)

NER model to detect PII, with reasons.

Example

hugs = Huggingface()

# Define a pii_detection feedback function using HuggingFace.
f_pii_detection = Feedback(hugs.pii_detection).on_input()

The on(...) selector can be changed. See Feedback Function Guide : Selectors

hallucination_evaluator

hallucination_evaluator(model_output: str, retrieved_text_chunks: str) -> float
Evaluates the hallucination score for a combined input of two statements as a float 0<x<1 representing a 
true/false boolean. if the return is greater than 0.5 the statement is evaluated as true. if the return is
less than 0.5 the statement is evaluated as a hallucination.

**!!! example

** python from trulens_eval.feedback.provider.hugs import Huggingface huggingface_provider = Huggingface() score = huggingface_provider.hallucination_evaluator("The sky is blue. [SEP] Apples are red , the grass is green.")

Args:
    model_output (str): This is what an LLM returns based on the text chunks retrieved during RAG
    retrieved_text_chunk (str): These are the text chunks you have retrieved during RAG

Returns:
    float: Hallucination score

trulens_eval.feedback.provider.openai.OpenAI

Bases: LLMProvider

Out of the box feedback functions calling OpenAI APIs.

Create an OpenAI Provider with out of the box feedback functions.

Example

from trulens_eval.feedback.provider.openai import OpenAI 
openai_provider = OpenAI()
PARAMETER DESCRIPTION
model_engine

The OpenAI completion model. Defaults to gpt-3.5-turbo

TYPE: Optional[str] DEFAULT: None

**kwargs

Additional arguments to pass to the OpenAIEndpoint which are then passed to OpenAIClient and finally to the OpenAI client.

TYPE: dict DEFAULT: {}

Functions

moderation_hate

moderation_hate(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is hate speech.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_hate, higher_is_better=False
).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not hate) and 1.0 (hate).

TYPE: float

moderation_hatethreatening

moderation_hatethreatening(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is threatening speech.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_hatethreatening, higher_is_better=False
).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not threatening) and 1.0 (threatening).

TYPE: float

moderation_selfharm

moderation_selfharm(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about self harm.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_selfharm, higher_is_better=False
).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not self harm) and 1.0 (self harm).

TYPE: float

moderation_sexual

moderation_sexual(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is sexual speech.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_sexual, higher_is_better=False
).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not sexual) and 1.0 (sexual).

TYPE: float

moderation_sexualminors

moderation_sexualminors(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about sexual minors.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_sexualminors, higher_is_better=False
).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not sexual minors) and 1.0 (sexual

TYPE: float

float

minors).

moderation_violence

moderation_violence(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about violence.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_violence, higher_is_better=False
).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not violence) and 1.0 (violence).

TYPE: float

moderation_violencegraphic

moderation_violencegraphic(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about graphic violence.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_violencegraphic, higher_is_better=False
).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not graphic violence) and 1.0 (graphic

TYPE: float

float

violence).

moderation_harassment

moderation_harassment(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about graphic violence.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_harassment, higher_is_better=False
).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not harrassment) and 1.0 (harrassment).

TYPE: float

moderation_harassment_threatening

moderation_harassment_threatening(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about graphic violence.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_harassment_threatening, higher_is_better=False
).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not harrassment/threatening) and 1.0 (harrassment/threatening).

TYPE: float

trulens_eval.feedback.provider.base.LLMProvider

Bases: Provider

An LLM-based provider.

This is an abstract class and needs to be initialized as one of these:

Functions

generate_score

generate_score(system_prompt: str, user_prompt: Optional[str] = None, normalize: float = 10.0, temperature: float = 0.0) -> float

Base method to generate a score only, used for evaluation.

PARAMETER DESCRIPTION
system_prompt

A pre-formatted system prompt.

TYPE: str

user_prompt

An optional user prompt.

TYPE: Optional[str] DEFAULT: None

normalize

The normalization factor for the score.

TYPE: float DEFAULT: 10.0

temperature

The temperature for the LLM response.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
float

The score on a 0-1 scale.

generate_score_and_reasons

generate_score_and_reasons(system_prompt: str, user_prompt: Optional[str] = None, normalize: float = 10.0, temperature: float = 0.0) -> Tuple[float, Dict]

Base method to generate a score and reason, used for evaluation.

PARAMETER DESCRIPTION
system_prompt

A pre-formatted system prompt.

TYPE: str

user_prompt

An optional user prompt. Defaults to None.

TYPE: Optional[str] DEFAULT: None

normalize

The normalization factor for the score.

TYPE: float DEFAULT: 10.0

temperature

The temperature for the LLM response.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
float

The score on a 0-1 scale.

Dict

Reason metadata if returned by the LLM.

context_relevance

context_relevance(question: str, context: str, temperature: float = 0.0) -> float

Uses chat completion model. A function that completes a template to check the relevance of the context to the question.

Example

from trulens_eval.app import App
context = App.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER DESCRIPTION
question

A question being asked.

TYPE: str

context

Context related to the question.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not relevant) and 1.0 (relevant).

TYPE: float

qs_relevance

qs_relevance(question: str, context: str) -> float

Question statement relevance is deprecated and will be removed in future versions. Please use context relevance in its place.

context_relevance_with_cot_reasons

context_relevance_with_cot_reasons(question: str, context: str, temperature: float = 0.0) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.

Example

from trulens_eval.app import App
context = App.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER DESCRIPTION
question

A question being asked.

TYPE: str

context

Context related to the question.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "not relevant" and 1 being "relevant".

TYPE: Tuple[float, Dict]

qs_relevance_with_cot_reasons

qs_relevance_with_cot_reasons(question: str, context: str) -> Tuple[float, Dict]

Question statement relevance is deprecated and will be removed in future versions. Please use context relevance in its place.

relevance

relevance(prompt: str, response: str) -> float

Uses chat completion model. A function that completes a template to check the relevance of the response to a prompt.

Example

feedback = Feedback(provider.relevance).on_input_output()

The on_input_output() selector can be changed. See Feedback Function Guide

Usage on RAG Contexts
feedback = Feedback(provider.relevance).on_input().on(
    TruLlama.select_source_nodes().node.text # See note below
).aggregate(np.mean) 

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "not relevant" and 1 being "relevant".

TYPE: float

relevance_with_cot_reasons

relevance_with_cot_reasons(prompt: str, response: str) -> Tuple[float, Dict]

Uses chat completion Model. A function that completes a template to check the relevance of the response to a prompt. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.relevance_with_cot_reasons).on_input_output()

The on_input_output() selector can be changed. See Feedback Function Guide

Usage on RAG Contexts
feedback = Feedback(provider.relevance_with_cot_reasons).on_input().on(
    TruLlama.select_source_nodes().node.text # See note below
).aggregate(np.mean) 

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "not relevant" and 1 being "relevant".

TYPE: Tuple[float, Dict]

sentiment

sentiment(text: str) -> float

Uses chat completion model. A function that completes a template to check the sentiment of some text.

Example

feedback = Feedback(provider.sentiment).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate sentiment of.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "negative sentiment" and 1 being "positive sentiment".

sentiment_with_cot_reasons

sentiment_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the sentiment of some text. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.sentiment_with_cot_reasons).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (negative sentiment) and 1.0 (positive sentiment).

TYPE: Tuple[float, Dict]

model_agreement

model_agreement(prompt: str, response: str) -> float

Uses chat completion model. A function that gives a chat completion model the same prompt and gets a response, encouraging truthfulness. A second template is given to the model with a prompt that the original response is correct, and measures whether previous chat completion response is similar.

Example

feedback = Feedback(provider.model_agreement).on_input_output() 

The on_input_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not in agreement) and 1.0 (in agreement).

TYPE: float

conciseness

conciseness(text: str) -> float

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.conciseness).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate the conciseness of.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not concise) and 1.0 (concise).

conciseness_with_cot_reasons

conciseness_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.conciseness).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate the conciseness of.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not concise) and 1.0 (concise)

Dict

A dictionary containing the reasons for the evaluation.

correctness

correctness(text: str) -> float

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.correctness).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

A prompt to an agent.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not correct) and 1.0 (correct).

correctness_with_cot_reasons

correctness_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.correctness_with_cot_reasons).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not correct) and 1.0 (correct).

TYPE: Tuple[float, Dict]

coherence

coherence(text: str) -> float

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.coherence).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not coherent) and 1.0 (coherent).

TYPE: float

coherence_with_cot_reasons

coherence_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.coherence_with_cot_reasons).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not coherent) and 1.0 (coherent).

TYPE: Tuple[float, Dict]

harmfulness

harmfulness(text: str) -> float

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.harmfulness).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not harmful) and 1.0 (harmful)".

TYPE: float

harmfulness_with_cot_reasons

harmfulness_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.harmfulness_with_cot_reasons).on_output()
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not harmful) and 1.0 (harmful).

TYPE: Tuple[float, Dict]

maliciousness

maliciousness(text: str) -> float

Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.maliciousness).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not malicious) and 1.0 (malicious).

TYPE: float

maliciousness_with_cot_reasons

maliciousness_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat compoletion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.maliciousness_with_cot_reasons).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not malicious) and 1.0 (malicious).

TYPE: Tuple[float, Dict]

helpfulness

helpfulness(text: str) -> float

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.helpfulness).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not helpful) and 1.0 (helpful).

TYPE: float

helpfulness_with_cot_reasons

helpfulness_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.helpfulness_with_cot_reasons).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not helpful) and 1.0 (helpful).

TYPE: Tuple[float, Dict]

controversiality

controversiality(text: str) -> float

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval.

Example

feedback = Feedback(provider.controversiality).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not controversial) and 1.0 (controversial).

TYPE: float

controversiality_with_cot_reasons

controversiality_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.controversiality_with_cot_reasons).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not controversial) and 1.0 (controversial).

TYPE: Tuple[float, Dict]

misogyny

misogyny(text: str) -> float

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.misogyny).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not misogynistic) and 1.0 (misogynistic).

TYPE: float

misogyny_with_cot_reasons

misogyny_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.misogyny_with_cot_reasons).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not misogynistic) and 1.0 (misogynistic).

TYPE: Tuple[float, Dict]

criminality

criminality(text: str) -> float

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.criminality).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not criminal) and 1.0 (criminal).

TYPE: float

criminality_with_cot_reasons

criminality_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.criminality_with_cot_reasons).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not criminal) and 1.0 (criminal).

TYPE: Tuple[float, Dict]

insensitivity

insensitivity(text: str) -> float

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.insensitivity).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not insensitive) and 1.0 (insensitive).

TYPE: float

insensitivity_with_cot_reasons

insensitivity_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.insensitivity_with_cot_reasons).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not insensitive) and 1.0 (insensitive).

TYPE: Tuple[float, Dict]

comprehensiveness_with_cot_reasons

comprehensiveness_with_cot_reasons(source: str, summary: str) -> Tuple[float, Dict]

Uses chat completion model. A function that tries to distill main points and compares a summary against those main points. This feedback function only has a chain of thought implementation as it is extremely important in function assessment.

Example

feedback = Feedback(provider.comprehensiveness_with_cot_reasons).on_input_output()
PARAMETER DESCRIPTION
source

Text corresponding to source material.

TYPE: str

summary

Text corresponding to a summary.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

A value between 0.0 (main points missed) and 1.0 (no main points missed).

summarization_with_cot_reasons

summarization_with_cot_reasons(source: str, summary: str) -> Tuple[float, Dict]

Summarization is deprecated in place of comprehensiveness. Defaulting to comprehensiveness_with_cot_reasons.

stereotypes

stereotypes(prompt: str, response: str) -> float

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example

feedback = Feedback(provider.stereotypes).on_input_output()
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (no stereotypes assumed) and 1.0 (stereotypes assumed).

stereotypes_with_cot_reasons

stereotypes_with_cot_reasons(prompt: str, response: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example

feedback = Feedback(provider.stereotypes).on_input_output()
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

A value between 0.0 (no stereotypes assumed) and 1.0 (stereotypes assumed).

trulens_eval.feedback.groundedness

Classes

Groundedness

Bases: WithClassInfo, SerialModel

Measures Groundedness.

Currently the groundedness functions work well with a summarizer. This class will use an LLM to find the relevant strings in a text. The groundedness_provider can either be an LLM provider (such as OpenAI) or NLI with huggingface.

Example

from trulens_eval.feedback import Groundedness
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()
groundedness_imp = Groundedness(groundedness_provider=openai_provider)

Example

from trulens_eval.feedback import Groundedness
from trulens_eval.feedback.provider.hugs import Huggingface
huggingface_provider = Huggingface()
groundedness_imp = Groundedness(groundedness_provider=huggingface_provider)
PARAMETER DESCRIPTION
groundedness_provider

Provider to use for evaluating groundedness. This should be OpenAI LLM or HuggingFace NLI. Defaults to OpenAI.

TYPE: Optional[Provider] DEFAULT: None

Functions
groundedness_measure_with_cot_reasons
groundedness_measure_with_cot_reasons(source: str, statement: str) -> Tuple[float, dict]

A measure to track if the source material supports each sentence in the statement using an LLM provider.

The LLM will process the entire statement at once, using chain of thought methodology to emit the reasons.

Usage on RAG Contexts
from trulens_eval import Feedback
from trulens_eval.feedback import Groundedness
from trulens_eval.feedback.provider.openai import OpenAI
grounded = feedback.Groundedness(groundedness_provider=OpenAI())

f_groundedness = feedback.Feedback(grounded.groundedness_measure_with_cot_reasons).on(
    Select.Record.app.combine_documents_chain._call.args.inputs.input_documents[:].page_content # See note below
).on_output().aggregate(grounded.grounded_statements_aggregator)

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER DESCRIPTION
source

The source that should support the statement.

TYPE: str

statement

The statement to check groundedness.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, dict]

A measure between 0 and 1, where 1 means each sentence is grounded in the source.

groundedness_measure_with_nli
groundedness_measure_with_nli(source: str, statement: str) -> Tuple[float, dict]

A measure to track if the source material supports each sentence in the statement using an NLI model.

First the response will be split into statements using a sentence tokenizer.The NLI model will process each statement using a natural language inference model, and will use the entire source.

Usage on RAG Contexts:

from trulens_eval import Feedback
from trulens_eval.feedback import Groundedness
from trulens_eval.feedback.provider.hugs = Huggingface
grounded = feedback.Groundedness(groundedness_provider=Huggingface())


f_groundedness = feedback.Feedback(grounded.groundedness_measure_with_nli).on(
    Select.Record.app.combine_documents_chain._call.args.inputs.input_documents[:].page_content # See note below
).on_output().aggregate(grounded.grounded_statements_aggregator)
The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER DESCRIPTION
source

The source that should support the statement

TYPE: str

statement

The statement to check groundedness

TYPE: str

RETURNS DESCRIPTION
float

A measure between 0 and 1, where 1 means each sentence is grounded in the source.

TYPE: float

str

TYPE: dict

groundedness_measure
groundedness_measure(source: str, statement: str) -> Tuple[float, dict]

Groundedness measure is deprecated in place of the chain-of-thought version. This function will raise a NotImplementedError.

groundedness_measure_with_summarize_step
groundedness_measure_with_summarize_step(source: str, statement: str) -> float

DEPRECATED: This method is deprecated and will be removed in a future release. Please use alternative groundedness measure methods.

A measure to track if the source material supports each sentence in the statement. This groundedness measure is more accurate; but slower using a two step process. - First find supporting evidence with an LLM - Then for each statement sentence, check groundedness

Usage on RAG Contexts:

from trulens_eval import Feedback
from trulens_eval.feedback import Groundedness
from trulens_eval.feedback.provider.openai import OpenAI
grounded = feedback.Groundedness(groundedness_provider=OpenAI())


f_groundedness = feedback.Feedback(grounded.groundedness_measure_with_summarize_step).on(
    Select.Record.app.combine_documents_chain._call.args.inputs.input_documents[:].page_content # See note below
).on_output().aggregate(grounded.grounded_statements_aggregator)
The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER DESCRIPTION
source

The source that should support the statement

TYPE: str

statement

The statement to check groundedness

TYPE: str

RETURNS DESCRIPTION
float

A measure between 0 and 1, where 1 means each sentence is grounded in the source.

TYPE: float

grounded_statements_aggregator
grounded_statements_aggregator(source_statements_multi_output: List[Dict]) -> float

Compute the mean groundedness based on the best evidence available for each statement.

PARAMETER DESCRIPTION
source_statements_multi_output

A list of scores. Each list index is a context. The Dict is a per statement score.

TYPE: List[Dict]

RETURNS DESCRIPTION
float

for each statement, gets the max score, then averages over that.

TYPE: float

Functions

trulens_eval.feedback.groundtruth

Classes

GroundTruthAgreement

Bases: WithClassInfo, SerialModel

Measures Agreement against a Ground Truth.

Functions
__init__
__init__(ground_truth: Union[List, Callable, FunctionOrMethod], provider: Optional[Provider] = None, bert_scorer: Optional[BERTScorer] = None, **kwargs)

Measures Agreement against a Ground Truth.

Usage 1:

from trulens_eval.feedback import GroundTruthAgreement
golden_set = [
    {"query": "who invented the lightbulb?", "response": "Thomas Edison"},
    {"query": "ΒΏquien invento la bombilla?", "response": "Thomas Edison"}
]
ground_truth_collection = GroundTruthAgreement(golden_set)

Usage 2:

from trulens_eval.feedback import GroundTruthAgreement
ground_truth_imp = llm_app
response = llm_app(prompt)
ground_truth_collection = GroundTruthAgreement(ground_truth_imp)

PARAMETER DESCRIPTION
ground_truth

A list of query/response pairs or a function or callable that returns a ground truth string given a prompt string.

TYPE: Union[Callable, FunctionOrMethod]

bert_scorer

Internal Usage for DB serialization.

TYPE: Optional[&quot;BERTScorer&quot;] DEFAULT: None

provider

Internal Usage for DB serialization.

TYPE: Provider DEFAULT: None

agreement_measure
agreement_measure(prompt: str, response: str) -> Union[float, Tuple[float, Dict[str, str]]]

Uses OpenAI's Chat GPT Model. A function that that measures similarity to ground truth. A second template is given to Chat GPT with a prompt that the original response is correct, and measures whether previous Chat GPT's response is similar.

Example

from trulens_eval import Feedback
from trulens_eval.feedback import GroundTruthAgreement
golden_set = [
    {"query": "who invented the lightbulb?", "response": "Thomas Edison"},
    {"query": "ΒΏquien invento la bombilla?", "response": "Thomas Edison"}
]
ground_truth_collection = GroundTruthAgreement(golden_set)

feedback = Feedback(ground_truth_collection.agreement_measure).on_input_output() 
The on_input_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
Union[float, Tuple[float, Dict[str, str]]]
  • float: A value between 0 and 1. 0 being "not in agreement" and 1 being "in agreement".
Union[float, Tuple[float, Dict[str, str]]]
  • dict: with key 'ground_truth_response'
mae
mae(prompt: str, response: str, score: float) -> float

Method to look up the numeric expected score from a golden set and take the differnce.

Primarily used for evaluation of model generated feedback against human feedback

Example

from trulens_eval import Feedback
from trulens_eval.feedback import GroundTruthAgreement

golden_set =
{"query": "How many stomachs does a cow have?", "response": "Cows' diet relies primarily on grazing.", "expected_score": 0.4},
{"query": "Name some top dental floss brands", "response": "I don't know", "expected_score": 0.8}
]
ground_truth_collection = GroundTruthAgreement(golden_set)

f_groundtruth = Feedback(ground_truth.mae).on(Select.Record.calls[0].args.args[0]).on(Select.Record.calls[0].args.args[1]).on_output()
bert_score
bert_score(prompt: str, response: str) -> Union[float, Tuple[float, Dict[str, str]]]

Uses BERT Score. A function that that measures similarity to ground truth using bert embeddings.

Example

from trulens_eval import Feedback
from trulens_eval.feedback import GroundTruthAgreement
golden_set = [
    {"query": "who invented the lightbulb?", "response": "Thomas Edison"},
    {"query": "ΒΏquien invento la bombilla?", "response": "Thomas Edison"}
]
ground_truth_collection = GroundTruthAgreement(golden_set)

feedback = Feedback(ground_truth_collection.bert_score).on_input_output() 
The on_input_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
Union[float, Tuple[float, Dict[str, str]]]
  • float: A value between 0 and 1. 0 being "not in agreement" and 1 being "in agreement".
Union[float, Tuple[float, Dict[str, str]]]
  • dict: with key 'ground_truth_response'
bleu
bleu(prompt: str, response: str) -> Union[float, Tuple[float, Dict[str, str]]]

Uses BLEU Score. A function that that measures similarity to ground truth using token overlap.

Example

from trulens_eval import Feedback
from trulens_eval.feedback import GroundTruthAgreement
golden_set = [
    {"query": "who invented the lightbulb?", "response": "Thomas Edison"},
    {"query": "ΒΏquien invento la bombilla?", "response": "Thomas Edison"}
]
ground_truth_collection = GroundTruthAgreement(golden_set)

feedback = Feedback(ground_truth_collection.bleu).on_input_output() 
The on_input_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
Union[float, Tuple[float, Dict[str, str]]]
  • float: A value between 0 and 1. 0 being "not in agreement" and 1 being "in agreement".
Union[float, Tuple[float, Dict[str, str]]]
  • dict: with key 'ground_truth_response'
rouge
rouge(prompt: str, response: str) -> Union[float, Tuple[float, Dict[str, str]]]

Uses BLEU Score. A function that that measures similarity to ground truth using token overlap.

PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
Union[float, Tuple[float, Dict[str, str]]]
  • float: A value between 0 and 1. 0 being "not in agreement" and 1 being "in agreement".
Union[float, Tuple[float, Dict[str, str]]]
  • dict: with key 'ground_truth_response'

Functions

trulens_eval.feedback.embeddings

Classes

Embeddings

Bases: WithClassInfo, SerialModel

Embedding related feedback function implementations.

Functions
__init__
__init__(embed_model: Embedder = None)

Instantiates embeddings for feedback functions.

f_embed = feedback.Embeddings(embed_model=embed_model)

PARAMETER DESCRIPTION
embed_model

TYPE: Embedder DEFAULT: None

cosine_distance
cosine_distance(query: str, document: str) -> Union[float, Tuple[float, Dict[str, str]]]

Runs cosine distance on the query and document embeddings

Example

Below is just one example. See supported embedders: https://gpt-index.readthedocs.io/en/latest/core_modules/model_modules/embeddings/root.html from langchain.embeddings.openai import OpenAIEmbeddings

model_name = 'text-embedding-ada-002'

embed_model = OpenAIEmbeddings(
    model=model_name,
    openai_api_key=OPENAI_API_KEY
)

# Create the feedback function
f_embed = feedback.Embeddings(embed_model=embed_model)
f_embed_dist = feedback.Feedback(f_embed.cosine_distance)                .on_input()                .on(Select.Record.app.combine_documents_chain._call.args.inputs.input_documents[:].page_content)

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER DESCRIPTION
query

A text prompt to a vector DB.

TYPE: str

document

The document returned from the vector DB.

TYPE: str

RETURNS DESCRIPTION
Union[float, Tuple[float, Dict[str, str]]]
  • float: the embedding vector distance
manhattan_distance
manhattan_distance(query: str, document: str) -> Union[float, Tuple[float, Dict[str, str]]]

Runs L1 distance on the query and document embeddings

Example

Below is just one example. See supported embedders: https://gpt-index.readthedocs.io/en/latest/core_modules/model_modules/embeddings/root.html from langchain.embeddings.openai import OpenAIEmbeddings

model_name = 'text-embedding-ada-002'

embed_model = OpenAIEmbeddings(
    model=model_name,
    openai_api_key=OPENAI_API_KEY
)

# Create the feedback function
f_embed = feedback.Embeddings(embed_model=embed_model)
f_embed_dist = feedback.Feedback(f_embed.manhattan_distance)                .on_input()                .on(Select.Record.app.combine_documents_chain._call.args.inputs.input_documents[:].page_content)

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER DESCRIPTION
query

A text prompt to a vector DB.

TYPE: str

document

The document returned from the vector DB.

TYPE: str

RETURNS DESCRIPTION
Union[float, Tuple[float, Dict[str, str]]]
  • float: the embedding vector distance
euclidean_distance
euclidean_distance(query: str, document: str) -> Union[float, Tuple[float, Dict[str, str]]]

Runs L2 distance on the query and document embeddings

Example

Below is just one example. See supported embedders: https://gpt-index.readthedocs.io/en/latest/core_modules/model_modules/embeddings/root.html from langchain.embeddings.openai import OpenAIEmbeddings

model_name = 'text-embedding-ada-002'

embed_model = OpenAIEmbeddings(
    model=model_name,
    openai_api_key=OPENAI_API_KEY
)

# Create the feedback function
f_embed = feedback.Embeddings(embed_model=embed_model)
f_embed_dist = feedback.Feedback(f_embed.euclidean_distance)                .on_input()                .on(Select.Record.app.combine_documents_chain._call.args.inputs.input_documents[:].page_content)

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER DESCRIPTION
query

A text prompt to a vector DB.

TYPE: str

document

The document returned from the vector DB.

TYPE: str

RETURNS DESCRIPTION
Union[float, Tuple[float, Dict[str, str]]]
  • float: the embedding vector distance