📖 Stock Feedback Functions¶

trulens_eval.feedback.provider.hugs.Huggingface ¶

Bases: Provider

Out of the box feedback functions calling Huggingface APIs.

Functions¶

language_match ¶

language_match(text1: str, text2: str) -> Tuple[float, Dict]

Uses Huggingface's papluca/xlm-roberta-base-language-detection model. A function that uses language detection on text1 and text2 and calculates the probit difference on the language detected on text1. The function is: 1.0 - (|probit_language_text1(text1) - probit_language_text1(text2))

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.hugs import Huggingface
huggingface_provider = Huggingface()

feedback = Feedback(huggingface_provider.language_match).on_input_output()

The on_input_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text1`	Text to evaluate. TYPE: `str`
`text2`	Comparative text to evaluate. TYPE: `str`

Returns:

float: A value between 0 and 1. 0 being "different languages" and 1
being "same languages".

context_relevance ¶

context_relevance(prompt: str, context: str) -> float

Uses Huggingface's truera/context_relevance model, a model that uses computes the relevance of a given context to the prompt. The model can be found at https://huggingface.co/truera/context_relevance. Usage:

from trulens_eval import Feedback
from trulens_eval.feedback.provider.hugs import Huggingface
huggingface_provider = Huggingface()

feedback = Feedback(huggingface_provider.context_relevance).on_input_output()

The on_input_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`prompt`	The given prompt. TYPE: `str`
`context`	Comparative contextual information. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0 and 1. 0 being irrelevant and 1 TYPE: `float`
`float`	being a relevant context for addressing the prompt.

positive_sentiment ¶

positive_sentiment(text: str) -> float

Uses Huggingface's cardiffnlp/twitter-roberta-base-sentiment model. A function that uses a sentiment classifier on text.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.hugs import Huggingface
huggingface_provider = Huggingface()

feedback = Feedback(huggingface_provider.positive_sentiment).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	Text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0 and 1. 0 being "negative sentiment" and 1 TYPE: `float`
`float`	being "positive sentiment".

toxic ¶

toxic(text: str) -> float

Uses Huggingface's martin-ha/toxic-comment-model model. A function that uses a toxic comment classifier on text.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.hugs import Huggingface
huggingface_provider = Huggingface()

feedback = Feedback(huggingface_provider.not_toxic).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	Text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0 and 1. 1 being "toxic" and 0 being "not TYPE: `float`
`float`	toxic".

pii_detection ¶

pii_detection(text: str) -> float

NER model to detect PII.

Example

hugs = Huggingface()

# Define a pii_detection feedback function using HuggingFace.
f_pii_detection = Feedback(hugs.pii_detection).on_input()

The on(...) selector can be changed. See Feedback Function Guide: Selectors

PARAMETER	DESCRIPTION
`text`	A text prompt that may contain a name. TYPE: `str`

RETURNS	DESCRIPTION
`float`	The likelihood that a name is contained in the input text.

pii_detection_with_cot_reasons ¶

pii_detection_with_cot_reasons(text: str)

NER model to detect PII, with reasons.

Example

hugs = Huggingface()

# Define a pii_detection feedback function using HuggingFace.
f_pii_detection = Feedback(hugs.pii_detection).on_input()

The on(...) selector can be changed. See Feedback Function Guide : Selectors

hallucination_evaluator ¶

hallucination_evaluator(model_output: str, retrieved_text_chunks: str) -> float

Evaluates the hallucination score for a combined input of two statements as a float 0<x<1 representing a 
true/false boolean. if the return is greater than 0.5 the statement is evaluated as true. if the return is
less than 0.5 the statement is evaluated as a hallucination.

**!!! example

** python from trulens_eval.feedback.provider.hugs import Huggingface huggingface_provider = Huggingface() score = huggingface_provider.hallucination_evaluator("The sky is blue. [SEP] Apples are red , the grass is green.")

Args:
    model_output (str): This is what an LLM returns based on the text chunks retrieved during RAG
    retrieved_text_chunk (str): These are the text chunks you have retrieved during RAG

Returns:
    float: Hallucination score

trulens_eval.feedback.provider.openai.OpenAI ¶

Bases: LLMProvider

Out of the box feedback functions calling OpenAI APIs.

Create an OpenAI Provider with out of the box feedback functions.

Example

from trulens_eval.feedback.provider.openai import OpenAI 
openai_provider = OpenAI()

PARAMETER	DESCRIPTION
`model_engine`	The OpenAI completion model. Defaults to `gpt-3.5-turbo` TYPE: `Optional[str]` DEFAULT: `None`
`**kwargs`	Additional arguments to pass to the OpenAIEndpoint which are then passed to OpenAIClient and finally to the OpenAI client. TYPE: `dict` DEFAULT: `{}`

Functions¶

moderation_hate ¶

moderation_hate(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is hate speech.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_hate, higher_is_better=False
).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	Text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not hate) and 1.0 (hate). TYPE: `float`

moderation_hatethreatening ¶

moderation_hatethreatening(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is threatening speech.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_hatethreatening, higher_is_better=False
).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	Text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not threatening) and 1.0 (threatening). TYPE: `float`

moderation_selfharm ¶

moderation_selfharm(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about self harm.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_selfharm, higher_is_better=False
).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	Text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not self harm) and 1.0 (self harm). TYPE: `float`

moderation_sexual ¶

moderation_sexual(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is sexual speech.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_sexual, higher_is_better=False
).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	Text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not sexual) and 1.0 (sexual). TYPE: `float`

moderation_sexualminors ¶

moderation_sexualminors(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about sexual minors.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_sexualminors, higher_is_better=False
).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	Text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not sexual minors) and 1.0 (sexual TYPE: `float`
`float`	minors).

moderation_violence ¶

moderation_violence(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about violence.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_violence, higher_is_better=False
).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	Text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not violence) and 1.0 (violence). TYPE: `float`

moderation_violencegraphic ¶

moderation_violencegraphic(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about graphic violence.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_violencegraphic, higher_is_better=False
).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	Text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not graphic violence) and 1.0 (graphic TYPE: `float`
`float`	violence).

moderation_harassment ¶

moderation_harassment(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about graphic violence.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_harassment, higher_is_better=False
).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	Text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not harrassment) and 1.0 (harrassment). TYPE: `float`

moderation_harassment_threatening ¶

moderation_harassment_threatening(text: str) -> float

Uses OpenAI's Moderation API. A function that checks if text is about graphic violence.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()

feedback = Feedback(
    openai_provider.moderation_harassment_threatening, higher_is_better=False
).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	Text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not harrassment/threatening) and 1.0 (harrassment/threatening). TYPE: `float`

trulens_eval.feedback.provider.base.LLMProvider ¶

Bases: Provider

An LLM-based provider.

This is an abstract class and needs to be initialized as one of these:

OpenAI and subclass AzureOpenAI.
Bedrock.
LiteLLM. LiteLLM provides an interface to a wide range of models.
Langchain.

Functions¶

generate_score ¶

generate_score(system_prompt: str, user_prompt: Optional[str] = None, normalize: float = 10.0, temperature: float = 0.0) -> float

Base method to generate a score only, used for evaluation.

PARAMETER	DESCRIPTION
`system_prompt`	A pre-formatted system prompt. TYPE: `str`
`user_prompt`	An optional user prompt. TYPE: `Optional[str]` DEFAULT: `None`
`normalize`	The normalization factor for the score. TYPE: `float` DEFAULT: `10.0`
`temperature`	The temperature for the LLM response. TYPE: `float` DEFAULT: `0.0`

RETURNS	DESCRIPTION
`float`	The score on a 0-1 scale.

generate_score_and_reasons ¶

generate_score_and_reasons(system_prompt: str, user_prompt: Optional[str] = None, normalize: float = 10.0, temperature: float = 0.0) -> Tuple[float, Dict]

Base method to generate a score and reason, used for evaluation.

PARAMETER	DESCRIPTION
`system_prompt`	A pre-formatted system prompt. TYPE: `str`
`user_prompt`	An optional user prompt. Defaults to None. TYPE: `Optional[str]` DEFAULT: `None`
`normalize`	The normalization factor for the score. TYPE: `float` DEFAULT: `10.0`
`temperature`	The temperature for the LLM response. TYPE: `float` DEFAULT: `0.0`

RETURNS	DESCRIPTION
`float`	The score on a 0-1 scale.
`Dict`	Reason metadata if returned by the LLM.

context_relevance ¶

context_relevance(question: str, context: str, temperature: float = 0.0) -> float

Uses chat completion model. A function that completes a template to check the relevance of the context to the question.

Example

from trulens_eval.app import App
context = App.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER	DESCRIPTION
`question`	A question being asked. TYPE: `str`
`context`	Context related to the question. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not relevant) and 1.0 (relevant). TYPE: `float`

qs_relevance ¶

qs_relevance(question: str, context: str) -> float

Question statement relevance is deprecated and will be removed in future versions. Please use context relevance in its place.

context_relevance_with_cot_reasons ¶

context_relevance_with_cot_reasons(question: str, context: str, temperature: float = 0.0) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.

Example

from trulens_eval.app import App
context = App.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER	DESCRIPTION
`question`	A question being asked. TYPE: `str`
`context`	Context related to the question. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0 and 1. 0 being "not relevant" and 1 being "relevant". TYPE: `Tuple[float, Dict]`

qs_relevance_with_cot_reasons ¶

qs_relevance_with_cot_reasons(question: str, context: str) -> Tuple[float, Dict]

Question statement relevance is deprecated and will be removed in future versions. Please use context relevance in its place.

relevance ¶

relevance(prompt: str, response: str) -> float

Uses chat completion model. A function that completes a template to check the relevance of the response to a prompt.

Example

feedback = Feedback(provider.relevance).on_input_output()

The on_input_output() selector can be changed. See Feedback Function Guide

Usage on RAG Contexts

feedback = Feedback(provider.relevance).on_input().on(
    TruLlama.select_source_nodes().node.text # See note below
).aggregate(np.mean)

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER	DESCRIPTION
`prompt`	A text prompt to an agent. TYPE: `str`
`response`	The agent's response to the prompt. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0 and 1. 0 being "not relevant" and 1 being "relevant". TYPE: `float`

relevance_with_cot_reasons ¶

relevance_with_cot_reasons(prompt: str, response: str) -> Tuple[float, Dict]

Uses chat completion Model. A function that completes a template to check the relevance of the response to a prompt. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.relevance_with_cot_reasons).on_input_output()

The on_input_output() selector can be changed. See Feedback Function Guide

Usage on RAG Contexts

feedback = Feedback(provider.relevance_with_cot_reasons).on_input().on(
    TruLlama.select_source_nodes().node.text # See note below
).aggregate(np.mean)

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER	DESCRIPTION
`prompt`	A text prompt to an agent. TYPE: `str`
`response`	The agent's response to the prompt. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0 and 1. 0 being "not relevant" and 1 being "relevant". TYPE: `Tuple[float, Dict]`

sentiment ¶

sentiment(text: str) -> float

Uses chat completion model. A function that completes a template to check the sentiment of some text.

Example

feedback = Feedback(provider.sentiment).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	The text to evaluate sentiment of. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0 and 1. 0 being "negative sentiment" and 1 being "positive sentiment".

sentiment_with_cot_reasons ¶

sentiment_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the sentiment of some text. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.sentiment_with_cot_reasons).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	Text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (negative sentiment) and 1.0 (positive sentiment). TYPE: `Tuple[float, Dict]`

model_agreement ¶

model_agreement(prompt: str, response: str) -> float

Uses chat completion model. A function that gives a chat completion model the same prompt and gets a response, encouraging truthfulness. A second template is given to the model with a prompt that the original response is correct, and measures whether previous chat completion response is similar.

Example

feedback = Feedback(provider.model_agreement).on_input_output()

The on_input_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`prompt`	A text prompt to an agent. TYPE: `str`
`response`	The agent's response to the prompt. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not in agreement) and 1.0 (in agreement). TYPE: `float`

conciseness ¶

conciseness(text: str) -> float

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.conciseness).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	The text to evaluate the conciseness of. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not concise) and 1.0 (concise).

conciseness_with_cot_reasons ¶

conciseness_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.conciseness).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	The text to evaluate the conciseness of. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not concise) and 1.0 (concise)
`Dict`	A dictionary containing the reasons for the evaluation.

correctness ¶

correctness(text: str) -> float

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.correctness).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	A prompt to an agent. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not correct) and 1.0 (correct).

correctness_with_cot_reasons ¶

correctness_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.correctness_with_cot_reasons).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	Text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not correct) and 1.0 (correct). TYPE: `Tuple[float, Dict]`

coherence ¶

coherence(text: str) -> float

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.coherence).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not coherent) and 1.0 (coherent). TYPE: `float`

coherence_with_cot_reasons ¶

coherence_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.coherence_with_cot_reasons).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not coherent) and 1.0 (coherent). TYPE: `Tuple[float, Dict]`

harmfulness ¶

harmfulness(text: str) -> float

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.harmfulness).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not harmful) and 1.0 (harmful)". TYPE: `float`

harmfulness_with_cot_reasons ¶

harmfulness_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.harmfulness_with_cot_reasons).on_output()

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not harmful) and 1.0 (harmful). TYPE: `Tuple[float, Dict]`

maliciousness ¶

maliciousness(text: str) -> float

Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.maliciousness).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not malicious) and 1.0 (malicious). TYPE: `float`

maliciousness_with_cot_reasons ¶

maliciousness_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat compoletion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.maliciousness_with_cot_reasons).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not malicious) and 1.0 (malicious). TYPE: `Tuple[float, Dict]`

helpfulness ¶

helpfulness(text: str) -> float

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.helpfulness).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not helpful) and 1.0 (helpful). TYPE: `float`

helpfulness_with_cot_reasons ¶

helpfulness_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.helpfulness_with_cot_reasons).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not helpful) and 1.0 (helpful). TYPE: `Tuple[float, Dict]`

controversiality ¶

controversiality(text: str) -> float

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval.

Example

feedback = Feedback(provider.controversiality).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not controversial) and 1.0 (controversial). TYPE: `float`

controversiality_with_cot_reasons ¶

controversiality_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.controversiality_with_cot_reasons).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not controversial) and 1.0 (controversial). TYPE: `Tuple[float, Dict]`

misogyny ¶

misogyny(text: str) -> float

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.misogyny).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not misogynistic) and 1.0 (misogynistic). TYPE: `float`

misogyny_with_cot_reasons ¶

misogyny_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.misogyny_with_cot_reasons).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not misogynistic) and 1.0 (misogynistic). TYPE: `Tuple[float, Dict]`

criminality ¶

criminality(text: str) -> float

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.criminality).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not criminal) and 1.0 (criminal). TYPE: `float`

criminality_with_cot_reasons ¶

criminality_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.criminality_with_cot_reasons).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not criminal) and 1.0 (criminal). TYPE: `Tuple[float, Dict]`

insensitivity ¶

insensitivity(text: str) -> float

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.insensitivity).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not insensitive) and 1.0 (insensitive). TYPE: `float`

insensitivity_with_cot_reasons ¶

insensitivity_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.insensitivity_with_cot_reasons).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`text`	The text to evaluate. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (not insensitive) and 1.0 (insensitive). TYPE: `Tuple[float, Dict]`

comprehensiveness_with_cot_reasons ¶

comprehensiveness_with_cot_reasons(source: str, summary: str) -> Tuple[float, Dict]

Uses chat completion model. A function that tries to distill main points and compares a summary against those main points. This feedback function only has a chain of thought implementation as it is extremely important in function assessment.

Example

feedback = Feedback(provider.comprehensiveness_with_cot_reasons).on_input_output()

PARAMETER	DESCRIPTION
`source`	Text corresponding to source material. TYPE: `str`
`summary`	Text corresponding to a summary. TYPE: `str`

RETURNS	DESCRIPTION
`Tuple[float, Dict]`	A value between 0.0 (main points missed) and 1.0 (no main points missed).

summarization_with_cot_reasons ¶

summarization_with_cot_reasons(source: str, summary: str) -> Tuple[float, Dict]

Summarization is deprecated in place of comprehensiveness. Defaulting to comprehensiveness_with_cot_reasons.

stereotypes ¶

stereotypes(prompt: str, response: str) -> float

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example

feedback = Feedback(provider.stereotypes).on_input_output()

PARAMETER	DESCRIPTION
`prompt`	A text prompt to an agent. TYPE: `str`
`response`	The agent's response to the prompt. TYPE: `str`

RETURNS	DESCRIPTION
`float`	A value between 0.0 (no stereotypes assumed) and 1.0 (stereotypes assumed).

stereotypes_with_cot_reasons ¶

stereotypes_with_cot_reasons(prompt: str, response: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example

feedback = Feedback(provider.stereotypes).on_input_output()

PARAMETER	DESCRIPTION
`prompt`	A text prompt to an agent. TYPE: `str`
`response`	The agent's response to the prompt. TYPE: `str`

RETURNS	DESCRIPTION
`Tuple[float, Dict]`	A value between 0.0 (no stereotypes assumed) and 1.0 (stereotypes assumed).

trulens_eval.feedback.groundedness ¶

Classes¶

Groundedness ¶

Bases: WithClassInfo, SerialModel

Measures Groundedness.

Currently the groundedness functions work well with a summarizer. This class will use an LLM to find the relevant strings in a text. The groundedness_provider can either be an LLM provider (such as OpenAI) or NLI with huggingface.

Example

from trulens_eval.feedback import Groundedness
from trulens_eval.feedback.provider.openai import OpenAI
openai_provider = OpenAI()
groundedness_imp = Groundedness(groundedness_provider=openai_provider)

Example

from trulens_eval.feedback import Groundedness
from trulens_eval.feedback.provider.hugs import Huggingface
huggingface_provider = Huggingface()
groundedness_imp = Groundedness(groundedness_provider=huggingface_provider)

PARAMETER	DESCRIPTION
`groundedness_provider`	Provider to use for evaluating groundedness. This should be OpenAI LLM or HuggingFace NLI. Defaults to `OpenAI`. TYPE: `Optional[Provider]` DEFAULT: `None`

Functions¶

groundedness_measure_with_cot_reasons ¶

groundedness_measure_with_cot_reasons(source: str, statement: str) -> Tuple[float, dict]

A measure to track if the source material supports each sentence in the statement using an LLM provider.

The LLM will process the entire statement at once, using chain of thought methodology to emit the reasons.

Usage on RAG Contexts

from trulens_eval import Feedback
from trulens_eval.feedback import Groundedness
from trulens_eval.feedback.provider.openai import OpenAI
grounded = feedback.Groundedness(groundedness_provider=OpenAI())

f_groundedness = feedback.Feedback(grounded.groundedness_measure_with_cot_reasons).on(
    Select.Record.app.combine_documents_chain._call.args.inputs.input_documents[:].page_content # See note below
).on_output().aggregate(grounded.grounded_statements_aggregator)

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER	DESCRIPTION
`source`	The source that should support the statement. TYPE: `str`
`statement`	The statement to check groundedness. TYPE: `str`

RETURNS	DESCRIPTION
`Tuple[float, dict]`	A measure between 0 and 1, where 1 means each sentence is grounded in the source.

groundedness_measure_with_nli ¶

groundedness_measure_with_nli(source: str, statement: str) -> Tuple[float, dict]

A measure to track if the source material supports each sentence in the statement using an NLI model.

First the response will be split into statements using a sentence tokenizer.The NLI model will process each statement using a natural language inference model, and will use the entire source.

Usage on RAG Contexts:

from trulens_eval import Feedback
from trulens_eval.feedback import Groundedness
from trulens_eval.feedback.provider.hugs = Huggingface
grounded = feedback.Groundedness(groundedness_provider=Huggingface())


f_groundedness = feedback.Feedback(grounded.groundedness_measure_with_nli).on(
    Select.Record.app.combine_documents_chain._call.args.inputs.input_documents[:].page_content # See note below
).on_output().aggregate(grounded.grounded_statements_aggregator)

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER	DESCRIPTION
`source`	The source that should support the statement TYPE: `str`
`statement`	The statement to check groundedness TYPE: `str`

RETURNS	DESCRIPTION
`float`	A measure between 0 and 1, where 1 means each sentence is grounded in the source. TYPE: `float`
`str`	TYPE: `dict`

groundedness_measure ¶

groundedness_measure(source: str, statement: str) -> Tuple[float, dict]

Groundedness measure is deprecated in place of the chain-of-thought version. This function will raise a NotImplementedError.

groundedness_measure_with_summarize_step ¶

groundedness_measure_with_summarize_step(source: str, statement: str) -> float

DEPRECATED: This method is deprecated and will be removed in a future release. Please use alternative groundedness measure methods.

A measure to track if the source material supports each sentence in the statement. This groundedness measure is more accurate; but slower using a two step process. - First find supporting evidence with an LLM - Then for each statement sentence, check groundedness

Usage on RAG Contexts:

from trulens_eval import Feedback
from trulens_eval.feedback import Groundedness
from trulens_eval.feedback.provider.openai import OpenAI
grounded = feedback.Groundedness(groundedness_provider=OpenAI())


f_groundedness = feedback.Feedback(grounded.groundedness_measure_with_summarize_step).on(
    Select.Record.app.combine_documents_chain._call.args.inputs.input_documents[:].page_content # See note below
).on_output().aggregate(grounded.grounded_statements_aggregator)

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER	DESCRIPTION
`source`	The source that should support the statement TYPE: `str`
`statement`	The statement to check groundedness TYPE: `str`

RETURNS	DESCRIPTION
`float`	A measure between 0 and 1, where 1 means each sentence is grounded in the source. TYPE: `float`

grounded_statements_aggregator ¶

grounded_statements_aggregator(source_statements_multi_output: List[Dict]) -> float

Compute the mean groundedness based on the best evidence available for each statement.

PARAMETER	DESCRIPTION
`source_statements_multi_output`	A list of scores. Each list index is a context. The Dict is a per statement score. TYPE: `List[Dict]`

RETURNS	DESCRIPTION
`float`	for each statement, gets the max score, then averages over that. TYPE: `float`

Functions¶

trulens_eval.feedback.groundtruth ¶

Classes¶

GroundTruthAgreement ¶

Bases: WithClassInfo, SerialModel

Measures Agreement against a Ground Truth.

Functions¶

init ¶

__init__(ground_truth: Union[List, Callable, FunctionOrMethod], provider: Optional[Provider] = None, bert_scorer: Optional[BERTScorer] = None, **kwargs)

Measures Agreement against a Ground Truth.

Usage 1:

from trulens_eval.feedback import GroundTruthAgreement
golden_set = [
    {"query": "who invented the lightbulb?", "response": "Thomas Edison"},
    {"query": "¿quien invento la bombilla?", "response": "Thomas Edison"}
]
ground_truth_collection = GroundTruthAgreement(golden_set)

Usage 2:

from trulens_eval.feedback import GroundTruthAgreement
ground_truth_imp = llm_app
response = llm_app(prompt)
ground_truth_collection = GroundTruthAgreement(ground_truth_imp)

PARAMETER	DESCRIPTION
`ground_truth`	A list of query/response pairs or a function or callable that returns a ground truth string given a prompt string. TYPE: `Union[Callable, FunctionOrMethod]`
`bert_scorer`	Internal Usage for DB serialization. TYPE: `Optional["BERTScorer"]` DEFAULT: `None`
`provider`	Internal Usage for DB serialization. TYPE: `Provider` DEFAULT: `None`

agreement_measure ¶

agreement_measure(prompt: str, response: str) -> Union[float, Tuple[float, Dict[str, str]]]

Uses OpenAI's Chat GPT Model. A function that that measures similarity to ground truth. A second template is given to Chat GPT with a prompt that the original response is correct, and measures whether previous Chat GPT's response is similar.

Example

from trulens_eval import Feedback
from trulens_eval.feedback import GroundTruthAgreement
golden_set = [
    {"query": "who invented the lightbulb?", "response": "Thomas Edison"},
    {"query": "¿quien invento la bombilla?", "response": "Thomas Edison"}
]
ground_truth_collection = GroundTruthAgreement(golden_set)

feedback = Feedback(ground_truth_collection.agreement_measure).on_input_output()

The on_input_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`prompt`	A text prompt to an agent. TYPE: `str`
`response`	The agent's response to the prompt. TYPE: `str`

RETURNS	DESCRIPTION
`Union[float, Tuple[float, Dict[str, str]]]`	float: A value between 0 and 1. 0 being "not in agreement" and 1 being "in agreement".
`Union[float, Tuple[float, Dict[str, str]]]`	dict: with key 'ground_truth_response'

mae ¶

mae(prompt: str, response: str, score: float) -> float

Method to look up the numeric expected score from a golden set and take the differnce.

Primarily used for evaluation of model generated feedback against human feedback

Example

from trulens_eval import Feedback
from trulens_eval.feedback import GroundTruthAgreement

golden_set =
{"query": "How many stomachs does a cow have?", "response": "Cows' diet relies primarily on grazing.", "expected_score": 0.4},
{"query": "Name some top dental floss brands", "response": "I don't know", "expected_score": 0.8}
]
ground_truth_collection = GroundTruthAgreement(golden_set)

f_groundtruth = Feedback(ground_truth.mae).on(Select.Record.calls[0].args.args[0]).on(Select.Record.calls[0].args.args[1]).on_output()

bert_score ¶

bert_score(prompt: str, response: str) -> Union[float, Tuple[float, Dict[str, str]]]

Uses BERT Score. A function that that measures similarity to ground truth using bert embeddings.

Example

from trulens_eval import Feedback
from trulens_eval.feedback import GroundTruthAgreement
golden_set = [
    {"query": "who invented the lightbulb?", "response": "Thomas Edison"},
    {"query": "¿quien invento la bombilla?", "response": "Thomas Edison"}
]
ground_truth_collection = GroundTruthAgreement(golden_set)

feedback = Feedback(ground_truth_collection.bert_score).on_input_output()

The on_input_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`prompt`	A text prompt to an agent. TYPE: `str`
`response`	The agent's response to the prompt. TYPE: `str`

RETURNS	DESCRIPTION
`Union[float, Tuple[float, Dict[str, str]]]`	float: A value between 0 and 1. 0 being "not in agreement" and 1 being "in agreement".
`Union[float, Tuple[float, Dict[str, str]]]`	dict: with key 'ground_truth_response'

bleu ¶

bleu(prompt: str, response: str) -> Union[float, Tuple[float, Dict[str, str]]]

Uses BLEU Score. A function that that measures similarity to ground truth using token overlap.

Example

from trulens_eval import Feedback
from trulens_eval.feedback import GroundTruthAgreement
golden_set = [
    {"query": "who invented the lightbulb?", "response": "Thomas Edison"},
    {"query": "¿quien invento la bombilla?", "response": "Thomas Edison"}
]
ground_truth_collection = GroundTruthAgreement(golden_set)

feedback = Feedback(ground_truth_collection.bleu).on_input_output()

The on_input_output() selector can be changed. See Feedback Function Guide

PARAMETER	DESCRIPTION
`prompt`	A text prompt to an agent. TYPE: `str`
`response`	The agent's response to the prompt. TYPE: `str`

RETURNS	DESCRIPTION
`Union[float, Tuple[float, Dict[str, str]]]`	float: A value between 0 and 1. 0 being "not in agreement" and 1 being "in agreement".
`Union[float, Tuple[float, Dict[str, str]]]`	dict: with key 'ground_truth_response'

rouge ¶

rouge(prompt: str, response: str) -> Union[float, Tuple[float, Dict[str, str]]]

Uses BLEU Score. A function that that measures similarity to ground truth using token overlap.

PARAMETER	DESCRIPTION
`prompt`	A text prompt to an agent. TYPE: `str`
`response`	The agent's response to the prompt. TYPE: `str`

RETURNS	DESCRIPTION
`Union[float, Tuple[float, Dict[str, str]]]`	float: A value between 0 and 1. 0 being "not in agreement" and 1 being "in agreement".
`Union[float, Tuple[float, Dict[str, str]]]`	dict: with key 'ground_truth_response'

Functions¶

trulens_eval.feedback.embeddings ¶

Classes¶

Embeddings ¶

Bases: WithClassInfo, SerialModel

Embedding related feedback function implementations.

Functions¶

init ¶

__init__(embed_model: Embedder = None)

Instantiates embeddings for feedback functions.

f_embed = feedback.Embeddings(embed_model=embed_model)

PARAMETER	DESCRIPTION
`embed_model`	Supported embedders taken from llama-index: https://gpt-index.readthedocs.io/en/latest/core_modules/model_modules/embeddings/root.html TYPE: `Embedder` DEFAULT: `None`

cosine_distance ¶

cosine_distance(query: str, document: str) -> Union[float, Tuple[float, Dict[str, str]]]

Runs cosine distance on the query and document embeddings

Example

Below is just one example. See supported embedders: https://gpt-index.readthedocs.io/en/latest/core_modules/model_modules/embeddings/root.html from langchain.embeddings.openai import OpenAIEmbeddings

model_name = 'text-embedding-ada-002'

embed_model = OpenAIEmbeddings(
    model=model_name,
    openai_api_key=OPENAI_API_KEY
)

# Create the feedback function
f_embed = feedback.Embeddings(embed_model=embed_model)
f_embed_dist = feedback.Feedback(f_embed.cosine_distance)                .on_input()                .on(Select.Record.app.combine_documents_chain._call.args.inputs.input_documents[:].page_content)

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER	DESCRIPTION
`query`	A text prompt to a vector DB. TYPE: `str`
`document`	The document returned from the vector DB. TYPE: `str`

RETURNS	DESCRIPTION
`Union[float, Tuple[float, Dict[str, str]]]`	float: the embedding vector distance

manhattan_distance ¶

manhattan_distance(query: str, document: str) -> Union[float, Tuple[float, Dict[str, str]]]

Runs L1 distance on the query and document embeddings

Example

Below is just one example. See supported embedders: https://gpt-index.readthedocs.io/en/latest/core_modules/model_modules/embeddings/root.html from langchain.embeddings.openai import OpenAIEmbeddings

model_name = 'text-embedding-ada-002'

embed_model = OpenAIEmbeddings(
    model=model_name,
    openai_api_key=OPENAI_API_KEY
)

# Create the feedback function
f_embed = feedback.Embeddings(embed_model=embed_model)
f_embed_dist = feedback.Feedback(f_embed.manhattan_distance)                .on_input()                .on(Select.Record.app.combine_documents_chain._call.args.inputs.input_documents[:].page_content)

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER	DESCRIPTION
`query`	A text prompt to a vector DB. TYPE: `str`
`document`	The document returned from the vector DB. TYPE: `str`

RETURNS	DESCRIPTION
`Union[float, Tuple[float, Dict[str, str]]]`	float: the embedding vector distance

euclidean_distance ¶

euclidean_distance(query: str, document: str) -> Union[float, Tuple[float, Dict[str, str]]]

Runs L2 distance on the query and document embeddings

Example

Below is just one example. See supported embedders: https://gpt-index.readthedocs.io/en/latest/core_modules/model_modules/embeddings/root.html from langchain.embeddings.openai import OpenAIEmbeddings

model_name = 'text-embedding-ada-002'

embed_model = OpenAIEmbeddings(
    model=model_name,
    openai_api_key=OPENAI_API_KEY
)

# Create the feedback function
f_embed = feedback.Embeddings(embed_model=embed_model)
f_embed_dist = feedback.Feedback(f_embed.euclidean_distance)                .on_input()                .on(Select.Record.app.combine_documents_chain._call.args.inputs.input_documents[:].page_content)

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER	DESCRIPTION
`query`	A text prompt to a vector DB. TYPE: `str`
`document`	The document returned from the vector DB. TYPE: `str`

RETURNS	DESCRIPTION
`Union[float, Tuple[float, Dict[str, str]]]`	float: the embedding vector distance

📖 Stock Feedback Functions¶

trulens_eval.feedback.provider.hugs.Huggingface ¶

Functions¶

language_match ¶

context_relevance ¶

positive_sentiment ¶

toxic ¶

pii_detection ¶

pii_detection_with_cot_reasons ¶

hallucination_evaluator ¶

trulens_eval.feedback.provider.openai.OpenAI ¶

Functions¶

moderation_hate ¶

moderation_hatethreatening ¶

moderation_selfharm ¶

moderation_sexual ¶

moderation_sexualminors ¶

moderation_violence ¶

moderation_violencegraphic ¶

moderation_harassment ¶

moderation_harassment_threatening ¶

trulens_eval.feedback.provider.base.LLMProvider ¶

Functions¶

generate_score ¶

generate_score_and_reasons ¶

context_relevance ¶

qs_relevance ¶

context_relevance_with_cot_reasons ¶

qs_relevance_with_cot_reasons ¶

relevance ¶

relevance_with_cot_reasons ¶

sentiment ¶

sentiment_with_cot_reasons ¶

model_agreement ¶

conciseness ¶

conciseness_with_cot_reasons ¶

correctness ¶

correctness_with_cot_reasons ¶

coherence ¶

coherence_with_cot_reasons ¶

harmfulness ¶

harmfulness_with_cot_reasons ¶

maliciousness ¶

maliciousness_with_cot_reasons ¶

helpfulness ¶

helpfulness_with_cot_reasons ¶

controversiality ¶

controversiality_with_cot_reasons ¶

misogyny ¶

misogyny_with_cot_reasons ¶

criminality ¶

criminality_with_cot_reasons ¶

insensitivity ¶

insensitivity_with_cot_reasons ¶

comprehensiveness_with_cot_reasons ¶

summarization_with_cot_reasons ¶

stereotypes ¶

stereotypes_with_cot_reasons ¶

trulens_eval.feedback.groundedness ¶

Classes¶

Groundedness ¶

Functions¶

groundedness_measure_with_cot_reasons ¶

groundedness_measure_with_nli ¶

groundedness_measure ¶

groundedness_measure_with_summarize_step ¶

grounded_statements_aggregator ¶

Functions¶

trulens_eval.feedback.groundtruth ¶

Classes¶

GroundTruthAgreement ¶

Functions¶

__init__ ¶

agreement_measure ¶

mae ¶

bert_score ¶

bleu ¶

rouge ¶

Functions¶

trulens_eval.feedback.embeddings ¶

init ¶

init ¶