Skip to content

LLM Provider

trulens_eval.feedback.provider.base.LLMProvider

Bases: Provider

An LLM-based provider.

This is an abstract class and needs to be initialized as one of these:

Functions

generate_score

generate_score(system_prompt: str, user_prompt: Optional[str] = None, normalize: float = 10.0, temperature: float = 0.0) -> float

Base method to generate a score only, used for evaluation.

PARAMETER DESCRIPTION
system_prompt

A pre-formatted system prompt.

TYPE: str

user_prompt

An optional user prompt. Defaults to None.

TYPE: Optional[str] DEFAULT: None

normalize

The normalization factor for the score. Defaults to 10.0.

TYPE: float DEFAULT: 10.0

temperature

The temperature for the LLM response. Defaults to 0.0.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
float

The score on a 0-1 scale.

TYPE: float

generate_score_and_reasons

generate_score_and_reasons(system_prompt: str, user_prompt: Optional[str] = None, normalize: float = 10.0, temperature: float = 0.0) -> Tuple[float, Dict]

Base method to generate a score and reason, used for evaluation.

PARAMETER DESCRIPTION
system_prompt

A pre-formatted system prompt.

TYPE: str

user_prompt

An optional user prompt. Defaults to None.

TYPE: Optional[str] DEFAULT: None

normalize

The normalization factor for the score. Defaults to 10.0.

TYPE: float DEFAULT: 10.0

temperature

The temperature for the LLM response. Defaults to 0.0.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, Dict]: The score on a 0-1 scale and reason metadata (dict) if returned by the LLM.

context_relevance

context_relevance(question: str, context: str, temperature: float = 0.0) -> float

Uses chat completion model. A function that completes a template to check the relevance of the context to the question.

Example

from trulens_eval.app import App
context = App.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER DESCRIPTION
question

A question being asked.

TYPE: str

context

Context related to the question.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not relevant) and 1.0 (relevant).

TYPE: float

qs_relevance

qs_relevance(question: str, context: str) -> float

Question statement relevance is deprecated and will be removed in future versions. Please use context relevance in its place.

context_relevance_with_cot_reasons

context_relevance_with_cot_reasons(question: str, context: str, temperature: float = 0.0) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.

Example

from trulens_eval.app import App
context = App.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER DESCRIPTION
question

A question being asked.

TYPE: str

context

Context related to the question.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "not relevant" and 1 being "relevant".

TYPE: Tuple[float, Dict]

qs_relevance_with_cot_reasons

qs_relevance_with_cot_reasons(question: str, context: str) -> Tuple[float, Dict]

Question statement relevance is deprecated and will be removed in future versions. Please use context relevance in its place.

relevance

relevance(prompt: str, response: str) -> float

Uses chat completion model. A function that completes a template to check the relevance of the response to a prompt.

Example

feedback = Feedback(provider.relevance).on_input_output()

The on_input_output() selector can be changed. See Feedback Function Guide

Usage on RAG Contexts
feedback = Feedback(provider.relevance).on_input().on(
    TruLlama.select_source_nodes().node.text # See note below
).aggregate(np.mean) 

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "not relevant" and 1 being "relevant".

TYPE: float

relevance_with_cot_reasons

relevance_with_cot_reasons(prompt: str, response: str) -> Tuple[float, Dict]

Uses chat completion Model. A function that completes a template to check the relevance of the response to a prompt. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.relevance_with_cot_reasons).on_input_output()

The on_input_output() selector can be changed. See Feedback Function Guide

Usage on RAG Contexts
feedback = Feedback(provider.relevance_with_cot_reasons).on_input().on(
    TruLlama.select_source_nodes().node.text # See note below
).aggregate(np.mean) 

The on(...) selector can be changed. See Feedback Function Guide : Selectors

PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "not relevant" and 1 being "relevant".

TYPE: Tuple[float, Dict]

sentiment

sentiment(text: str) -> float

Uses chat completion model. A function that completes a template to check the sentiment of some text.

Example

feedback = Feedback(provider.sentiment).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate sentiment of.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "negative sentiment" and 1 being "positive sentiment".

sentiment_with_cot_reasons

sentiment_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the sentiment of some text. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.sentiment_with_cot_reasons).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (negative sentiment) and 1.0 (positive sentiment).

TYPE: Tuple[float, Dict]

model_agreement

model_agreement(prompt: str, response: str) -> float

Uses chat completion model. A function that gives a chat completion model the same prompt and gets a response, encouraging truthfulness. A second template is given to the model with a prompt that the original response is correct, and measures whether previous chat completion response is similar.

Example

feedback = Feedback(provider.model_agreement).on_input_output() 

The on_input_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not in agreement) and 1.0 (in agreement).

TYPE: float

conciseness

conciseness(text: str) -> float

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.conciseness).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate the conciseness of.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not concise) and 1.0 (concise).

conciseness_with_cot_reasons

conciseness_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.conciseness).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate the conciseness of.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not concise) and 1.0 (concise)

Dict

A dictionary containing the reasons for the evaluation.

correctness

correctness(text: str) -> float

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.correctness).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

A prompt to an agent.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not correct) and 1.0 (correct).

correctness_with_cot_reasons

correctness_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.correctness_with_cot_reasons).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not correct) and 1.0 (correct).

TYPE: Tuple[float, Dict]

coherence

coherence(text: str) -> float

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.coherence).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not coherent) and 1.0 (coherent).

TYPE: float

coherence_with_cot_reasons

coherence_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.coherence_with_cot_reasons).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not coherent) and 1.0 (coherent).

TYPE: Tuple[float, Dict]

harmfulness

harmfulness(text: str) -> float

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.harmfulness).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not harmful) and 1.0 (harmful)".

TYPE: float

harmfulness_with_cot_reasons

harmfulness_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.harmfulness_with_cot_reasons).on_output()
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not harmful) and 1.0 (harmful).

TYPE: Tuple[float, Dict]

maliciousness

maliciousness(text: str) -> float

Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.maliciousness).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not malicious) and 1.0 (malicious).

TYPE: float

maliciousness_with_cot_reasons

maliciousness_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat compoletion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.maliciousness_with_cot_reasons).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not malicious) and 1.0 (malicious).

TYPE: Tuple[float, Dict]

helpfulness

helpfulness(text: str) -> float

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.helpfulness).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not helpful) and 1.0 (helpful).

TYPE: float

helpfulness_with_cot_reasons

helpfulness_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.helpfulness_with_cot_reasons).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not helpful) and 1.0 (helpful).

TYPE: Tuple[float, Dict]

controversiality

controversiality(text: str) -> float

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval.

Example

feedback = Feedback(provider.controversiality).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not controversial) and 1.0 (controversial).

TYPE: float

controversiality_with_cot_reasons

controversiality_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.controversiality_with_cot_reasons).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not controversial) and 1.0 (controversial).

TYPE: Tuple[float, Dict]

misogyny

misogyny(text: str) -> float

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.misogyny).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not misogynistic) and 1.0 (misogynistic).

TYPE: float

misogyny_with_cot_reasons

misogyny_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.misogyny_with_cot_reasons).on_output() 

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not misogynistic) and 1.0 (misogynistic).

TYPE: Tuple[float, Dict]

criminality

criminality(text: str) -> float

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.criminality).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not criminal) and 1.0 (criminal).

TYPE: float

criminality_with_cot_reasons

criminality_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.criminality_with_cot_reasons).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not criminal) and 1.0 (criminal).

TYPE: Tuple[float, Dict]

insensitivity

insensitivity(text: str) -> float

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.insensitivity).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not insensitive) and 1.0 (insensitive).

TYPE: float

insensitivity_with_cot_reasons

insensitivity_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.insensitivity_with_cot_reasons).on_output()

The on_output() selector can be changed. See Feedback Function Guide

PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not insensitive) and 1.0 (insensitive).

TYPE: Tuple[float, Dict]

comprehensiveness_with_cot_reasons

comprehensiveness_with_cot_reasons(source: str, summary: str) -> Tuple[float, Dict]

Uses chat completion model. A function that tries to distill main points and compares a summary against those main points. This feedback function only has a chain of thought implementation as it is extremely important in function assessment.

Example

feedback = Feedback(provider.comprehensiveness_with_cot_reasons).on_input_output()
PARAMETER DESCRIPTION
source

Text corresponding to source material.

TYPE: str

summary

Text corresponding to a summary.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

A value between 0.0 (main points missed) and 1.0 (no main points missed).

summarization_with_cot_reasons

summarization_with_cot_reasons(source: str, summary: str) -> Tuple[float, Dict]

Summarization is deprecated in place of comprehensiveness. Defaulting to comprehensiveness_with_cot_reasons.

stereotypes

stereotypes(prompt: str, response: str) -> float

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example

feedback = Feedback(provider.stereotypes).on_input_output()
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (no stereotypes assumed) and 1.0 (stereotypes assumed).

stereotypes_with_cot_reasons

stereotypes_with_cot_reasons(prompt: str, response: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example

feedback = Feedback(provider.stereotypes).on_input_output()
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

A value between 0.0 (no stereotypes assumed) and 1.0 (stereotypes assumed).