Skip to content

LLM Provider

trulens_eval.feedback.provider.base.LLMProvider

Bases: Provider

An LLM-based provider.

This is an abstract class and needs to be initialized as one of these:

Functions

generate_score

generate_score(
    system_prompt: str,
    user_prompt: Optional[str] = None,
    normalize: float = 10.0,
    temperature: float = 0.0,
) -> float

Base method to generate a score only, used for evaluation.

PARAMETER DESCRIPTION
system_prompt

A pre-formatted system prompt.

TYPE: str

user_prompt

An optional user prompt.

TYPE: Optional[str] DEFAULT: None

normalize

The normalization factor for the score.

TYPE: float DEFAULT: 10.0

temperature

The temperature for the LLM response.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
float

The score on a 0-1 scale.

generate_score_and_reasons

generate_score_and_reasons(
    system_prompt: str,
    user_prompt: Optional[str] = None,
    normalize: float = 10.0,
    temperature: float = 0.0,
) -> Tuple[float, Dict]

Base method to generate a score and reason, used for evaluation.

PARAMETER DESCRIPTION
system_prompt

A pre-formatted system prompt.

TYPE: str

user_prompt

An optional user prompt. Defaults to None.

TYPE: Optional[str] DEFAULT: None

normalize

The normalization factor for the score.

TYPE: float DEFAULT: 10.0

temperature

The temperature for the LLM response.

TYPE: float DEFAULT: 0.0

RETURNS DESCRIPTION
float

The score on a 0-1 scale.

Dict

Reason metadata if returned by the LLM.

context_relevance

context_relevance(
    question: str, context: str, temperature: float = 0.0
) -> float

Uses chat completion model. A function that completes a template to check the relevance of the context to the question.

Example

from trulens_eval.app import App
context = App.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
PARAMETER DESCRIPTION
question

A question being asked.

TYPE: str

context

Context related to the question.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not relevant) and 1.0 (relevant).

TYPE: float

qs_relevance

qs_relevance(question: str, context: str) -> float

Question statement relevance is deprecated and will be removed in future versions. Please use context relevance in its place.

context_relevance_with_cot_reasons

context_relevance_with_cot_reasons(
    question: str, context: str, temperature: float = 0.0
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.

Example

from trulens_eval.app import App
context = App.select_context(rag_app)
feedback = (
    Feedback(provider.context_relevance_with_cot_reasons)
    .on_input()
    .on(context)
    .aggregate(np.mean)
    )
PARAMETER DESCRIPTION
question

A question being asked.

TYPE: str

context

Context related to the question.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "not relevant" and 1 being "relevant".

TYPE: Tuple[float, Dict]

qs_relevance_with_cot_reasons

qs_relevance_with_cot_reasons(
    question: str, context: str
) -> Tuple[float, Dict]

Question statement relevance is deprecated and will be removed in future versions. Please use context relevance in its place.

relevance

relevance(prompt: str, response: str) -> float

Uses chat completion model. A function that completes a template to check the relevance of the response to a prompt.

Example

feedback = Feedback(provider.relevance).on_input_output()
Usage on RAG Contexts
feedback = Feedback(provider.relevance).on_input().on(
    TruLlama.select_source_nodes().node.text # See note below
).aggregate(np.mean) 
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "not relevant" and 1 being "relevant".

TYPE: float

relevance_with_cot_reasons

relevance_with_cot_reasons(
    prompt: str, response: str
) -> Tuple[float, Dict]

Uses chat completion Model. A function that completes a template to check the relevance of the response to a prompt. Also uses chain of thought methodology and emits the reasons.

Example

feedback = (
    Feedback(provider.relevance_with_cot_reasons)
    .on_input()
    .on_output()
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "not relevant" and 1 being "relevant".

TYPE: Tuple[float, Dict]

sentiment

sentiment(text: str) -> float

Uses chat completion model. A function that completes a template to check the sentiment of some text.

Example

feedback = Feedback(provider.sentiment).on_output() 
PARAMETER DESCRIPTION
text

The text to evaluate sentiment of.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0 and 1. 0 being "negative sentiment" and 1 being "positive sentiment".

sentiment_with_cot_reasons

sentiment_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the sentiment of some text. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.sentiment_with_cot_reasons).on_output() 
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (negative sentiment) and 1.0 (positive sentiment).

TYPE: Tuple[float, Dict]

model_agreement

model_agreement(prompt: str, response: str) -> float

Uses chat completion model. A function that gives a chat completion model the same prompt and gets a response, encouraging truthfulness. A second template is given to the model with a prompt that the original response is correct, and measures whether previous chat completion response is similar.

Example

feedback = Feedback(provider.model_agreement).on_input_output() 
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not in agreement) and 1.0 (in agreement).

TYPE: float

conciseness

conciseness(text: str) -> float

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.conciseness).on_output() 
PARAMETER DESCRIPTION
text

The text to evaluate the conciseness of.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not concise) and 1.0 (concise).

conciseness_with_cot_reasons

conciseness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.conciseness).on_output() 

Args: text: The text to evaluate the conciseness of.

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0.0 (not concise) and 1.0 (concise) and a string containing the reasons for the evaluation.

correctness

correctness(text: str) -> float

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.correctness).on_output() 
PARAMETER DESCRIPTION
text

A prompt to an agent.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not correct) and 1.0 (correct).

correctness_with_cot_reasons

correctness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.correctness_with_cot_reasons).on_output() 
PARAMETER DESCRIPTION
text

Text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0 (not correct) and 1.0 (correct) and a string containing the reasons for the evaluation.

coherence

coherence(text: str) -> float

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.coherence).on_output() 
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not coherent) and 1.0 (coherent).

TYPE: float

coherence_with_cot_reasons

coherence_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.coherence_with_cot_reasons).on_output() 
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0 (not coherent) and 1.0 (coherent) and a string containing the reasons for the evaluation.

harmfulness

harmfulness(text: str) -> float

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.harmfulness).on_output() 
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not harmful) and 1.0 (harmful)".

TYPE: float

harmfulness_with_cot_reasons

harmfulness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.harmfulness_with_cot_reasons).on_output()
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0 (not harmful) and 1.0 (harmful) and a string containing the reasons for the evaluation.

maliciousness

maliciousness(text: str) -> float

Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.maliciousness).on_output() 
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not malicious) and 1.0 (malicious).

TYPE: float

maliciousness_with_cot_reasons

maliciousness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat compoletion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.maliciousness_with_cot_reasons).on_output() 
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0 (not malicious) and 1.0 (malicious) and a string containing the reasons for the evaluation.

helpfulness

helpfulness(text: str) -> float

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.helpfulness).on_output() 
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not helpful) and 1.0 (helpful).

TYPE: float

helpfulness_with_cot_reasons

helpfulness_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.helpfulness_with_cot_reasons).on_output() 
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0 (not helpful) and 1.0 (helpful) and a string containing the reasons for the evaluation.

controversiality

controversiality(text: str) -> float

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval.

Example

feedback = Feedback(provider.controversiality).on_output() 
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not controversial) and 1.0 (controversial).

TYPE: float

controversiality_with_cot_reasons

controversiality_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.controversiality_with_cot_reasons).on_output() 
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0 (not controversial) and 1.0 (controversial) and a string containing the reasons for the evaluation.

misogyny

misogyny(text: str) -> float

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.misogyny).on_output() 
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not misogynistic) and 1.0 (misogynistic).

TYPE: float

misogyny_with_cot_reasons

misogyny_with_cot_reasons(text: str) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.misogyny_with_cot_reasons).on_output() 
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0.0 (not misogynistic) and 1.0 (misogynistic) and a string containing the reasons for the evaluation.

criminality

criminality(text: str) -> float

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.criminality).on_output()
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not criminal) and 1.0 (criminal).

TYPE: float

criminality_with_cot_reasons

criminality_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.criminality_with_cot_reasons).on_output()
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0.0 (not criminal) and 1.0 (criminal) and a string containing the reasons for the evaluation.

insensitivity

insensitivity(text: str) -> float

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval.

Example

feedback = Feedback(provider.insensitivity).on_output()
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (not insensitive) and 1.0 (insensitive).

TYPE: float

insensitivity_with_cot_reasons

insensitivity_with_cot_reasons(
    text: str,
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.

Example

feedback = Feedback(provider.insensitivity_with_cot_reasons).on_output()
PARAMETER DESCRIPTION
text

The text to evaluate.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0.0 (not insensitive) and 1.0 (insensitive) and a string containing the reasons for the evaluation.

comprehensiveness_with_cot_reasons

comprehensiveness_with_cot_reasons(
    source: str, summary: str
) -> Tuple[float, Dict]

Uses chat completion model. A function that tries to distill main points and compares a summary against those main points. This feedback function only has a chain of thought implementation as it is extremely important in function assessment.

Example

feedback = Feedback(provider.comprehensiveness_with_cot_reasons).on_input_output()
PARAMETER DESCRIPTION
source

Text corresponding to source material.

TYPE: str

summary

Text corresponding to a summary.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0.0 (not comprehensive) and 1.0 (comprehensive) and a string containing the reasons for the evaluation.

summarization_with_cot_reasons

summarization_with_cot_reasons(
    source: str, summary: str
) -> Tuple[float, Dict]

Summarization is deprecated in place of comprehensiveness. Defaulting to comprehensiveness_with_cot_reasons.

stereotypes

stereotypes(prompt: str, response: str) -> float

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example

feedback = Feedback(provider.stereotypes).on_input_output()
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
float

A value between 0.0 (no stereotypes assumed) and 1.0 (stereotypes assumed).

stereotypes_with_cot_reasons

stereotypes_with_cot_reasons(
    prompt: str, response: str
) -> Tuple[float, Dict]

Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.

Example

feedback = Feedback(provider.stereotypes_with_cot_reasons).on_input_output()
PARAMETER DESCRIPTION
prompt

A text prompt to an agent.

TYPE: str

response

The agent's response to the prompt.

TYPE: str

RETURNS DESCRIPTION
Tuple[float, Dict]

Tuple[float, str]: A tuple containing a value between 0.0 (no stereotypes assumed) and 1.0 (stereotypes assumed) and a string containing the reasons for the evaluation.

groundedness_measure_with_cot_reasons

groundedness_measure_with_cot_reasons(
    source: str, statement: str
) -> Tuple[float, dict]

A measure to track if the source material supports each sentence in the statement using an LLM provider.

The LLM will process the entire statement at once, using chain of thought methodology to emit the reasons.

Example

from trulens_eval import Feedback
from trulens_eval.feedback.provider.openai import OpenAI

provider = OpenAI()

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons)
    .on(context.collect()
    .on_output()
    )

Args: source: The source that should support the statement. statement: The statement to check groundedness.

RETURNS DESCRIPTION
Tuple[float, dict]

Tuple[float, str]: A tuple containing a value between 0.0 (not grounded) and 1.0 (grounded) and a string containing the reasons for the evaluation.