LLM Provider¶
trulens_eval.feedback.provider.base.LLMProvider
¶
Bases: Provider
An LLM-based provider.
This is an abstract class and needs to be initialized as one of these:
-
OpenAI and subclass AzureOpenAI.
-
LiteLLM. LiteLLM provides an interface to a wide range of models.
Functions¶
generate_score
¶
generate_score(system_prompt: str, user_prompt: Optional[str] = None, normalize: float = 10.0, temperature: float = 0.0) -> float
Base method to generate a score only, used for evaluation.
PARAMETER | DESCRIPTION |
---|---|
system_prompt |
A pre-formatted system prompt.
TYPE:
|
user_prompt |
An optional user prompt. |
normalize |
The normalization factor for the score.
TYPE:
|
temperature |
The temperature for the LLM response.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
The score on a 0-1 scale. |
generate_score_and_reasons
¶
generate_score_and_reasons(system_prompt: str, user_prompt: Optional[str] = None, normalize: float = 10.0, temperature: float = 0.0) -> Tuple[float, Dict]
Base method to generate a score and reason, used for evaluation.
PARAMETER | DESCRIPTION |
---|---|
system_prompt |
A pre-formatted system prompt.
TYPE:
|
user_prompt |
An optional user prompt. Defaults to None. |
normalize |
The normalization factor for the score.
TYPE:
|
temperature |
The temperature for the LLM response.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
The score on a 0-1 scale. |
Dict
|
Reason metadata if returned by the LLM. |
context_relevance
¶
Uses chat completion model. A function that completes a template to check the relevance of the context to the question.
Example
from trulens_eval.app import App
context = App.select_context(rag_app)
feedback = (
Feedback(provider.context_relevance_with_cot_reasons)
.on_input()
.on(context)
.aggregate(np.mean)
)
The on(...)
selector can be changed. See Feedback Function Guide :
Selectors
PARAMETER | DESCRIPTION |
---|---|
question |
A question being asked.
TYPE:
|
context |
Context related to the question.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not relevant) and 1.0 (relevant).
TYPE:
|
qs_relevance
¶
Question statement relevance is deprecated and will be removed in future versions. Please use context relevance in its place.
context_relevance_with_cot_reasons
¶
context_relevance_with_cot_reasons(question: str, context: str, temperature: float = 0.0) -> Tuple[float, Dict]
Uses chat completion model. A function that completes a template to check the relevance of the context to the question. Also uses chain of thought methodology and emits the reasons.
Example
from trulens_eval.app import App
context = App.select_context(rag_app)
feedback = (
Feedback(provider.context_relevance_with_cot_reasons)
.on_input()
.on(context)
.aggregate(np.mean)
)
on(...)
selector can be changed. See Feedback Function Guide : Selectors
PARAMETER | DESCRIPTION |
---|---|
question |
A question being asked.
TYPE:
|
context |
Context related to the question.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0 and 1. 0 being "not relevant" and 1 being "relevant". |
qs_relevance_with_cot_reasons
¶
Question statement relevance is deprecated and will be removed in future versions. Please use context relevance in its place.
relevance
¶
Uses chat completion model. A function that completes a template to check the relevance of the response to a prompt.
Example
feedback = Feedback(provider.relevance).on_input_output()
The on_input_output()
selector can be changed. See Feedback Function
Guide
Usage on RAG Contexts
feedback = Feedback(provider.relevance).on_input().on(
TruLlama.select_source_nodes().node.text # See note below
).aggregate(np.mean)
The on(...)
selector can be changed. See Feedback Function Guide :
Selectors
PARAMETER | DESCRIPTION |
---|---|
prompt |
A text prompt to an agent.
TYPE:
|
response |
The agent's response to the prompt.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0 and 1. 0 being "not relevant" and 1 being "relevant".
TYPE:
|
relevance_with_cot_reasons
¶
Uses chat completion Model. A function that completes a template to check the relevance of the response to a prompt. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.relevance_with_cot_reasons).on_input_output()
The on_input_output()
selector can be changed. See Feedback Function
Guide
Usage on RAG Contexts
feedback = Feedback(provider.relevance_with_cot_reasons).on_input().on(
TruLlama.select_source_nodes().node.text # See note below
).aggregate(np.mean)
The on(...)
selector can be changed. See Feedback Function Guide :
Selectors
PARAMETER | DESCRIPTION |
---|---|
prompt |
A text prompt to an agent.
TYPE:
|
response |
The agent's response to the prompt.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0 and 1. 0 being "not relevant" and 1 being "relevant". |
sentiment
¶
Uses chat completion model. A function that completes a template to check the sentiment of some text.
Example
feedback = Feedback(provider.sentiment).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate sentiment of.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0 and 1. 0 being "negative sentiment" and 1 being "positive sentiment". |
sentiment_with_cot_reasons
¶
Uses chat completion model. A function that completes a template to check the sentiment of some text. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.sentiment_with_cot_reasons).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
Text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (negative sentiment) and 1.0 (positive sentiment). |
model_agreement
¶
Uses chat completion model. A function that gives a chat completion model the same prompt and gets a response, encouraging truthfulness. A second template is given to the model with a prompt that the original response is correct, and measures whether previous chat completion response is similar.
Example
feedback = Feedback(provider.model_agreement).on_input_output()
The on_input_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
prompt |
A text prompt to an agent.
TYPE:
|
response |
The agent's response to the prompt.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not in agreement) and 1.0 (in agreement).
TYPE:
|
conciseness
¶
Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.conciseness).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate the conciseness of.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not concise) and 1.0 (concise). |
conciseness_with_cot_reasons
¶
Uses chat completion model. A function that completes a template to check the conciseness of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.conciseness).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate the conciseness of.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not concise) and 1.0 (concise) |
Dict
|
A dictionary containing the reasons for the evaluation. |
correctness
¶
Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.correctness).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
A prompt to an agent.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not correct) and 1.0 (correct). |
correctness_with_cot_reasons
¶
Uses chat completion model. A function that completes a template to check the correctness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.correctness_with_cot_reasons).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
Text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not correct) and 1.0 (correct). |
coherence
¶
Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.coherence).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not coherent) and 1.0 (coherent).
TYPE:
|
coherence_with_cot_reasons
¶
Uses chat completion model. A function that completes a template to check the coherence of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.coherence_with_cot_reasons).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not coherent) and 1.0 (coherent). |
harmfulness
¶
Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.harmfulness).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not harmful) and 1.0 (harmful)".
TYPE:
|
harmfulness_with_cot_reasons
¶
Uses chat completion model. A function that completes a template to check the harmfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.harmfulness_with_cot_reasons).on_output()
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not harmful) and 1.0 (harmful). |
maliciousness
¶
Uses chat completion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.maliciousness).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not malicious) and 1.0 (malicious).
TYPE:
|
maliciousness_with_cot_reasons
¶
Uses chat compoletion model. A function that completes a template to check the maliciousness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.maliciousness_with_cot_reasons).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not malicious) and 1.0 (malicious). |
helpfulness
¶
Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.helpfulness).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not helpful) and 1.0 (helpful).
TYPE:
|
helpfulness_with_cot_reasons
¶
Uses chat completion model. A function that completes a template to check the helpfulness of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.helpfulness_with_cot_reasons).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not helpful) and 1.0 (helpful). |
controversiality
¶
Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval.
Example
feedback = Feedback(provider.controversiality).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not controversial) and 1.0 (controversial).
TYPE:
|
controversiality_with_cot_reasons
¶
Uses chat completion model. A function that completes a template to check the controversiality of some text. Prompt credit to Langchain Eval. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.controversiality_with_cot_reasons).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not controversial) and 1.0 (controversial). |
misogyny
¶
Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.misogyny).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not misogynistic) and 1.0 (misogynistic).
TYPE:
|
misogyny_with_cot_reasons
¶
Uses chat completion model. A function that completes a template to check the misogyny of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.misogyny_with_cot_reasons).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not misogynistic) and 1.0 (misogynistic). |
criminality
¶
Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.criminality).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not criminal) and 1.0 (criminal).
TYPE:
|
criminality_with_cot_reasons
¶
Uses chat completion model. A function that completes a template to check the criminality of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.criminality_with_cot_reasons).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not criminal) and 1.0 (criminal). |
insensitivity
¶
Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval.
Example
feedback = Feedback(provider.insensitivity).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not insensitive) and 1.0 (insensitive).
TYPE:
|
insensitivity_with_cot_reasons
¶
Uses chat completion model. A function that completes a template to check the insensitivity of some text. Prompt credit to LangChain Eval. Also uses chain of thought methodology and emits the reasons.
Example
feedback = Feedback(provider.insensitivity_with_cot_reasons).on_output()
The on_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text |
The text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (not insensitive) and 1.0 (insensitive). |
comprehensiveness_with_cot_reasons
¶
Uses chat completion model. A function that tries to distill main points and compares a summary against those main points. This feedback function only has a chain of thought implementation as it is extremely important in function assessment.
Example
feedback = Feedback(provider.comprehensiveness_with_cot_reasons).on_input_output()
PARAMETER | DESCRIPTION |
---|---|
source |
Text corresponding to source material.
TYPE:
|
summary |
Text corresponding to a summary.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[float, Dict]
|
A value between 0.0 (main points missed) and 1.0 (no main points missed). |
summarization_with_cot_reasons
¶
Summarization is deprecated in place of comprehensiveness. Defaulting to comprehensiveness_with_cot_reasons.
stereotypes
¶
Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.
Example
feedback = Feedback(provider.stereotypes).on_input_output()
PARAMETER | DESCRIPTION |
---|---|
prompt |
A text prompt to an agent.
TYPE:
|
response |
The agent's response to the prompt.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0.0 (no stereotypes assumed) and 1.0 (stereotypes assumed). |
stereotypes_with_cot_reasons
¶
Uses chat completion model. A function that completes a template to check adding assumed stereotypes in the response when not present in the prompt.
Example
feedback = Feedback(provider.stereotypes).on_input_output()
PARAMETER | DESCRIPTION |
---|---|
prompt |
A text prompt to an agent.
TYPE:
|
response |
The agent's response to the prompt.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[float, Dict]
|
A value between 0.0 (no stereotypes assumed) and 1.0 (stereotypes assumed). |