π€ Huggingface Provider¶
trulens_eval.feedback.provider.hugs.Huggingface
¶
Bases: Provider
Out of the box feedback functions calling Huggingface APIs.
Functions¶
__init__
¶
Create a Huggingface Provider with out of the box feedback functions.
Example
from trulens_eval.feedback.provider.hugs import Huggingface
huggingface_provider = Huggingface()
language_match
¶
Uses Huggingface's papluca/xlm-roberta-base-language-detection model. A
function that uses language detection on text1
and text2
and
calculates the probit difference on the language detected on text1. The
function is: 1.0 - (|probit_language_text1(text1) -
probit_language_text1(text2))
Example
from trulens_eval import Feedback
from trulens_eval.feedback.provider.hugs import Huggingface
huggingface_provider = Huggingface()
feedback = Feedback(huggingface_provider.language_match).on_input_output()
The on_input_output()
selector can be changed. See Feedback Function
Guide
PARAMETER | DESCRIPTION |
---|---|
text1 |
Text to evaluate.
TYPE:
|
text2 |
Comparative text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0 and 1. 0 being "different languages" and 1 being "same languages". |
groundedness_measure_with_nli
¶
A measure to track if the source material supports each sentence in the statement using an NLI model.
First the response will be split into statements using a sentence tokenizer.The NLI model will process each statement using a natural language inference model, and will use the entire source.
Example
from trulens_eval.feedback import Feedback
from trulens_eval.feedback.provider.hugs = Huggingface
huggingface_provider = Huggingface()
f_groundedness = (
Feedback(huggingface_provider.groundedness_measure_with_nli)
.on(context)
.on_output()
PARAMETER | DESCRIPTION |
---|---|
source |
The source that should support the statement
TYPE:
|
statement |
The statement to check groundedness
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[float, dict]
|
Tuple[float, str]: A tuple containing a value between 0.0 (not grounded) and 1.0 (grounded) and a string containing the reasons for the evaluation. |
context_relevance
¶
Uses Huggingface's truera/context_relevance model, a model that uses computes the relevance of a given context to the prompt. The model can be found at https://huggingface.co/truera/context_relevance.
Example
from trulens_eval import Feedback
from trulens_eval.feedback.provider.hugs import Huggingface
huggingface_provider = Huggingface()
feedback = (
Feedback(huggingface_provider.context_relevance)
.on_input()
.on(context)
.aggregate(np.mean)
)
PARAMETER | DESCRIPTION |
---|---|
prompt |
The given prompt.
TYPE:
|
context |
Comparative contextual information.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0 and 1. 0 being irrelevant and 1 being a relevant context for addressing the prompt.
TYPE:
|
positive_sentiment
¶
Uses Huggingface's cardiffnlp/twitter-roberta-base-sentiment model. A
function that uses a sentiment classifier on text
.
Example
from trulens_eval import Feedback
from trulens_eval.feedback.provider.hugs import Huggingface
huggingface_provider = Huggingface()
feedback = Feedback(huggingface_provider.positive_sentiment).on_output()
PARAMETER | DESCRIPTION |
---|---|
text |
Text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0 (negative sentiment) and 1 (positive sentiment).
TYPE:
|
toxic
¶
Uses Huggingface's martin-ha/toxic-comment-model model. A function that
uses a toxic comment classifier on text
.
Example
from trulens_eval import Feedback
from trulens_eval.feedback.provider.hugs import Huggingface
huggingface_provider = Huggingface()
feedback = Feedback(huggingface_provider.toxic).on_output()
PARAMETER | DESCRIPTION |
---|---|
text |
Text to evaluate.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
A value between 0 (not toxic) and 1 (toxic).
TYPE:
|
pii_detection
¶
NER model to detect PII.
Example
hugs = Huggingface()
# Define a pii_detection feedback function using HuggingFace.
f_pii_detection = Feedback(hugs.pii_detection).on_input()
The on(...)
selector can be changed. See Feedback Function Guide:
Selectors
PARAMETER | DESCRIPTION |
---|---|
text |
A text prompt that may contain a PII.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
The likelihood that a PII is contained in the input text.
TYPE:
|
pii_detection_with_cot_reasons
¶
pii_detection_with_cot_reasons(text: str)
NER model to detect PII, with reasons.
Example
hugs = Huggingface()
# Define a pii_detection feedback function using HuggingFace.
f_pii_detection = Feedback(hugs.pii_detection).on_input()
The on(...)
selector can be changed. See Feedback Function Guide
:
Selectors
Args: text: A text prompt that may contain a name.
Returns: Tuple[float, str]: A tuple containing a the likelihood that a PII is contained in the input text and a string containing what PII is detected (if any).
hallucination_evaluator
¶
Evaluates the hallucination score for a combined input of two statements as a float 0<x<1 representing a true/false boolean. if the return is greater than 0.5 the statement is evaluated as true. if the return is less than 0.5 the statement is evaluated as a hallucination.
Example
from trulens_eval.feedback.provider.hugs import Huggingface
huggingface_provider = Huggingface()
score = huggingface_provider.hallucination_evaluator("The sky is blue. [SEP] Apples are red , the grass is green.")
PARAMETER | DESCRIPTION |
---|---|
model_output |
This is what an LLM returns based on the text chunks retrieved during RAG
TYPE:
|
retrieved_text_chunk |
These are the text chunks you have retrieved during RAG
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
float
|
Hallucination score
TYPE:
|