Custom Functions¶
Feedback functions are an extensible framework for evaluating LLMs. You can add your own feedback functions to evaluate the qualities required by your application by updating trulens_eval/feedback.py
, or simply creating a new provider class and feedback function in youre notebook. If your contributions would be useful for others, we encourage you to contribute to TruLens!
Feedback functions are organized by model provider into Provider classes.
The process for adding new feedback functions is:
- Create a new Provider class or locate an existing one that applies to your feedback function. If your feedback function does not rely on a model provider, you can create a standalone class. Add the new feedback function method to your selected class. Your new method can either take a single text (str) as a parameter or both prompt (str) and response (str). It should return a float between 0 (worst) and 1 (best).
from trulens_eval import Provider, Feedback, Select, Tru
class StandAlone(Provider):
def custom_feedback(self, my_text_field: str) -> float:
"""
A dummy function of text inputs to float outputs.
Parameters:
my_text_field (str): Text to evaluate.
Returns:
float: square length of the text
"""
return 1.0 / (1.0 + len(my_text_field) * len(my_text_field))
- Instantiate your provider and feedback functions. The feedback function is wrapped by the trulens-eval Feedback class which helps specify what will get sent to your function parameters (For example: Select.RecordInput or Select.RecordOutput)
standalone = StandAlone()
f_custom_function = Feedback(standalone.custom_feedback).on(
my_text_field=Select.RecordOutput
)
- Your feedback function is now ready to use just like the out of the box feedback functions. Below is an example of it being used.
tru = Tru()
feedback_results = tru.run_feedback_functions(
record=record,
feedback_functions=[f_custom_function]
)
tru.add_feedbacks(feedback_results)
Multi-Output Feedback functions¶
Trulens also supports multi-output feedback functions. As a typical feedback function will output a float between 0 and 1, multi-output should output a dictionary of output_key
to a float between 0 and 1. The feedbacks table will display the feedback with column feedback_name:::outputkey
multi_output_feedback = Feedback(lambda input_param: {'output_key1': 0.1, 'output_key2': 0.9}, name="multi").on(
input_param=Select.RecordOutput
)
feedback_results = tru.run_feedback_functions(
record=record,
feedback_functions=[multi_output_feedback]
)
tru.add_feedbacks(feedback_results)
# Aggregators will run on the same dict keys.
import numpy as np
multi_output_feedback = Feedback(lambda input_param: {'output_key1': 0.1, 'output_key2': 0.9}, name="multi-agg").on(
input_param=Select.RecordOutput
).aggregate(np.mean)
feedback_results = tru.run_feedback_functions(
record=record,
feedback_functions=[multi_output_feedback]
)
tru.add_feedbacks(feedback_results)
# For multi-context chunking, an aggregator can operate on a list of multi output dictionaries.
def dict_aggregator(list_dict_input):
agg = 0
for dict_input in list_dict_input:
agg += dict_input['output_key1']
return agg
multi_output_feedback = Feedback(lambda input_param: {'output_key1': 0.1, 'output_key2': 0.9}, name="multi-agg-dict").on(
input_param=Select.RecordOutput
).aggregate(dict_aggregator)
feedback_results = tru.run_feedback_functions(
record=record,
feedback_functions=[multi_output_feedback]
)
tru.add_feedbacks(feedback_results)