trulens.core¶
trulens.core
¶
Trulens Core LLM Evaluation Library¶
The trulens-core
library includes everything to get started.
Classes¶
Feedback
¶
Bases: FeedbackDefinition
Feedback function container.
Typical usage is to specify a feedback implementation function from a Provider and the mapping of selectors describing how to construct the arguments to the implementation:
Example
from trulens.core import Feedback
from trulens.providers.huggingface import Huggingface
hugs = Huggingface()
# Create a feedback function from a provider:
feedback = Feedback(
hugs.language_match # the implementation
).on_input_output() # selectors shorthand
Attributes¶
tru_class_info
instance-attribute
¶
tru_class_info: Class
Class information of this pydantic object for use in deserialization.
Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.
implementation
class-attribute
instance-attribute
¶
Implementation serialization.
aggregator
class-attribute
instance-attribute
¶
Aggregator method serialization.
combinations
class-attribute
instance-attribute
¶
combinations: Optional[FeedbackCombinations] = PRODUCT
Mode of combining selected values to produce arguments to each feedback function call.
feedback_definition_id
instance-attribute
¶
feedback_definition_id: FeedbackDefinitionID = (
feedback_definition_id
)
Id, if not given, uniquely determined from content.
if_exists
class-attribute
instance-attribute
¶
Only execute the feedback function if the following selector names something that exists in a record/app.
Can use this to evaluate conditionally on presence of some calls, for example. Feedbacks skipped this way will have a status of FeedbackResultStatus.SKIPPED.
if_missing
class-attribute
instance-attribute
¶
if_missing: FeedbackOnMissingParameters = ERROR
How to handle missing parameters in feedback function calls.
run_location
instance-attribute
¶
run_location: Optional[FeedbackRunLocation]
Where the feedback evaluation takes place (e.g. locally, at a Snowflake server, etc).
supplied_name
class-attribute
instance-attribute
¶
An optional name. Only will affect displayed tables.
higher_is_better
class-attribute
instance-attribute
¶
Feedback result magnitude interpretation.
imp
class-attribute
instance-attribute
¶
imp: Optional[ImpCallable] = imp
Implementation callable.
A serialized version is stored at FeedbackDefinition.implementation.
agg
class-attribute
instance-attribute
¶
agg: Optional[AggCallable] = agg
Aggregator method for feedback functions that produce more than one result.
A serialized version is stored at FeedbackDefinition.aggregator.
name
property
¶
name: str
Name of the feedback function.
Derived from the name of the function implementing it if no supplied name provided.
Functions¶
load
staticmethod
¶
load(obj, *args, **kwargs)
Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.
model_validate
classmethod
¶
model_validate(*args, **kwargs) -> Any
Deserialized a jsonized version of the app into the instance of the class it was serialized from.
Note
This process uses extra information stored in the jsonized object and handled by WithClassInfo.
on_input_output
¶
on_input_output() -> Feedback
Specifies that the feedback implementation arguments are to be the main app input and output in that order.
Returns a new Feedback object with the specification.
on_default
¶
on_default() -> Feedback
Specifies that one argument feedbacks should be evaluated on the main app output and two argument feedbacks should be evaluates on main input and main output in that order.
Returns a new Feedback object with this specification.
evaluate_deferred
staticmethod
¶
evaluate_deferred(
session: TruSession,
limit: Optional[int] = None,
shuffle: bool = False,
run_location: Optional[FeedbackRunLocation] = None,
) -> List[Tuple[Series, Future[FeedbackResult]]]
Evaluates feedback functions that were specified to be deferred.
Returns a list of tuples with the DB row containing the Feedback and initial FeedbackResult as well as the Future which will contain the actual result.
PARAMETER | DESCRIPTION |
---|---|
limit |
The maximum number of evals to start. |
shuffle |
Shuffle the order of the feedbacks to evaluate.
TYPE:
|
run_location |
Only run feedback functions with this run_location.
TYPE:
|
Constants that govern behavior:
-
TruSession.RETRY_RUNNING_SECONDS: How long to time before restarting a feedback that was started but never failed (or failed without recording that fact).
-
TruSession.RETRY_FAILED_SECONDS: How long to wait to retry a failed feedback.
aggregate
¶
aggregate(
func: Optional[AggCallable] = None,
combinations: Optional[FeedbackCombinations] = None,
) -> Feedback
Specify the aggregation function in case the selectors for this feedback generate more than one value for implementation argument(s). Can also specify the method of producing combinations of values in such cases.
Returns a new Feedback object with the given aggregation function and/or the given combination mode.
on_prompt
¶
Create a variant of self
that will take in the main app input or
"prompt" as input, sending it as an argument arg
to implementation.
on_response
¶
Create a variant of self
that will take in the main app output or
"response" as input, sending it as an argument arg
to implementation.
on
¶
on(*args, **kwargs) -> Feedback
Create a variant of self
with the same implementation but the given
selectors. Those provided positionally get their implementation argument
name guessed and those provided as kwargs get their name from the kwargs
key.
check_selectors
¶
check_selectors(
app: Union[AppDefinition, JSON],
record: Record,
source_data: Optional[Dict[str, Any]] = None,
warning: bool = False,
) -> bool
Check that the selectors are valid for the given app and record.
PARAMETER | DESCRIPTION |
---|---|
app |
The app that produced the record.
TYPE:
|
record |
The record that the feedback will run on. This can be a mostly empty record for checking ahead of producing one. The utility method App.dummy_record is built for this purpose.
TYPE:
|
source_data |
Additional data to select from when extracting feedback function arguments. |
warning |
Issue a warning instead of raising an error if a selector is invalid. As some parts of a Record cannot be known ahead of producing it, it may be necessary to not raise exception here and only issue a warning.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
True if the selectors are valid. False if not (if warning is set). |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If a selector is invalid and warning is not set. |
run
¶
run(
app: Optional[Union[AppDefinition, JSON]] = None,
record: Optional[Record] = None,
source_data: Optional[Dict] = None,
**kwargs: Dict[str, Any]
) -> FeedbackResult
Run the feedback function on the given record
. The app
that
produced the record is also required to determine input/output argument
names.
PARAMETER | DESCRIPTION |
---|---|
app |
The app that produced the record. This can be AppDefinition or a jsonized AppDefinition. It will be jsonized if it is not already.
TYPE:
|
record |
The record to evaluate the feedback on. |
source_data |
Additional data to select from when extracting feedback function arguments. |
**kwargs |
Any additional keyword arguments are used to set or override selected feedback function inputs. |
RETURNS | DESCRIPTION |
---|---|
FeedbackResult
|
A FeedbackResult object with the result of the feedback function. |
extract_selection
¶
extract_selection(
app: Optional[Union[AppDefinition, JSON]] = None,
record: Optional[Record] = None,
source_data: Optional[Dict] = None,
) -> Iterable[Dict[str, Any]]
Given the app
that produced the given record
, extract from record
the values that will be sent as arguments to the implementation as
specified by self.selectors
. Additional data to select from can be
provided in source_data
. All args are optional. If a
Record is specified, its calls are
laid out as app (see
layout_calls_as_app).
Provider
¶
Bases: WithClassInfo
, SerialModel
Base Provider class.
TruLens makes use of Feedback Providers to generate evaluations of large language model applications. These providers act as an access point to different models, most commonly classification models and large language models.
These models are then used to generate feedback on application outputs or intermediate results.
Provider
is the base class for all feedback providers. It is an abstract
class and should not be instantiated directly. Rather, it should be subclassed
and the subclass should implement the methods defined in this class.
There are many feedback providers available in TruLens that grant access to a wide range of proprietary and open-source models.
Providers for classification and other non-LLM models should directly subclass Provider
.
The feedback functions available for these providers are tied to specific providers, as they
rely on provider-specific endpoints to models that are tuned to a particular task.
For example, the Huggingface feedback provider provides access to a number of classification models for specific tasks, such as language detection. These models are than utilized by a feedback function to generate an evaluation score.
Example:
```python
from trulens.providers.huggingface import Huggingface
huggingface_provider = Huggingface()
huggingface_provider.language_match(prompt, response)
```
Providers for LLM models should subclass trulens.feedback.LLMProvider
, which itself subclasses Provider
.
Providers for LLM-generated feedback are more of a plug-and-play variety. This means that the
base model of your choice can be combined with feedback-specific prompting to generate feedback.
For example, relevance
can be run with any base LLM feedback provider. Once the feedback provider
is instantiated with a base model, the relevance
function can be called with a prompt and response.
This means that the base model selected is combined with specific prompting for relevance
to generate feedback.
Example:
```python
from trulens.providers.openai import OpenAI
provider = OpenAI(model_engine="gpt-3.5-turbo")
provider.relevance(prompt, response)
```
Attributes¶
tru_class_info
instance-attribute
¶
tru_class_info: Class
Class information of this pydantic object for use in deserialization.
Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.
endpoint
class-attribute
instance-attribute
¶
Endpoint supporting this provider.
Remote API invocations are handled by the endpoint.
Functions¶
load
staticmethod
¶
load(obj, *args, **kwargs)
Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.
model_validate
classmethod
¶
model_validate(*args, **kwargs) -> Any
Deserialized a jsonized version of the app into the instance of the class it was serialized from.
Note
This process uses extra information stored in the jsonized object and handled by WithClassInfo.
SnowflakeFeedback
¶
Bases: Feedback
Similar to the parent class Feedback except this ensures the feedback is run only on the Snowflake server.
Attributes¶
tru_class_info
instance-attribute
¶
tru_class_info: Class
Class information of this pydantic object for use in deserialization.
Using this odd key to not pollute attribute names in whatever class we mix this into. Should be the same as CLASS_INFO.
implementation
class-attribute
instance-attribute
¶
Implementation serialization.
aggregator
class-attribute
instance-attribute
¶
Aggregator method serialization.
combinations
class-attribute
instance-attribute
¶
combinations: Optional[FeedbackCombinations] = PRODUCT
Mode of combining selected values to produce arguments to each feedback function call.
feedback_definition_id
instance-attribute
¶
feedback_definition_id: FeedbackDefinitionID = (
feedback_definition_id
)
Id, if not given, uniquely determined from content.
if_exists
class-attribute
instance-attribute
¶
Only execute the feedback function if the following selector names something that exists in a record/app.
Can use this to evaluate conditionally on presence of some calls, for example. Feedbacks skipped this way will have a status of FeedbackResultStatus.SKIPPED.
if_missing
class-attribute
instance-attribute
¶
if_missing: FeedbackOnMissingParameters = ERROR
How to handle missing parameters in feedback function calls.
supplied_name
class-attribute
instance-attribute
¶
An optional name. Only will affect displayed tables.
higher_is_better
class-attribute
instance-attribute
¶
Feedback result magnitude interpretation.
name
property
¶
name: str
Name of the feedback function.
Derived from the name of the function implementing it if no supplied name provided.
imp
class-attribute
instance-attribute
¶
imp: Optional[ImpCallable] = imp
Implementation callable.
A serialized version is stored at FeedbackDefinition.implementation.
agg
class-attribute
instance-attribute
¶
agg: Optional[AggCallable] = agg
Aggregator method for feedback functions that produce more than one result.
A serialized version is stored at FeedbackDefinition.aggregator.
Functions¶
load
staticmethod
¶
load(obj, *args, **kwargs)
Deserialize/load this object using the class information in tru_class_info to lookup the actual class that will do the deserialization.
model_validate
classmethod
¶
model_validate(*args, **kwargs) -> Any
Deserialized a jsonized version of the app into the instance of the class it was serialized from.
Note
This process uses extra information stored in the jsonized object and handled by WithClassInfo.
on_input_output
¶
on_input_output() -> Feedback
Specifies that the feedback implementation arguments are to be the main app input and output in that order.
Returns a new Feedback object with the specification.
on_default
¶
on_default() -> Feedback
Specifies that one argument feedbacks should be evaluated on the main app output and two argument feedbacks should be evaluates on main input and main output in that order.
Returns a new Feedback object with this specification.
evaluate_deferred
staticmethod
¶
evaluate_deferred(
session: TruSession,
limit: Optional[int] = None,
shuffle: bool = False,
run_location: Optional[FeedbackRunLocation] = None,
) -> List[Tuple[Series, Future[FeedbackResult]]]
Evaluates feedback functions that were specified to be deferred.
Returns a list of tuples with the DB row containing the Feedback and initial FeedbackResult as well as the Future which will contain the actual result.
PARAMETER | DESCRIPTION |
---|---|
limit |
The maximum number of evals to start. |
shuffle |
Shuffle the order of the feedbacks to evaluate.
TYPE:
|
run_location |
Only run feedback functions with this run_location.
TYPE:
|
Constants that govern behavior:
-
TruSession.RETRY_RUNNING_SECONDS: How long to time before restarting a feedback that was started but never failed (or failed without recording that fact).
-
TruSession.RETRY_FAILED_SECONDS: How long to wait to retry a failed feedback.
aggregate
¶
aggregate(
func: Optional[AggCallable] = None,
combinations: Optional[FeedbackCombinations] = None,
) -> Feedback
Specify the aggregation function in case the selectors for this feedback generate more than one value for implementation argument(s). Can also specify the method of producing combinations of values in such cases.
Returns a new Feedback object with the given aggregation function and/or the given combination mode.
on_prompt
¶
Create a variant of self
that will take in the main app input or
"prompt" as input, sending it as an argument arg
to implementation.
on_response
¶
Create a variant of self
that will take in the main app output or
"response" as input, sending it as an argument arg
to implementation.
on
¶
on(*args, **kwargs) -> Feedback
Create a variant of self
with the same implementation but the given
selectors. Those provided positionally get their implementation argument
name guessed and those provided as kwargs get their name from the kwargs
key.
check_selectors
¶
check_selectors(
app: Union[AppDefinition, JSON],
record: Record,
source_data: Optional[Dict[str, Any]] = None,
warning: bool = False,
) -> bool
Check that the selectors are valid for the given app and record.
PARAMETER | DESCRIPTION |
---|---|
app |
The app that produced the record.
TYPE:
|
record |
The record that the feedback will run on. This can be a mostly empty record for checking ahead of producing one. The utility method App.dummy_record is built for this purpose.
TYPE:
|
source_data |
Additional data to select from when extracting feedback function arguments. |
warning |
Issue a warning instead of raising an error if a selector is invalid. As some parts of a Record cannot be known ahead of producing it, it may be necessary to not raise exception here and only issue a warning.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
True if the selectors are valid. False if not (if warning is set). |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If a selector is invalid and warning is not set. |
run
¶
run(
app: Optional[Union[AppDefinition, JSON]] = None,
record: Optional[Record] = None,
source_data: Optional[Dict] = None,
**kwargs: Dict[str, Any]
) -> FeedbackResult
Run the feedback function on the given record
. The app
that
produced the record is also required to determine input/output argument
names.
PARAMETER | DESCRIPTION |
---|---|
app |
The app that produced the record. This can be AppDefinition or a jsonized AppDefinition. It will be jsonized if it is not already.
TYPE:
|
record |
The record to evaluate the feedback on. |
source_data |
Additional data to select from when extracting feedback function arguments. |
**kwargs |
Any additional keyword arguments are used to set or override selected feedback function inputs. |
RETURNS | DESCRIPTION |
---|---|
FeedbackResult
|
A FeedbackResult object with the result of the feedback function. |
extract_selection
¶
extract_selection(
app: Optional[Union[AppDefinition, JSON]] = None,
record: Optional[Record] = None,
source_data: Optional[Dict] = None,
) -> Iterable[Dict[str, Any]]
Given the app
that produced the given record
, extract from record
the values that will be sent as arguments to the implementation as
specified by self.selectors
. Additional data to select from can be
provided in source_data
. All args are optional. If a
Record is specified, its calls are
laid out as app (see
layout_calls_as_app).
FeedbackMode
¶
Mode of feedback evaluation.
Specify this using the feedback_mode
to App
constructors.
Note
This class extends str to allow users to compare its values with
their string representations, i.e. in if mode == "none": ...
. Internal
uses should use the enum instances.
Attributes¶
NONE
class-attribute
instance-attribute
¶
NONE = 'none'
No evaluation will happen even if feedback functions are specified.
WITH_APP
class-attribute
instance-attribute
¶
WITH_APP = 'with_app'
Try to run feedback functions immediately and before app returns a record.
WITH_APP_THREAD
class-attribute
instance-attribute
¶
WITH_APP_THREAD = 'with_app_thread'
Try to run feedback functions in the same process as the app but after it produces a record.
DEFERRED
class-attribute
instance-attribute
¶
DEFERRED = 'deferred'
Evaluate later via the process started by
TruSession.start_deferred_feedback_evaluator
.
Select
¶
Utilities for creating selectors using Lens and aliases/shortcuts.
Attributes¶
Tru
class-attribute
instance-attribute
¶
Selector for the tru wrapper (TruLlama, TruChain, etc.).
RecordInput
class-attribute
instance-attribute
¶
RecordInput: Query = main_input
Selector for the main app input.
RecordOutput
class-attribute
instance-attribute
¶
RecordOutput: Query = main_output
Selector for the main app output.
RecordCalls
class-attribute
instance-attribute
¶
RecordCalls: Query = app
Selector for the calls made by the wrapped app.
Laid out by path into components.
RecordCall
class-attribute
instance-attribute
¶
RecordCall: Query = calls[-1]
Selector for the first called method (last to return).
RecordArgs
class-attribute
instance-attribute
¶
RecordArgs: Query = args
Selector for the whole set of inputs/arguments to the first called / last method call.
RecordRets
class-attribute
instance-attribute
¶
RecordRets: Query = rets
Selector for the whole output of the first called / last returned method call.
Functions¶
path_and_method
staticmethod
¶
If select
names in method as the last attribute, extract the method name
and the selector without the final method name.
dequalify
staticmethod
¶
If the given selector qualifies record or app, remove that qualification.
TruSession
¶
Bases: BaseModel
, SingletonPerName
TruSession is the main class that provides an entry points to trulens.
TruSession lets you:
- Log app prompts and outputs
- Log app Metadata
- Run and log feedback functions
- Run streamlit dashboard to view experiment results
By default, all data is logged to the current working directory to
"default.sqlite"
. Data can be logged to a SQLAlchemy-compatible url
referred to by database_url
.
Supported App Types
TruChain: Langchain apps.
TruLlama: Llama Index apps.
TruRails: NeMo Guardrails apps.
TruBasicApp:
Basic apps defined solely using a function from str
to str
.
TruCustomApp: Custom apps containing custom structures and methods. Requires annotation of methods to instrument.
TruVirtual: Virtual apps that do not have a real app to instrument but have a virtual structure and can log existing captured data as if they were trulens records.
PARAMETER | DESCRIPTION |
---|---|
connector |
Database Connector to use. If not provided, a default DefaultDBConnector is created.
TYPE:
|
**kwargs |
All other arguments are used to initialize
DefaultDBConnector.
Mutually exclusive with
DEFAULT:
|
Attributes¶
RETRY_RUNNING_SECONDS
class-attribute
instance-attribute
¶
RETRY_RUNNING_SECONDS: float = 60.0
How long to wait (in seconds) before restarting a feedback function that has already started
A feedback function execution that has started may have stalled or failed in a bad way that did not record the failure.
RETRY_FAILED_SECONDS
class-attribute
instance-attribute
¶
RETRY_FAILED_SECONDS: float = 5 * 60.0
How long to wait (in seconds) to retry a failed feedback function run.
DEFERRED_NUM_RUNS
class-attribute
instance-attribute
¶
DEFERRED_NUM_RUNS: int = 32
Number of futures to wait for when evaluating deferred feedback functions.
RECORDS_BATCH_TIMEOUT_IN_SEC
class-attribute
instance-attribute
¶
RECORDS_BATCH_TIMEOUT_IN_SEC: int = 10
Time to wait before inserting a batch of records into the database.
GROUND_TRUTHS_BATCH_SIZE
class-attribute
instance-attribute
¶
GROUND_TRUTHS_BATCH_SIZE: int = 100
Time to wait before inserting a batch of ground truths into the database.
connector
class-attribute
instance-attribute
¶
connector: Optional[DBConnector] = Field(None, exclude=True)
Database Connector to use. If not provided, a default is created and used.
Functions¶
delete_singleton_by_name
staticmethod
¶
delete_singleton_by_name(
name: str, cls: Optional[Type[SingletonPerName]] = None
)
Delete the singleton instance with the given name.
This can be used for testing to create another singleton.
PARAMETER | DESCRIPTION |
---|---|
name |
The name of the singleton instance to delete.
TYPE:
|
cls |
The class of the singleton instance to delete. If not given, all instances with the given name are deleted.
TYPE:
|
delete_singleton
¶
delete_singleton()
Delete the singleton instance. Can be used for testing to create another singleton.
App
¶
Create an App from the given App constructor arguments by guessing which app type they refer to.
This method intentionally prints out the type of app being created to let user know in case the guess is wrong.
Virtual
¶
Virtual(*args, **kwargs) -> App
Deprecated
Use trulens.core.session.TruSession.App instead.
find_unused_port
¶
find_unused_port(*args, **kwargs)
Deprecated
Use trulens.dashboard.run.find_unused_port instead.
run_dashboard
¶
run_dashboard(*args, **kwargs)
Deprecated
Use trulens.dashboard.run.run_dashboard instead.
start_dashboard
¶
start_dashboard(*args, **kwargs)
Deprecated
Use trulens.dashboard.run.run_dashboard instead.
stop_dashboard
¶
stop_dashboard(*args, **kwargs)
Deprecated
Use trulens.dashboard.run.stop_dashboard instead.
update_record
¶
update_record(*args, **kwargs)
Deprecated
Use trulens.core.session.TruSession.connector .db.insert_record instead.
migrate_database
¶
Migrates the database.
This should be run whenever there are breaking changes in a database created with an older version of trulens.
PARAMETER | DESCRIPTION |
---|---|
**kwargs |
Keyword arguments to pass to migrate_database of the current database. |
See DB.migrate_database.
add_record
¶
add_record_nowait
¶
add_record_nowait(record: Record) -> None
Add a record to the queue to be inserted in the next batch.
run_feedback_functions
¶
run_feedback_functions(
record: Record,
feedback_functions: Sequence[Feedback],
app: Optional[AppDefinition] = None,
wait: bool = True,
) -> Union[
Iterable[FeedbackResult],
Iterable[Future[FeedbackResult]],
]
Run a collection of feedback functions and report their result.
PARAMETER | DESCRIPTION |
---|---|
record |
The record on which to evaluate the feedback functions.
TYPE:
|
app |
The app that produced the given record.
If not provided, it is looked up from the given database
TYPE:
|
feedback_functions |
A collection of feedback functions to evaluate. |
wait |
If set (default), will wait for results before returning.
TYPE:
|
YIELDS | DESCRIPTION |
---|---|
Union[Iterable[FeedbackResult], Iterable[Future[FeedbackResult]]]
|
One result for each element of |
add_app
¶
add_app(app: AppDefinition) -> AppID
Add an app to the database and return its unique id.
PARAMETER | DESCRIPTION |
---|---|
app |
The app to add to the database.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
AppID
|
A unique app identifier str. |
delete_app
¶
delete_app(app_id: AppID) -> None
Deletes an app from the database based on its app_id.
PARAMETER | DESCRIPTION |
---|---|
app_id |
The unique identifier of the app to be deleted.
TYPE:
|
add_feedback
¶
add_feedback(
feedback_result_or_future: Optional[
Union[FeedbackResult, Future[FeedbackResult]]
] = None,
**kwargs: dict
) -> FeedbackResultID
Add a single feedback result or future to the database and return its unique id.
PARAMETER | DESCRIPTION |
---|---|
feedback_result_or_future |
If a Future
is given, call will wait for the result before adding it to the
database. If
TYPE:
|
**kwargs |
Fields to add to the given feedback result or to create a new FeedbackResult with.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
FeedbackResultID
|
A unique result identifier str. |
add_feedbacks
¶
add_feedbacks(
feedback_results: Iterable[
Union[FeedbackResult, Future[FeedbackResult]]
]
) -> List[FeedbackResultID]
Add multiple feedback results to the database and return their unique ids.
PARAMETER | DESCRIPTION |
---|---|
feedback_results |
An iterable with each iteration being a FeedbackResult or Future of the same. Each given future will be waited.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[FeedbackResultID]
|
List of unique result identifiers str in the same order as input
|
get_app
¶
get_app(app_id: AppID) -> Optional[JSONized[AppDefinition]]
Look up an app from the database.
This method produces the JSON-ized version of the app. It can be deserialized back into an AppDefinition with model_validate:
Example
from trulens.core.schema import app
app_json = session.get_app(app_id="app_hash_85ebbf172d02e733c8183ac035d0cbb2")
app = app.AppDefinition.model_validate(app_json)
Warning
Do not rely on deserializing into App as its implementations feature attributes not meant to be deserialized.
PARAMETER | DESCRIPTION |
---|---|
app_id |
The unique identifier str of the app to look up.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[JSONized[AppDefinition]]
|
JSON-ized version of the app. |
get_apps
¶
get_apps() -> List[JSONized[AppDefinition]]
Look up all apps from the database.
RETURNS | DESCRIPTION |
---|---|
List[JSONized[AppDefinition]]
|
A list of JSON-ized version of all apps in the database. |
Warning
Same Deserialization caveats as get_app.
get_records_and_feedback
¶
get_records_and_feedback(
app_ids: Optional[List[AppID]] = None,
offset: Optional[int] = None,
limit: Optional[int] = None,
) -> Tuple[DataFrame, List[str]]
Get records, their feedback results, and feedback names.
PARAMETER | DESCRIPTION |
---|---|
app_ids |
A list of app ids to filter records by. If empty or not given, all apps' records will be returned. |
offset |
Record row offset. |
limit |
Limit on the number of records to return. |
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
DataFrame of records with their feedback results. |
List[str]
|
List of feedback names that are columns in the DataFrame. |
get_leaderboard
¶
get_leaderboard(
app_ids: Optional[List[AppID]] = None,
group_by_metadata_key: Optional[str] = None,
) -> DataFrame
Get a leaderboard for the given apps.
PARAMETER | DESCRIPTION |
---|---|
app_ids |
A list of app ids to filter records by. If empty or not given, all apps will be included in leaderboard. |
group_by_metadata_key |
A key included in record metadata that you want to group results by. |
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
Dataframe of apps with their feedback results aggregated. |
DataFrame
|
If group_by_metadata_key is provided, the dataframe will be grouped by the specified key. |
add_ground_truth_to_dataset
¶
add_ground_truth_to_dataset(
dataset_name: str,
ground_truth_df: DataFrame,
dataset_metadata: Optional[Dict[str, Any]] = None,
)
Create a new dataset, if not existing, and add ground truth data to it. If the dataset with the same name already exists, the ground truth data will be added to it.
PARAMETER | DESCRIPTION |
---|---|
dataset_name |
Name of the dataset.
TYPE:
|
ground_truth_df |
DataFrame containing the ground truth data.
TYPE:
|
dataset_metadata |
Additional metadata to add to the dataset. |
get_ground_truth
¶
Get ground truth data from the dataset. dataset_name: Name of the dataset.
start_evaluator
¶
start_evaluator(
restart: bool = False,
fork: bool = False,
disable_tqdm: bool = False,
run_location: Optional[FeedbackRunLocation] = None,
return_when_done: bool = False,
) -> Optional[Union[Process, Thread]]
Start a deferred feedback function evaluation thread or process.
PARAMETER | DESCRIPTION |
---|---|
restart |
If set, will stop the existing evaluator before starting a new one.
TYPE:
|
fork |
If set, will start the evaluator in a new process instead of a thread. NOT CURRENTLY SUPPORTED.
TYPE:
|
disable_tqdm |
If set, will disable progress bar logging from the evaluator.
TYPE:
|
run_location |
Run only the evaluations corresponding to run_location.
TYPE:
|
return_when_done |
Instead of running asynchronously, will block until no feedbacks remain.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[Union[Process, Thread]]
|
If return_when_done is True, then returns None. Otherwise, the started process or thread that is executing the deferred feedback evaluator. |