Skip to content

🦑 Tru

trulens_eval.tru.Tru

Bases: SingletonPerName

Tru is the main class that provides an entry points to trulens-eval.

Tru lets you:

  • Log app prompts and outputs
  • Log app Metadata
  • Run and log feedback functions
  • Run streamlit dashboard to view experiment results

By default, all data is logged to the current working directory to "default.sqlite". Data can be logged to a SQLAlchemy-compatible url referred to by database_url.

Supported App Types

TruChain: Langchain apps.

TruLlama: Llama Index apps.

TruRails: NeMo Guardrails apps.

TruBasicApp: Basic apps defined solely using a function from str to str.

TruCustomApp: Custom apps containing custom structures and methods. Requres annotation of methods to instrument.

TruVirtual: Virtual apps that do not have a real app to instrument but have a virtual structure and can log existing captured data as if they were trulens records.

PARAMETER DESCRIPTION
database

Database to use. If not provided, an SQLAlchemyDB database will be initialized based on the other arguments.

TYPE: Optional[DB] DEFAULT: None

database_url

Database URL. Defaults to a local SQLite database file at "default.sqlite" See this article on SQLAlchemy database URLs. (defaults to sqlite://DEFAULT_DATABASE_FILE).

TYPE: Optional[str] DEFAULT: None

database_file

Path to a local SQLite database file.

Deprecated: Use database_url instead.

TYPE: Optional[str] DEFAULT: None

database_prefix

Prefix for table names for trulens_eval to use. May be useful in some databases hosting other apps.

TYPE: Optional[str] DEFAULT: None

database_redact_keys

Whether to redact secret keys in data to be written to database (defaults to False)

TYPE: Optional[bool] DEFAULT: None

database_args

Additional arguments to pass to the database constructor.

TYPE: Optional[Dict[str, Any]] DEFAULT: None

Attributes

RETRY_RUNNING_SECONDS class-attribute instance-attribute

RETRY_RUNNING_SECONDS: float = 60.0

How long to wait (in seconds) before restarting a feedback function that has already started

A feedback function execution that has started may have stalled or failed in a bad way that did not record the failure.

See also

start_evaluator

DEFERRED

RETRY_FAILED_SECONDS class-attribute instance-attribute

RETRY_FAILED_SECONDS: float = 5 * 60.0

How long to wait (in seconds) to retry a failed feedback function run.

DEFERRED_NUM_RUNS class-attribute instance-attribute

DEFERRED_NUM_RUNS: int = 32

Number of futures to wait for when evaluating deferred feedback functions.

db instance-attribute

Database supporting this workspace.

Will be an opqaue wrapper if it is not ready to use due to migration requirements.

Functions

Chain

Chain(chain: Chain, **kwargs: dict) -> TruChain

Create a langchain app recorder with database managed by self.

PARAMETER DESCRIPTION
chain

The langchain chain defining the app to be instrumented.

TYPE: Chain

**kwargs

Additional keyword arguments to pass to the TruChain.

TYPE: dict DEFAULT: {}

Llama

Llama(engine: Union[BaseQueryEngine, BaseChatEngine], **kwargs: dict) -> TruLlama

Create a llama-index app recorder with database managed by self.

PARAMETER DESCRIPTION
engine

The llama-index engine defining the app to be instrumented.

TYPE: Union[BaseQueryEngine, BaseChatEngine]

**kwargs

Additional keyword arguments to pass to TruLlama.

TYPE: dict DEFAULT: {}

Basic

Basic(text_to_text: Callable[[str], str], **kwargs: dict) -> TruBasicApp

Create a basic app recorder with database managed by self.

PARAMETER DESCRIPTION
text_to_text

A function that takes a string and returns a string. The wrapped app's functionality is expected to be entirely in this function.

TYPE: Callable[[str], str]

**kwargs

Additional keyword arguments to pass to TruBasicApp.

TYPE: dict DEFAULT: {}

Custom

Custom(app: Any, **kwargs: dict) -> TruCustomApp

Create a custom app recorder with database managed by self.

PARAMETER DESCRIPTION
app

The app to be instrumented. This can be any python object.

TYPE: Any

**kwargs

Additional keyword arguments to pass to TruCustomApp.

TYPE: dict DEFAULT: {}

Virtual

Virtual(app: Union[VirtualApp, Dict], **kwargs: dict) -> TruVirtual

Create a virtual app recorder with database managed by self.

PARAMETER DESCRIPTION
app

The app to be instrumented. If not a VirtualApp, it is passed to VirtualApp constructor to create it.

TYPE: Union[VirtualApp, Dict]

**kwargs

Additional keyword arguments to pass to TruVirtual.

TYPE: dict DEFAULT: {}

reset_database

reset_database()

Reset the database. Clears all tables.

See DB.reset_database.

migrate_database

migrate_database(**kwargs: Dict[str, Any])

Migrates the database.

This should be run whenever there are breaking changes in a database created with an older version of trulens_eval.

PARAMETER DESCRIPTION
**kwargs

Keyword arguments to pass to migrate_database of the current database.

TYPE: Dict[str, Any] DEFAULT: {}

See DB.migrate_database.

add_record

add_record(record: Optional[Record] = None, **kwargs: dict) -> RecordID

Add a record to the database.

PARAMETER DESCRIPTION
record

The record to add.

TYPE: Optional[Record] DEFAULT: None

**kwargs

Record fields to add to the given record or a new record if no record provided.

TYPE: dict DEFAULT: {}

RETURNS DESCRIPTION
RecordID

Unique record identifier str .

run_feedback_functions

run_feedback_functions(record: Record, feedback_functions: Sequence[Feedback], app: Optional[AppDefinition] = None, wait: bool = True) -> Union[Iterable[FeedbackResult], Iterable[Future[FeedbackResult]]]

Run a collection of feedback functions and report their result.

PARAMETER DESCRIPTION
record

The record on which to evaluate the feedback functions.

TYPE: Record

app

The app that produced the given record. If not provided, it is looked up from the given database db.

TYPE: Optional[AppDefinition] DEFAULT: None

feedback_functions

A collection of feedback functions to evaluate.

TYPE: Sequence[Feedback]

wait

If set (default), will wait for results before returning.

TYPE: bool DEFAULT: True

YIELDS DESCRIPTION
Union[Iterable[FeedbackResult], Iterable[Future[FeedbackResult]]]

One result for each element of feedback_functions of FeedbackResult if wait is enabled (default) or Future of FeedbackResult if wait is disabled.

add_app

add_app(app: AppDefinition) -> AppID

Add an app to the database and return its unique id.

PARAMETER DESCRIPTION
app

The app to add to the database.

TYPE: AppDefinition

RETURNS DESCRIPTION
AppID

A unique app identifier str.

delete_app

delete_app(app_id: AppID) -> None

Deletes an app from the database based on its app_id.

PARAMETER DESCRIPTION
app_id

The unique identifier of the app to be deleted.

TYPE: AppID

add_feedback

add_feedback(feedback_result_or_future: Optional[Union[FeedbackResult, Future[FeedbackResult]]] = None, **kwargs: dict) -> FeedbackResultID

Add a single feedback result or future to the database and return its unique id.

PARAMETER DESCRIPTION
feedback_result_or_future

If a Future is given, call will wait for the result before adding it to the database. If kwargs are given and a FeedbackResult is also given, the kwargs will be used to update the FeedbackResult otherwise a new one will be created with kwargs as arguments to its constructor.

TYPE: Optional[Union[FeedbackResult, Future[FeedbackResult]]] DEFAULT: None

**kwargs

Fields to add to the given feedback result or to create a new FeedbackResult with.

TYPE: dict DEFAULT: {}

RETURNS DESCRIPTION
FeedbackResultID

A unique result identifier str.

add_feedbacks

add_feedbacks(feedback_results: Iterable[Union[FeedbackResult, Future[FeedbackResult]]]) -> List[FeedbackResultID]

Add multiple feedback results to the database and return their unique ids.

PARAMETER DESCRIPTION
feedback_results

An iterable with each iteration being a FeedbackResult or Future of the same. Each given future will be waited.

TYPE: Iterable[Union[FeedbackResult, Future[FeedbackResult]]]

RETURNS DESCRIPTION
List[FeedbackResultID]

List of unique result identifiers str in the same order as input feedback_results.

get_app

get_app(app_id: AppID) -> JSONized[AppDefinition]

Look up an app from the database.

This method produces the JSON-ized version of the app. It can be deserialized back into an AppDefinition with model_validate:

Example
from trulens_eval.schema import app
app_json = tru.get_app(app_id="Custom Application v1")
app = app.AppDefinition.model_validate(app_json)
Warning

Do not rely on deserializing into App as its implementations feature attributes not meant to be deserialized.

PARAMETER DESCRIPTION
app_id

The unique identifier str of the app to look up.

TYPE: AppID

RETURNS DESCRIPTION
JSONized[AppDefinition]

JSON-ized version of the app.

get_apps

get_apps() -> List[JSONized[AppDefinition]]

Look up all apps from the database.

RETURNS DESCRIPTION
List[JSONized[AppDefinition]]

A list of JSON-ized version of all apps in the database.

Warning

Same Deserialization caveats as get_app.

get_records_and_feedback

get_records_and_feedback(app_ids: Optional[List[AppID]] = None) -> Tuple[DataFrame, List[str]]

Get records, their feeback results, and feedback names.

PARAMETER DESCRIPTION
app_ids

A list of app ids to filter records by. If empty or not given, all apps' records will be returned.

TYPE: Optional[List[AppID]] DEFAULT: None

RETURNS DESCRIPTION
DataFrame

Dataframe of records with their feedback results.

List[str]

List of feedback names that are columns in the dataframe.

get_leaderboard

get_leaderboard(app_ids: Optional[List[AppID]] = None) -> DataFrame

Get a leaderboard for the given apps.

PARAMETER DESCRIPTION
app_ids

A list of app ids to filter records by. If empty or not given, all apps will be included in leaderboard.

TYPE: Optional[List[AppID]] DEFAULT: None

RETURNS DESCRIPTION
DataFrame

Dataframe of apps with their feedback results aggregated.

start_evaluator

start_evaluator(restart: bool = False, fork: bool = False) -> Union[Process, Thread]

Start a deferred feedback function evaluation thread or process.

PARAMETER DESCRIPTION
restart

If set, will stop the existing evaluator before starting a new one.

TYPE: bool DEFAULT: False

fork

If set, will start the evaluator in a new process instead of a thread. NOT CURRENTLY SUPPORTED.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
Union[Process, Thread]

The started process or thread that is executing the deferred feedback evaluator.

Relevant constants

RETRY_RUNNING_SECONDS

RETRY_FAILED_SECONDS

DEFERRED_NUM_RUNS

MAX_THREADS

stop_evaluator

stop_evaluator()

Stop the deferred feedback evaluation thread.

run_dashboard

run_dashboard(port: Optional[int] = 8501, address: Optional[str] = None, force: bool = False, _dev: Optional[Path] = None) -> Process

Run a streamlit dashboard to view logged results and apps.

PARAMETER DESCRIPTION
port

Port number to pass to streamlit through server.port.

TYPE: Optional[int] DEFAULT: 8501

address

Address to pass to streamlit through server.address.

Address cannot be set if running from a colab notebook.

TYPE: Optional[str] DEFAULT: None

force

Stop existing dashboard(s) first. Defaults to False.

TYPE: bool DEFAULT: False

_dev

If given, run dashboard with the given PYTHONPATH. This can be used to run the dashboard from outside of its pip package installation folder.

TYPE: Optional[Path] DEFAULT: None

RETURNS DESCRIPTION
Process

The Process executing the streamlit dashboard.

RAISES DESCRIPTION
RuntimeError

Dashboard is already running. Can be avoided if force is set.

stop_dashboard

stop_dashboard(force: bool = False) -> None

Stop existing dashboard(s) if running.

PARAMETER DESCRIPTION
force

Also try to find any other dashboard processes not started in this notebook and shut them down too.

This option is not supported under windows.

TYPE: bool DEFAULT: False

RAISES DESCRIPTION
RuntimeError

Dashboard is not running in the current process. Can be avoided with force.