Skip to content

Endpoint

trulens_eval.feedback.provider.endpoint.base

Attributes

DEFAULT_RPM module-attribute

DEFAULT_RPM = 60

Default requests per minute for endpoints.

Classes

EndpointCallback

Bases: SerialModel

Callbacks to be invoked after various API requests and track various metrics like token usage.

Attributes
endpoint class-attribute instance-attribute
endpoint: Endpoint = Field(exclude=True)

Thhe endpoint owning this callback.

cost class-attribute instance-attribute
cost: Cost = Field(default_factory=Cost)

Costs tracked by this callback.

Functions
handle
handle(response: Any) -> None

Called after each request.

handle_chunk
handle_chunk(response: Any) -> None

Called after receiving a chunk from a request.

handle_generation
handle_generation(response: Any) -> None

Called after each completion request.

handle_generation_chunk
handle_generation_chunk(response: Any) -> None

Called after receiving a chunk from a completion request.

handle_classification
handle_classification(response: Any) -> None

Called after each classification response.

Endpoint

Bases: WithClassInfo, SerialModel, SingletonPerName

API usage, pacing, and utilities for API endpoints.

Attributes
instrumented_methods class-attribute
instrumented_methods: Dict[Any, List[Tuple[Callable, Callable, Type[Endpoint]]]] = defaultdict(list)

Mapping of classe/module-methods that have been instrumented for cost tracking along with the wrapper methods and the class that instrumented them.

Key is the class or module owning the instrumented method. Tuple value has:

  • original function,

  • wrapped version,

  • endpoint that did the wrapping.

name instance-attribute
name: str

API/endpoint name.

rpm class-attribute instance-attribute

Requests per minute.

retries class-attribute instance-attribute
retries: int = 3

Retries (if performing requests using this class).

post_headers class-attribute instance-attribute
post_headers: Dict[str, str] = Field(default_factory=dict, exclude=True)

Optional post headers for post requests if done by this class.

pace class-attribute instance-attribute
pace: Pace = Field(default_factory=lambda: Pace(marks_per_second=DEFAULT_RPM / 60.0, seconds_per_period=60.0), exclude=True)

Pacing instance to maintain a desired rpm.

global_callback class-attribute instance-attribute
global_callback: EndpointCallback = Field(exclude=True)

Track costs not run inside "track_cost" here.

Also note that Endpoints are singletons (one for each unique name argument) hence this global callback will track all requests for the named api even if you try to create multiple endpoints (with the same name).

callback_class class-attribute instance-attribute
callback_class: Type[EndpointCallback] = Field(exclude=True)

Callback class to use for usage tracking.

callback_name class-attribute instance-attribute
callback_name: str = Field(exclude=True)

Name of variable that stores the callback noted above.

Classes
EndpointSetup dataclass

Class for storing supported endpoint information.

See track_all_costs for usage.

Functions
pace_me
pace_me() -> float

Block until we can make a request to this endpoint to keep pace with maximum rpm. Returns time in seconds since last call to this method returned.

run_in_pace
run_in_pace(func: Callable[[A], B], *args, **kwargs) -> B

Run the given func on the given args and kwargs at pace with the endpoint-specified rpm. Failures will be retried self.retries times.

run_me
run_me(thunk: Thunk[T]) -> T

DEPRECTED: Run the given thunk, returning itse output, on pace with the api. Retries request multiple times if self.retries > 0.

DEPRECATED: Use run_in_pace instead.

print_instrumented classmethod
print_instrumented()

Print out all of the methods that have been instrumented for cost tracking. This is organized by the classes/modules containing them.

track_all_costs staticmethod
track_all_costs(__func: CallableMaybeAwaitable[A, T], *args, with_openai: bool = True, with_hugs: bool = True, with_litellm: bool = True, with_bedrock: bool = True, **kwargs) -> Tuple[T, Sequence[EndpointCallback]]

Track costs of all of the apis we can currently track, over the execution of thunk.

track_all_costs_tally staticmethod
track_all_costs_tally(__func: CallableMaybeAwaitable[A, T], *args, with_openai: bool = True, with_hugs: bool = True, with_litellm: bool = True, with_bedrock: bool = True, **kwargs) -> Tuple[T, Cost]

Track costs of all of the apis we can currently track, over the execution of thunk.

track_cost
track_cost(__func: CallableMaybeAwaitable[T], *args, **kwargs) -> Tuple[T, EndpointCallback]

Tally only the usage performed within the execution of the given thunk. Returns the thunk's result alongside the EndpointCallback object that includes the usage information.

handle_wrapped_call
handle_wrapped_call(func: Callable, bindings: BoundArguments, response: Any, callback: Optional[EndpointCallback]) -> None

This gets called with the results of every instrumented method. This should be implemented by each subclass.

PARAMETER DESCRIPTION
func

the wrapped method.

TYPE: Callable

bindings

the inputs to the wrapped method.

TYPE: BoundArguments

response

whatever the wrapped function returned.

TYPE: Any

callback

the callback set up by track_cost if the wrapped method was called and returned within an invocation of track_cost.

TYPE: Optional[EndpointCallback]

wrap_function
wrap_function(func)

Create a wrapper of the given function to perform cost tracking.

DummyEndpoint

Bases: Endpoint

Endpoint for testing purposes.

Does not make any network calls and just pretends to.

Attributes
loading_prob instance-attribute
loading_prob: float

How often to produce the "model loading" response that huggingface api sometimes produces.

loading_time class-attribute instance-attribute
loading_time: Callable[[], float] = Field(exclude=True, default_factory=lambda: lambda: uniform(0.73, 3.7))

How much time to indicate as needed to load the model in the above response.

error_prob instance-attribute
error_prob: float

How often to produce an error response.

freeze_prob instance-attribute
freeze_prob: float

How often to freeze instead of producing a response.

overloaded_prob instance-attribute
overloaded_prob: float
How often to produce the overloaded message that huggingface sometimes produces.
alloc instance-attribute
alloc: int

How much data in bytes to allocate when making requests.

delay class-attribute instance-attribute
delay: float = 0.0

How long to delay each request.

Functions
handle_wrapped_call
handle_wrapped_call(func: Callable, bindings: BoundArguments, response: Any, callback: Optional[EndpointCallback]) -> None

Dummy handler does nothing.

post
post(url: str, payload: JSON, timeout: Optional[float] = None) -> Any

Pretend to make a classification request similar to huggingface API.

Simulates overloaded, model loading, frozen, error as configured:

requests.post(
    url, json=payload, timeout=timeout, headers=self.post_headers
)

Functions