Llama-Index Integration¶
TruLens provides TruLlama, a deep integration with Llama-Index to allow you to inspect and evaluate the internals of your application built using Llama-Index.
TruLlama captures all of the metrics and metadata listed in the instrumentation overview. In addition, TruLlama provides the select_source_nodes
method to capture the source nodes of your query.
Supported methods¶
TruLlama supports both sync and async modes using the following Llama-Index query engine methods:
query
aquery
chat
achat
stream_chat
astream_chat
Example usage¶
Below is a quick example of usage. First, we'll create a standard Llama-Index query engine from Paul Graham's Essay, What I Worked On
from llama_index import VectorStoreIndex, SimpleWebPageReader
from trulens_eval import TruLlama
documents = SimpleWebPageReader(html_to_text=True).load_data(
["http://paulgraham.com/worked.html"]
)
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
To instrument an Llama-Index query engine, all that's required is to wrap it using TruLlama.
tru_query_engine_recorder = TruLlama(query_engine)
with tru_query_engine_recorder as recording:
llm_response = query_engine.query("What did the author do growing up?")
You can find the full quickstart available here: Llama-Index Quickstart
Async Support¶
TruLlama also provides async support for Llama-Index through the aquery
, achat
, and astream_chat
methods. This allows you to track and evaluate async applciations.
As an example, below is an Llama-Index async chat engine (achat
).
# Imports main tools:
from trulens_eval import TruLlama, Feedback, Tru, feedback, Select
tru = Tru()
from llama_index import VectorStoreIndex, SimpleWebPageReader
documents = SimpleWebPageReader(html_to_text=True).load_data(
["http://paulgraham.com/worked.html"]
)
index = VectorStoreIndex.from_documents(documents)
chat_engine = index.as_chat_engine(streaming=True)
To instrument an Llama-Index achat
engine, all that's required is to wrap it using TruLlama - just like with the query engine.
tru_chat_recorder = TruLlama(chat_engine)
with tru_chat_recorder as recording:
llm_response_async = await chat_engine.aquery("What did the author do growing up?")
print(llm_response_async)
Streaming Support¶
TruLlama also provides streaming support for Llama-Index. This allows you to track and evaluate streaming applications.
As an example, below is an Llama-Index query engine with streaming.
from llama_index import VectorStoreIndex, SimpleWebPageReader
from trulens_eval import TruLlama
documents = SimpleWebPageReader(html_to_text=True).load_data(
["http://paulgraham.com/worked.html"]
)
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(streaming=True)
Just like with other methods, just wrap your streaming query engine with TruLlama and operate like before.
You can also print the response tokens as they are generated using the response_gen
attribute.
tru_query_engine_recorder = TruLlama(query_engine)
with tru_query_engine_recorder as recording:
response = query_engine.query("What did the author do growing up?")
for c in response.response_gen:
print(c)
For more usage examples, check out the Llama-Index examples directory.