📓 🦙 LlamaIndex Integration¶

TruLens provides TruLlama, a deep integration with LlamaIndex to allow you to inspect and evaluate the internals of your application built using LlamaIndex. This is done through the instrumentation of key LlamaIndex classes and methods. To see all classes and methods instrumented, see Appendix: LlamaIndex Instrumented Classes and Methods.

In addition to the default instrumentation, TruChain exposes the select_context and select_source_nodes methods for evaluations that require access to retrieved context or source nodes. Exposing these methods bypasses the need to know the json structure of your app ahead of time, and makes your evaluations re-usable across different apps.

Example usage¶

Below is a quick example of usage. First, we'll create a standard LlamaIndex query engine from Paul Graham's Essay, What I Worked On

In [4]:

Copied!





from llama_index.core import VectorStoreIndex
from llama_index.readers.web import SimpleWebPageReader

documents = SimpleWebPageReader(html_to_text=True).load_data(
    ["http://paulgraham.com/worked.html"]
)
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()
from llama_index.core import VectorStoreIndex
from llama_index.readers.web import SimpleWebPageReader

documents = SimpleWebPageReader(html_to_text=True).load_data(
    ["http://paulgraham.com/worked.html"]
)
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()

To instrument an LlamaIndex query engine, all that's required is to wrap it using TruLlama.

In [5]:

Copied!

from trulens_eval import TruLlama
tru_query_engine_recorder = TruLlama(query_engine)

with tru_query_engine_recorder as recording:
    print(query_engine.query("What did the author do growing up?"))
from trulens_eval import TruLlama
tru_query_engine_recorder = TruLlama(query_engine)

with tru_query_engine_recorder as recording:
    print(query_engine.query("What did the author do growing up?"))

🦑 Tru initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of Tru` to prevent this.
The author, growing up, worked on writing short stories and programming.

To properly evaluate LLM apps we often need to point our evaluation at an internal step of our application, such as the retreived context. Doing so allows us to evaluate for metrics including context relevance and groundedness.

For LlamaIndex applications where the source nodes are used, select_context can be used to access the retrieved text for evaluation.

In [ ]:

Copied!





from trulens_eval.feedback.provider import OpenAI
from trulens_eval.feedback import Feedback
import numpy as np

provider = OpenAI()

context = TruLlama.select_context(query_engine)

f_context_relevance = (
    Feedback(provider.context_relevance)
    .on_input()
    .on(context)
    .aggregate(np.mean)
)
from trulens_eval.feedback.provider import OpenAI
from trulens_eval.feedback import Feedback
import numpy as np

provider = OpenAI()

context = TruLlama.select_context(query_engine)

f_context_relevance = (
    Feedback(provider.context_relevance)
    .on_input()
    .on(context)
    .aggregate(np.mean)
)

For added flexibility, the select_context method is also made available through trulens_eval.app.App. This allows you to switch between frameworks without changing your context selector:

In [ ]:

Copied!

from trulens_eval.app import App
context = App.select_context(query_engine)
from trulens_eval.app import App
context = App.select_context(query_engine)

You can find the full quickstart available here: LlamaIndex Quickstart

Async Support¶

TruLlama also provides async support for LlamaIndex through the aquery, achat, and astream_chat methods. This allows you to track and evaluate async applciations.

As an example, below is an LlamaIndex async chat engine (achat).

In [6]:

Copied!





# Imports main tools:
from trulens_eval import TruLlama, Tru
tru = Tru()

from llama_index.core import VectorStoreIndex
from llama_index.readers.web import SimpleWebPageReader

documents = SimpleWebPageReader(html_to_text=True).load_data(
    ["http://paulgraham.com/worked.html"]
)
index = VectorStoreIndex.from_documents(documents)

chat_engine = index.as_chat_engine()
# Imports main tools:
from trulens_eval import TruLlama, Tru
tru = Tru()

from llama_index.core import VectorStoreIndex
from llama_index.readers.web import SimpleWebPageReader

documents = SimpleWebPageReader(html_to_text=True).load_data(
    ["http://paulgraham.com/worked.html"]
)
index = VectorStoreIndex.from_documents(documents)

chat_engine = index.as_chat_engine()

To instrument an LlamaIndex achat engine, all that's required is to wrap it using TruLlama - just like with the query engine.

In [7]:

Copied!

tru_chat_recorder = TruLlama(chat_engine)

with tru_chat_recorder as recording:
    llm_response_async = await chat_engine.achat("What did the author do growing up?")

print(llm_response_async)
tru_chat_recorder = TruLlama(chat_engine)

with tru_chat_recorder as recording:
    llm_response_async = await chat_engine.achat("What did the author do growing up?")

print(llm_response_async)

A new object of type ChatMemoryBuffer at 0x2bf581210 is calling an instrumented method put. The path of this call may be incorrect.
Guessing path of new object is app.memory based on other object (0x2bf5e5050) using this function.
Could not determine main output from None.
Could not determine main output from None.
Could not determine main output from None.
Could not determine main output from None.

The author worked on writing short stories and programming while growing up.

Streaming Support¶

TruLlama also provides streaming support for LlamaIndex. This allows you to track and evaluate streaming applications.

As an example, below is an LlamaIndex query engine with streaming.

In [8]:

Copied!





from llama_index.core import VectorStoreIndex
from llama_index.readers.web import SimpleWebPageReader
from trulens_eval import TruLlama

documents = SimpleWebPageReader(html_to_text=True).load_data(
    ["http://paulgraham.com/worked.html"]
)
index = VectorStoreIndex.from_documents(documents)

chat_engine = index.as_chat_engine(streaming=True)
from llama_index.core import VectorStoreIndex
from llama_index.readers.web import SimpleWebPageReader
from trulens_eval import TruLlama

documents = SimpleWebPageReader(html_to_text=True).load_data(
    ["http://paulgraham.com/worked.html"]
)
index = VectorStoreIndex.from_documents(documents)

chat_engine = index.as_chat_engine(streaming=True)

Just like with other methods, just wrap your streaming query engine with TruLlama and operate like before.

You can also print the response tokens as they are generated using the response_gen attribute.

In [9]:

Copied!

tru_chat_engine_recorder = TruLlama(chat_engine)

with tru_chat_engine_recorder as recording:
    response = chat_engine.stream_chat("What did the author do growing up?")

for c in response.response_gen:
    print(c)
tru_chat_engine_recorder = TruLlama(chat_engine)

with tru_chat_engine_recorder as recording:
    response = chat_engine.stream_chat("What did the author do growing up?")

for c in response.response_gen:
    print(c)

A new object of type ChatMemoryBuffer at 0x2c1df9950 is calling an instrumented method put. The path of this call may be incorrect.
Guessing path of new object is app.memory based on other object (0x2c08b04f0) using this function.
Could not find usage information in openai response:
<openai.Stream object at 0x2bf5f3ed0>
Could not find usage information in openai response:
<openai.Stream object at 0x2bf5f3ed0>

For more usage examples, check out the LlamaIndex examples directory.

Appendix: LlamaIndex Instrumented Classes and Methods¶

The modules, classes, and methods that trulens instruments can be retrieved from the appropriate Instrument subclass.

In [14]:

Copied!

from trulens_eval.tru_llama import LlamaInstrument
LlamaInstrument().print_instrumentation()
from trulens_eval.tru_llama import LlamaInstrument
LlamaInstrument().print_instrumentation()

Module langchain*
  Class langchain.agents.agent.BaseMultiActionAgent
    Method plan: (self, intermediate_steps: 'List[Tuple[AgentAction, str]]', callbacks: 'Callbacks' = None, **kwargs: 'Any') -> 'Union[List[AgentAction], AgentFinish]'
    Method aplan: (self, intermediate_steps: 'List[Tuple[AgentAction, str]]', callbacks: 'Callbacks' = None, **kwargs: 'Any') -> 'Union[List[AgentAction], AgentFinish]'
  Class langchain.agents.agent.BaseSingleActionAgent
    Method plan: (self, intermediate_steps: 'List[Tuple[AgentAction, str]]', callbacks: 'Callbacks' = None, **kwargs: 'Any') -> 'Union[AgentAction, AgentFinish]'
    Method aplan: (self, intermediate_steps: 'List[Tuple[AgentAction, str]]', callbacks: 'Callbacks' = None, **kwargs: 'Any') -> 'Union[AgentAction, AgentFinish]'
  Class langchain.chains.base.Chain
    Method invoke: (self, input: Dict[str, Any], config: Optional[langchain_core.runnables.config.RunnableConfig] = None, **kwargs: Any) -> Dict[str, Any]
    Method ainvoke: (self, input: Dict[str, Any], config: Optional[langchain_core.runnables.config.RunnableConfig] = None, **kwargs: Any) -> Dict[str, Any]
    Method run: (self, *args: Any, callbacks: Union[List[langchain_core.callbacks.base.BaseCallbackHandler], langchain_core.callbacks.base.BaseCallbackManager, NoneType] = None, tags: Optional[List[str]] = None, metadata: Optional[Dict[str, Any]] = None, **kwargs: Any) -> Any
    Method arun: (self, *args: Any, callbacks: Union[List[langchain_core.callbacks.base.BaseCallbackHandler], langchain_core.callbacks.base.BaseCallbackManager, NoneType] = None, tags: Optional[List[str]] = None, metadata: Optional[Dict[str, Any]] = None, **kwargs: Any) -> Any
    Method _call: (self, inputs: Dict[str, Any], run_manager: Optional[langchain_core.callbacks.manager.CallbackManagerForChainRun] = None) -> Dict[str, Any]
    Method _acall: (self, inputs: Dict[str, Any], run_manager: Optional[langchain_core.callbacks.manager.AsyncCallbackManagerForChainRun] = None) -> Dict[str, Any]
  Class langchain.memory.chat_memory.BaseChatMemory
    Method save_context: (self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None
    Method clear: (self) -> None
  Class langchain_core.chat_history.BaseChatMessageHistory
  Class langchain_core.documents.base.Document
  Class langchain_core.language_models.base.BaseLanguageModel
  Class langchain_core.language_models.llms.BaseLLM
  Class langchain_core.load.serializable.Serializable
  Class langchain_core.memory.BaseMemory
    Method save_context: (self, inputs: 'Dict[str, Any]', outputs: 'Dict[str, str]') -> 'None'
    Method clear: (self) -> 'None'
  Class langchain_core.prompts.base.BasePromptTemplate
  Class langchain_core.retrievers.BaseRetriever
    Method _get_relevant_documents: (self, query: 'str', *, run_manager: 'CallbackManagerForRetrieverRun') -> 'List[Document]'
    Method get_relevant_documents: (self, query: 'str', *, callbacks: 'Callbacks' = None, tags: 'Optional[List[str]]' = None, metadata: 'Optional[Dict[str, Any]]' = None, run_name: 'Optional[str]' = None, **kwargs: 'Any') -> 'List[Document]'
    Method aget_relevant_documents: (self, query: 'str', *, callbacks: 'Callbacks' = None, tags: 'Optional[List[str]]' = None, metadata: 'Optional[Dict[str, Any]]' = None, run_name: 'Optional[str]' = None, **kwargs: 'Any') -> 'List[Document]'
    Method _aget_relevant_documents: (self, query: 'str', *, run_manager: 'AsyncCallbackManagerForRetrieverRun') -> 'List[Document]'
  Class langchain_core.runnables.base.RunnableSerializable
  Class langchain_core.tools.BaseTool
    Method _arun: (self, *args: 'Any', **kwargs: 'Any') -> 'Any'
    Method _run: (self, *args: 'Any', **kwargs: 'Any') -> 'Any'

Module llama_hub.*

Module llama_index.*
  Class llama_index.core.base.base_query_engine.BaseQueryEngine
    Method query: (self, str_or_query_bundle: Union[str, llama_index.core.schema.QueryBundle]) -> Union[llama_index.core.base.response.schema.Response, llama_index.core.base.response.schema.StreamingResponse, llama_index.core.base.response.schema.PydanticResponse]
    Method aquery: (self, str_or_query_bundle: Union[str, llama_index.core.schema.QueryBundle]) -> Union[llama_index.core.base.response.schema.Response, llama_index.core.base.response.schema.StreamingResponse, llama_index.core.base.response.schema.PydanticResponse]
    Method retrieve: (self, query_bundle: llama_index.core.schema.QueryBundle) -> List[llama_index.core.schema.NodeWithScore]
    Method synthesize: (self, query_bundle: llama_index.core.schema.QueryBundle, nodes: List[llama_index.core.schema.NodeWithScore], additional_source_nodes: Optional[Sequence[llama_index.core.schema.NodeWithScore]] = None) -> Union[llama_index.core.base.response.schema.Response, llama_index.core.base.response.schema.StreamingResponse, llama_index.core.base.response.schema.PydanticResponse]
  Class llama_index.core.base.base_query_engine.QueryEngineComponent
    Method _run_component: (self, **kwargs: Any) -> Any
  Class llama_index.core.base.base_retriever.BaseRetriever
    Method retrieve: (self, str_or_query_bundle: Union[str, llama_index.core.schema.QueryBundle]) -> List[llama_index.core.schema.NodeWithScore]
    Method _retrieve: (self, query_bundle: llama_index.core.schema.QueryBundle) -> List[llama_index.core.schema.NodeWithScore]
    Method _aretrieve: (self, query_bundle: llama_index.core.schema.QueryBundle) -> List[llama_index.core.schema.NodeWithScore]
  Class llama_index.core.base.embeddings.base.BaseEmbedding
  Class llama_index.core.base.llms.types.LLMMetadata
  Class llama_index.core.chat_engine.types.BaseChatEngine
    Method chat: (self, message: str, chat_history: Optional[List[llama_index.core.base.llms.types.ChatMessage]] = None) -> Union[llama_index.core.chat_engine.types.AgentChatResponse, llama_index.core.chat_engine.types.StreamingAgentChatResponse]
    Method achat: (self, message: str, chat_history: Optional[List[llama_index.core.base.llms.types.ChatMessage]] = None) -> Union[llama_index.core.chat_engine.types.AgentChatResponse, llama_index.core.chat_engine.types.StreamingAgentChatResponse]
    Method stream_chat: (self, message: str, chat_history: Optional[List[llama_index.core.base.llms.types.ChatMessage]] = None) -> llama_index.core.chat_engine.types.StreamingAgentChatResponse
  Class llama_index.core.indices.base.BaseIndex
  Class llama_index.core.indices.prompt_helper.PromptHelper
  Class llama_index.core.memory.types.BaseMemory
    Method put: (self, message: llama_index.core.base.llms.types.ChatMessage) -> None
  Class llama_index.core.node_parser.interface.NodeParser
  Class llama_index.core.postprocessor.types.BaseNodePostprocessor
    Method _postprocess_nodes: (self, nodes: List[llama_index.core.schema.NodeWithScore], query_bundle: Optional[llama_index.core.schema.QueryBundle] = None) -> List[llama_index.core.schema.NodeWithScore]
  Class llama_index.core.question_gen.types.BaseQuestionGenerator
  Class llama_index.core.response_synthesizers.base.BaseSynthesizer
  Class llama_index.core.response_synthesizers.refine.Refine
    Method get_response: (self, query_str: str, text_chunks: Sequence[str], prev_response: Union[pydantic.v1.main.BaseModel, str, Generator[str, NoneType, NoneType], NoneType] = None, **response_kwargs: Any) -> Union[pydantic.v1.main.BaseModel, str, Generator[str, NoneType, NoneType]]
  Class llama_index.core.schema.BaseComponent
  Class llama_index.core.tools.types.BaseTool
    Method __call__: (self, input: Any) -> llama_index.core.tools.types.ToolOutput
  Class llama_index.core.tools.types.ToolMetadata
  Class llama_index.core.vector_stores.types.VectorStore
  Class llama_index.legacy.llm_predictor.base.BaseLLMPredictor
    Method predict: (self, prompt: llama_index.legacy.prompts.base.BasePromptTemplate, **prompt_args: Any) -> str
  Class llama_index.legacy.llm_predictor.base.LLMPredictor
    Method predict: (self, prompt: llama_index.legacy.prompts.base.BasePromptTemplate, output_cls: Optional[pydantic.v1.main.BaseModel] = None, **prompt_args: Any) -> str

Module trulens_eval.*
  Class trulens_eval.feedback.feedback.Feedback
    Method __call__: (self, *args, **kwargs) -> 'Any'
  Class trulens_eval.utils.imports.llama_index.core.llms.base.BaseLLM
    WARNING: this class could not be imported. It may have been (re)moved. Error:
      > No module named 'llama_index.core.llms.base'
  Class trulens_eval.utils.langchain.WithFeedbackFilterDocuments
    Method _get_relevant_documents: (self, query: str, *, run_manager) -> List[langchain_core.documents.base.Document]
    Method get_relevant_documents: (self, query: 'str', *, callbacks: 'Callbacks' = None, tags: 'Optional[List[str]]' = None, metadata: 'Optional[Dict[str, Any]]' = None, run_name: 'Optional[str]' = None, **kwargs: 'Any') -> 'List[Document]'
    Method aget_relevant_documents: (self, query: 'str', *, callbacks: 'Callbacks' = None, tags: 'Optional[List[str]]' = None, metadata: 'Optional[Dict[str, Any]]' = None, run_name: 'Optional[str]' = None, **kwargs: 'Any') -> 'List[Document]'
    Method _aget_relevant_documents: (self, query: 'str', *, run_manager: 'AsyncCallbackManagerForRetrieverRun') -> 'List[Document]'
  Class trulens_eval.utils.llama.WithFeedbackFilterNodes
    WARNING: this class could not be imported. It may have been (re)moved. Error:
      > No module named 'llama_index.indices.vector_store'
  Class trulens_eval.utils.python.EmptyType

Instrumenting other classes/methods.¶

Additional classes and methods can be instrumented by use of the trulens_eval.instruments.Instrument methods and decorators. Examples of such usage can be found in the custom app used in the custom_example.ipynb notebook which can be found in trulens_eval/examples/expositional/end2end_apps/custom_app/custom_app.py. More information about these decorators can be found in the docs/trulens_eval/tracking/instrumentation/index.ipynb notebook.

Inspecting instrumentation¶

The specific objects (of the above classes) and methods instrumented for a particular app can be inspected using the App.print_instrumented as exemplified in the next cell. Unlike Instrument.print_instrumentation, this function only shows what in an app was actually instrumented.

In [11]:

Copied!

tru_chat_engine_recorder.print_instrumented()
tru_chat_engine_recorder.print_instrumented()

Components:
	TruLlama (Other) at 0x2bf5d5d10 with path __app__
	OpenAIAgent (Other) at 0x2bf535a10 with path __app__.app
	ChatMemoryBuffer (Other) at 0x2bf537210 with path __app__.app.memory
	SimpleChatStore (Other) at 0x2be6ef710 with path __app__.app.memory.chat_store

Methods:
Object at 0x2bf537210:
	<function ChatMemoryBuffer.put at 0x2b14c19e0> with path __app__.app.memory
	<function BaseMemory.put at 0x2b1448f40> with path __app__.app.memory
Object at 0x2bf535a10:
	<function BaseQueryEngine.query at 0x2b137dc60> with path __app__.app
	<function BaseQueryEngine.aquery at 0x2b137e2a0> with path __app__.app
	<function AgentRunner.chat at 0x2bf5aa160> with path __app__.app
	<function AgentRunner.achat at 0x2bf5aa2a0> with path __app__.app
	<function AgentRunner.stream_chat at 0x2bf5aa340> with path __app__.app
	<function BaseQueryEngine.retrieve at 0x2b137e340> with path __app__.app
	<function BaseQueryEngine.synthesize at 0x2b137e3e0> with path __app__.app
	<function BaseChatEngine.chat at 0x2b1529f80> with path __app__.app
	<function BaseChatEngine.achat at 0x2b152a0c0> with path __app__.app
	<function BaseAgent.stream_chat at 0x2beb437e0> with path __app__.app
	<function BaseChatEngine.stream_chat at 0x2b152a020> with path __app__.app
Object at 0x2c1df9950:
	<function ChatMemoryBuffer.put at 0x2b14c19e0> with path __app__.app.memory