LangGraph Functional API Quickstart with TruLens¶
This notebook demonstrates how to:
- Build a calculator agent using LangGraph's Functional API
- Instrument it with TruLens using
TruGraph - Evaluate the agent with Agent GPA metrics
The Functional API provides a more intuitive way to define agents using @entrypoint and @task decorators instead of explicit graph construction.
Based on: https://docs.langchain.com/oss/python/langgraph/quickstart
Install Dependencies¶
# !pip install trulens trulens-apps-langgraph trulens-providers-openai langgraph langchain langchain-openai -q
Set Up API Keys¶
import os
# Set your API key (or use environment variables)
os.environ["OPENAI_API_KEY"] = "sk-proj-..."
Step 1: Define Tools and Model¶
We'll create a simple calculator agent with add, multiply, and divide tools.
from langchain.tools import tool
from langchain.chat_models import init_chat_model
# Use OpenAI as the model provider (you can also use Claude as shown in the LangGraph docs)
model = init_chat_model("openai:gpt-4o", temperature=0)
# Define calculator tools
@tool
def multiply(a: int, b: int) -> int:
"""Multiply `a` and `b`.
Args:
a: First int
b: Second int
"""
return a * b
@tool
def add(a: int, b: int) -> int:
"""Adds `a` and `b`.
Args:
a: First int
b: Second int
"""
return a + b
@tool
def divide(a: int, b: int) -> float:
"""Divide `a` by `b`.
Args:
a: First int
b: Second int
"""
return a / b
# Augment the LLM with tools
tools = [add, multiply, divide]
model_with_tools = model.bind_tools(tools)
Step 2: Define the Agent using Functional API¶
The Functional API uses @entrypoint and @task decorators to define the agent workflow.
@task: Marks a function as a resumable unit of work@entrypoint: Defines the main entry point that orchestrates tasks
from langchain.messages import SystemMessage, ToolMessage
from langchain_core.messages import BaseMessage
from langgraph.func import entrypoint, task
from langgraph.graph import add_messages
from trulens.core.otel.instrument import instrument, generation_attributes
from trulens.core.otel.instrument import instrument_tools
from trulens.otel.semconv.trace import SpanAttributes
tools_by_name = {tool.name: tool for tool in tools}
instrument_tools(tools_by_name)
@task
def call_llm(messages: list[BaseMessage]):
"""LLM decides whether to call a tool or not."""
return model_with_tools.invoke(
[
SystemMessage(
content="You are a helpful assistant tasked with performing arithmetic on a set of inputs."
)
]
+ messages
)
@task
def call_tool(tool_call: dict):
"""Execute a single tool call."""
tool = tools_by_name[tool_call["name"]]
observation = tool.invoke(tool_call["args"])
return ToolMessage(content=str(observation), tool_call_id=tool_call["id"])
@entrypoint()
def calculator_agent(messages: list[BaseMessage]):
"""Calculator agent that processes messages and calls tools as needed."""
# Use add_messages to handle message accumulation
messages = add_messages([], messages)
# Agent loop: keep calling LLM until no more tool calls
while True:
# Call the LLM
llm_response = call_llm(messages).result()
messages = add_messages(messages, [llm_response])
# Check if there are tool calls
if not llm_response.tool_calls:
# No tool calls, return the final response
break
# Execute tool calls
for tool_call in llm_response.tool_calls:
tool_result = call_tool(tool_call).result()
messages = add_messages(messages, [tool_result])
return messages
Step 3: Set Up TruLens Session and Agent GPA Metrics¶
Agent GPA (Goal, Plan, Action) metrics evaluate:
- Answer Relevance: Is the response relevant to the user's question?
- Tool Selection: Did the agent choose appropriate tools for the task?
- Tool Calling: Were the tool calls executed correctly with proper arguments?
- Execution Efficiency: Did the agent complete the task without unnecessary steps?
from trulens.core import Feedback, TruSession
from trulens.core.feedback.selector import Selector
from trulens.providers.openai import OpenAI as TruOpenAI
# Initialize TruLens session
session = TruSession()
session.reset_database()
# Initialize OpenAI provider for evaluations
provider = TruOpenAI(model_engine="gpt-4o-mini")
# Agent GPA metrics use trace-level selection
trace_selector = {"trace": Selector(trace_level=True)}
# Answer Relevance: Is the answer relevant to the question?
f_answer_relevance = (
Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
.on_input()
.on_output()
)
# Tool Selection: Did the agent choose appropriate tools?
f_tool_selection = Feedback(
provider.tool_selection_with_cot_reasons, name="Tool Selection"
).on(trace_selector)
# Tool Calling: Were tool calls executed correctly?
f_tool_calling = Feedback(
provider.tool_calling_with_cot_reasons, name="Tool Calling"
).on(trace_selector)
# Execution Efficiency: Did the agent complete efficiently?
f_execution_efficiency = Feedback(
provider.execution_efficiency_with_cot_reasons, name="Execution Efficiency"
).on(trace_selector)
# Combine all Agent GPA feedbacks
agent_gpa_feedbacks = [
f_answer_relevance,
f_tool_selection,
f_tool_calling,
f_execution_efficiency,
]
Step 4: Wrap with TruGraph¶
Since LangGraph's Functional API is built on top of LangGraph, we use TruGraph to instrument the agent.
from trulens.apps.langgraph import TruGraph
# Wrap the calculator agent with TruGraph
tru_agent = TruGraph(
calculator_agent,
app_name="Calculator Agent (Functional API)",
app_version="v1",
feedbacks=agent_gpa_feedbacks,
)
Step 5: Run the Agent with Evaluation¶
Let's test our calculator agent with various arithmetic queries.
from langchain.messages import HumanMessage
# Test queries for the calculator agent
test_queries = [
"Add 3 and 4.",
"What is 15 multiplied by 7?",
"Divide 100 by 4, then add 10 to the result.",
]
for query in test_queries:
print(f"Query: {query}")
print("-" * 50)
with tru_agent as recording:
result = calculator_agent.invoke([HumanMessage(content=query)])
# Get the final response
final_response = result[-1].content
print(f"Response: {final_response}\n")
Step 6: View Evaluation Results¶
Use retrieve_feedback_results() to wait for evaluations to complete.
# Wait for and retrieve feedback results
feedback_results = recording.retrieve_feedback_results(timeout=300)
feedback_results
# Get the leaderboard showing evaluation scores across all records
session.get_leaderboard()
Step 7: Launch the Dashboard¶
from trulens.dashboard import run_dashboard
run_dashboard(session)
Understanding the Results¶
Agent GPA Metrics Explained¶
| Metric | What it Measures | Good Score Means |
|---|---|---|
| Answer Relevance | Does the response address the user's question? | Agent provided a relevant answer |
| Tool Selection | Did the agent pick the right tools? | Agent chose add for addition, multiply for multiplication, etc. |
| Tool Calling | Were tool calls correct? | Tool arguments were valid (correct numbers passed) |
| Execution Efficiency | Was the task completed efficiently? | No unnecessary tool calls or loops |
Functional API vs Graph API¶
The Functional API offers a more Pythonic way to define agents:
- Uses familiar decorators (
@task,@entrypoint) - Control flow is explicit in Python code (loops, conditionals)
- Easier to read for developers familiar with Python
The Graph API (shown in langgraph_quickstart.ipynb) offers:
- Visual graph representation
- Explicit nodes and edges
- Better for complex multi-agent workflows
Both APIs produce LangGraph applications that can be instrumented with TruGraph.
Next Steps¶
- Add more tools to expand the agent's capabilities
- Add Plan Quality and Plan Adherence metrics for agents that do explicit planning
- Compare different model versions using the leaderboard
- Explore the trace in the dashboard to see tool calls and LLM reasoning