LangGraph Functional API Quickstart with TruLens¶

This notebook demonstrates how to:

Build a calculator agent using LangGraph's Functional API
Instrument it with TruLens using TruGraph
Evaluate the agent with Agent GPA metrics

The Functional API provides a more intuitive way to define agents using @entrypoint and @task decorators instead of explicit graph construction.

Based on: https://docs.langchain.com/oss/python/langgraph/quickstart

Install Dependencies¶

In [ ]:

Copied!

# !pip install trulens trulens-apps-langgraph trulens-providers-openai langgraph langchain langchain-openai -q
# !pip install trulens trulens-apps-langgraph trulens-providers-openai langgraph langchain langchain-openai -q

Set Up API Keys¶

In [ ]:

Copied!

import os

# Set your API key (or use environment variables)
os.environ["OPENAI_API_KEY"] = "sk-proj-..."
import os

# Set your API key (or use environment variables)
os.environ["OPENAI_API_KEY"] = "sk-proj-..."

Step 1: Define Tools and Model¶

We'll create a simple calculator agent with add, multiply, and divide tools.

In [ ]:

Copied!





from langchain.tools import tool
from langchain.chat_models import init_chat_model

# Use OpenAI as the model provider (you can also use Claude as shown in the LangGraph docs)
model = init_chat_model("openai:gpt-4o", temperature=0)


# Define calculator tools
@tool
def multiply(a: int, b: int) -> int:
    """Multiply `a` and `b`.

    Args:
        a: First int
        b: Second int
    """
    return a * b


@tool
def add(a: int, b: int) -> int:
    """Adds `a` and `b`.

    Args:
        a: First int
        b: Second int
    """
    return a + b


@tool
def divide(a: int, b: int) -> float:
    """Divide `a` by `b`.

    Args:
        a: First int
        b: Second int
    """
    return a / b


# Augment the LLM with tools
tools = [add, multiply, divide]
model_with_tools = model.bind_tools(tools)
from langchain.tools import tool
from langchain.chat_models import init_chat_model

# Use OpenAI as the model provider (you can also use Claude as shown in the LangGraph docs)
model = init_chat_model("openai:gpt-4o", temperature=0)


# Define calculator tools
@tool
def multiply(a: int, b: int) -> int:
    """Multiply `a` and `b`.

    Args:
        a: First int
        b: Second int
    """
    return a * b


@tool
def add(a: int, b: int) -> int:
    """Adds `a` and `b`.

    Args:
        a: First int
        b: Second int
    """
    return a + b


@tool
def divide(a: int, b: int) -> float:
    """Divide `a` by `b`.

    Args:
        a: First int
        b: Second int
    """
    return a / b


# Augment the LLM with tools
tools = [add, multiply, divide]
model_with_tools = model.bind_tools(tools)

Step 2: Define the Agent using Functional API¶

The Functional API uses @entrypoint and @task decorators to define the agent workflow.

@task: Marks a function as a resumable unit of work
@entrypoint: Defines the main entry point that orchestrates tasks

In [ ]:

Copied!





from langchain.messages import SystemMessage, ToolMessage
from langchain_core.messages import BaseMessage
from langgraph.func import entrypoint, task
from langgraph.graph import add_messages

from trulens.core.otel.instrument import instrument, generation_attributes
from trulens.core.otel.instrument import instrument_tools
from trulens.otel.semconv.trace import SpanAttributes

tools_by_name = {tool.name: tool for tool in tools}
instrument_tools(tools_by_name)

@task
def call_llm(messages: list[BaseMessage]):
    """LLM decides whether to call a tool or not."""
    return model_with_tools.invoke(
        [
            SystemMessage(
                content="You are a helpful assistant tasked with performing arithmetic on a set of inputs."
            )
        ]
        + messages
    )


@task
def call_tool(tool_call: dict):
    """Execute a single tool call."""
    tool = tools_by_name[tool_call["name"]]
    observation = tool.invoke(tool_call["args"])
    return ToolMessage(content=str(observation), tool_call_id=tool_call["id"])


@entrypoint()
def calculator_agent(messages: list[BaseMessage]):
    """Calculator agent that processes messages and calls tools as needed."""
    # Use add_messages to handle message accumulation
    messages = add_messages([], messages)

    # Agent loop: keep calling LLM until no more tool calls
    while True:
        # Call the LLM
        llm_response = call_llm(messages).result()
        messages = add_messages(messages, [llm_response])

        # Check if there are tool calls
        if not llm_response.tool_calls:
            # No tool calls, return the final response
            break

        # Execute tool calls
        for tool_call in llm_response.tool_calls:
            tool_result = call_tool(tool_call).result()
            messages = add_messages(messages, [tool_result])

    return messages
from langchain.messages import SystemMessage, ToolMessage
from langchain_core.messages import BaseMessage
from langgraph.func import entrypoint, task
from langgraph.graph import add_messages

from trulens.core.otel.instrument import instrument, generation_attributes
from trulens.core.otel.instrument import instrument_tools
from trulens.otel.semconv.trace import SpanAttributes

tools_by_name = {tool.name: tool for tool in tools}
instrument_tools(tools_by_name)

@task
def call_llm(messages: list[BaseMessage]):
    """LLM decides whether to call a tool or not."""
    return model_with_tools.invoke(
        [
            SystemMessage(
                content="You are a helpful assistant tasked with performing arithmetic on a set of inputs."
            )
        ]
        + messages
    )


@task
def call_tool(tool_call: dict):
    """Execute a single tool call."""
    tool = tools_by_name[tool_call["name"]]
    observation = tool.invoke(tool_call["args"])
    return ToolMessage(content=str(observation), tool_call_id=tool_call["id"])


@entrypoint()
def calculator_agent(messages: list[BaseMessage]):
    """Calculator agent that processes messages and calls tools as needed."""
    # Use add_messages to handle message accumulation
    messages = add_messages([], messages)

    # Agent loop: keep calling LLM until no more tool calls
    while True:
        # Call the LLM
        llm_response = call_llm(messages).result()
        messages = add_messages(messages, [llm_response])

        # Check if there are tool calls
        if not llm_response.tool_calls:
            # No tool calls, return the final response
            break

        # Execute tool calls
        for tool_call in llm_response.tool_calls:
            tool_result = call_tool(tool_call).result()
            messages = add_messages(messages, [tool_result])

    return messages

Step 3: Set Up TruLens Session and Agent GPA Metrics¶

Agent GPA (Goal, Plan, Action) metrics evaluate:

Answer Relevance: Is the response relevant to the user's question?
Tool Selection: Did the agent choose appropriate tools for the task?
Tool Calling: Were the tool calls executed correctly with proper arguments?
Execution Efficiency: Did the agent complete the task without unnecessary steps?

In [ ]:

Copied!





from trulens.core import Feedback, TruSession
from trulens.core.feedback.selector import Selector
from trulens.providers.openai import OpenAI as TruOpenAI

# Initialize TruLens session
session = TruSession()
session.reset_database()

# Initialize OpenAI provider for evaluations
provider = TruOpenAI(model_engine="gpt-4o-mini")

# Agent GPA metrics use trace-level selection
trace_selector = {"trace": Selector(trace_level=True)}

# Answer Relevance: Is the answer relevant to the question?
f_answer_relevance = (
    Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
    .on_input()
    .on_output()
)

# Tool Selection: Did the agent choose appropriate tools?
f_tool_selection = Feedback(
    provider.tool_selection_with_cot_reasons, name="Tool Selection"
).on(trace_selector)

# Tool Calling: Were tool calls executed correctly?
f_tool_calling = Feedback(
    provider.tool_calling_with_cot_reasons, name="Tool Calling"
).on(trace_selector)

# Execution Efficiency: Did the agent complete efficiently?
f_execution_efficiency = Feedback(
    provider.execution_efficiency_with_cot_reasons, name="Execution Efficiency"
).on(trace_selector)

# Combine all Agent GPA feedbacks
agent_gpa_feedbacks = [
    f_answer_relevance,
    f_tool_selection,
    f_tool_calling,
    f_execution_efficiency,
]
from trulens.core import Feedback, TruSession
from trulens.core.feedback.selector import Selector
from trulens.providers.openai import OpenAI as TruOpenAI

# Initialize TruLens session
session = TruSession()
session.reset_database()

# Initialize OpenAI provider for evaluations
provider = TruOpenAI(model_engine="gpt-4o-mini")

# Agent GPA metrics use trace-level selection
trace_selector = {"trace": Selector(trace_level=True)}

# Answer Relevance: Is the answer relevant to the question?
f_answer_relevance = (
    Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
    .on_input()
    .on_output()
)

# Tool Selection: Did the agent choose appropriate tools?
f_tool_selection = Feedback(
    provider.tool_selection_with_cot_reasons, name="Tool Selection"
).on(trace_selector)

# Tool Calling: Were tool calls executed correctly?
f_tool_calling = Feedback(
    provider.tool_calling_with_cot_reasons, name="Tool Calling"
).on(trace_selector)

# Execution Efficiency: Did the agent complete efficiently?
f_execution_efficiency = Feedback(
    provider.execution_efficiency_with_cot_reasons, name="Execution Efficiency"
).on(trace_selector)

# Combine all Agent GPA feedbacks
agent_gpa_feedbacks = [
    f_answer_relevance,
    f_tool_selection,
    f_tool_calling,
    f_execution_efficiency,
]

Step 4: Wrap with TruGraph¶

Since LangGraph's Functional API is built on top of LangGraph, we use TruGraph to instrument the agent.

In [ ]:

Copied!





from trulens.apps.langgraph import TruGraph

# Wrap the calculator agent with TruGraph
tru_agent = TruGraph(
    calculator_agent,
    app_name="Calculator Agent (Functional API)",
    app_version="v1",
    feedbacks=agent_gpa_feedbacks,
)
from trulens.apps.langgraph import TruGraph

# Wrap the calculator agent with TruGraph
tru_agent = TruGraph(
    calculator_agent,
    app_name="Calculator Agent (Functional API)",
    app_version="v1",
    feedbacks=agent_gpa_feedbacks,
)

Step 5: Run the Agent with Evaluation¶

Let's test our calculator agent with various arithmetic queries.

In [ ]:

Copied!





from langchain.messages import HumanMessage

# Test queries for the calculator agent
test_queries = [
    "Add 3 and 4.",
    "What is 15 multiplied by 7?",
    "Divide 100 by 4, then add 10 to the result.",
]

for query in test_queries:
    print(f"Query: {query}")
    print("-" * 50)

    with tru_agent as recording:
        result = calculator_agent.invoke([HumanMessage(content=query)])

    # Get the final response
    final_response = result[-1].content
    print(f"Response: {final_response}\n")
from langchain.messages import HumanMessage

# Test queries for the calculator agent
test_queries = [
    "Add 3 and 4.",
    "What is 15 multiplied by 7?",
    "Divide 100 by 4, then add 10 to the result.",
]

for query in test_queries:
    print(f"Query: {query}")
    print("-" * 50)

    with tru_agent as recording:
        result = calculator_agent.invoke([HumanMessage(content=query)])

    # Get the final response
    final_response = result[-1].content
    print(f"Response: {final_response}\n")

Step 6: View Evaluation Results¶

Use retrieve_feedback_results() to wait for evaluations to complete.

In [ ]:

Copied!

# Wait for and retrieve feedback results
feedback_results = recording.retrieve_feedback_results(timeout=300)
feedback_results
# Wait for and retrieve feedback results
feedback_results = recording.retrieve_feedback_results(timeout=300)
feedback_results

In [ ]:

Copied!

# Get the leaderboard showing evaluation scores across all records
session.get_leaderboard()
# Get the leaderboard showing evaluation scores across all records
session.get_leaderboard()

Step 7: Launch the Dashboard¶

In [ ]:

Copied!

from trulens.dashboard import run_dashboard

run_dashboard(session)
from trulens.dashboard import run_dashboard

run_dashboard(session)

Understanding the Results¶

Agent GPA Metrics Explained¶

Metric	What it Measures	Good Score Means
Answer Relevance	Does the response address the user's question?	Agent provided a relevant answer
Tool Selection	Did the agent pick the right tools?	Agent chose `add` for addition, `multiply` for multiplication, etc.
Tool Calling	Were tool calls correct?	Tool arguments were valid (correct numbers passed)
Execution Efficiency	Was the task completed efficiently?	No unnecessary tool calls or loops

Functional API vs Graph API¶

The Functional API offers a more Pythonic way to define agents:

Uses familiar decorators (@task, @entrypoint)
Control flow is explicit in Python code (loops, conditionals)
Easier to read for developers familiar with Python

The Graph API (shown in langgraph_quickstart.ipynb) offers:

Visual graph representation
Explicit nodes and edges
Better for complex multi-agent workflows

Both APIs produce LangGraph applications that can be instrumented with TruGraph.

Next Steps¶

Add more tools to expand the agent's capabilities
Add Plan Quality and Plan Adherence metrics for agents that do explicit planning
Compare different model versions using the leaderboard
Explore the trace in the dashboard to see tool calls and LLM reasoning