GEPA + TruLens: Evolving Prompts with Feedback-Driven Fitness¶

GEPA (Genetic/Evolutionary Prompt Adaptation) optimizes prompts using evolutionary algorithms. Instead of manually tuning instructions, you define a fitness function that scores prompt variants and let the algorithm search for improvements.

TruLens feedback functions are a natural fit as fitness functions: they already score dimensions like context relevance, groundedness, and toxicity on a [0, 1] scale.

This notebook shows how to:

Wrap a TruLens feedback function as a GEPA fitness function using TruGEPA.
Run evolutionary prompt optimization with run_evolution.
Automatically log every evaluation as a TruLens virtual record for audit and dashboard visualization.
Plot the improvement trajectory.

In [ ]:

Copied!

# !pip install trulens trulens-apps-gepa trulens-providers-openai
# !pip install trulens trulens-apps-gepa trulens-providers-openai

Setup¶

In [ ]:

Copied!

import os

# Set your OpenAI API key.
os.environ["OPENAI_API_KEY"] = "sk-..."  # replace with your key
import os

# Set your OpenAI API key.
os.environ["OPENAI_API_KEY"] = "sk-..."  # replace with your key

1. Start a TruLens session¶

TruGEPA logs every evaluation automatically. A TruSession must be active before the first evaluation so records have somewhere to go.

In [ ]:

Copied!

from trulens.core import TruSession

session = TruSession()
session.reset_database()
from trulens.core import TruSession

session = TruSession()
session.reset_database()

2. Define a feedback function¶

We use context_relevance from the OpenAI provider. It scores how relevant a prompt is to a fixed reference context on a [0, 1] scale.

In [ ]:

Copied!





from trulens.providers.openai import OpenAI

provider = OpenAI()

# context_relevance expects (question, context) -> float
feedback_fn = provider.context_relevance

# Fixed reference context used to score every prompt variant.
REFERENCE_CONTEXT = (
    "TruLens is an open-source library for evaluating and tracking "
    "LLM-based applications. It supports feedback functions for quality, "
    "safety, and relevance metrics."
)
from trulens.providers.openai import OpenAI

provider = OpenAI()

# context_relevance expects (question, context) -> float
feedback_fn = provider.context_relevance

# Fixed reference context used to score every prompt variant.
REFERENCE_CONTEXT = (
    "TruLens is an open-source library for evaluating and tracking "
    "LLM-based applications. It supports feedback functions for quality, "
    "safety, and relevance metrics."
)

3. Wrap the feedback function as a GEPA fitness function¶

Pass both app_name and app_version to enable logging — TruGEPA creates a TruVirtual recorder automatically and every evaluation is stored as a TruLens record. Omit both to run without logging. Supplying only one raises an error immediately.

In [ ]:

Copied!





from trulens.apps.gepa import TruGEPA

fitness = TruGEPA(
    feedback_fn,
    # optimize_key names the feedback arg that receives the evolving prompt.
    optimize_key="question",
    # feedback_args holds all other fixed args forwarded on every call.
    feedback_args={"context": REFERENCE_CONTEXT},
    # Supply both to enable logging; omit both to run without logging.
    app_name="gepa_prompt_optimizer",
    app_version="v1",
)

# Quick sanity check.
score = fitness("What does TruLens do?")
print(f"Test score: {score:.3f}")
from trulens.apps.gepa import TruGEPA

fitness = TruGEPA(
    feedback_fn,
    # optimize_key names the feedback arg that receives the evolving prompt.
    optimize_key="question",
    # feedback_args holds all other fixed args forwarded on every call.
    feedback_args={"context": REFERENCE_CONTEXT},
    # Supply both to enable logging; omit both to run without logging.
    app_name="gepa_prompt_optimizer",
    app_version="v1",
)

# Quick sanity check.
score = fitness("What does TruLens do?")
print(f"Test score: {score:.3f}")

4. Define a mutation function¶

A mutation function takes a prompt string and returns a modified variant. Real-world setups often use an LLM to rephrase; here we use simple template mutations for illustration.

In [ ]:

Copied!





import random

MUTATIONS = [
    lambda p: f"Please explain: {p}",
    lambda p: f"{p} Provide a detailed answer.",
    lambda p: f"In simple terms, {p.lower()}",
    lambda p: f"{p} Focus on key benefits.",
    lambda p: p.replace("?", ". Explain this."),
]

def mutate(prompt: str) -> str:
    return random.choice(MUTATIONS)(prompt)
import random

MUTATIONS = [
    lambda p: f"Please explain: {p}",
    lambda p: f"{p} Provide a detailed answer.",
    lambda p: f"In simple terms, {p.lower()}",
    lambda p: f"{p} Focus on key benefits.",
    lambda p: p.replace("?", ". Explain this."),
]

def mutate(prompt: str) -> str:
    return random.choice(MUTATIONS)(prompt)

5. Run evolutionary optimization¶

In [ ]:

Copied!





from trulens.apps.gepa import run_evolution

BASE_PROMPT = "What is TruLens?"

best_prompt, best_score, history = run_evolution(
    base_prompt=BASE_PROMPT,
    fitness_fn=fitness,
    mutate_fn=mutate,
    n_generations=8,
    population_size=5,
    top_k=2,
    seed=42,
)

print(f"\nBest prompt  : {best_prompt}")
print(f"Best score   : {best_score:.3f}")
from trulens.apps.gepa import run_evolution

BASE_PROMPT = "What is TruLens?"

best_prompt, best_score, history = run_evolution(
    base_prompt=BASE_PROMPT,
    fitness_fn=fitness,
    mutate_fn=mutate,
    n_generations=8,
    population_size=5,
    top_k=2,
    seed=42,
)

print(f"\nBest prompt  : {best_prompt}")
print(f"Best score   : {best_score:.3f}")

6. Visualize the improvement trajectory¶

In [ ]:

Copied!





import matplotlib.pyplot as plt

generations = list(range(1, len(history) + 1))
scores = [s for _, s in history]

plt.figure(figsize=(8, 4))
plt.plot(generations, scores, marker="o", linewidth=2)
plt.xlabel("Generation")
plt.ylabel("Best fitness score")
plt.title("GEPA Prompt Optimization — context_relevance trajectory")
plt.ylim(0, 1.05)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\nGeneration-by-generation history:")
for gen, (prompt, score) in enumerate(history, 1):
    print(f"  Gen {gen:2d} | score={score:.3f} | prompt='{prompt}'")
import matplotlib.pyplot as plt

generations = list(range(1, len(history) + 1))
scores = [s for _, s in history]

plt.figure(figsize=(8, 4))
plt.plot(generations, scores, marker="o", linewidth=2)
plt.xlabel("Generation")
plt.ylabel("Best fitness score")
plt.title("GEPA Prompt Optimization — context_relevance trajectory")
plt.ylim(0, 1.05)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\nGeneration-by-generation history:")
for gen, (prompt, score) in enumerate(history, 1):
    print(f"  Gen {gen:2d} | score={score:.3f} | prompt='{prompt}'")

7. View results in the TruLens dashboard¶

All evaluations were logged automatically as virtual records. Launch the dashboard to explore them interactively.

In [ ]:

Copied!

records_df, _ = session.get_records_and_feedback()
print(f"Total records logged: {len(records_df)}")
records_df[["input", "output"]].head(10)
records_df, _ = session.get_records_and_feedback()
print(f"Total records logged: {len(records_df)}")
records_df[["input", "output"]].head(10)

In [ ]:

Copied!

from trulens.dashboard import run_dashboard

run_dashboard(session)
from trulens.dashboard import run_dashboard

run_dashboard(session)