Logging to PostgreSQL¶
This notebook demonstrates how to configure TruLens to log traces and feedback results to a PostgreSQL database instead of the default SQLite.
Prerequisites¶
1. Start PostgreSQL with Docker¶
# From the trulens root directory
docker compose -f docker/test-database.yaml up -d pg-test
This starts a PostgreSQL container with:
- Database:
pg-test-db - User:
pg-test-user - Password:
pg-test-pswd - Port:
5432
2. Install PostgreSQL Driver¶
pip install psycopg2-binary
Note: Use psycopg2-binary or psycopg2, NOT asyncpg. TruLens uses synchronous SQLAlchemy operations which are incompatible with async drivers.
Connect TruSession to PostgreSQL¶
In [ ]:
Copied!
from trulens.core import TruSession
# PostgreSQL connection URL format:
# postgresql://username:password@host:port/database
POSTGRES_URL = "postgresql://pg-test-user:pg-test-pswd@localhost:5432/pg-test-db"
# Create a TruSession connected to PostgreSQL
session = TruSession(database_url=POSTGRES_URL)
print(f"Connected to: {session.connector.db.engine.url.database}")
print(f"Database dialect: {session.connector.db.engine.dialect.name}")
from trulens.core import TruSession
# PostgreSQL connection URL format:
# postgresql://username:password@host:port/database
POSTGRES_URL = "postgresql://pg-test-user:pg-test-pswd@localhost:5432/pg-test-db"
# Create a TruSession connected to PostgreSQL
session = TruSession(database_url=POSTGRES_URL)
print(f"Connected to: {session.connector.db.engine.url.database}")
print(f"Database dialect: {session.connector.db.engine.dialect.name}")
Create an Instrumented App¶
Let's create a simple RAG-style application with instrumentation.
In [ ]:
Copied!
from trulens.core.otel.instrument import instrument
from trulens.otel.semconv.trace import SpanAttributes
class SimpleRAGApp:
"""A simple RAG-style application for demonstration."""
@instrument(span_type=SpanAttributes.SpanType.RETRIEVAL)
def retrieve(self, query: str) -> list:
"""Retrieve relevant contexts for a query."""
# Simulated retrieval - in practice, this would query a vector store
return [
f"TruLens is an open-source library for evaluating LLM applications.",
f"TruLens provides feedback functions to measure quality metrics.",
]
@instrument(span_type=SpanAttributes.SpanType.GENERATION)
def generate(self, query: str, contexts: list) -> str:
"""Generate an answer based on retrieved contexts."""
# Simulated generation - in practice, this would call an LLM
context_text = " ".join(contexts)
return f"Based on the context: {context_text[:100]}..."
@instrument()
def query(self, question: str) -> str:
"""Main entry point: retrieve contexts and generate an answer."""
contexts = self.retrieve(question)
return self.generate(question, contexts)
app = SimpleRAGApp()
from trulens.core.otel.instrument import instrument
from trulens.otel.semconv.trace import SpanAttributes
class SimpleRAGApp:
"""A simple RAG-style application for demonstration."""
@instrument(span_type=SpanAttributes.SpanType.RETRIEVAL)
def retrieve(self, query: str) -> list:
"""Retrieve relevant contexts for a query."""
# Simulated retrieval - in practice, this would query a vector store
return [
f"TruLens is an open-source library for evaluating LLM applications.",
f"TruLens provides feedback functions to measure quality metrics.",
]
@instrument(span_type=SpanAttributes.SpanType.GENERATION)
def generate(self, query: str, contexts: list) -> str:
"""Generate an answer based on retrieved contexts."""
# Simulated generation - in practice, this would call an LLM
context_text = " ".join(contexts)
return f"Based on the context: {context_text[:100]}..."
@instrument()
def query(self, question: str) -> str:
"""Main entry point: retrieve contexts and generate an answer."""
contexts = self.retrieve(question)
return self.generate(question, contexts)
app = SimpleRAGApp()
Wrap with TruApp and Record¶
In [ ]:
Copied!
from trulens.apps.app import TruApp
# Wrap the app with TruApp for recording
tru_app = TruApp(
app,
app_name="PostgresExampleApp",
app_version="v1",
)
# Run the app and record traces to PostgreSQL
with tru_app as recording:
result = app.query("What is TruLens?")
print(f"Answer: {result}")
from trulens.apps.app import TruApp
# Wrap the app with TruApp for recording
tru_app = TruApp(
app,
app_name="PostgresExampleApp",
app_version="v1",
)
# Run the app and record traces to PostgreSQL
with tru_app as recording:
result = app.query("What is TruLens?")
print(f"Answer: {result}")
Query Records from PostgreSQL¶
In [ ]:
Copied!
# Retrieve records from the database
records_df, feedback_cols = session.get_records_and_feedback(
app_name="PostgresExampleApp"
)
print(f"Found {len(records_df)} record(s)")
if len(records_df) > 0:
print(f"\nRecord columns: {list(records_df.columns)[:8]}...")
print(f"\nInput: {records_df['input'].iloc[0]}")
print(f"Output: {records_df['output'].iloc[0][:100]}...")
# Retrieve records from the database
records_df, feedback_cols = session.get_records_and_feedback(
app_name="PostgresExampleApp"
)
print(f"Found {len(records_df)} record(s)")
if len(records_df) > 0:
print(f"\nRecord columns: {list(records_df.columns)[:8]}...")
print(f"\nInput: {records_df['input'].iloc[0]}")
print(f"Output: {records_df['output'].iloc[0][:100]}...")
Launch the Dashboard¶
In [ ]:
Copied!
from trulens.dashboard import run_dashboard
# Launch the TruLens dashboard - it will connect to PostgreSQL automatically
run_dashboard(session)
from trulens.dashboard import run_dashboard
# Launch the TruLens dashboard - it will connect to PostgreSQL automatically
run_dashboard(session)
Connection URL Formats¶
TruLens supports various PostgreSQL connection URL formats:
# Basic format
"postgresql://username:password@host:port/database"
# With psycopg2 driver explicitly specified
"postgresql+psycopg2://username:password@host:port/database"
# With SSL (for cloud-hosted PostgreSQL)
"postgresql://username:password@host:port/database?sslmode=require"
Production Considerations¶
For production deployments:
Use environment variables for credentials:
import os database_url = os.environ.get("DATABASE_URL") session = TruSession(database_url=database_url)
Enable key redaction to avoid storing sensitive data:
session = TruSession( database_url=database_url, database_redact_keys=True )
Use connection pooling for high-throughput applications (handled automatically by SQLAlchemy).