Evaluate your Hybrid Search RAG pipeline built on Langchain and Qdrantยถ
Overviewยถ
This notebook walks you through building and evaluating a Hybrid Search RAG (Retrieval-Augmented Generation) pipeline using Langchain, Qdrant, OpenAI and TruLens. You will load a real financial document, index it into a vector store using both dense and sparse embeddings, and assess response quality using automated LLM-based evaluation metrics.
Follow along!
Installationยถ
!pip install langchain-community langchain-openai langchain-qdrant
!pip install fastembed pypdfium2 trulens trulens-providers-openai
Initial Setupยถ
Install the required libraries and configure API keys for OpenAI and Qdrant. A TruLens session is initialized here to track and store all evaluation data throughout the notebook.
- Get OpenAI API Key: https://platform.openai.com/
- Get Qdrant API Key and Endpoint URL: https://cloud.qdrant.io/
from langchain_community.document_loaders import PyPDFium2Loader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import FastEmbedEmbeddings
from langchain_qdrant import FastEmbedSparse, QdrantVectorStore, RetrievalMode
from qdrant_client import QdrantClient, models
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")
QDRANT_URL = userdata.get("QDRANT_URL")
QDRANT_API_KEY = userdata.get("QDRANT_API_KEY")
from trulens.core import TruSession
session = TruSession()
session.reset_database()
Data loading and chunkingยถ
The IMF 2025 Financial Access Survey annual report is downloaded and parsed using PyPDFium2. It is then split into overlapping chunks using Langchain's RecursiveCharacterTextSplitter to prepare the text for embedding and retrieval.
Reason for picking PyPDFium2, it works better on Financial document. Research: arxiv:2410.09871
!wget https://data.imf.org/-/media/iData/External-Storage/Documents/7FC05452C6C743D2BFB6188D2E248A38/en/2025-FAS-Annual-Report.pdf -O annual_report.pdf
path = "annual_report.pdf"
loader = PyPDFium2Loader(path)
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 1536,
chunk_overlap = 0,
)
chunks = text_splitter.split_documents(docs)
Setup your Qdrant Vector Database for Hybrid Search RAGยถ
A Qdrant collection is created with separate configurations for dense vectors (using Jina embeddings) and sparse vectors (using BM25). This dual setup enables hybrid retrieval, combining semantic similarity with keyword-based matching for more accurate results.
dense_embeddings = FastEmbedEmbeddings(model_name="jinaai/jina-embeddings-v2-small-en")
sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")
collection_name = "annual-report-survey"
dense_vector_name = "dense"
sparse_vector_name = "sparse"
client = QdrantClient(
url=QDRANT_URL,
api_key=QDRANT_API_KEY
)
client.create_collection(
collection_name = collection_name,
vectors_config = {
dense_vector_name : models.VectorParams(size = 512,
distance = models.Distance.COSINE
)},
sparse_vectors_config={
sparse_vector_name: models.SparseVectorParams(
index = models.SparseIndexParams(on_disk=False),
modifier = models.Modifier.IDF,
)
})
Index your data into the Vector knowledge storeยถ
Document chunks are embedded and stored in Qdrant using the QdrantVectorStore interface with hybrid retrieval mode enabled. A quick similarity search is run to verify the retriever is working correctly before moving to generation.
db = QdrantVectorStore(
client = client,
collection_name = collection_name,
embedding = dense_embeddings,
sparse_embedding = sparse_embeddings,
retrieval_mode = RetrievalMode.HYBRID,
vector_name =dense_vector_name,
sparse_vector_name = sparse_vector_name,
)
db.add_documents(documents=chunks) # just run it once. After the data is indexed, you need not have to add_documents again
Test the retriever on the user queryยถ
query = "projected reach of the digital remittance market by 2034"
relevant_docs = db.similarity_search(query)
Generator - Generate RAG response using ChatOpenAIยถ
A RAG class is defined with three instrumented methods: retrieve, generate_completion, and query. TruLens instrumentation decorators are applied to each method so that inputs, outputs, and intermediate context can be captured automatically for evaluation.
from trulens.core.otel.instrument import instrument
from trulens.otel.semconv.trace import SpanAttributes
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-5-mini")
SYSTEM_PROMPT = """
You are an expert financial analyst specializing in corporate finance.
You should be respectful and truthful while answering the user questions, if not you will face serious consequences.
The only source information you have is the context provided, if the user query is not from the context
Just say `I dont know , not enough information provided.`
"""
USER_PROMPT = """
Answer the USER QUERY based on the CONTEXT below.
If the question cannot be answered using the information provided answer with `I dont know , not enough information provided.`
<context>
CONTEXT: {context}
</context>
<query>
USER QUERY: {query}
</query>
"""
class RAG:
def __init__(self, model_name: str = "gpt-5-mini"):
self.model_name = model_name
@instrument(
span_type=SpanAttributes.SpanType.RETRIEVAL,
attributes={
SpanAttributes.RETRIEVAL.QUERY_TEXT: "query",
SpanAttributes.RETRIEVAL.RETRIEVED_CONTEXTS: "return",
},
)
def retrieve(self, query: str) -> list:
"""retrieve relevant documents from the vector search engine using Qdrant"""
results = db.max_marginal_relevance_search(
query=query, k=4
)
return [result.page_content for result in results]
@instrument(span_type=SpanAttributes.SpanType.GENERATION)
def generate_completion(self, query: str, context_list: list) -> str:
"""Generate answer from context with improved prompting."""
context = "\n-".join(context_list)
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": USER_PROMPT.format(context=context,
query=query)},
]
response = llm.invoke(messages)
return response.content
@instrument(span_type=SpanAttributes.SpanType.RECORD_ROOT,
attributes={
SpanAttributes.RECORD_ROOT.INPUT: "query",
SpanAttributes.RECORD_ROOT.OUTPUT: "return",
},
)
def query(self, query: str) -> str:
context_list = self.retrieve(query=query)
completion = self.generate_completion(
query=query, context_list=context_list
)
return completion
Custom LLM as a Judge - RAG Triad Evalsยถ
TruLens is used to evaluate the pipeline across three key metrics: Groundedness, Answer Relevance, and Context Relevance. A custom scoring rubric is defined for Context Relevance to reflect the financial domain, and the pipeline is tested on three queries ranging from relevant to out-of-scope. Results are summarized in a leaderboard view.
import numpy as np
from trulens.core import Metric
from trulens.core import Selector
from trulens.providers.openai import OpenAI
provider = OpenAI(model_engine="gpt-5-nano")
f_groundedness = Metric(
implementation=provider.groundedness_measure_with_cot_reasons_consider_answerability,
name="Groundedness",
selectors={
"source": Selector.select_context(collect_list=True),
"statement": Selector.select_record_output(),
"question": Selector.select_record_input(),
},
)
f_answer_relevance = Metric(
implementation=provider.relevance_with_cot_reasons,
name="Answer Relevance",
selectors={
"prompt": Selector.select_record_input(),
"response": Selector.select_record_output(),
},
)
custom_content_relevance_criteria = (
"Score the retrieved contexts for relevance to the financial question from 0 to 4. "
"0 - No context contains relevant financial information. "
"1 - At least one context contains loosely related financial terms or background but lacks specifics. "
"2 - At least two contexts contain relevant financial data or figures partially addressing the query. "
"3 - At least three contexts are directly relevant with specific financial metrics, ratios, or facts. "
"4 - All four contexts are highly relevant, collectively providing sufficient information to answer the query completely."
)
f_context_relevance = Metric(
implementation=provider.context_relevance_with_cot_reasons,
name="Context Relevance",
selectors={
"question": Selector.select_record_input(),
"context": Selector.select_context(collect_list=False),
},
criteria=custom_content_relevance_criteria,
min_score_val=0,
max_score_val=4,
)
from trulens.apps.app import TruApp
app = RAG()
evals_trace = TruApp(app,
app_name="Annural Report",
app_version="v1",
feedbacks=[f_groundedness, f_answer_relevance, f_context_relevance],
)
test_queries = [
"projected reach of the digital remittance market by 2034",
"what is blackhole",
"what was the gain of BNPL model via crowdfunding in 2022"
]
with evals_trace as recording:
for query in test_queries:
app.query(query)
session.get_leaderboard()
from trulens.dashboard import run_dashboard
run_dashboard(session)
