Evaluate your Hybrid Search RAG pipeline built on Langchain and Qdrant¶

Overview¶

This notebook walks you through building and evaluating a Hybrid Search RAG (Retrieval-Augmented Generation) pipeline using Langchain, Qdrant, OpenAI and TruLens. You will load a real financial document, index it into a vector store using both dense and sparse embeddings, and assess response quality using automated LLM-based evaluation metrics.

Follow along!

Installation¶

In [ ]:

Copied!

!pip install langchain-community langchain-openai langchain-qdrant
!pip install langchain-community langchain-openai langchain-qdrant

In [ ]:

Copied!

!pip install fastembed pypdfium2  trulens trulens-providers-openai
!pip install fastembed pypdfium2  trulens trulens-providers-openai

Initial Setup¶

Install the required libraries and configure API keys for OpenAI and Qdrant. A TruLens session is initialized here to track and store all evaluation data throughout the notebook.

Get OpenAI API Key: https://platform.openai.com/
Get Qdrant API Key and Endpoint URL: https://cloud.qdrant.io/

In [ ]:

Copied!





from langchain_community.document_loaders import PyPDFium2Loader
from langchain_text_splitters import RecursiveCharacterTextSplitter

from langchain_community.embeddings import FastEmbedEmbeddings
from langchain_qdrant import FastEmbedSparse, QdrantVectorStore, RetrievalMode
from qdrant_client import QdrantClient, models
from langchain_community.document_loaders import PyPDFium2Loader
from langchain_text_splitters import RecursiveCharacterTextSplitter

from langchain_community.embeddings import FastEmbedEmbeddings
from langchain_qdrant import FastEmbedSparse, QdrantVectorStore, RetrievalMode
from qdrant_client import QdrantClient, models

In [ ]:

Copied!





import os
from google.colab import userdata

os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")
QDRANT_URL = userdata.get("QDRANT_URL")
QDRANT_API_KEY = userdata.get("QDRANT_API_KEY")
import os
from google.colab import userdata

os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")
QDRANT_URL = userdata.get("QDRANT_URL")
QDRANT_API_KEY = userdata.get("QDRANT_API_KEY")

In [ ]:

Copied!

from trulens.core import TruSession

session = TruSession()
session.reset_database()
from trulens.core import TruSession

session = TruSession()
session.reset_database()

Data loading and chunking¶

The IMF 2025 Financial Access Survey annual report is downloaded and parsed using PyPDFium2. It is then split into overlapping chunks using Langchain's RecursiveCharacterTextSplitter to prepare the text for embedding and retrieval.

Reason for picking PyPDFium2, it works better on Financial document. Research: arxiv:2410.09871

In [ ]:

Copied!

!wget https://data.imf.org/-/media/iData/External-Storage/Documents/7FC05452C6C743D2BFB6188D2E248A38/en/2025-FAS-Annual-Report.pdf -O annual_report.pdf
!wget https://data.imf.org/-/media/iData/External-Storage/Documents/7FC05452C6C743D2BFB6188D2E248A38/en/2025-FAS-Annual-Report.pdf -O annual_report.pdf

In [ ]:

Copied!

path = "annual_report.pdf"
loader = PyPDFium2Loader(path)
docs = loader.load()
path = "annual_report.pdf"
loader = PyPDFium2Loader(path)
docs = loader.load()

In [ ]:

Copied!





text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1536,
    chunk_overlap = 0,
)
chunks = text_splitter.split_documents(docs)
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1536,
    chunk_overlap = 0,
)
chunks = text_splitter.split_documents(docs)

Setup your Qdrant Vector Database for Hybrid Search RAG¶

A Qdrant collection is created with separate configurations for dense vectors (using Jina embeddings) and sparse vectors (using BM25). This dual setup enables hybrid retrieval, combining semantic similarity with keyword-based matching for more accurate results.

In [ ]:

Copied!

dense_embeddings = FastEmbedEmbeddings(model_name="jinaai/jina-embeddings-v2-small-en")
sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")
dense_embeddings = FastEmbedEmbeddings(model_name="jinaai/jina-embeddings-v2-small-en")
sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")

In [ ]:

Copied!

collection_name = "annual-report-survey"
dense_vector_name = "dense"
sparse_vector_name = "sparse"
collection_name = "annual-report-survey"
dense_vector_name = "dense"
sparse_vector_name = "sparse"

In [ ]:

Copied!





client = QdrantClient(
    url=QDRANT_URL,
    api_key=QDRANT_API_KEY
)
client = QdrantClient(
    url=QDRANT_URL,
    api_key=QDRANT_API_KEY
)

In [ ]:

Copied!





client.create_collection(
    collection_name = collection_name,
    vectors_config = {
        dense_vector_name : models.VectorParams(size = 512,
                                     distance = models.Distance.COSINE
                                     )},
    sparse_vectors_config={
        sparse_vector_name: models.SparseVectorParams(
            index = models.SparseIndexParams(on_disk=False),
            modifier = models.Modifier.IDF,
        )
    })
client.create_collection(
    collection_name = collection_name,
    vectors_config = {
        dense_vector_name : models.VectorParams(size = 512,
                                     distance = models.Distance.COSINE
                                     )},
    sparse_vectors_config={
        sparse_vector_name: models.SparseVectorParams(
            index = models.SparseIndexParams(on_disk=False),
            modifier = models.Modifier.IDF,
        )
    })

Index your data into the Vector knowledge store¶

Document chunks are embedded and stored in Qdrant using the QdrantVectorStore interface with hybrid retrieval mode enabled. A quick similarity search is run to verify the retriever is working correctly before moving to generation.

In [ ]:

Copied!





db = QdrantVectorStore(
    client = client,
    collection_name = collection_name,
    embedding = dense_embeddings,
    sparse_embedding = sparse_embeddings,
    retrieval_mode = RetrievalMode.HYBRID,
    vector_name =dense_vector_name,
    sparse_vector_name = sparse_vector_name,
)
db = QdrantVectorStore(
    client = client,
    collection_name = collection_name,
    embedding = dense_embeddings,
    sparse_embedding = sparse_embeddings,
    retrieval_mode = RetrievalMode.HYBRID,
    vector_name =dense_vector_name,
    sparse_vector_name = sparse_vector_name,
)

In [ ]:

Copied!

db.add_documents(documents=chunks) # just run it once. After the data is indexed, you need not have to add_documents again
db.add_documents(documents=chunks) # just run it once. After the data is indexed, you need not have to add_documents again

Test the retriever on the user query¶

In [ ]:

Copied!

query = "projected reach of the digital remittance market by 2034"
relevant_docs = db.similarity_search(query)
query = "projected reach of the digital remittance market by 2034"
relevant_docs = db.similarity_search(query)

Generator - Generate RAG response using ChatOpenAI¶

A RAG class is defined with three instrumented methods: retrieve, generate_completion, and query. TruLens instrumentation decorators are applied to each method so that inputs, outputs, and intermediate context can be captured automatically for evaluation.

In [ ]:

Copied!

from trulens.core.otel.instrument import instrument
from trulens.otel.semconv.trace import SpanAttributes
from langchain_openai import ChatOpenAI
from trulens.core.otel.instrument import instrument
from trulens.otel.semconv.trace import SpanAttributes
from langchain_openai import ChatOpenAI

In [ ]:

Copied!

llm = ChatOpenAI(model_name="gpt-5-mini")
llm = ChatOpenAI(model_name="gpt-5-mini")

In [ ]:

Copied!





SYSTEM_PROMPT = """
  You are an expert financial analyst specializing in corporate finance.
  You should be respectful and truthful while answering the user questions, if not you will face serious consequences.

  The only source information you have is the context provided, if the user query is not from the context
  Just say `I dont know , not enough information provided.`
"""
SYSTEM_PROMPT = """
  You are an expert financial analyst specializing in corporate finance.
  You should be respectful and truthful while answering the user questions, if not you will face serious consequences.

  The only source information you have is the context provided, if the user query is not from the context
  Just say `I dont know , not enough information provided.`
"""

In [ ]:

Copied!





USER_PROMPT = """
  Answer the USER QUERY based on the CONTEXT below.
  If the question cannot be answered using the information provided answer with `I dont know , not enough information provided.`

  <context>
  CONTEXT: {context}
  </context>

  <query>
  USER QUERY: {query}
  </query>
"""
USER_PROMPT = """
  Answer the USER QUERY based on the CONTEXT below.
  If the question cannot be answered using the information provided answer with `I dont know , not enough information provided.`

  <context>
  CONTEXT: {context}
  </context>

  <query>
  USER QUERY: {query}
  </query>
"""

In [ ]:

Copied!





class RAG:
    def __init__(self, model_name: str = "gpt-5-mini"):
        self.model_name = model_name

    @instrument(
        span_type=SpanAttributes.SpanType.RETRIEVAL,
        attributes={
            SpanAttributes.RETRIEVAL.QUERY_TEXT: "query",
            SpanAttributes.RETRIEVAL.RETRIEVED_CONTEXTS: "return",
        },
    )
    def retrieve(self, query: str) -> list:
        """retrieve relevant documents from the vector search engine using Qdrant"""
        results = db.max_marginal_relevance_search(
            query=query, k=4
        )
        return [result.page_content for result in results]

    @instrument(span_type=SpanAttributes.SpanType.GENERATION)
    def generate_completion(self, query: str, context_list: list) -> str:
        """Generate answer from context with improved prompting."""

        context = "\n-".join(context_list)
        messages = [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": USER_PROMPT.format(context=context,
                                                          query=query)},
        ]
        response = llm.invoke(messages)
        return response.content

    @instrument(span_type=SpanAttributes.SpanType.RECORD_ROOT,
        attributes={
            SpanAttributes.RECORD_ROOT.INPUT: "query",
            SpanAttributes.RECORD_ROOT.OUTPUT: "return",
        },
    )
    def query(self, query: str) -> str:
        context_list = self.retrieve(query=query)
        completion = self.generate_completion(
            query=query, context_list=context_list
        )
        return completion
class RAG:
    def __init__(self, model_name: str = "gpt-5-mini"):
        self.model_name = model_name

    @instrument(
        span_type=SpanAttributes.SpanType.RETRIEVAL,
        attributes={
            SpanAttributes.RETRIEVAL.QUERY_TEXT: "query",
            SpanAttributes.RETRIEVAL.RETRIEVED_CONTEXTS: "return",
        },
    )
    def retrieve(self, query: str) -> list:
        """retrieve relevant documents from the vector search engine using Qdrant"""
        results = db.max_marginal_relevance_search(
            query=query, k=4
        )
        return [result.page_content for result in results]

    @instrument(span_type=SpanAttributes.SpanType.GENERATION)
    def generate_completion(self, query: str, context_list: list) -> str:
        """Generate answer from context with improved prompting."""

        context = "\n-".join(context_list)
        messages = [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": USER_PROMPT.format(context=context,
                                                          query=query)},
        ]
        response = llm.invoke(messages)
        return response.content

    @instrument(span_type=SpanAttributes.SpanType.RECORD_ROOT,
        attributes={
            SpanAttributes.RECORD_ROOT.INPUT: "query",
            SpanAttributes.RECORD_ROOT.OUTPUT: "return",
        },
    )
    def query(self, query: str) -> str:
        context_list = self.retrieve(query=query)
        completion = self.generate_completion(
            query=query, context_list=context_list
        )
        return completion

Custom LLM as a Judge - RAG Triad Evals¶

TruLens is used to evaluate the pipeline across three key metrics: Groundedness, Answer Relevance, and Context Relevance. A custom scoring rubric is defined for Context Relevance to reflect the financial domain, and the pipeline is tested on three queries ranging from relevant to out-of-scope. Results are summarized in a leaderboard view.

In [ ]:

Copied!





import numpy as np
from trulens.core import Metric
from trulens.core import Selector
from trulens.providers.openai import OpenAI
import numpy as np
from trulens.core import Metric
from trulens.core import Selector
from trulens.providers.openai import OpenAI

In [ ]:

Copied!

provider = OpenAI(model_engine="gpt-5-nano")
provider = OpenAI(model_engine="gpt-5-nano")

In [ ]:

Copied!





f_groundedness = Metric(
    implementation=provider.groundedness_measure_with_cot_reasons_consider_answerability,
    name="Groundedness",
    selectors={
        "source": Selector.select_context(collect_list=True),
        "statement": Selector.select_record_output(),
        "question": Selector.select_record_input(),
    },
)
f_answer_relevance = Metric(
    implementation=provider.relevance_with_cot_reasons,
    name="Answer Relevance",
    selectors={
        "prompt": Selector.select_record_input(),
        "response": Selector.select_record_output(),
    },
)
f_groundedness = Metric(
    implementation=provider.groundedness_measure_with_cot_reasons_consider_answerability,
    name="Groundedness",
    selectors={
        "source": Selector.select_context(collect_list=True),
        "statement": Selector.select_record_output(),
        "question": Selector.select_record_input(),
    },
)
f_answer_relevance = Metric(
    implementation=provider.relevance_with_cot_reasons,
    name="Answer Relevance",
    selectors={
        "prompt": Selector.select_record_input(),
        "response": Selector.select_record_output(),
    },
)

In [ ]:

Copied!





custom_content_relevance_criteria = (
    "Score the retrieved contexts for relevance to the financial question from 0 to 4. "
    "0 - No context contains relevant financial information. "
    "1 - At least one context contains loosely related financial terms or background but lacks specifics. "
    "2 - At least two contexts contain relevant financial data or figures partially addressing the query. "
    "3 - At least three contexts are directly relevant with specific financial metrics, ratios, or facts. "
    "4 - All four contexts are highly relevant, collectively providing sufficient information to answer the query completely."
)
custom_content_relevance_criteria = (
    "Score the retrieved contexts for relevance to the financial question from 0 to 4. "
    "0 - No context contains relevant financial information. "
    "1 - At least one context contains loosely related financial terms or background but lacks specifics. "
    "2 - At least two contexts contain relevant financial data or figures partially addressing the query. "
    "3 - At least three contexts are directly relevant with specific financial metrics, ratios, or facts. "
    "4 - All four contexts are highly relevant, collectively providing sufficient information to answer the query completely."
)

In [ ]:

Copied!





f_context_relevance = Metric(
    implementation=provider.context_relevance_with_cot_reasons,
    name="Context Relevance",
    selectors={
        "question": Selector.select_record_input(),
        "context": Selector.select_context(collect_list=False),
    },
    criteria=custom_content_relevance_criteria,
    min_score_val=0,
    max_score_val=4,
)
f_context_relevance = Metric(
    implementation=provider.context_relevance_with_cot_reasons,
    name="Context Relevance",
    selectors={
        "question": Selector.select_record_input(),
        "context": Selector.select_context(collect_list=False),
    },
    criteria=custom_content_relevance_criteria,
    min_score_val=0,
    max_score_val=4,
)

In [ ]:

Copied!

from trulens.apps.app import TruApp
from trulens.apps.app import TruApp

In [ ]:

Copied!

app = RAG()
app = RAG()

In [ ]:

Copied!





evals_trace = TruApp(app,
    app_name="Annural Report",
    app_version="v1",
    feedbacks=[f_groundedness, f_answer_relevance, f_context_relevance],
)
evals_trace = TruApp(app,
    app_name="Annural Report",
    app_version="v1",
    feedbacks=[f_groundedness, f_answer_relevance, f_context_relevance],
)

In [ ]:

Copied!





test_queries = [
    "projected reach of the digital remittance market by 2034",
    "what is blackhole",
    "what was the gain of BNPL model via crowdfunding in 2022"
]
test_queries = [
    "projected reach of the digital remittance market by 2034",
    "what is blackhole",
    "what was the gain of BNPL model via crowdfunding in 2022"
]

In [ ]:

Copied!

with evals_trace as recording:
    for query in test_queries:
        app.query(query)
with evals_trace as recording:
    for query in test_queries:
        app.query(query)

In [ ]:

Copied!

session.get_leaderboard()
session.get_leaderboard()

In [ ]:

Copied!

from trulens.dashboard import run_dashboard
run_dashboard(session)
from trulens.dashboard import run_dashboard
run_dashboard(session)

TruLens Dashboard