Finding Answers in Complex Standardization Documents Using Qdrant 📃

Building RAG to query standardization documents

In a recent survey conducted by cnvrg.io, they found that 44% of up-and-coming firms and startups consider generative AI to be indispensable to their growth. At present, owing to their constraints, a number of them opt for LLMaaS (LLM as a Service). This involves purchasing AI technology from top-tier companies for their use. Despite the fact that many lack the necessary infrastructure and expertise to construct their own LLMs, the report forecasts that 48% of these firms will adopt a build approach in the near future. This implies utilizing open-source resources and customizing them to develop their own models for specific applications. Large language models have become an essential part of enterprise solutions in such a short span of time.

Retrieval Augmented Generation (RAG)

Considering the huge potential and value of LLMs, using them straight out of the box is not advisable. When we query an LLM on a specific topic continuously with ordinary prompts, the LLM may provide irrelevant responses as it needs more precision of that topic due to the vast and wide range of content it is trained on. This drawback is called “Hallucination” and can be usually solved by providing the LLM with context while using it. This process involves providing the LLM with content from which it can draw conclusions. This enables the model to generate responses relevant to the context and reflects well with what the user expects. This is what we call Retrieval Augmented Generation (RAG).

Usually this is the case with standardization documents like the ISO standards or the IEEE standards. Those are long and complex documents — and reading them to find some information we seek and get exactly what we want is nearly impossible. That is where RAG can help us. RAG’s capabilities are fundamentally rooted in the retrieval process. This enables the model to broaden its understanding beyond the pre-existing or pre-trained data, accessing a large pool of information that is current or specific to the context.

Qdrant

Qdrant, a vector database and search engine crafted in Rust, is an open-source platform. It serves as a comprehensive solution for developers aiming to incorporate vector similarity search, matching, and recommendations into their applications. Its user-friendly nature for recommendation systems and searching sets it apart from other vector databases, mainly due to its extensive API functionalities. In addition, it offers pre-built client libraries for Python and several other programming languages. Qdrant is highly scalable with cloud compatibility and accommodates a broad spectrum of data types.

. . .

In this tutorial, we will explore a question-answering RAG system for such documents using the Qdrant vector database.

The System Workflow

The workflow of the Qdrant-based RAG system will be as follows:

  1. Obtain the user’s question.

  2. Transform the user’s question into a semantically equivalent vector representation using an embedding model.

  3. Use Qdrant’s built-in functions to calculate the similarity between the query vector and the content vectors in the database, and fetch the top-k related content.

  4. Use the retrieved context and the user’s question as input for an LLM.

  5. The LLM generates the appropriate response.

Prerequisites

The directory structure for the project is as shown.

.
├── app.py
├── docs
│ ├── ASM-standards.pdf
│ ├── ASM.pdf
├── .env
├── ingest.py
└── requirements.txt

The tutorial is tested on these Python libraries. Make sure about the versions while working. Install these using requirements.txt.

transformers: 4.36.1
qdrant-client: 1.7.0
langchain: 0.0.350
sentence_transformers: 2.2.2
huggingface_hub: 0.19.4
PyPDF2
protobuf: 4.25.1
torch: 2.1.2

Set Up Qdrant

Make sure you have docker installed and keep the docker engine running if you are using a local environment. Qdrant can be installed by downloading its docker image.

!docker pull qdrant/qdrant

Run the Qdrant docker container using the command.

!docker run -p 6333:6333 \
    -v $(pwd)/qdrant_storage:/qdrant/storage \
    qdrant/qdrant

Alternatively, you can start the container from the docker desktop console.

Then only you will be able to start the Qdrant client in the Python files.

Creating the Knowledge Base

LLMs work on encoded vector representations of real-world content called embeddings. So, before we use or store the data from the document, we must convert them into embeddings. The retrieval knowledge base of the RAG system will be the vector database and it will store all content we have in the form of embeddings. We will use the ASM standardization document for the demo.

Import required libraries.

ingest.py

import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders import PyPDFLoader
from qdrant_client import QdrantClient
from qdrant_client.http import models
import torch
from sentence_transformers import SentenceTransformer

We’ll begin by creating a collection. In database terms, a collection is a cluster of data, with each individual piece referred to as a document. The dimensions of the vectors within the collection are defined by us. For this scenario, each vector possesses 348 dimensions, corresponding to the 348-dimensional output of the model we’re utilizing.

ingest.py

qdrant_client = QdrantClient(host='localhost', port=6333)
my_collection = "ASM"
qdrant_client.recreate_collection(
    collection_name=my_collection,
    vectors_config=models.VectorParams(size=384, distance=models.Distance.COSINE)
)

Extract the text from the document using the PyPDFLoader library. To enhance the search process, the document is divided into several sections. This strategy assists in the effective extraction of the most pertinent information. We employ a tool from LangChain, known as RecursiveCharacterTextSplitter, to break the document down into 700-character segments, with each segment overlapping the next by 50 characters.

ingest.py

loader = PyPDFLoader("docs/ASM-standards.pdf")
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=700, chunk_overlap=50)
texts = text_splitter.split_documents(documents)

Define the embedding model. Here we are using Mini-LM-L6. MiniLM-L6-v2 is specifically designed for efficient text encoding, making it well-suited for generating dense vector representations of documents or text segments.

ingest.py

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

For each document object from texts, create the embedding vector. Qdrant stores the vectors as points which have the following structure.

  • id: The ID of the document object.

  • vector: The embedding vector of the document object.

  • payload: The original content of the document object.

ingest.py

points_list = []
i=0
for text in texts:
    embedding = model.encode(text.page_content)
    point_dict = {
        "id": i+1,
        "vector": embedding,
        "payload": {"text": text.page_content},
    }
    points_list.append(point_dict)
    i+=1

Store the extracted points to the vector database.

ingest.py

qdrant_client.upsert(collection_name=my_collection, points=points_list)

Now run the file ingest.py.

Retrieving Context

As we discussed earlier, the document is long and contains a lot of standardization rules. If we choose to know about the standards of only one industry or commodity, we need to extract only related content from the database. This should be given to the LLM to get the desired response. The LLM used for question-answering here is a fine-tuned version of BERT.

Import libraries.

app.py

import torch
from transformers import pipeline
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from qdrant_client.http import models
from typing import List

Retrieve the collection we just created from the database.

app.py

collection_name = "ASM"
qdrant_client = QdrantClient(host='localhost', port=6333)
collections = qdrant_client.get_collections()

Define the models we are going to utilize for the question-answering and the embedding function.

app.py

model_name = "bert-large-uncased-whole-word-masking-finetuned-squad"
embedding_model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

Now create the task pipeline. The pipeline is configured to handle question-answering tasks; prepare it to receive questions and context as input and return answers. The task parameter is set to question answering.

app.py

reader = pipeline("question-answering", model=model_name, tokenizer=model_name)

Now, define the function which fetches relevant context from the database.

app.py

def get_context(query: str, top_k: int) -> List[str]:
    """
    Get the relevant context from the database for a given query


    Args:
        query (str): What do we want to know?
        top_k (int): Top K results to return


    Returns:
        context (List[str]):
    """
    try:
        encoded_query = embedding_model.encode(query).tolist()  


        result = qdrant_client.search(
            collection_name=collection_name,
            query_vector=encoded_query,
            limit=top_k,
        ) 


        context = [
            [x.payload["text"]] for x in result
        ]  
        return context


    except Exception as e:
        print({e})

The function encodes the query into an embedding and searches the whole database for similarity using the search function of Qdrant. This feature from Qdrant is very efficient in computing similarities between vectors and fetching similar contents. The context is retrieved from the database and returned by the function.

Generating Responses

Define the function to generate responses based on context and the query.

app.py

def get_response(query: str, context: List[str]):
    """
    Extract the answer from the context for a given query


    Args:
        query (str): _description_
        context (list[str]): _description_
    """
    results = []
    for c in context:
        answer = reader(question=query, context=c[0])
        results.append(answer)



    results = sorted(results, key=lambda x: x["score"], reverse=True)
    for i in range(len(results)):
        print(f"{i+1}", end=" ")
        print(
            "Answer: ",
            results[i]["answer"],
            "\n  score: ",
            results[i]["score"],
        )

In this process, the pipeline is supplied with the query and context. The LLM utilizes these inputs to derive and structure the responses. The outcomes are then arranged according to the scores obtained from the reader model, with a higher score indicating a more pertinent response.

Here are some sample queries and responses.

app.py

query = "What is a mandated mechanical test for Welded Ferritic-Martensitic Stainless-Steel Pipe?"
context = get_context(query, top_k=1)
print("Context: {}\n".format(context))
get_response(query, context)

Output:

query = "What is the minimum size of calibration hole in the reference standard?"

Output:

You can also use LangChain prompt templates to create better prompts that would yield better responses. It can be structured like this.


"""Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.


Context: {context}
Question: {question}


Only return the helpful answer below and nothing else.
Helpful answer:
"""

I’ll leave the exploration up to you. Don’t forget to delete the collection after use as it consumes a lot of resources when run locally.

client.delete_collection(collection_name=collection_name)

Find the complete code here.

Wrapping Up

Congratulations! You have learned how to convert your documents to vector embeddings and store them in the Qdrant vector database. You have also seen how we can query the document easily using the RAG approach without having to go through the whole complex content in the standardization document. Try scaling this using advanced chains from LangChain and frameworks like LlamaIndex based on the Qdrant vector database. I hope you enjoyed this tutorial and found it useful. Thank you for reading and happy coding!

References

Qdrant Documentation - Qdrant

Qdrant is an Open-Source Vector Database and Vector Search Engine written in Rust. It provides fast and scalable vector…

qdrant.tech

sentence-transformers (Sentence Transformers)

In the following you find models tuned to be used for sentence / text embedding generation. They can be used with the…

huggingface.co

AIAnytime - Overview

AI Evangelist | Creator of "AI Anytime" YouTube Channel - AIAnytime

github.com