Building a Semantic Search System with Qdrant and FastAPI: A Practical Guide to AI-Powered Customer Support Service

Ibrahim Halil Koyuncu
8 min readFeb 13, 2025

--

In this article, we explore the fundamentals of semantic search and how vector databases revolutionize information retrieval. We start by understanding what semantic search is, how vectors represent meaning, and why vector databases like Qdrant are essential for fast and efficient similarity search. We also highlight the advantages of vector databases and break down the key components of vector search in Qdrant. Finally, we dive into a real-world implementation, building a customer support and ticket resolution system, where we store past support tickets as vector embeddings and create a semantic search engine to retrieve relevant solutions instantly.

What is Semantic Search

Semantic search is a search technique that improves accuracy by understanding the intent and contextual meaning of a query rather than relying only on exact keyword matching.

What is Vector

A vector is a mathematical representation of an object or data point in a multi-dimensional space. Each element of a vector corresponds to a specific feature or attribute of the object it represents.

For example:

  • In image recognition, a vector could represent an image, with each element corresponding to pixel values or extracted visual features.
  • In natural language processing (NLP), a vector might represent a sentence, with elements capturing semantic meaning, word embeddings, or sentiment.
  • In music recommendation systems, a vector could represent a song, with elements describing tempo, genre, lyrics.

Vectors allow complex data types like text, images, audio, and video to be represented numerically, enabling machines to process and compare them efficiently.

What is the role of Vector Database

A Vector Database is a specialized type of database designed to store, manage, and query high-dimensional vectors efficiently. Unlike traditional relational databases that organize data in rows and columns, vector databases store data as collections of high-dimensional vectors along with metadata.

Difference between Vector Database and RDMS

Key Distance Metrics in VectorDBs:

Vector databases use mathematical distance functions to measure similarity:

  1. Cosine Similarity — Measures the angle between two vectors; useful for text similarity.
  2. Dot Product — Considers both the magnitude and direction; useful in recommendation systems.
  3. Euclidean Distance — Measures straight-line distance; useful for spatial or numerical data.

Advantages of Vector Databases:

  • Efficient storage and retrieval of high-dimensional data.
  • Fast real-time similarity searches across billions of vectors.
  • Support for unstructured data like images, videos, and natural language.
  • Reduces the need for custom-built similarity search solutions.

Essential Qdrant Concepts for Vector Search

Before we begin implementing our project, it’s important to understand some fundamental concepts related to Qdrant’s vector storage model. Qdrant organizes and manages vector data using key components like collections, points, payloads, filtering, and indexing. Let’s go through each of these:

  1. Collection:

A Collection in Qdrant is similar to a table in a relational database — it acts as a container for storing vectors. Each collection has:

  • A defined vector size (dimensionality), which must match the embeddings we store.
  • A distance metric (e.g., Cosine Similarity, Euclidean Distance, or Dot Product) to measure similarity.

2. Points

The points are the central entity that Qdrant operates with. A point is a record consisting of a vector and an optional payload.

# example point
{
"id": 129,
"vector": [0.1, 0.2, 0.3, 0.47],
"payload": {"color": "black"},
}

You can search among the points grouped in one collection based on vector similarity.

3. Payload

A Payload is additional structured metadata associated with a vector. Payloads allow us to filter search results based on attributes like category, priority, or resolution status.

{
"payload": {
"category": "billing",
"priority": "high",
"resolved": true
}
}

4. Filtering

Filtering allows us to refine search queries based on payload values. This is extremely useful for narrowing down results to only relevant vectors.

5. Indexing

A key feature of Qdrant is the effective combination of vector and traditional indexes. It is essential to have this because for vector search to work effectively with filters, having vector index only is not enough. In simpler terms, a vector index speeds up vector search, and payload indexes speed up filtering.

Project Implementation: Building a Customer Support and Ticket Resolution System

Now that we understand the concepts of semantic search, vectors, and vector databases, let’s move on to the practical implementation. In this section, we will build a project based on “Customer Support and Ticket Resolution System” that leverages vector search to enhance query responses.

Overview of the Implementation

The goal of this project is to build an intelligent search system that helps resolve customer issues quickly and efficiently by using past customer support tickets, FAQs, and troubleshooting guides. By storing these past interactions in a vector database like Qdrant and creating a semantic search engine, the system can find the most relevant past responses to a customer’s question. This way, we can provide instant answers based on previously solved issues, making customer support faster and more effective. To make it easy to use, we’re exposing the search engine through a FastAPI web service, so it can be easily integrated into existing support systems.

Key Steps

Workflow for this service
  • Run Qdrant container to store high-dimensional vectors.
  • Convert sample tickets, FAQs, and troubleshooting guides into vector representations using an Sentence Transformer model.
  • Load the encoded vectors into the vector database for efficient querying.
  • Develop a function that queries Qdrant to find relevant support responses based on similarity.
  • Build a FastAPI-based API to allow external applications (chatbots, support dashboards) to access the search functionality.

So lets start with running Qdrant Vector Database container. You can use command in below. If necessary change the data persistent path.

docker run -d -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage qdrant/qdrant

Now next step is loading data to vector database.

import json
from qdrant_client import QdrantClient, models
from sentence_transformers import SentenceTransformer

# Load dataset from JSON file
with open("customer_support_data.json", "r") as file:
data = json.load(file)

# Initialize Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L6-v2")

# Connect to Qdrant instance
client = QdrantClient("localhost", port=6333)

# Define Qdrant collection name
collection_name = "customer_support"

if client.collection_exists(collection_name):
client.delete_collection(collection_name)

client.create_collection(
collection_name=collection_name,
vectors_config=models.VectorParams(size=384, distance=models.Distance.COSINE)
)

# Convert text data to embeddings and prepare points for insertion
points = []
for entry in data:
embedding = model.encode(entry["customer_issue"]).tolist()
point = models.PointStruct(
id=int(entry["ticket_id"].split("-")[1]), # Convert TKT-1000 to integer ID
vector=embedding,
payload={
"category": entry["category"],
"customer_issue": entry["customer_issue"],
"resolution_response": entry["resolution_response"]
}
)
points.append(point)

# Insert points into Qdrant
client.upsert(collection_name=collection_name, points=points)

print("✅ Data successfully uploaded to Qdrant!")

In this script, we are loading customer support ticket data into Qdrant, a vector database, to enable semantic search.

Step — 1) Loading Data from JSON

  • We start by reading the dataset (customer_support_data.json) that contains past customer issues and their resolutions.

Step — 2) Initializing Sentence Transformer Model

  • We use “all-MiniLM-L6-v2”, a small, fast, and efficient sentence embedding model designed for semantic similarity and retrieval.
  • This model converts text (customer issues) into numerical vector embeddings, which allow us to search for similar cases in the future.

Step — 3) Connecting to Qdrant Instance

  • We create a Qdrant client to connect to the vector database running at localhost:6333.

Step — 4)Creating the Qdrant Collection

  • Before inserting data, we check if the collection already exists and delete it if necessary.
  • We then create a new collection named "customer_support", specifying with “Vector size = 384” (matches our model’s embedding size) and Cosine distance as the similarity metric (measures how similar two vectors are).

Step — 5) Encoding Customer Issues into Vector Embeddings

  • We loop through each customer issue in the dataset.
  • The issue text is passed through “all-MiniLM-L6-v2” to generate a vector representation.
  • This allows the system to retrieve similar past issues

Step — 6) Adding Metadata (Payload) for Filtering

Alongside the vector, we store metadata (payload) that includes:

  • Category (e.g., “Billing Issue”, “Technical Support”)
  • Customer Issue (Original user query)
  • Resolution Response (How it was solved)

The payload helps filter and refine searches, such as retrieving only technical support issues.

Step — 7) Inserting Data into Qdrant

  • After encoding the text and adding metadata, we upload the vectors into Qdrant.
  • This prepares the database for semantic search, allowing us to retrieve past tickets that are most relevant to a new query.

Now we are ready to create Semantic Search Engine to retrieve most relevant solutions based on past tickets.

        query_vector = self.model.encode(query).tolist()
results = self.client.query_points(
collection_name=self.collection_name,
query=query_vector,
limit=top_k,
query_filter=models.Filter(
must=[
models.FieldCondition(
key="category",
match=models.MatchValue(value=category)
)
]
)
).points

In this code blocks we perform a filtered semantic search in Qdrant. The query (user input) is encoded into a vector embedding using "all-MiniLM-L6-v2". “query_points” searches for the most similar vectors in the specified collection and “limit=top_k” ensures only the top-k most relevant results are returned. The “query_filter” parameter ensures that only results matching the specified category are considered. That’s all :) You are ready for search ! You can find the whole project files in my github repo. Lets ask the Search Engine about “I can’t log in” and observe the result :)

As you can see web service return 2 possible solution for that problem.

{
"query": "I can't log in",
"results": [
{
"Customer Issue": "My account was locked after multiple login attempts.",
"Category": "Account Access",
"Resolution Response": "For security reasons, your account is temporarily locked. Please try again in 30 minutes or reset your password."
},
{
"Customer Issue": "I forgot my password and can't reset it.",
"Category": "Account Access",
"Resolution Response": "You can reset your password using the 'Forgot Password' link. If you need further assistance, contact support."
}
]
}

In the next article, I’ll explore text-to-image and image-to-text search using LlamaIndex (llama-index-embeddings-huggingface) and how it enables multimodal retrieval. Then, I’ll dive into building a RAG system with LLMs, integrating vector search with retrieval-augmented generation(RAG) for smarter AI-driven responses. To scale and deploy these systems, I’ll cover running LLM-powered applications on Kubernetes with GPU acceleration, leveraging cloud platforms like AWS, GCP, and Azure. Finally, I’ll take things a step further by creating AI agents capable of autonomous task execution and interaction, powered by LLMs and multi-agent frameworks. Get ready — exciting innovations ahead! 🚀

--

--

Ibrahim Halil Koyuncu
Ibrahim Halil Koyuncu

No responses yet