Qdrant
Qdrant (read: quadrant ) is a vector similarity search engine. It provides a production-ready service with a convenient API to store, search, and manage vectors with additional payload and extended filtering support. It makes it useful for all sorts of neural network or semantic-based matching, faceted search, and other applications.
This documentation demonstrates how to use Qdrant with Langchain for dense/sparse and hybrid retrieval.
This page documents the
QdrantVectorStore
class that supports multiple retrieval modes via Qdrant's new Query API. It requires you to run Qdrant v1.10.0 or above.
Setup
There are various modes of how to run Qdrant
, and depending on the chosen one, there will be some subtle differences. The options include:
- Local mode, no server required
- Docker deployments
- Qdrant Cloud
See the installation instructions.
%pip install -qU langchain-qdrant 'qdrant-client[fastembed]'
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datasets 2.20.0 requires huggingface-hub>=0.21.2, but you have huggingface-hub 0.20.3 which is incompatible.
langchain-huggingface 0.0.3 requires huggingface-hub>=0.23.0, but you have huggingface-hub 0.20.3 which is incompatible.
langchain-huggingface 0.0.3 requires tokenizers>=0.19.1, but you have tokenizers 0.15.2 which is incompatible.
accelerate 0.33.0 requires huggingface-hub>=0.21.0, but you have huggingface-hub 0.20.3 which is incompatible.
transformers 4.43.3 requires huggingface-hub<1.0,>=0.23.2, but you have huggingface-hub 0.20.3 which is incompatible.
transformers 4.43.3 requires tokenizers<0.20,>=0.19, but you have tokenizers 0.15.2 which is incompatible.[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
Credentials
There are no credentials needed to run the code in this notebook.
Initialization
For these examples we will use embeddings from the langchain_ollama
package since it is free to use.
Local mode
Python client allows you to run the same code in local mode without running the Qdrant server. That's great for testing things out and debugging or storing just a small amount of vectors. The embeddings might be fully kept in memory or persisted on disk.
In-memory
For some testing scenarios and quick experiments, you may prefer to keep all the data in memory only, so it gets lost when the client is destroyed - usually at the end of your script/notebook.
from langchain_qdrant import QdrantVectorStore
from langchain_ollama import OllamaEmbeddings
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams
embedding_function = OllamaEmbeddings(model="llama3")
client = QdrantClient(":memory:")
client.create_collection(
collection_name="demo_collection",
vectors_config=VectorParams(size=4096, distance=Distance.COSINE),
)
vector_store = QdrantVectorStore(
client=client,
collection_name="demo_collection",
embedding=embedding_function,
)
On-disk storage
Local mode, without using the Qdrant server, may also store your vectors on disk so they persist between runs.
client = QdrantClient(path="/tmp/local_qdrant")
client.create_collection(
collection_name="demo_collection",
vectors_config=VectorParams(size=4096, distance=Distance.COSINE),
)
vector_store = QdrantVectorStore(
client=client,
collection_name="demo_collection",
embedding=embedding_function,
)
On-premise server deployment
No matter if you choose to launch Qdrant locally with a Docker container, or select a Kubernetes deployment with the official Helm chart, the way you're going to connect to such an instance will be identical. You'll need to provide a URL pointing to the service.
url = "<---qdrant url here --->"
qdrant = QdrantVectorStore.from_documents(
docs,
embeddings,
url=url,
prefer_grpc=True,
collection_name="my_documents",
)
Qdrant Cloud
If you prefer not to keep yourself busy with managing the infrastructure, you can choose to set up a fully-managed Qdrant cluster on Qdrant Cloud. There is a free forever 1GB cluster included for trying out. The main difference with using a managed version of Qdrant is that you'll need to provide an API key to secure your deployment from being accessed publicly. The value can also be set in a QDRANT_API_KEY
environment variable.
url = "<---qdrant cloud cluster url here --->"
api_key = "<---api key here--->"
qdrant = QdrantVectorStore.from_documents(
docs,
embeddings,
url=url,
prefer_grpc=True,
api_key=api_key,
collection_name="my_documents",
)
Using an existing collection
To get an instance of langchain_qdrant.Qdrant
without loading any new documents or texts, you can use the Qdrant.from_existing_collection()
method.
qdrant = QdrantVectorStore.from_existing_collection(
embeddings=embeddings,
collection_name="my_documents",
url="http://localhost:6333",
)
Manage vector store
Add items to vector store
from langchain_core.documents import Document
document_1 = Document(
page_content="foo",
metadata={"source": "https://example.com"}
)
document_2 = Document(
page_content="bar",
metadata={"source": "https://example.com"}
)
document_3 = Document(
page_content="baz",
metadata={"source": "https://example.com"}
)
documents = [document_1, document_2, document_3]
vector_store.add_documents(documents=documents)
['5dc00d153425475a82b39dd4cfcf17a0',
'880fe06607934d64982bf8a3fb4124ce',
'e00da4acfdcb48f9968e70279b42dd2d']
Delete items from vector store
vector_store.delete(ids=["e00da4acfdcb48f9968e70279b42dd2d"])
True
Query vector store
Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent.
Query directly
The simplest scenario for using Qdrant vector store is to perform a similarity search. Under the hood, our query will be encoded into vector embeddings and used to find similar documents in Qdrant collection.
results = vector_store.similarity_search(query="thud",k=1)
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
* bar [{'source': 'https://example.com', '_id': '880fe06607934d64982bf8a3fb4124ce', '_collection_name': 'demo_collection'}]
QdrantVectorStore
supports 3 modes for similarity searches. They can be configured using the retrieval_mode
parameter when setting up the class.
- Dense Vector Search(Default)
- Sparse Vector Search
- Hybrid Search
Dense Vector Search
To search with only dense vectors,
- The
retrieval_mode
parameter should be set toRetrievalMode.DENSE
(default). - A dense embeddings value should be provided to the
embedding
parameter.
Sparse Vector Search
To search with only sparse vectors,
- The
retrieval_mode
parameter should be set toRetrievalMode.SPARSE
. - An implementation of the
SparseEmbeddings
interface using any sparse embeddings provider has to be provided as value to thesparse_embedding
parameter.
The langchain-qdrant
package provides a FastEmbed based implementation out of the box.
Hybrid Vector Search
To perform a hybrid search using dense and sparse vectors with score fusion,
- The
retrieval_mode
parameter should be set toRetrievalMode.HYBRID
. - A dense embeddings value should be provided to the
embedding
parameter. - An implementation of the
SparseEmbeddings
interface using any sparse embeddings provider has to be provided as value to thesparse_embedding
parameter.
Note that if you've added documents with the HYBRID
mode, you can switch to any retrieval mode when searching. Since both the dense and sparse vectors are available in the collection.
If you want to execute a similarity search and receive the corresponding scores you can run:
client.scroll(collection_name="demo_collection")
([Record(id='5dc00d153425475a82b39dd4cfcf17a0', payload={'page_content': 'foo', 'metadata': {'source': 'https://example.com'}}, vector=None, shard_key=None, order_value=None),
Record(id='880fe06607934d64982bf8a3fb4124ce', payload={'page_content': 'bar', 'metadata': {'source': 'https://example.com'}}, vector=None, shard_key=None, order_value=None)],
None)
results = vector_store.similarity_search_with_score(query="thud",k=1)
for doc, score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.521379] bar [{'source': 'https://example.com', '_id': '880fe06607934d64982bf8a3fb4124ce', '_collection_name': 'demo_collection'}]
For a full list of all the search functions available for a QdrantVectorStore
, read the API reference
Metadata filtering
Qdrant has an extensive filtering system with rich type support. It is also possible to use the filters in Langchain, by passing an additional param to both the similarity_search_with_score
and similarity_search
methods.
from qdrant_client.http import models
results = vector_store.similarity_search(query="thud",k=1,filter=models.Filter(should=[
models.FieldCondition(
key="page_content",
match=models.MatchValue(value="foo"),
),
]
))
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
* foo [{'source': 'https://example.com', '_id': '5dc00d153425475a82b39dd4cfcf17a0', '_collection_name': 'demo_collection'}]
Query by turning into retriever
You can also transform the vector store into a retriever for easier usage in your chains.
retriever = vector_store.as_retriever(
search_type="mmr",
search_kwargs={"k": 1}
)
retriever.invoke("thud")
[Document(metadata={'source': 'https://example.com', '_id': '880fe06607934d64982bf8a3fb4124ce', '_collection_name': 'demo_collection'}, page_content='bar')]
Using retriever in a simple RAG chain:
from langchain_openai import ChatOpenAI
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
llm = ChatOpenAI(model="gpt-3.5-turbo-0125")
prompt = hub.pull("rlm/rag-prompt")
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
rag_chain.invoke("thud")
"I don't know."
Customizing Qdrant
There are options to use an existing Qdrant collection within your Langchain application. In such cases, you may need to define how to map Qdrant point into the Langchain Document
.
Named vectors
Qdrant supports multiple vectors per point by named vectors. If you work with a collection created externally or want to have the differently named vector used, you can configure it by providing its name.
QdrantVectorStore.from_documents(
docs,
embedding=embeddings,
sparse_embedding=sparse_embeddings,
location=":memory:",
collection_name="my_documents_2",
retrieval_mode=RetrievalMode.HYBRID,
vector_name="custom_vector",
sparse_vector_name="custom_sparse_vector",
)
Metadata
Qdrant stores your vector embeddings along with the optional JSON-like payload. Payloads are optional, but since LangChain assumes the embeddings are generated from the documents, we keep the context data, so you can extract the original texts as well.
By default, your document is going to be stored in the following payload structure:
{
"page_content": "Lorem ipsum dolor sit amet",
"metadata": {
"foo": "bar"
}
}
You can, however, decide to use different keys for the page content and metadata. That's useful if you already have a collection that you'd like to reuse.
QdrantVectorStore.from_documents(
docs,
embeddings,
location=":memory:",
collection_name="my_documents_2",
content_payload_key="my_page_content_key",
metadata_payload_key="my_meta",
)
API reference
For detailed documentation of all QdrantVectorStore
features and configurations head to the API reference: https://api.python.langchain.com/en/latest/vectorstores/langchain_qdrant.vectorstores.Qdrant.html