Skip to main content

Couchbase

Couchbase is an award-winning distributed NoSQL cloud database that delivers unmatched versatility, performance, scalability, and financial value for all of your cloud, mobile, AI, and edge computing applications. Couchbase embraces AI with coding assistance for developers and vector search for their applications.

Vector Search is a part of the Full Text Search Service (Search Service) in Couchbase.

This tutorial explains how to use Vector Search in Couchbase. You can work with both Couchbase Capella and your self-managed Couchbase Server.

Setup

To access the CouchbaseVectorStore you first need to install the langchain-couchbase partner package:

%pip install --upgrade --quiet langchain-couchbase

Credentials

Head over to cloud.couchbase.com and create a new connection, making sure to save your database username and password:

import getpass

COUCHBASE_CONNECTION_STRING = getpass.getpass("Enter the connection string for the Couchbase cluster: ")
DB_USERNAME = getpass.getpass("Enter the username for the Couchbase cluster: ")
DB_PASSWORD = getpass.getpass("Enter the password for the Couchbase cluster: ")

It's also helpful (but not needed) to set up LangSmith for best-in-class observability

# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()

Instantiation

Before instantiating we need to create a connection.

Create Couchbase Connection Object

We create a connection to the Couchbase cluster initially and then pass the cluster object to the Vector Store.

Here, we are connecting using the username and password. You can also connect using any other supported way to your cluster.

For more information on connecting to the Couchbase cluster, please check the Python SDK documentation.

from datetime import timedelta

from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.options import ClusterOptions

auth = PasswordAuthenticator(DB_USERNAME, DB_PASSWORD)
options = ClusterOptions(auth)
cluster = Cluster(COUCHBASE_CONNECTION_STRING, options)

# Wait until the cluster is ready for use.
cluster.wait_until_ready(timedelta(seconds=5))

We will now set the bucket, scope, and collection names in the Couchbase cluster that we want to use for Vector Search.

For this example, we are using the default scope & collections.

BUCKET_NAME = "example-bucket"
SCOPE_NAME = "_default"
COLLECTION_NAME = "_default"
SEARCH_INDEX_NAME = "vector-index"

For details on how to create a Search index with support for Vector fields, please refer to the documentation.

Simple Instantiation

We create the vector store object with the cluster information and the search index name. We will use langchain_ollama embedding for this example since it is free to use.

from langchain_couchbase.vectorstores import CouchbaseVectorStore
from langchain_ollama import OllamaEmbeddings

embedding_function = OllamaEmbeddings(model="llama3")

vector_store = CouchbaseVectorStore(
cluster=cluster,
bucket_name=BUCKET_NAME,
scope_name=SCOPE_NAME,
collection_name=COLLECTION_NAME,
embedding=embedding_function,
index_name=SEARCH_INDEX_NAME,
)
API Reference:OllamaEmbeddings

Specify the Text & Embeddings Field

You can optionally specify the text & embeddings field for the document using the text_key and embedding_key fields.

vector_store = CouchbaseVectorStore(
cluster=cluster,
bucket_name=BUCKET_NAME,
scope_name=SCOPE_NAME,
collection_name=COLLECTION_NAME,
embedding=embedding_function,
index_name=SEARCH_INDEX_NAME,
text_key="text",
embedding_key="embedding",
)

Manage vector store

Add items to vector store

from langchain_core.documents import Document


document_1 = Document(
page_content = "foo"
)

documents = [document_1]

vector_store.add_documents(documents=documents)
API Reference:Document
['156c920b12de42f8aef22f2887e8d6b5']

Delete items from vector store

vector_store.delete(ids=["156c920b12de42f8aef22f2887e8d6b5"])
True

Query vector store

Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. In this example, our vector store contains the documents from the file ../../how_to/state_of_the_union.txt.

Query directly

Performing a simple similarity search can be done as follows:

results = vector_store.similarity_search(query="name",k=1)
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
* We are cutting off Russia’s largest banks from the international financial system.  

Preventing Russia’s central bank from defending the Russian Ruble making Putin’s $630 Billion “war fund” worthless.

We are choking off Russia’s access to technology that will sap its economic strength and weaken its military for years to come.

Tonight I say to the Russian oligarchs and corrupt leaders who have bilked billions of dollars off this violent regime no more. [{'source': '../../how_to/state_of_the_union.txt'}]

Query with Score

You can also fetch the scores for the results by calling the similarity_search_with_score method.

query = "What did president say about Ketanji Brown Jackson"
results = vector_store.similarity_search_with_score(query)
document, score = results[0]
print(document)
print(f"Score: {score}")
page_content='One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.' metadata={'source': '../../how_to/state_of_the_union.txt'}
Score: 7864.61279296875

Specifying Fields to Return

You can specify the fields to return from the document using fields parameter in the searches. These fields are returned as part of the metadata object in the returned Document. You can fetch any field that is stored in the Search index. The text_key of the document is returned as part of the document's page_content.

If you do not specify any fields to be fetched, all the fields stored in the index are returned.

If you want to fetch one of the fields in the metadata, you need to specify it using .

For example, to fetch the source field in the metadata, you need to specify metadata.source.

query = "What did president say about Ketanji Brown Jackson"
results = vector_store.similarity_search(query, fields=["metadata.source"])
print(results[0])
page_content='One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.' metadata={'source': '../../how_to/state_of_the_union.txt'}

Hybrid Queries

Couchbase allows you to do hybrid searches by combining Vector Search results with searches on non-vector fields of the document like the metadata object.

The results will be based on the combination of the results from both Vector Search and the searches supported by Search Service. The scores of each of the component searches are added up to get the total score of the result.

To perform hybrid searches, there is an optional parameter, search_options that can be passed to all the similarity searches.
The different search/query possibilities for the search_options can be found here.

In order to simulate hybrid search, let us create some random metadata from the existing documents. We uniformly add three fields to the metadata, date between 2010 & 2020, rating between 1 & 5 and author set to either John Doe or Jane Doe.

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

# Adding metadata to documents
for i, doc in enumerate(docs):
doc.metadata["date"] = f"{range(2010, 2020)[i % 10]}-01-01"
doc.metadata["rating"] = range(1, 6)[i % 5]
doc.metadata["author"] = ["John Doe", "Jane Doe"][i % 2]

vector_store.add_documents(docs)

query = "What did the president say about Ketanji Brown Jackson"
results = vector_store.similarity_search(query)
print(results[0].metadata)
{'source': '../../how_to/state_of_the_union.txt'}

Query by Exact Value

We can search for exact matches on a textual field like the author in the metadata object.

query = "What did the president say about Ketanji Brown Jackson"
results = vector_store.similarity_search(
query,
search_options={"query": {"field": "metadata.author", "match": "John Doe"}},
fields=["metadata.author"],
)
print(results[0])
page_content='One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.' metadata={'author': 'John Doe'}

Query by Partial Match

We can search for partial matches by specifying a fuzziness for the search. This is useful when you want to search for slight variations or misspellings of a search query.

Here, "Jae" is close (fuzziness of 1) to "Jane".

query = "What did the president say about Ketanji Brown Jackson"
results = vector_store.similarity_search(
query,
search_options={
"query": {"field": "metadata.author", "match": "Jae", "fuzziness": 1}
},
fields=["metadata.author"],
)
print(results[0])
page_content='One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.' metadata={'author': 'John Doe'}

Query by Date Range Query

We can search for documents that are within a date range query on a date field like metadata.date.

query = "Any mention about independence?"
results = vector_store.similarity_search(
query,
search_options={
"query": {
"start": "2016-12-31",
"end": "2017-01-02",
"inclusive_start": True,
"inclusive_end": False,
"field": "metadata.date",
}
},
)
print(results[0])
page_content='I know some are talking about “living with COVID-19”. Tonight – I say that we will never just accept living with COVID-19. 

We will continue to combat the virus as we do other diseases. And because this is a virus that mutates and spreads, we will stay on guard.

Here are four common sense steps as we move forward safely.' metadata={'source': '../../how_to/state_of_the_union.txt'}

Query by Numeric Range Query

We can search for documents that are within a range for a numeric field like metadata.rating.

query = "Any mention about independence?"
results = vector_store.similarity_search_with_score(
query,
search_options={
"query": {
"min": 3,
"max": 5,
"inclusive_min": True,
"inclusive_max": True,
"field": "metadata.rating",
}
},
)
print(results[0])
(Document(metadata={'author': 'Jane Doe', 'date': '2011-01-01', 'rating': 2, 'source': '../../how_to/state_of_the_union.txt'}, page_content='I know some are talking about “living with COVID-19”. Tonight – I say that we will never just accept living with COVID-19. \n\nWe will continue to combat the virus as we do other diseases. And because this is a virus that mutates and spreads, we will stay on guard. \n\nHere are four common sense steps as we move forward safely.'), 11459.0087890625)

Combining Multiple Search Queries

Different search queries can be combined using AND (conjuncts) or OR (disjuncts) operators.

In this example, we are checking for documents with a rating between 3 & 4 and dated between 2015 & 2018.

query = "Any mention about independence?"
results = vector_store.similarity_search_with_score(
query,
search_options={
"query": {
"conjuncts": [
{"min": 3, "max": 4, "inclusive_max": True, "field": "metadata.rating"},
{"start": "2016-12-31", "end": "2017-01-02", "field": "metadata.date"},
]
}
},
)
print(results[0])
(Document(metadata={'source': '../../how_to/state_of_the_union.txt'}, page_content='I know some are talking about “living with COVID-19”. Tonight – I say that we will never just accept living with COVID-19. \n\nWe will continue to combat the virus as we do other diseases. And because this is a virus that mutates and spreads, we will stay on guard. \n\nHere are four common sense steps as we move forward safely.'), 11459.0087890625)

Other Queries

Similarly, you can use any of the supported Query methods like Geo Distance, Polygon Search, Wildcard, Regular Expressions, etc in the search_options parameter. Please refer to the documentation for more details on the available query methods and their syntax.

Frequently Asked Questions

Question: Should I create the Search index before creating the CouchbaseVectorStore object?

Yes, currently you need to create the Search index before creating the CouchbaseVectoreStore object.

Question: I am not seeing all the fields that I specified in my search results.

In Couchbase, we can only return the fields stored in the Search index. Please ensure that the field that you are trying to access in the search results is part of the Search index.

One way to handle this is to index and store a document's fields dynamically in the index.

  • In Capella, you need to go to "Advanced Mode" then under the chevron "General Settings" you can check "[X] Store Dynamic Fields" or "[X] Index Dynamic Fields"
  • In Couchbase Server, in the Index Editor (not Quick Editor) under the chevron "Advanced" you can check "[X] Store Dynamic Fields" or "[X] Index Dynamic Fields"

Note that these options will increase the size of the index.

For more details on dynamic mappings, please refer to the documentation.

Question: I am unable to see the metadata object in my search results.

This is most likely due to the metadata field in the document not being indexed and/or stored by the Couchbase Search index. In order to index the metadata field in the document, you need to add it to the index as a child mapping.

If you select to map all the fields in the mapping, you will be able to search by all metadata fields. Alternatively, to optimize the index, you can select the specific fields inside metadata object to be indexed. You can refer to the docs to learn more about indexing child mappings.

Creating Child Mappings

API reference

For detailed documentation of all CouchbaseVectorStore features and configurations head to the API reference: https://api.python.langchain.com/en/latest/vectorstores/langchain_couchbase.vectorstores.CouchbaseVectorStore.html


Was this page helpful?


You can also leave detailed feedback on GitHub.