Skip to main content

PGVector

An implementation of LangChain vectorstore abstraction using postgres as the backend and utilizing the pgvector extension.

The code lives in an integration package called: langchain_postgres.

Status

This code has been ported over from langchain_community into a dedicated package called langchain-postgres. The following changes have been made:

  • langchain_postgres works only with psycopg3. Please update your connnecion strings from postgresql+psycopg2://... to postgresql+psycopg://langchain:langchain@... (yes, it's the driver name is psycopg not psycopg3, but it'll use psycopg3.
  • The schema of the embedding store and collection have been changed to make add_documents work correctly with user specified ids.
  • One has to pass an explicit connection object now.

Currently, there is no mechanism that supports easy data migration on schema changes. So any schema changes in the vectorstore will require the user to recreate the tables and re-add the documents. If this is a concern, please use a different vectorstore. If not, this implementation should be fine for your use case.

Setup

First donwload the partner package:

pip install -qU langchain_postgres

[notice] A new release of pip is available: 24.0 -> 24.1.2
[notice] To update, run: pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.

You can run the following command to spin up a a postgres container with the pgvector extension:

docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16

Credentials

There are no credentials needed to run this notebook, just make sure you downloaded the langchain_postgres package and correctly started the postgres container.

Instantiation

We will use embeddings from the langchain_ollama package since it is free to use.

from langchain_core.documents import Document
from langchain_postgres import PGVector
from langchain_postgres.vectorstores import PGVector
from langchain_ollama import OllamaEmbeddings

# See docker command above to launch a postgres instance with pgvector enabled.
connection = "postgresql+psycopg://langchain:langchain@localhost:6024/langchain" # Uses psycopg3!
collection_name = "my_docs"
embedding_function = OllamaEmbeddings(model="llama3")

vector_store = PGVector(
embeddings=embedding_function,
collection_name=collection_name,
connection=connection,
use_jsonb=True,
)
API Reference:Document | OllamaEmbeddings

Manage vector store

Add items to vector store

Note that adding documents by ID will over-write any existing documents that match that ID.

docs = [
Document(
page_content="there are cats in the pond",
metadata={"id": 1, "location": "pond", "topic": "animals"},
),
Document(
page_content="ducks are also found in the pond",
metadata={"id": 2, "location": "pond", "topic": "animals"},
),
Document(
page_content="fresh apples are available at the market",
metadata={"id": 3, "location": "market", "topic": "food"},
),
Document(
page_content="the market also sells fresh oranges",
metadata={"id": 4, "location": "market", "topic": "food"},
),
Document(
page_content="the new art exhibit is fascinating",
metadata={"id": 5, "location": "museum", "topic": "art"},
),
Document(
page_content="a sculpture exhibit is also at the museum",
metadata={"id": 6, "location": "museum", "topic": "art"},
),
Document(
page_content="a new coffee shop opened on Main Street",
metadata={"id": 7, "location": "Main Street", "topic": "food"},
),
Document(
page_content="the book club meets at the library",
metadata={"id": 8, "location": "library", "topic": "reading"},
),
Document(
page_content="the library hosts a weekly story time for kids",
metadata={"id": 9, "location": "library", "topic": "reading"},
),
Document(
page_content="a cooking class for beginners is offered at the community center",
metadata={"id": 10, "location": "community center", "topic": "classes"},
),
]

vector_store.add_documents(docs, ids=[doc.metadata["id"] for doc in docs])
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Delete items from vector store

vector_store.delete(ids=["3"])

Query vector store

Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent.

Filtering Support

The vectorstore supports a set of filters that can be applied against the metadata fields of the documents.

OperatorMeaning/Category
\$eqEquality (==)
\$neInequality (!=)
\$ltLess than (<)
\$lteLess than or equal (<=)
\$gtGreater than (>)
\$gteGreater than or equal (>=)
\$inSpecial Cased (in)
\$ninSpecial Cased (not in)
\$betweenSpecial Cased (between)
\$likeText (like)
\$ilikeText (case-insensitive like)
\$andLogical (and)
\$orLogical (or)

Query directly

Performing a simple similarity search can be done as follows:

results = vector_store.similarity_search("kitty", k=10, filter={"id": {"$in": [1, 5, 2, 9]}})
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
* there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]
* the library hosts a weekly story time for kids [{'id': 9, 'topic': 'reading', 'location': 'library'}]
* ducks are also found in the pond [{'id': 2, 'topic': 'animals', 'location': 'pond'}]
* the new art exhibit is fascinating [{'id': 5, 'topic': 'art', 'location': 'museum'}]

If you provide a dict with multiple fields, but no operators, the top level will be interpreted as a logical AND filter

vector_store.similarity_search(
"ducks",
k=10,
filter={"id": {"$in": [1, 5, 2, 9]}, "location": {"$in": ["pond", "market"]}},
)
[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond'),
Document(metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond')]
vector_store.similarity_search(
"ducks",
k=10,
filter={
"$and": [
{"id": {"$in": [1, 5, 2, 9]}},
{"location": {"$in": ["pond", "market"]}},
]
},
)
[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond'),
Document(metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond')]

If you want to execute a similarity search and receive the corresponding scores you can run:

results = vector_store.similarity_search_with_score(query="thud",k=1)
for doc, score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.763449] there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]

For a full list of the different searches you can execute on a PGVector vector store, please refer to the API reference.

Query by turning into retriever

You can also transform the vector store into a retriever for easier usage in your chains.

retriever = vector_store.as_retriever(
search_type="mmr",
search_kwargs={"k": 1}
)
retriever.invoke("thud")
[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond')]

Using retriever in a simple RAG chain:

from langchain_openai import ChatOpenAI
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough


llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

prompt = hub.pull("rlm/rag-prompt")

def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)

rag_chain.invoke("thud")
"I don't know."

API reference

For detailed documentation of all ModuleNameVectorStore features and configurations head to the API reference: https://api.python.langchain.com/en/latest/vectorstores/langchain_postgres.vectorstores.PGVector.html


Was this page helpful?


You can also leave detailed feedback on GitHub.