PGVector
An implementation of LangChain vectorstore abstraction using
postgres
as the backend and utilizing thepgvector
extension.
The code lives in an integration package called: langchain_postgres.
Status
This code has been ported over from langchain_community
into a dedicated package called langchain-postgres
. The following changes have been made:
- langchain_postgres works only with psycopg3. Please update your connnecion strings from
postgresql+psycopg2://...
topostgresql+psycopg://langchain:langchain@...
(yes, it's the driver name ispsycopg
notpsycopg3
, but it'll usepsycopg3
. - The schema of the embedding store and collection have been changed to make add_documents work correctly with user specified ids.
- One has to pass an explicit connection object now.
Currently, there is no mechanism that supports easy data migration on schema changes. So any schema changes in the vectorstore will require the user to recreate the tables and re-add the documents. If this is a concern, please use a different vectorstore. If not, this implementation should be fine for your use case.
Setup
First donwload the partner package:
pip install -qU langchain_postgres
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
You can run the following command to spin up a a postgres container with the pgvector
extension:
docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16
Credentials
There are no credentials needed to run this notebook, just make sure you downloaded the langchain_postgres
package and correctly started the postgres container.
Instantiation
We will use embeddings from the langchain_ollama
package since it is free to use.
from langchain_core.documents import Document
from langchain_postgres import PGVector
from langchain_postgres.vectorstores import PGVector
from langchain_ollama import OllamaEmbeddings
# See docker command above to launch a postgres instance with pgvector enabled.
connection = "postgresql+psycopg://langchain:langchain@localhost:6024/langchain" # Uses psycopg3!
collection_name = "my_docs"
embedding_function = OllamaEmbeddings(model="llama3")
vector_store = PGVector(
embeddings=embedding_function,
collection_name=collection_name,
connection=connection,
use_jsonb=True,
)
Manage vector store
Add items to vector store
Note that adding documents by ID will over-write any existing documents that match that ID.
docs = [
Document(
page_content="there are cats in the pond",
metadata={"id": 1, "location": "pond", "topic": "animals"},
),
Document(
page_content="ducks are also found in the pond",
metadata={"id": 2, "location": "pond", "topic": "animals"},
),
Document(
page_content="fresh apples are available at the market",
metadata={"id": 3, "location": "market", "topic": "food"},
),
Document(
page_content="the market also sells fresh oranges",
metadata={"id": 4, "location": "market", "topic": "food"},
),
Document(
page_content="the new art exhibit is fascinating",
metadata={"id": 5, "location": "museum", "topic": "art"},
),
Document(
page_content="a sculpture exhibit is also at the museum",
metadata={"id": 6, "location": "museum", "topic": "art"},
),
Document(
page_content="a new coffee shop opened on Main Street",
metadata={"id": 7, "location": "Main Street", "topic": "food"},
),
Document(
page_content="the book club meets at the library",
metadata={"id": 8, "location": "library", "topic": "reading"},
),
Document(
page_content="the library hosts a weekly story time for kids",
metadata={"id": 9, "location": "library", "topic": "reading"},
),
Document(
page_content="a cooking class for beginners is offered at the community center",
metadata={"id": 10, "location": "community center", "topic": "classes"},
),
]
vector_store.add_documents(docs, ids=[doc.metadata["id"] for doc in docs])
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Delete items from vector store
vector_store.delete(ids=["3"])
Query vector store
Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent.
Filtering Support
The vectorstore supports a set of filters that can be applied against the metadata fields of the documents.
Operator | Meaning/Category |
---|---|
\$eq | Equality (==) |
\$ne | Inequality (!=) |
\$lt | Less than (<) |
\$lte | Less than or equal (<=) |
\$gt | Greater than (>) |
\$gte | Greater than or equal (>=) |
\$in | Special Cased (in) |
\$nin | Special Cased (not in) |
\$between | Special Cased (between) |
\$like | Text (like) |
\$ilike | Text (case-insensitive like) |
\$and | Logical (and) |
\$or | Logical (or) |
Query directly
Performing a simple similarity search can be done as follows:
results = vector_store.similarity_search("kitty", k=10, filter={"id": {"$in": [1, 5, 2, 9]}})
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
* there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]
* the library hosts a weekly story time for kids [{'id': 9, 'topic': 'reading', 'location': 'library'}]
* ducks are also found in the pond [{'id': 2, 'topic': 'animals', 'location': 'pond'}]
* the new art exhibit is fascinating [{'id': 5, 'topic': 'art', 'location': 'museum'}]
If you provide a dict with multiple fields, but no operators, the top level will be interpreted as a logical AND filter
vector_store.similarity_search(
"ducks",
k=10,
filter={"id": {"$in": [1, 5, 2, 9]}, "location": {"$in": ["pond", "market"]}},
)
[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond'),
Document(metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond')]
vector_store.similarity_search(
"ducks",
k=10,
filter={
"$and": [
{"id": {"$in": [1, 5, 2, 9]}},
{"location": {"$in": ["pond", "market"]}},
]
},
)
[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond'),
Document(metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond')]
If you want to execute a similarity search and receive the corresponding scores you can run:
results = vector_store.similarity_search_with_score(query="thud",k=1)
for doc, score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.763449] there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]
For a full list of the different searches you can execute on a PGVector
vector store, please refer to the API reference.
Query by turning into retriever
You can also transform the vector store into a retriever for easier usage in your chains.
retriever = vector_store.as_retriever(
search_type="mmr",
search_kwargs={"k": 1}
)
retriever.invoke("thud")
[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond')]
Using retriever in a simple RAG chain:
from langchain_openai import ChatOpenAI
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
llm = ChatOpenAI(model="gpt-3.5-turbo-0125")
prompt = hub.pull("rlm/rag-prompt")
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
rag_chain.invoke("thud")
"I don't know."
API reference
For detailed documentation of all ModuleNameVectorStore features and configurations head to the API reference: https://api.python.langchain.com/en/latest/vectorstores/langchain_postgres.vectorstores.PGVector.html