Skip to main content

Faiss

Facebook AI Similarity Search (FAISS) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.

You can find the FAISS documentation at this page.

Seti[]

You'll need to install langchain-community with pip install -qU langchain-community to use this integration

This notebook shows how to use functionality related to the FAISS vector database. It will show functionality specific to this integration. After going through, it may be useful to explore relevant use-case pages to learn how to use this vectorstore as part of a larger chain.

Setup

The integration lives in the langchain-community package. We also need to install the faiss package itself. We can install these with:

Note that you can also install faiss-gpu if you want to use the GPU enabled version

pip install -qU langchain-community faiss-cpu

It's also helpful (but not needed) to set up LangSmith for best-in-class observability

# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()

Instantiation

We are going to use the langchain_ollama package for embeddings since it is free.

from langchain_community.vectorstores import FAISS
from langchain_ollama import OllamaEmbeddings
import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore

embedding_function = OllamaEmbeddings(model="llama3")
index = faiss.IndexFlatL2(len(embedding_function.embed_query("hello world")))

vector_store = FAISS(
embedding_function=embedding_function,
index=index,
docstore= InMemoryDocstore(),
index_to_docstore_id={}
)

Manage vector store

Add items to vector store

from langchain_core.documents import Document

document_1 = Document(
page_content="foo",
metadata={"source": "https://example.com"}
)

document_2 = Document(
page_content="bar",
metadata={"source": "https://another-example.com"}
)

document_3 = Document(
page_content="baz",
metadata={"source": "https://example.com"}
)

documents = [document_1, document_2, document_3]

vector_store.add_documents(documents=documents,ids=["1","2","3"])
API Reference:Document
['1', '2', '3']

Delete items from vector store

vector_store.delete(ids=["3"])
True

Query vector store

Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent.

Query directly

Performing a simple similarity search can be done as follows:

results = vector_store.similarity_search(query="thud",k=1,filter={"source":"https://example.com"})
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
* foo [{'source': 'https://example.com'}]

If you want to execute a similarity search and receive the corresponding scores you can run:

results = vector_store.similarity_search_with_score(query="thud",k=1,filter={"source":"https://another-example.com"})
for doc, score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=21192.644531] bar [{'source': 'https://another-example.com'}]

There are a variety of other ways to search a FAISS vector store. For a complete list of those methods, please refer to the API Reference

Query by turning into retriever

You can also transform the vector store into a retriever for easier usage in your chains.

retriever = vector_store.as_retriever(
search_type="mmr",
search_kwargs={"k": 1}
)
retriever.invoke("thud")
[Document(metadata={'source': 'https://another-example.com'}, page_content='bar')]

Using retriever in a simple RAG chain:

from langchain_openai import ChatOpenAI
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough


llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

prompt = hub.pull("rlm/rag-prompt")

def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)

rag_chain.invoke("thud")
"I don't know, can you provide more context or clarify the question?"

Saving and loading

You can also save and load a FAISS index. This is useful so you don't have to recreate it everytime you use it.

vector_store.save_local("faiss_index")

new_vector_store = FAISS.load_local("faiss_index", embedding_function,allow_dangerous_deserialization=True)

docs = new_vector_store.similarity_search("qux")
docs[0]
Document(metadata={'source': 'https://another-example.com'}, page_content='bar')

Merging

You can also merge two FAISS vectorstores

db1 = FAISS.from_texts(["foo"], embedding_function)
db2 = FAISS.from_texts(["bar"], embedding_function)

db1.docstore._dict
{'10323c11-85c4-47fb-abcb-cbe0325bda02': Document(page_content='foo')}
db2.docstore._dict
{'bd993360-1468-454c-8fba-c70455b4e3db': Document(page_content='bar')}
db1.merge_from(db2)
db1.docstore._dict
{'10323c11-85c4-47fb-abcb-cbe0325bda02': Document(page_content='foo'),
'bd993360-1468-454c-8fba-c70455b4e3db': Document(page_content='bar')}

API reference

For detailed documentation of all FAISS vector store features and configurations head to the API reference: https://api.python.langchain.com/en/latest/vectorstores/langchain_community.vectorstores.faiss.FAISS.html


Was this page helpful?


You can also leave detailed feedback on GitHub.