VectorStore/QA, MMR support¶

NOTE: this uses Cassandra's "Vector Search" capability. Make sure you are connecting to a vector-enabled database for this demo.

Cassandra's VectorStore allows for Vector Search with the Maximal Marginal Relevance (MMR) algorithm.

This is a search criterion that instead of just selecting the k stored documents most relevant to the provided query, first identifies a larger pool of relevant results, and then singles out k of them so that they carry as diverse information between them as possible.

In this way, when the stored text fragments are likely to be redundant, you can optimize token usage and help the models give more comprehensive answers.

This is very useful, for instance, if you are building a QA chatbot on past Support chat recorded interactions.

First prepare a connection to a vector-search-capable Cassandra and initialize the required LLM and embeddings:

In [1]:

Copied!

from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain.vectorstores import Cassandra
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain.vectorstores import Cassandra

A database connection is needed. (If on a Colab, the only supported option is the cloud service Astra DB.)

In [2]:

Copied!





# Ensure loading of database credentials into environment variables:
import os
from dotenv import load_dotenv
load_dotenv("../../../.env")
import cassio
# Ensure loading of database credentials into environment variables:
import os
from dotenv import load_dotenv
load_dotenv("../../../.env")
import cassio

Select your choice of database by editing this cell, if needed:

In [3]:

Copied!

database_mode = "cassandra"  # "cassandra" / "astra_db"
database_mode = "cassandra"  # "cassandra" / "astra_db"

In [4]:

Copied!





if database_mode == "astra_db":
    cassio.init(
        database_id=os.environ["ASTRA_DB_ID"],
        token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
        keyspace=os.environ.get("ASTRA_DB_KEYSPACE"),  # this is optional
    )
if database_mode == "astra_db":
    cassio.init(
        database_id=os.environ["ASTRA_DB_ID"],
        token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
        keyspace=os.environ.get("ASTRA_DB_KEYSPACE"),  # this is optional
    )

In [5]:

Copied!





if database_mode == "cassandra":
    from cqlsession import getCassandraCQLSession, getCassandraCQLKeyspace
    cassio.init(
        session=getCassandraCQLSession(),
        keyspace=getCassandraCQLKeyspace(),
    )
if database_mode == "cassandra":
    from cqlsession import getCassandraCQLSession, getCassandraCQLKeyspace
    cassio.init(
        session=getCassandraCQLSession(),
        keyspace=getCassandraCQLKeyspace(),
    )

Below is the logic to instantiate the LLM and embeddings of choice. We chose to leave it in the notebooks for clarity.

In [6]:

Copied!





import os
from llm_choice import suggestLLMProvider
llmProvider = suggestLLMProvider()
# (Alternatively set llmProvider to 'GCP_VertexAI', 'OpenAI', 'Azure_OpenAI' ... manually if you have credentials)
if llmProvider == 'GCP_VertexAI':
    from langchain.llms import VertexAI
    from langchain.embeddings import VertexAIEmbeddings
    llm = VertexAI()
    myEmbedding = VertexAIEmbeddings()
    print('LLM+embeddings from Vertex AI')
elif llmProvider == 'OpenAI':
    os.environ['OPENAI_API_TYPE'] = 'open_ai'
    from langchain.llms import OpenAI
    from langchain.embeddings import OpenAIEmbeddings
    llm = OpenAI(temperature=0)
    myEmbedding = OpenAIEmbeddings()
    print('LLM+embeddings from OpenAI')
elif llmProvider == 'Azure_OpenAI':
    os.environ['OPENAI_API_TYPE'] = 'azure'
    os.environ['OPENAI_API_VERSION'] = os.environ['AZURE_OPENAI_API_VERSION']
    os.environ['OPENAI_API_BASE'] = os.environ['AZURE_OPENAI_API_BASE']
    os.environ['OPENAI_API_KEY'] = os.environ['AZURE_OPENAI_API_KEY']
    from langchain.llms import AzureOpenAI
    from langchain.embeddings import OpenAIEmbeddings
    llm = AzureOpenAI(temperature=0, model_name=os.environ['AZURE_OPENAI_LLM_MODEL'],
                      engine=os.environ['AZURE_OPENAI_LLM_DEPLOYMENT'])
    myEmbedding = OpenAIEmbeddings(model=os.environ['AZURE_OPENAI_EMBEDDINGS_MODEL'],
                                   deployment=os.environ['AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT'])
    print('LLM+embeddings from Azure OpenAI')
else:
    raise ValueError('Unknown LLM provider.')
import os
from llm_choice import suggestLLMProvider
llmProvider = suggestLLMProvider()
# (Alternatively set llmProvider to 'GCP_VertexAI', 'OpenAI', 'Azure_OpenAI' ... manually if you have credentials)
if llmProvider == 'GCP_VertexAI':
    from langchain.llms import VertexAI
    from langchain.embeddings import VertexAIEmbeddings
    llm = VertexAI()
    myEmbedding = VertexAIEmbeddings()
    print('LLM+embeddings from Vertex AI')
elif llmProvider == 'OpenAI':
    os.environ['OPENAI_API_TYPE'] = 'open_ai'
    from langchain.llms import OpenAI
    from langchain.embeddings import OpenAIEmbeddings
    llm = OpenAI(temperature=0)
    myEmbedding = OpenAIEmbeddings()
    print('LLM+embeddings from OpenAI')
elif llmProvider == 'Azure_OpenAI':
    os.environ['OPENAI_API_TYPE'] = 'azure'
    os.environ['OPENAI_API_VERSION'] = os.environ['AZURE_OPENAI_API_VERSION']
    os.environ['OPENAI_API_BASE'] = os.environ['AZURE_OPENAI_API_BASE']
    os.environ['OPENAI_API_KEY'] = os.environ['AZURE_OPENAI_API_KEY']
    from langchain.llms import AzureOpenAI
    from langchain.embeddings import OpenAIEmbeddings
    llm = AzureOpenAI(temperature=0, model_name=os.environ['AZURE_OPENAI_LLM_MODEL'],
                      engine=os.environ['AZURE_OPENAI_LLM_DEPLOYMENT'])
    myEmbedding = OpenAIEmbeddings(model=os.environ['AZURE_OPENAI_EMBEDDINGS_MODEL'],
                                   deployment=os.environ['AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT'])
    print('LLM+embeddings from Azure OpenAI')
else:
    raise ValueError('Unknown LLM provider.')

LLM+embeddings from OpenAI

Create the store¶

Create a (Cassandra-backed) VectorStore and the corresponding LangChain VectorStoreIndexWrapper

In [7]:

Copied!





myCassandraVStore = Cassandra(
    embedding=myEmbedding,
    session=None,
    keyspace=None,
    table_name='vs_test2_' + llmProvider,
)
index = VectorStoreIndexWrapper(vectorstore=myCassandraVStore)
myCassandraVStore = Cassandra(
    embedding=myEmbedding,
    session=None,
    keyspace=None,
    table_name='vs_test2_' + llmProvider,
)
index = VectorStoreIndexWrapper(vectorstore=myCassandraVStore)

This command simply resets the store in case you want to run this demo repeatedly:

In [8]:

Copied!

myCassandraVStore.clear()
myCassandraVStore.clear()

Populate the index¶

Notice that the first four sentences express the same concept, while the fifth adds a new detail:

In [9]:

Copied!





BASE_SENTENCE_0 =     ('The frogs and the toads were meeting in the night '
                       'for a party under the moon.')
BASE_SENTENCE_1 =     ('There was a party under the moon, that all toads, '
                       'with the frogs, decided to throw that night.')
BASE_SENTENCE_2 =     ('And the frogs and the toads said: "Let us have a party '
                       'tonight, as the moon is shining".')
BASE_SENTENCE_3 =     ('I remember that night... toads, along with frogs, '
                       'were all busy planning a moonlit celebration.')
DIFFERENT_SENTENCE =  ('For the party, frogs and toads set a rule: '
                       'everyone was to wear a purple hat.')
BASE_SENTENCE_0 =     ('The frogs and the toads were meeting in the night '
                       'for a party under the moon.')
BASE_SENTENCE_1 =     ('There was a party under the moon, that all toads, '
                       'with the frogs, decided to throw that night.')
BASE_SENTENCE_2 =     ('And the frogs and the toads said: "Let us have a party '
                       'tonight, as the moon is shining".')
BASE_SENTENCE_3 =     ('I remember that night... toads, along with frogs, '
                       'were all busy planning a moonlit celebration.')
DIFFERENT_SENTENCE =  ('For the party, frogs and toads set a rule: '
                       'everyone was to wear a purple hat.')

Insert the three into the index, specifying "sources" while you're at it (it will be useful later):

In [10]:

Copied!





texts = [
    BASE_SENTENCE_0,
    BASE_SENTENCE_1,
    BASE_SENTENCE_2,
    BASE_SENTENCE_3,
    DIFFERENT_SENTENCE,
]
metadatas = [
    {'source': 'Barney\'s story at the pub'},
    {'source': 'Barney\'s story at the pub'},
    {'source': 'Barney\'s story at the pub'},
    {'source': 'Barney\'s story at the pub'},
    {'source': 'The chronicles at the village library'},
]
if llmProvider != 'Azure_OpenAI':
    ids = myCassandraVStore.add_texts(
        texts,
        metadatas=metadatas,
    )
    print('\n'.join(ids))
else:
    # Note: this is a temporary mitigation of an Azure OpenAI error with asking for
    #       multiple embedding in a single request, which would error with:
    #           "InvalidRequestError: Too many inputs. The max number of inputs is 1"
    for text, metadata in zip(texts, metadatas):
        thisId = myCassandraVStore.add_texts(
            [text],
            metadatas=[metadata],
        )[0]
        print(thisId)
texts = [
    BASE_SENTENCE_0,
    BASE_SENTENCE_1,
    BASE_SENTENCE_2,
    BASE_SENTENCE_3,
    DIFFERENT_SENTENCE,
]
metadatas = [
    {'source': 'Barney\'s story at the pub'},
    {'source': 'Barney\'s story at the pub'},
    {'source': 'Barney\'s story at the pub'},
    {'source': 'Barney\'s story at the pub'},
    {'source': 'The chronicles at the village library'},
]
if llmProvider != 'Azure_OpenAI':
    ids = myCassandraVStore.add_texts(
        texts,
        metadatas=metadatas,
    )
    print('\n'.join(ids))
else:
    # Note: this is a temporary mitigation of an Azure OpenAI error with asking for
    #       multiple embedding in a single request, which would error with:
    #           "InvalidRequestError: Too many inputs. The max number of inputs is 1"
    for text, metadata in zip(texts, metadatas):
        thisId = myCassandraVStore.add_texts(
            [text],
            metadatas=[metadata],
        )[0]
        print(thisId)

46c33fe2a3634ad79856006fc54176d5
12a2f838099642fe8bf365e228fb369c
2ed56f27a33e41fa94748769c8bc05c3
04b17c6b685a4e3c9bd2758ee7d40f9b
b825da62352b4e93867ace8d87b90db8

Query the store¶

Here is the question you'll use to query the index:

In [11]:

Copied!

QUESTION = 'Tell me about the party that night.'
QUESTION = 'Tell me about the party that night.'

Query with "similarity" search type¶

If you ask for two matches, you will get the two documents most related to the question. But in this case this is something of a waste of tokens:

In [12]:

Copied!

matchesSim = myCassandraVStore.search(QUESTION, search_type='similarity', k=2)
for i, doc in enumerate(matchesSim):
    print(f'[{i:2}]: "{doc.page_content}"')
matchesSim = myCassandraVStore.search(QUESTION, search_type='similarity', k=2)
for i, doc in enumerate(matchesSim):
    print(f'[{i:2}]: "{doc.page_content}"')

[ 0]: "There was a party under the moon, that all toads, with the frogs, decided to throw that night."
[ 1]: "I remember that night... toads, along with frogs, were all busy planning a moonlit celebration."

Query with MMR¶

Now, here's what happens with the MMR search type.

(Not shown here: you can tune the size of the results pool for the first step of the algorithm.)

In [13]:

Copied!

matchesMMR = myCassandraVStore.search(QUESTION, search_type='mmr', k=2)
for i, doc in enumerate(matchesMMR):
    print(f'[{i:2}]: "{doc.page_content}"')
matchesMMR = myCassandraVStore.search(QUESTION, search_type='mmr', k=2)
for i, doc in enumerate(matchesMMR):
    print(f'[{i:2}]: "{doc.page_content}"')

[ 0]: "There was a party under the moon, that all toads, with the frogs, decided to throw that night."
[ 1]: "For the party, frogs and toads set a rule: everyone was to wear a purple hat."

Query the index¶

Currently, LangChain's higher "index" abstraction does not allow to specify the search type, nor the number of matches subsequently used in creating the answer. So, by running this command you get an answer, all right.

In [14]:

Copied!

# (implicitly) by similarity
print(index.query(QUESTION, llm=llm))
# (implicitly) by similarity
print(index.query(QUESTION, llm=llm))

 The frogs and toads were having a party under the moon that night. They were busy planning and celebrating together.

You can request the question-answering process to provide references (as long as you annotated all input documents with a source metadata field):

In [15]:

Copied!





responseSrc = index.query_with_sources(QUESTION, llm=llm)
print('Automatic chain (implicitly by similarity):')
print(f'  ANSWER : {responseSrc["answer"].strip()}')
print(f'  SOURCES: {responseSrc["sources"].strip()}')
responseSrc = index.query_with_sources(QUESTION, llm=llm)
print('Automatic chain (implicitly by similarity):')
print(f'  ANSWER : {responseSrc["answer"].strip()}')
print(f'  SOURCES: {responseSrc["sources"].strip()}')

Automatic chain (implicitly by similarity):
  ANSWER : The frogs and toads were planning a party under the moon that night.
  SOURCES: Barney's story at the pub

Here the default is to fetch four documents ... so that the only other text actually carrying additional information is left out!

The QA Process behind the scenes¶

In order to exploit the MMR search in end-to-end question-answering pipelines, you need to recreate and manually tweak the steps behind the query or query_with_sources methods. This takes just a few lines.

First you need a few additional modules:

In [16]:

Copied!

from langchain.chains.retrieval_qa.base import RetrievalQA
from langchain.chains.qa_with_sources.retrieval import RetrievalQAWithSourcesChain
from langchain.chains.retrieval_qa.base import RetrievalQA
from langchain.chains.qa_with_sources.retrieval import RetrievalQAWithSourcesChain

You are ready to run two QA chains, identical in all respects (especially in the number of results to fetch, two), except the search_type:

Similarity-based QA¶

In [17]:

Copied!





# manual creation of the "retriever" with the 'similarity' search type
retrieverSim = myCassandraVStore.as_retriever(
    search_type='similarity',
    search_kwargs={
        'k': 2,
        # ...
    },
)
# Create a "RetrievalQA" chain
chainSim = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retrieverSim,
)
# Run it and print results
responseSim = chainSim.run(QUESTION)
print(responseSim)
# manual creation of the "retriever" with the 'similarity' search type
retrieverSim = myCassandraVStore.as_retriever(
    search_type='similarity',
    search_kwargs={
        'k': 2,
        # ...
    },
)
# Create a "RetrievalQA" chain
chainSim = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retrieverSim,
)
# Run it and print results
responseSim = chainSim.run(QUESTION)
print(responseSim)

 The party was held under the moon and was planned by both toads and frogs.

MMR-based QA¶

In [18]:

Copied!





# manual creation of the "retriever" with the 'MMR' search type
retrieverMMR = myCassandraVStore.as_retriever(
    search_type='mmr',
    search_kwargs={
        'k': 2,
        # ...
    },
)
# Create a "RetrievalQA" chain
chainMMR = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retrieverMMR
)
# Run it and print results
responseMMR = chainMMR.run(QUESTION)
print(responseMMR)
# manual creation of the "retriever" with the 'MMR' search type
retrieverMMR = myCassandraVStore.as_retriever(
    search_type='mmr',
    search_kwargs={
        'k': 2,
        # ...
    },
)
# Create a "RetrievalQA" chain
chainMMR = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retrieverMMR
)
# Run it and print results
responseMMR = chainMMR.run(QUESTION)
print(responseMMR)

 The party was held under the moon and was attended by both frogs and toads. Everyone was required to wear a purple hat.

Answers with sources¶

You can run the variant of these chains that also returns the source for the documents used in preparing the answer, which makes it even more obvious:

In [19]:

Copied!





chainSimSrc = RetrievalQAWithSourcesChain.from_chain_type(
    llm,
    retriever=retrieverSim,
)
#
responseSimSrc = chainSimSrc({chainSimSrc.question_key: QUESTION})
print('Similarity-based chain:')
print(f'  ANSWER : {responseSimSrc["answer"].strip()}')
print(f'  SOURCES: {responseSimSrc["sources"].strip()}')
chainSimSrc = RetrievalQAWithSourcesChain.from_chain_type(
    llm,
    retriever=retrieverSim,
)
#
responseSimSrc = chainSimSrc({chainSimSrc.question_key: QUESTION})
print('Similarity-based chain:')
print(f'  ANSWER : {responseSimSrc["answer"].strip()}')
print(f'  SOURCES: {responseSimSrc["sources"].strip()}')

Similarity-based chain:
  ANSWER : The toads and frogs were planning a moonlit celebration.
  SOURCES: Barney's story at the pub

In [20]:

Copied!





chainMMRSrc = RetrievalQAWithSourcesChain.from_chain_type(
    llm,
    retriever=retrieverMMR,
)
#
responseMMRSrc = chainMMRSrc({chainMMRSrc.question_key: QUESTION})
print('MMR-based chain:')
print(f'  ANSWER : {responseMMRSrc["answer"].strip()}')
print(f'  SOURCES: {responseMMRSrc["sources"].strip()}')
chainMMRSrc = RetrievalQAWithSourcesChain.from_chain_type(
    llm,
    retriever=retrieverMMR,
)
#
responseMMRSrc = chainMMRSrc({chainMMRSrc.question_key: QUESTION})
print('MMR-based chain:')
print(f'  ANSWER : {responseMMRSrc["answer"].strip()}')
print(f'  SOURCES: {responseMMRSrc["sources"].strip()}')

MMR-based chain:
  ANSWER : The party that night was thrown by frogs and toads, and everyone was required to wear a purple hat.
  SOURCES: Barney's story at the pub, The chronicles at the village library