How we made a smarter inquiry bot using LangChain, ChatGPT, and VectorDB

I am Isoda, a new graduate working as a front-end developer.

I have modified the system of an internal inquiry Slack bot, originally created by a senior colleague, to make it smarter and more human-like in its responses. This was done by using a tool called LangChain to connect it with a vector database and ChatGPT. In this post, I would like to briefly introduce this system.

What is LangChain

Introduction | LangChain

LangChain is a framework for developing applications that utilize language models such as ChatGPT. It can be used in various ways, such as chatbots, inquiry AI that answers questions based on knowledge from databases and documents, and document summarization.

This time, we will use LangChain to create an inquiry Slack bot for internal company policies.

System Structure The structure of the created system is as follows.

First, the question from Slack is received by LangChain. This question is then sent as a query from LangChain to the vector database. The vector database extracts and returns documents related to the question.

Next, LangChain instructs ChatGPT to refer to the document's content and answer the question. ChatGPT then generates a response. The returned response is sent directly back to Slack.

LangChain is built on Google Cloud Functions, and the vector database is set up on Google Cloud Storage.

From here, the explanation will cover how to create the vector database and how LangChain, the vector database, and ChatGPT interact.

Constructing the Vector Database

A vector database stores various types of data as vectors and performs similarity searches on queries to return relevant data.

In this case, a vector database was created using information published on an internal company website. The website data was obtained through web crawling and saved as text files. (This article does not cover the web crawling process.)

Below is the Python code for creating the vector database.

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import DirectoryLoader
from langchain.text_splitter import CharacterTextSplitter
import dotenv

# OPENAI_API_KEY
dotenv.load_dotenv()

loader = DirectoryLoader("Webサイトのtxtデータがあるディレクトリ", glob="**/*.txt")
documents = loader.load()

all_splits = CharacterTextSplitter().split_documents(documents)

db = FAISS.from_documents(all_splits, OpenAIEmbeddings())
db.save_local("./faiss_index")

First, the text data of a locally saved website is read using DirectoryLoader and load(). The glob argument specifies that only files with the .txt extension should be read.

Next, the text data to be stored in the vector database is split into segments of a certain length. The reason for this step is to ensure that the documents returned when querying the vector database are short and closely related to the query. Documents such as websites often contain multiple topics on a single page. By splitting them into shorter segments, each document is focused on a single topic, preventing unrelated topics from being included in the query results.

An instance of CharacterTextSplitter() is used for splitting, and the previously loaded website data, documents, is passed to the split_documents() function to perform the split.

Finally, the vector database is created. In this case, FAISS is used as the vector database. The embedding model for converting text data into vectors is from OpenAI. The vector database is generated using FAISS.from_documents(documents, embedding model). The created vector database is then saved locally in the faiss_index directory using db.save_local("./faiss_index"). Within the faiss_index directory, two files, index.faiss and index.pkl, are generated.

Building the Query System

The following code is a Python implementation of the query system. Modules from Google Cloud and LangChain are imported. The details will be explained in the following sections.

import os
import functions_framework
from google.cloud import storage
from langchain_community.vectorstores import FAISS
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_core.messages import HumanMessage

@functions_framework.http
def ai_question_vectordb_api(request):
    try:
        query_dict = dict()
        question = request.get_data(as_text=True)

        # Download the vector database from GCS
        storage_client = storage.Client()
        bucket = storage_client.bucket("test_vecdb_for_langchain")
        blob = bucket.blob("index.faiss")
        blob.download_to_filename("/tmp/index.faiss")
        blob = bucket.blob("index.pkl")
        blob.download_to_filename("/tmp/index.pkl")

        # Query the vector database
        vectorstore = FAISS.load_local("/tmp/", embeddings=OpenAIEmbeddings())
        docs = vectorstore.similarity_search(question, k=3)
        input_documents = str()
        for d in docs:
            input_documents += "input_documents: {}\n".format(d.page_content.replace('\n', ''))

        # Generate an answer for the query
        llm = ChatOpenAI(model_name="gpt-3.5-turbo-16k", temperature=0, request_timeout=60)
        query = input_documents
        query += "question: {} ".format(question).replace('\n', ' ')
        query += "Please answer the question."
        answer = llm([HumanMessage(content=query)])
        return answer.content
    except Exception as e:
        print(e)
        return "Sorry... It looks like something went wrong..."

Question Reception

@functions_framework.http
def ai_question_vectordb_api(request):
question = request.get_data(as_text = True)

The first line, @functions_framework.http, is like a magic spell used when implementing Google Cloud Functions. The question sent from Slack is received using request. Since request is assumed to be just a string, it is read as text and assigned to the question variable.

Query to Vector Database

# Download the vector database from GCS
storage_client = storage.Client()
bucket = storage_client.bucket("test_vecdb_for_langchain")
blob = bucket.blob("index.faiss")
blob.download_to_filename("/tmp/index.faiss")
blob = bucket.blob("index.pkl")
blob.download_to_filename("/tmp/index.pkl")

# Query the vector database
vectorstore = FAISS.load_local("/tmp/", embeddings = OpenAIEmbeddings())
docs = vectorstore.similarity_search(question, k = 3)
input_documents = str()
for d in docs:
  input_documents += "input_documents: {}\n".format(d.page_content.replace('\n', ''))

First, download the vector database FAISS from Google Cloud Storage and save it locally. In this case, it is stored in the /tmp directory, which can be used as a temporary file storage location on Google Cloud Functions. There are two files to download: one with the .faiss extension and another with the .pkl extension.

Next, load the locally saved vector database into memory. This is done using FAISS.load_local("directory where FAISS is located", embeddings=OpenAIEmbeddings()). Here, the same embedding model used when creating the vector database should be specified.

To query the loaded vector database, use vectorstore.similarity_search(question text). This query returns multiple documents that have a high similarity to the question text. In this case, k=3 is specified to return three documents. The retrieved documents are then formatted and stored in input_documents.

Query to ChatGPT

llm = ChatOpenAI(model_name = "gpt-3.5-turbo-16k", temperature = 0, request_timeout = 60)
query += input_documents
query += "question: {} ".format(question).replace('\n', ' ')
query += "Please answer the question."
answer = llm([HumanMessage(content = query)])
return answer.content

First, an instance of ChatOpenAI is created and assigned to the variable llm. This instance uses OpenAI's gpt-3.5-turbo-16k model. The temperature parameter takes a value between 0 and 1, where values closer to 1 make ChatGPT's responses more random. In this case, to make testing easier, the value is set to 0 so that the same question will always receive the same response.

Next, the variable query is assigned the text that will be sent to ChatGPT. Here, it consists of the text returned by the vector database and the question sent from Slack, formatted appropriately.

Then, the actual conversation with ChatGPT takes place using answer = llm([HumanMessage(content=query)]). ChatGPT's response is stored in answer.content, which is then directly returned to Slack.

Summary

By using LangChain, I was able to easily create an AI application in collaboration with a vector database and ChatGPT. This time, I didn't use it, but it seems that by using LangChain's "Chain" feature, the process can be made even more concise and easier to debug. Moving forward, I plan to study that aspect as well.

This article was originally published in Japanese on Sqripts and has been translated with minor edits for clarity.

Turn your vision into a tangible reality

We've been doing this for 30 years - helping businesses like yours with software development and testing. Let's connect and explore how we can support your goals.

Related Posts