Hi there! Olga Tatarinova here, co-founder of imaga AI. Gone are the days when calling customer support meant endless waiting on hold with music instead of dial tones. Artificial intelligence never gets tired, never takes breaks, and, thankfully, never plays annoying tunes.
In this article, I'll walk you through AI assistant development for one of our clients. It's designed to instantly respond to frequently asked questions using the company's knowledge base. Not only does the software search by keywords, but it also retrieves all relevant documents, even if they don't contain exact matches.
We'll cover the following topics:
- How to organize your product or company’s knowledge base
- How to teach a virtual assistant information extraction from the knowledge base and form responses to user questions.
- How to manage costs: Utilize LLM and avoid hefty expenses with OpenAI or other providers, especially if you have a large customer base.
The tools we'll use
Technologies we'll be using for the intelligent automated customer support bot:
- Chatwoot: An open-source operator interface and knowledge base.
- Rasa: An open-source framework for creating chatbots.
- Botfront: A virtual assistant interface for building chatbots on RASA.
- Qdrant: A vector database storing vector representations of articles from the knowledge base.
- Datapipe: The ETL tool extracts articles from Chatwoot, processes them, and places them into Qdrant.
Recipe
1. Content: preparing your knowledge Base
We love incorporating Chatwoot into our projects. Typically, we utilize it for the operator interface when the chatbot is assigned to a human. However, aside from the operator interface, Chatwoot offers a convenient knowledge base feature.
We've implemented a feature in Chatwoot’s knowledge base. For each FAQ article, we included several real-life examples of questions users ask when they want to find an answer from that article.
It's best to keep each article short and focused on one topic.
2. Programming: converting all knowledge base articles into vector form
We'll search for articles that address a user's query based on semantic similarity between texts. To do this, we'll first convert the texts into vector form and compute the distance between vectors. The smaller the distance, the closer the content of the articles.
We use the Qdrant vector database to store article vector representations. Qdrant is optimized for vector operations, enabling fast retrieval of similar vectors.
To convert the text from an article into a vector and store it in Qdrant, we need to tackle two tasks:
- Documents should be segmented so each vector corresponds to a single logical theme. This is crucial because with more encoded text comes an average and fuzzy resulting vector. Consequently, it becomes more challenging to perceive any specific theme within it. Therefore, initial document segmentation into parts is essential, and there is no one-size-fits-all solution here. Typically, segmentation is done using structural heuristics (chapters or paragraphs), then refined by Next Sentence Prediction models (NSP). Finally, it is verified by a human. This step wasn't necessary in the context of FAQs since we only had short answers. However, to enrich the search field, we generated human-like questions for the answers and a synthetic "answer representation." This is then converted into a vector and added as examples for the target article.
- We need to choose an effective method for vector generation. We've employed the encoder from OpenAI or the multilingual-e5 model. Both are effective due to their training on parallel Corpus in multiple languages.
3. Programming: Configuring the FAQ Service
The FAQ service implements a simple API. This API receives a user query, converts it into a vector, and performs a vector search in Qdrant. It then returns the most relevant vectors and the articles' titles and texts.
4. Programming: Configuring the Chatbot Assistant
We need a chatbot to receive questions from users and deliver responses.
To create the basic chatbot, we used Rasa, an open-source framework, and Botfront, a visual interface.
When a user messages the chatbot, RASA attempts to determine the intent of the user's query. If the user intends to ask an FAQ question, RASA redirects the query to the FAQ service.
The FAQ service then returns a list of related articles.
5. Optional: Free-form Responses with LLM (Using RAG, Retrieval-augmented Generation)
Once we've extracted the most relevant articles from the knowledge base, we can ask the LLM to read them and generate an accurate response to the user's query.
However, this approach has a significant drawback: top-tier LLMs like GPT-4 are expensive, and if you have a large volume of support requests, using LLMs can cost you an arm and a leg.
Our client faced precisely this scenario, so we disabled response generation, leaving only article-based responses from the knowledge base. This approach avoids using expensive LLMs for every query.
6. Programming: ensuring overall relevance
We have regular tasks that we must perform to keep all data up to date.
We need to update the vector representations of articles in Qdrant if new articles appear or old ones change. For this, we use the Datapipe ETL framework, which automatically tracks updates, deletions, and additions to content. We run the ETL process every 15 minutes. If any content in the knowledge base changes, Datapipe captures the changes and recalculates the vectors in Qdrant. This way, new information becomes available to the chatbot within 15 minutes of being added to the system's knowledge base.
Project Conclusions
As a result of the project, we now have a Chatwoot fork that supports the operation of an AI assistant based on the Chatwoot knowledge base "out of the box" without additional development.
If you're using Chatwoot for your projects, especially without chatbot automation, it may be worth switching to our Chatwoot fork to enable AI assistant solutions.
Statistics
As a result of the implementation, the number of support requests handled by the chatbot increased from 30% to 70%. The content team continues to add articles so that the chatbot can handle an increasing number of requests.
This concludes our article on developing an AI assistant. If you're interested in exploring our experience with implementing component-based development, check out our article to learn about the method's pros and cons.