While Large Language Models (LLMs) are increasingly capable of performing general reasoning and solving everyday tasks, they typically lack the context to address specialized use cases, like answering questions about your proprietary data. This is because they lack the necessary context to understand your data – so instead, they often default to saying they don’t know, or worse, hallucinating wrong information. Instead, a better solution is to ground your LLM on the data it needs to generate highly effective responses, rather than forcing it to guess on topics it doesn’t have context on.
While this process sounds simple, when it comes to chat applications, there are many pitfalls in ensuring that you retrieve the right information from your knowledge base. At Arcus, we’ve built and run information retrieval systems at planet scale to discover and incorporate the most relevant context from your data into your LLMs, grounding them with real data to prevent hallucinations. Our information retrieval capabilities are customizable to your data and domain, enabling users to power personalized LLM applications, such as domain-specific, chat-based copilots.
A simple solution to building an LLM chat application that’s grounded on your data works as follows:
While steps 1 and 3 above are required for building any LLM application grounded on additional data and challenges in their own right, step 2 can be especially tricky when dealing with chat applications.
Typically, information retrieval systems built for LLM applications take in a snippet of text as a query and retrieve indexed data that’s highly similar to the text provided. These information retrieval algorithms usually rely on vector semantic search, keyword-based searches, or more intelligent approaches. However, in the context of chat applications, deciding what text to use for your query is surprisingly difficult.
Consider the scenario where we build a chatbot to answer questions about various tech companies, dynamically retrieving context from a database of company summaries to enhance the knowledge of our LLM. Below is an example interaction that a user might have with our chatbot:
When the user asks their final prompt, our goal is to retrieve the company summary relating to Arcus and use that to answer the user’s question. However, the user’s final prompt doesn’t mention Arcus directly and to the model the “it” in the prompt is ambiguous – as the prompt refers to history in the chat. This means simply attempting to retrieve data based on the user’s final prompt won’t give us the data we need to answer their question. Here are some simple strategies for how we can generally use chat history for retrieval and why they might not work as expected:
Since simple solutions to forming the data retrieval query from chat applications often fall short for a wide variety of user prompts, we need to find a more robust solution to this problem to build performant, production-worthy chat-based copilots over our data.
At Arcus, we’ve architected a solution that decouples the chat history of the application from the query used for the retrieval system to solve the challenges above. Our solution relies on intelligent and automatic query transformations, which transform the user’s chat history into queries that get to the heart of the user’s prompt and can be used as single units of retrieval against our data. Using these transformed queries results in better retrieval performance and more accurate LLM responses.
By using query transformations to decouple the user's chat history from the specific queries used to retrieve information, we can ensure that the retrieval process focuses on the most pertinent data, minimizing the risk of irrelevant or conflicting information being retrieved. This approach is similar in spirit to our approach for indexing data, which decouples the raw data we intend to retrieve from the information we use to index the data.
In the context of chat applications, determining the right query for information retrieval is a nuanced task. A simple and straightforward approach to using query transformations for chat applications is to ask an LLM to re-write the chat history into a simple query. This query can then be used to retrieve relevant information over our index. For example, we can ask ChatGPT the following question and use ChatGPT’s response as the query we use for our retrieval system:
“Incorporate the above context to rephrase the user's final prompt into a single self-contained question.”
For the example we gave previously, the generated query is a re-stated question that incorporates all the necessary information to retrieve the right context from our index:
“When was Arcus Inc., the seed-stage startup based in New York City that focuses on building a data platform for LLMs, founded?”
This is now a self-contained prompt that can be used for the retrieval system and ensures that we retrieve the right context for our LLM to provide the correct response to the user’s prompt.
Simply using an out-of-the-box LLM for query transformations is often not a production-ready solution due to poor reliability and performance. Here are some factors to consider when deciding how to use query transformations for chat applications:
At Arcus, we’ve built a Query Transformation Engine (QTE) specifically designed to solve the key requirements and tradeoffs of using query transformations for chat applications. Our engine is built on the following core steps to solve the main challenges above:
Building a chat application that provides valuable insights using your data presents many unique challenges. At Arcus, we’ve designed an approach that gets to the heart of users’ questions and retrieves the most relevant data to answer them. As we continue to improve and iterate on the core challenges of chat-based copilots, we’re pushing the frontier of what’s possible for using your data intelligently to build LLM applications. Request a demo to see how Arcus can help you ground LLMs on your data to build domain-specific copilots and AI applications!
Arcus is also hiring! We’re actively working on building LLM applications grounded on complex data, using advanced indexing and retrieval algorithms to answer complex user queries, large scale systems for processing heterogeneous data at scale, and understanding the performance of LLMs in the context of your data. Check out our careers page or reach out to us at recruiting@arcus.co!