Skip to main content

Retrieval Augmented Generation

What is Retrieval Augmented Generation?

Retrieval Augmented Generation (RAG) is an innovative approach that can enhance the effectiveness of large language model (LLM) applications by leveraging custom data.

By retrieving relevant data/documents and providing them as context for the LLM, RAG has shown success in support chatbots and Q&A systems that require access to domain-specific knowledge. This approach solves two major problems faced by LLM models: they do not have access to custom data, and AI applications must leverage custom data to be effective.

RAG is now an industry standard, allowing organizations to deploy any LLM model and augment it to return relevant results for their organization by giving it a small amount of their data without the costs and time of fine-tuning or pre-training the model. The benefits of RAG include providing up-to-date and accurate responses, reducing inaccurate responses or hallucinations, providing domain-specific and relevant responses, and being efficient and cost-effective.

RAG has many use cases, including question and answer chatbots, search augmentation, and knowledge engines. When deciding whether to use RAG or fine-tune the model, RAG is the right place to start, being easy and possibly entirely sufficient for some use cases. Fine-tuning is most appropriate when one wants the LLM’s behavior to change or to learn a different “language.”