7 Search and retrieval-augmented generation

This chapter covers

Semantic embeddings
Semantic search
Integrating language models with custom knowledge
Retrieval-augmented generation
Advanced retrieval-augmented generation optimization

In most companies, years of accumulated expertise—strategic insights, collaborative learnings, and industry know-how—are scattered across wikis, knowledge bases, and internal documents. When a critical need arises, people struggle with finding the relevant information. With retrieval-augmented generation (RAG), you can directly integrate this wealth of knowledge into your language model (LM) application. RAG lets you dynamically retrieve relevant knowledge and weave it into LM-generated responses, making interactions more relevant and context aware.

Alex experiences the need for custom data integration firsthand. He spent a lot of time tweaking the prompts in his app, but users still feel disconnected from their domain of knowledge. Often, the LM outputs are generic, outdated, and undifferentiated. RAG allows him to integrate LM capabilities with the specific, up-to-date information his clients need, making the AI-generated content relevant and reliable.

7.1 Specializing your language model with custom data

7.1.1 How prompt engineering falls short over time

7.1.2 Summarizing the interview

7.2 Retrieving relevant documents with semantic search

7.2.1 The role of search in the B2B context

7.2.2 Searching with semantic embeddings

7.2.3 Evaluating search

7.2.4 Optimizing your search system

7.3 Building an end-to-end RAG system

7.3.1 A basic RAG setup

7.3.2 Evaluating your RAG system

7.3.3 Optimizing your RAG system

Summary