RAG Chunking and Overlap Explained (In Plain English)

Chunking and overlapping are important terms in vector and RAG search systems. They impact retrieval reliability (i.e. search return quality). Gotta understand them.

It’s helpful to start with a book analogy. Say you have intro to microeconomics textbook. It’s a large and heavy. Maybe 1000 pages divided into 30 chapters.

Each chapter contains a separate block of knowledge, which is good for organisation. Related topics are grouped together, which makes the material easier to navigate and understand. But students also need to understand how the last chapter connects to the current chapter. So, the authors start each chapter with a few pages that highlight the key points from the previous one link them to the next. Linking ideas in each chapter helps ensure that useful information is not lost between chapters.

Chunking is a bit like adding chapters. The database divides incoming data into smaller sections, which are called chunks. Chunks solve the problem of trying to compress too many ideas into a single database vector. If chunks are too big, query returns are likely to be less specific or miss the meaning of the query. Now, if the chunks are too small, they might trim connected ideas in the original article, which also leads to poor returns. Imagine, for instance, a table of data that gets cut in the middle.

Overlapping helps address this trim problem. Just as each chapter opens with a recap of the previous one, overlapping chunks repeat a small portion of text at the boundary — so no idea gets lost between sections. When chunks are created, a small portion of text (for example, 50 tokens) is repeated across adjacent chunks. This ensures that important context near the boundaries is preserved in multiple chunks, improving the chances that relevant information is retrieved.