Technical·February 20, 2026

RAG pipelines I'd never ship to prod (and the ones I would)

Three years of vector DB war stories. Chunking strategies, embedding drift, and the lie of cosine similarity.

Sheraz Ahmed

Solutions Architect · Lahore

Three years of vector DB war stories. What I'd ship today, and what I'd burn down on sight.

Cosine similarity is a lie agreed upon. It works, mostly, until your corpus drifts - and then it fails silently, with confidence. That alone disqualifies it from any system where a wrong answer has a cost.

Chunking is policy, not optimisation

Decide what a chunk MEANS to your retrieval target. A legal clause? A code function? A line of dialogue? Pick the semantic unit first, the byte size second. Every team I've seen get RAG wrong picked byte size first.

When I'd still ship RAG

Domains where the corpus updates daily and the cost of a wrong answer is bounded. Customer support, internal docs, recommendation. Domains where the corpus is sacred - medical, legal, financial - demand a different architecture. I'll write that up next.

End of essay

Get the next one by email