RAG pipelines I'd never ship to prod (and the ones I would)
Three years of vector DB war stories. Chunking strategies, embedding drift, and the lie of cosine similarity.
Three years of vector DB war stories. What I'd ship today, and what I'd burn down on sight.
\nCosine similarity is a lie agreed upon. It works, mostly, until your corpus drifts - and then it fails silently, with confidence. That alone disqualifies it from any system where a wrong answer has a cost.
\nChunking is policy, not optimisation
\nDecide what a chunk MEANS to your retrieval target. A legal clause? A code function? A line of dialogue? Pick the semantic unit first, the byte size second. Every team I've seen get RAG wrong picked byte size first.
\nWhen I'd still ship RAG
\nDomains where the corpus updates daily and the cost of a wrong answer is bounded. Customer support, internal docs, recommendation. Domains where the corpus is sacred - medical, legal, financial - demand a different architecture. I'll write that up next.
