A production RAG system serving a legal research platform has a recall@5 of 34% — meaning only 34% of relevant documents appear in the top 5 retrieved results. The system uses:
- OpenAI text-embedding-3-small for both indexing and queries
- Pinecone with cosine similarity
- Top-5 retrieval
- No reranking
Users frequently report that the system misses relevant case law even when they know it exists in the database. Queries tend to be in plain language ("what are the rules around non-compete agreements in California") while documents use formal legal terminology ("enforceability of restrictive covenants under California Business and Professions Code").
-
Identify the most likely cause of the low recall. Be specific about what is failing in the retrieval pipeline.
-
Propose a two-stage retrieval pipeline that would improve recall. Describe each stage, what model or algorithm is used, and why it addresses the specific failure mode.
-
What is the recall/latency tradeoff of your proposed solution, and how would you measure whether it actually improved recall@5?