You're deployed at a customer whose pilot worked on 10,000 documents with a single in-memory index. They now want the agent over their full 2-million-document corpus, and the current design won't hold on memory, latency, index build time, or updates. Design the scaled retrieval architecture: indexing approach, sharding/partitioning, update strategy, latency targets, and cost tradeoffs. AI tools are allowed. Then defend the architecture change to a skeptical CTO who is wary of over-engineering.