Design a batching and caching strategy for a high-traffic LLM application with p99 latency requirements of under 2 seconds. Cover: continuous batching vs static batching and when each applies, KV cache management and how context length affects it, semantic caching for near-duplicate queries (not just exact match), prefix caching for shared system prompts, how to implement request queuing without blowing your latency budget, and how these strategies interact — e.g. how caching and batching can conflict. What does your architecture look like end-to-end?