Your LLM application makes 5M API calls per day. Token costs are your largest infrastructure expense. Identify and quantify the main token reduction levers. Cover: prompt compression techniques (removing redundant context, dynamic system prompts), RAG optimization (passing fewer chunks, smaller chunks, better reranking), response length control, caching identical or near-identical requests, using smaller models for simpler subtasks, and prompt caching features offered by model providers. For each technique, what is the typical reduction, the quality tradeoff, and the implementation complexity?