You are making the build-vs-buy decision for LLM infrastructure. Walk through your framework for deciding between hosted APIs (OpenAI, Anthropic, Google) vs self-hosted open-source models (Llama, Mistral, Qwen). Consider: total cost at scale (token pricing vs GPU costs), data privacy and compliance requirements, latency requirements, customization needs (fine-tuning, RAG), operational burden, capability gaps, and hybrid approaches. Give a concrete example: for a healthcare company processing 500k patient queries per day with HIPAA requirements, what do you recommend and why?