Explain model quantization for LLM deployment. Cover: what quantization does (INT8, INT4, GPTQ, GGUF, AWQ), why it reduces memory and latency, what quality loss to expect at each precision level, how to measure quality loss (perplexity, task-specific evals), which tasks tolerate quantization well vs poorly, and when quantization is the right call vs just using a smaller model or hosted API. For a 13B parameter model that needs to run on a single 24GB GPU, what is your recommended quantization approach and what do you sacrifice?