Compare LoRA, QLoRA, and full fine-tuning for adapting large language models. For each approach explain: the underlying mechanism, memory and compute requirements, quality tradeoffs, training speed, what kinds of tasks it works best for, and when it fails. Specifically: how does LoRA inject low-rank updates without touching original weights? What does QLoRA add with 4-bit quantization? When is full fine-tuning worth the cost? For a 70B model on a single A100, which approach is even feasible and why?