Your transformer-based model achieves 96.2% accuracy, has 6.7B parameters in FP32 (26.8GB), and achieves 180ms p99 latency on a single A100. The product team needs under 50ms p99. (1) Calculate the memory footprint after quantising to INT8. Would it still fit on a single A100 80GB? (2) Propose a compression strategy to achieve the 50ms target while minimising accuracy loss. (3) You apply INT8 and accuracy drops from 96.2% to 94.7%. The PM says this is unacceptable. What are your options?