A healthcare company trains a model to predict patient readmission risk using 50,000 patient records. The model is deployed as an API that partner hospitals can query.
(1) Describe a concrete membership inference attack against this model. What would an attacker submit, what would they observe, and what would they infer about a specific person's medical history?
(2) The company says they are GDPR-compliant because they "anonymized" training data by removing names and SSNs. Why is this insufficient protection against the attack you described in (1)? What does the attack reveal that anonymization does not address?
(3) Describe how DP-SGD would be applied to this training process. Walk through the four steps (per-example gradients, clipping, noise, update). What privacy guarantee does it provide, expressed in terms of ε? What accuracy tradeoff would the company need to accept?