You have been asked to red-team an AI assistant used internally by a law firm. The assistant has access to all client case files (read-only), the firm's internal knowledge base, and the ability to draft emails (with human approval before sending). It is accessed by lawyers, paralegals, and administrative staff.
(1) Write a threat model: who are the adversaries (insider, external, unintentional), what are they trying to achieve, and what are the highest-value targets?
(2) Describe three specific attack scenarios you would test. For each, include:
- The exact input or scenario
- What failure you are testing for
- How you would classify the severity (Critical / High / Medium / Low) and why
(3) One of your attacks successfully extracts a confidential client email from the case files. What are your next steps over the next 48 hours — internal disclosure, mitigation, retest, ongoing monitoring? Be specific.