MediumAI SecurityTheory

Red-Teaming an AI Assistant for a Law Firm

You have been asked to red-team an AI assistant used internally by a law firm. The assistant has access to all client case files (read-only), the firm's internal knowledge base, and the ability to draft emails (with human approval before sending). It is accessed by lawyers, paralegals, and administrative staff.

(1) Write a threat model: who are the adversaries (insider, external, unintentional), what are they trying to achieve, and what are the highest-value targets?

(2) Describe three specific attack scenarios you would test. For each, include:

The exact input or scenario
What failure you are testing for
How you would classify the severity (Critical / High / Medium / Low) and why

(3) One of your attacks successfully extracts a confidential client email from the case files. What are your next steps over the next 48 hours — internal disclosure, mitigation, retest, ongoing monitoring? Be specific.

Sign in to attempt this problem

Free account gives you full access to community problems with the complete solution reveal — golden answer, senior walkthrough, and score breakdown — after submission.

Start free →Already have an account? Sign in