EasyML EngineeringTheory

Model Evaluation for Imbalanced Loan Fraud Detection

You are building a model to detect fraudulent loan applications. Dataset is 98% legitimate, 2% fraudulent. Your manager says optimize for accuracy.

(1) Why is accuracy the wrong metric here? Compute the accuracy of a trivial "predict not-fraud" baseline.

(2) Which metric would you actually optimize for, and why? Discuss precision, recall, F1, and AUC-ROC and which fits this problem best.

(3) How would you set the decision threshold, given that missed fraud costs $5,000 but a false positive costs approximately $200 in lost revenue? Show your math.

(4) Your offline AUC is 0.96 but after deploying, the team finds the model is performing worse than expected. Name two reasons offline evaluation can disagree with online performance.

Sign in to attempt this problem

Free account gives you full access to community problems with the complete solution reveal — golden answer, senior walkthrough, and score breakdown — after submission.

Start free →Already have an account? Sign in