A model that is 99% accurate can be useless. This lesson is the difference between looking competent and being competent in ML interviews.
The trap: imbalanced data
Predict a rare disease present in 1% of people. A model that always says "healthy" is 99% accurate — and catches zero sick patients. Accuracy hid total failure. This is why we need better metrics.
The confusion matrix — the source of truth
Predicted + Predicted -
Actual + True Positive False Negative (missed!)
Actual - False Positive True Negative
(false alarm)The two that matter
- Precision = of everything flagged positive, how much was right? TP / (TP + FP). "When I raise an alarm, am I usually correct?"
- Recall = of all real positives, how many did I catch? TP / (TP + FN). "Am I missing real cases?"
The trade-off (this is the interview question)
- Disease screening / fraud → maximise recall. Missing a real case is catastrophic; a false alarm just means a follow-up check.
- Spam filter / recommending content → favour precision. A false positive (real email → spam) is worse than letting one spam through.
- F1 score = harmonic mean of the two — one number when you need balance.
from sklearn.metrics import classification_report print(classification_report(y_test, preds)) # precision, recall, F1 per class
Say "it depends on the cost of a false negative vs false positive" in an interview and you instantly sound senior.