October 9, 2025

What Is Recall in Machine Learning?

In machine learning, evaluation metrics are more than just numbers — they’re the lenses through which we judge how well a model is doing. Among these metrics, recall holds a…

In machine learning, evaluation metrics are more than just numbers — they’re the lenses through which we judge how well a model is doing. Among these metrics, recall holds a special place. Sometimes called sensitivity or true positive rate, recall measures a model’s ability to identify all the relevant instances in a dataset. In contexts like medical diagnoses, fraud detection, or safety systems, missing positive cases can be far more costly than flagging extras.

Defining Recall

Recall is formally defined as:

[
\text{Recall} = \frac{TP}{TP + FN}
]

where TP = true positives (correctly predicted positives) and FN = false negatives (positive instances the model missed). In other words, recall answers: Of all the actual positive cases, how many did we correctly detect?

Google’s machine learning crash course describes this as the model’s true positive rate, i.e. the proportion of all real positives classified correctly. Similarly, platforms like Iguazio define recall essentially as the percentage of real positive cases a model identifies.

Because of this definition, recall captures “completeness” rather than “precision.” A model with high recall tries to minimize false negatives — it errs on the side of catching as many positives as possible, even if it catches some extra negatives along the way.

Why Recall Matters More Than Accuracy in Many Cases

In many real-world problems, classes are imbalanced. For example, fraudulent transactions might be 1 in 1,000 cases; disease cases in medical screening might be rare. If you build a model that always predicts “no” (negative) you might still get 99.9 % accuracy — yet the model fails to catch any real positive cases. This is the “accuracy paradox.”

Recall helps address this issue. When positive cases are rare, the cost of false negatives is often much higher than false positives. In medical diagnosis, a missed disease is more dangerous than a false alarm. In such domains, recall becomes more meaningful than overall accuracy.

That’s why in domains like healthcare, recall is often prioritized even at the expense of precision — the logic being that catching as many true positives as possible is critical, even if there are some extra false positives.

Precision vs. Recall: The Trade-Off

Recall rarely works alone. It is often paired with precision, which measures how many of the positive predictions were correct (i.e., TP / (TP + FP)).

As recall goes up (catching more positives), precision tends to fall (you bring in more false positives). Adjusting the model’s decision threshold — for example changing from classifying at probability > 0.5 down to 0.3 — can push recall higher, but precision may suffer.

Because of this tension, many evaluations use combined metrics like F1 score, which is the harmonic mean of precision and recall, or precision–recall curves that show how one metric shifts with the other.

Use Cases Where Recall Dominates

Recall becomes the defining metric in situations where false negatives are costly:

Medical screening (e.g. cancer detection): missing a sick patient is worse than falsely flagging a healthy one.
Fraud detection: failing to catch a fraudulent transaction can have financial and reputational consequences.
Spam or malware detection: letting dangerous content through is riskier than blocking a few benign ones.

In such domains, models are often optimized to maximize recall, subject to an acceptable level of precision.

Challenges and Limitations

High recall is not a silver bullet. Excessively pushing recall can degrade precision so much that the system becomes useless — too many false positives burden users or downstream systems.

Also, recall does not consider true negatives. A model can have perfect recall by labeling everything as positive — but that would collapse precision to a low value, making the model impractical.

Finally, for multi-class classification (more than two classes), recall may need to be calculated per class, and combined via methods like macro-averaging or micro-averaging.

The Bottom Line

Recall in machine learning is about sensitivity — the model’s ability to catch all true positives. It’s measured as TP / (TP + FN), often called true positive rate. Because of class imbalance and the high cost of missing relevant cases, recall often becomes more important than accuracy in domains like healthcare, fraud detection, or security systems.

Yet recall comes with trade-offs: boosting it usually lowers precision. That’s why effective models balance recall with precision (or F1). For practitioners, it’s not enough to aim for high recall — it must be tempered with domain understanding, error costs, and clarity on what kinds of mistakes are more tolerable.

Would you like me to also add diagrams or pseudocode examples illustrating recall, precision, and confusion matrices for inclusion in your publication?