Classification Metrics

Essential metrics for evaluating classification models with interactive confusion matrix calculator

Interactive Confusion Matrix Calculator

How to use: Adjust the confusion matrix values below to see how different metrics are calculated in real-time.

Confusion Matrix

Predicted
Positive Negative
Actual Positive
True Positive (TP)
False Negative (FN)
Negative
False Positive (FP)
True Negative (TN)

Calculated Metrics

Accuracy
-
Precision
-
Recall (Sensitivity)
-
F1 Score
-
Specificity
-

Confusion Matrix Explained

True Positive (TP): Model correctly predicts positive class
True Negative (TN): Model correctly predicts negative class
False Positive (FP): Model incorrectly predicts positive (Type I error)
False Negative (FN): Model incorrectly predicts negative (Type II error)

Mathematical Formulas

Accuracy
$$\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}$$
What it measures: Overall correctness of the model
Range: \([0, 1]\) or 0% to 100%
When to use: Balanced datasets
Limitation: Misleading with imbalanced classes
Precision (Positive Predictive Value)
$$\text{Precision} = \frac{TP}{TP + FP}$$
What it measures: Of all positive predictions, how many are actually positive?
Range: \([0, 1]\)
When to use: When False Positives are costly
Interpretation: High precision = few false alarms
Recall (Sensitivity, True Positive Rate)
$$\text{Recall} = \frac{TP}{TP + FN}$$
What it measures: Of all actual positives, how many did we catch?
Range: \([0, 1]\)
When to use: When False Negatives are costly
Interpretation: High recall = we don't miss many positives
F1 Score (Harmonic Mean)
$$F_1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} = \frac{2 \times TP}{2 \times TP + FP + FN}$$
What it measures: Balance between Precision and Recall
Range: \([0, 1]\)
When to use: Need balance or with imbalanced classes
Specificity (True Negative Rate)
$$\text{Specificity} = \frac{TN}{TN + FP}$$
What it measures: Of all actual negatives, how many did we correctly identify?
Relation: \(\text{Specificity} = 1 - \text{FPR}\)
F-Beta Score
$$F_\beta = (1 + \beta^2) \times \frac{\text{Precision} \times \text{Recall}}{\beta^2 \times \text{Precision} + \text{Recall}}$$
• \(\beta < 1\): More weight on Precision
• \(\beta = 1\): Equal weight (F1)
• \(\beta > 1\): More weight on Recall

When to Use Each Metric

Accuracy: Use for balanced datasets. Avoid with imbalanced classes.

Precision: Use when False Positives are costly (spam filter, fraud flagging).

Recall: Use when False Negatives are costly (disease detection, fraud catching).

F1 Score: Use when you need balance or have imbalanced classes.