Classification Metrics

Essential metrics for evaluating classification models with interactive confusion matrix calculator

Interactive Confusion Matrix Calculator

How to use: Adjust the confusion matrix values below to see how different metrics are calculated in real-time.

Confusion Matrix

		Predicted
		Positive	Negative
Actual	Positive	True Positive (TP)	False Negative (FN)
Actual	Negative	False Positive (FP)	True Negative (TN)

Calculated Metrics

Accuracy

Precision

Recall (Sensitivity)

F1 Score

Specificity

Confusion Matrix Explained

True Positive (TP): Model correctly predicts positive class
True Negative (TN): Model correctly predicts negative class
False Positive (FP): Model incorrectly predicts positive (Type I error)
False Negative (FN): Model incorrectly predicts negative (Type II error)

Mathematical Formulas

Accuracy

$$\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}$$

What it measures: Overall correctness of the model
Range: $[0, 1]$ or 0% to 100%
When to use: Balanced datasets
Limitation: Misleading with imbalanced classes

Precision (Positive Predictive Value)

$$\text{Precision} = \frac{TP}{TP + FP}$$

What it measures: Of all positive predictions, how many are actually positive?
Range: $[0, 1]$
When to use: When False Positives are costly
Interpretation: High precision = few false alarms

Recall (Sensitivity, True Positive Rate)

$$\text{Recall} = \frac{TP}{TP + FN}$$

What it measures: Of all actual positives, how many did we catch?
Range: $[0, 1]$
When to use: When False Negatives are costly
Interpretation: High recall = we don't miss many positives

F1 Score (Harmonic Mean)

$$F_1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} = \frac{2 \times TP}{2 \times TP + FP + FN}$$

What it measures: Balance between Precision and Recall
Range: $[0, 1]$
When to use: Need balance or with imbalanced classes

Specificity (True Negative Rate)

$$\text{Specificity} = \frac{TN}{TN + FP}$$

What it measures: Of all actual negatives, how many did we correctly identify?
Relation: $\text{Specificity} = 1 - \text{FPR}$

F-Beta Score

$$F_\beta = (1 + \beta^2) \times \frac{\text{Precision} \times \text{Recall}}{\beta^2 \times \text{Precision} + \text{Recall}}$$

• $\beta < 1$: More weight on Precision
• $\beta = 1$: Equal weight (F1)
• $\beta > 1$: More weight on Recall

When to Use Each Metric

Accuracy: Use for balanced datasets. Avoid with imbalanced classes.

Precision: Use when False Positives are costly (spam filter, fraud flagging).

Recall: Use when False Negatives are costly (disease detection, fraud catching).

F1 Score: Use when you need balance or have imbalanced classes.