Descriptive Statistics
Foundational statistical measures: Mean, Expectation, Variance, and Central Tendency
Interactive Statistics Visualization
How to use: Add or remove data points by clicking on the plot, or use the sliders to adjust values. Watch how mean, variance, and other statistics change in real-time!
Data Distribution
Measures of Central Tendency
Mean (μ)
-
Median
-
Mode
-
Measures of Spread
Variance (σ²)
-
Std Deviation (σ)
-
Range
-
Other Statistics
Q1 (25%)
-
Q3 (75%)
-
IQR
-
Count (n)
-
Arithmetic Mean
What is Arithmetic Mean?
The arithmetic mean (average) is the sum of all values divided by the count. It's the most common measure of central tendency.
Sample Mean Formula
$$\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i = \frac{x_1 + x_2 + \cdots + x_n}{n}$$
where:
• \(\bar{x}\) - sample mean (x-bar)
• n - number of observations
• x_i - individual data point
Properties:
• Sensitive to outliers (extreme values pull the mean)
• Best for symmetric distributions
• Minimizes sum of squared deviations
• Can be affected by skewness
• \(\bar{x}\) - sample mean (x-bar)
• n - number of observations
• x_i - individual data point
Properties:
• Sensitive to outliers (extreme values pull the mean)
• Best for symmetric distributions
• Minimizes sum of squared deviations
• Can be affected by skewness
Mathematical Expectation (Expected Value)
What is Expected Value?
Expected value is the theoretical mean of a probability distribution - the long-run average if you repeated an experiment infinitely many times.
Discrete Random Variable
$$E[X] = \mu = \sum_{i=1}^{n} x_i \cdot P(X = x_i)$$
Example: Rolling a fair die
$$E[X] = 1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6} + 3 \cdot \frac{1}{6} + 4 \cdot \frac{1}{6} + 5 \cdot \frac{1}{6} + 6 \cdot \frac{1}{6} = 3.5$$
$$E[X] = 1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6} + 3 \cdot \frac{1}{6} + 4 \cdot \frac{1}{6} + 5 \cdot \frac{1}{6} + 6 \cdot \frac{1}{6} = 3.5$$
Continuous Random Variable
$$E[X] = \int_{-\infty}^{\infty} x \cdot f(x) \, dx$$
where:
• f(x) - probability density function (PDF)
• Integration over all possible values weighted by probability
• f(x) - probability density function (PDF)
• Integration over all possible values weighted by probability
Properties of Expectation
Linearity:
$$E[aX + bY] = aE[X] + bE[Y]$$
Constant:
$$E[c] = c$$
Function of random variable:
$$E[g(X)] = \sum g(x_i) \cdot P(X = x_i)$$
$$E[aX + bY] = aE[X] + bE[Y]$$
Constant:
$$E[c] = c$$
Function of random variable:
$$E[g(X)] = \sum g(x_i) \cdot P(X = x_i)$$
Variance and Standard Deviation
What is Variance?
Variance measures how spread out the data is from the mean. It's the average of squared deviations from the mean. Standard deviation is the square root of variance (same units as data).
Population Variance
$$\sigma^2 = \frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2$$
where:
• σ² - population variance (sigma squared)
• N - population size
• μ - population mean
• Used when you have the entire population
• σ² - population variance (sigma squared)
• N - population size
• μ - population mean
• Used when you have the entire population
Sample Variance
$$s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2$$
where:
• s² - sample variance
• n - sample size
• \(\bar{x}\) - sample mean
• Divided by (n-1) for unbiased estimator (Bessel's correction)
• Used when you have a sample from population
• s² - sample variance
• n - sample size
• \(\bar{x}\) - sample mean
• Divided by (n-1) for unbiased estimator (Bessel's correction)
• Used when you have a sample from population
Variance using Expectation
$$\text{Var}(X) = E[(X - \mu)^2] = E[X^2] - (E[X])^2$$
Computational formula: Often easier to calculate
"Expected value of square minus square of expected value"
"Expected value of square minus square of expected value"
Standard Deviation
$$\sigma = \sqrt{\sigma^2} \quad \text{or} \quad s = \sqrt{s^2}$$
Advantage: Same units as original data (easier to interpret)
Example: If data is in dollars, std is in dollars (not dollars²)
Example: If data is in dollars, std is in dollars (not dollars²)
Properties of Variance
Constant:
$$\text{Var}(c) = 0$$
Scaling:
$$\text{Var}(aX) = a^2 \text{Var}(X)$$
Shifting (adding constant):
$$\text{Var}(X + b) = \text{Var}(X)$$
Independent variables:
$$\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$$
$$\text{Var}(c) = 0$$
Scaling:
$$\text{Var}(aX) = a^2 \text{Var}(X)$$
Shifting (adding constant):
$$\text{Var}(X + b) = \text{Var}(X)$$
Independent variables:
$$\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$$
Median and Mode
Median
The middle value when data is sorted. Splits data into two equal halves.
If n is odd:
$$\text{Median} = x_{(n+1)/2}$$
If n is even:
$$\text{Median} = \frac{x_{n/2} + x_{(n/2)+1}}{2}$$
Properties:
• Robust to outliers (not affected by extreme values)
• Better than mean for skewed distributions
• 50th percentile (Q2)
$$\text{Median} = x_{(n+1)/2}$$
If n is even:
$$\text{Median} = \frac{x_{n/2} + x_{(n/2)+1}}{2}$$
Properties:
• Robust to outliers (not affected by extreme values)
• Better than mean for skewed distributions
• 50th percentile (Q2)
Mode
The most frequently occurring value in the dataset.
Properties:
• Can have multiple modes (bimodal, multimodal)
• Can have no mode (all values unique)
• Most useful for categorical data
• Not affected by outliers
Example: [1, 2, 2, 3, 4, 4, 4, 5] → Mode = 4
• Can have multiple modes (bimodal, multimodal)
• Can have no mode (all values unique)
• Most useful for categorical data
• Not affected by outliers
Example: [1, 2, 2, 3, 4, 4, 4, 5] → Mode = 4
Quartiles and Percentiles
Quartiles
Quartiles divide the data into four equal parts.
• Q1 (25th percentile): 25% of data below this value
• Q2 (50th percentile): Median
• Q3 (75th percentile): 75% of data below this value
Interquartile Range (IQR):
$$\text{IQR} = Q3 - Q1$$
Measures spread of middle 50% of data (robust to outliers)
• Q2 (50th percentile): Median
• Q3 (75th percentile): 75% of data below this value
Interquartile Range (IQR):
$$\text{IQR} = Q3 - Q1$$
Measures spread of middle 50% of data (robust to outliers)
Outlier Detection using IQR
Lower fence: \(Q1 - 1.5 \times \text{IQR}\)
Upper fence: \(Q3 + 1.5 \times \text{IQR}\)
Values outside these fences are considered outliers
Upper fence: \(Q3 + 1.5 \times \text{IQR}\)
Values outside these fences are considered outliers
Mean vs Median vs Mode
Use Mean When:
✅ Data is symmetric
✅ No outliers
✅ Interval/ratio data
✅ Need to use all data points
❌ Sensitive to outliers
❌ Skewed distributions
✅ No outliers
✅ Interval/ratio data
✅ Need to use all data points
❌ Sensitive to outliers
❌ Skewed distributions
Use Median When:
✅ Data is skewed
✅ Outliers present
✅ Ordinal data
✅ Want robust measure
✅ Income, house prices
✅ Better for rankings
✅ Outliers present
✅ Ordinal data
✅ Want robust measure
✅ Income, house prices
✅ Better for rankings
Use Mode When:
✅ Categorical data
✅ Finding most common
✅ Nominal data
✅ Discrete distributions
✅ Most popular product
✅ Most frequent rating
✅ Finding most common
✅ Nominal data
✅ Discrete distributions
✅ Most popular product
✅ Most frequent rating
Key Insights
Relationship in symmetric distributions:
Mean = Median = Mode (perfectly symmetric like Normal distribution)
Right-skewed (positive skew):
Mode < Median < Mean (e.g., income distribution)
Left-skewed (negative skew):
Mean < Median < Mode (e.g., age at retirement)
Why n-1 in sample variance?
Bessel's correction: Using sample mean underestimates population variance. Dividing by (n-1) instead of n gives unbiased estimator.
Variance units problem:
Variance is in squared units (hard to interpret). Standard deviation fixes this by taking square root.
Coefficient of Variation (CV):
$$CV = \frac{\sigma}{\mu} \times 100\%$$
Relative variability - useful for comparing spread across different scales
Mean = Median = Mode (perfectly symmetric like Normal distribution)
Right-skewed (positive skew):
Mode < Median < Mean (e.g., income distribution)
Left-skewed (negative skew):
Mean < Median < Mode (e.g., age at retirement)
Why n-1 in sample variance?
Bessel's correction: Using sample mean underestimates population variance. Dividing by (n-1) instead of n gives unbiased estimator.
Variance units problem:
Variance is in squared units (hard to interpret). Standard deviation fixes this by taking square root.
Coefficient of Variation (CV):
$$CV = \frac{\sigma}{\mu} \times 100\%$$
Relative variability - useful for comparing spread across different scales