Random Variables
Understanding discrete and continuous random variables, PMF, PDF, and CDF
What is a Random Variable?
🎲 Random Variable - Mapping Outcomes to Numbers
Definition: A function that assigns a numerical value to each outcome in a sample space
Notation: Usually denoted by capital letters: $X$, $Y$, $Z$
Purpose: Allows us to perform mathematical operations on random events
Example: $X$ = number of heads in 3 coin flips (maps outcomes like HHT → 2)
$$\Omega = \text{sample space (set of all possible outcomes)}$$ $$\mathbb{R} = \text{real numbers}$$ $$X(\omega) = \text{numerical value assigned to outcome } \omega$$ Concrete Example - Coin Flip: $$\text{Sample space: } \Omega = \{H, T\}$$ $$\text{Random variable: } X(H) = 1, \quad X(T) = 0$$ $$\text{Now we can do math: } P(X = 1) = 0.5, \quad E[X] = 0.5, \text{ etc.}$$
Intuitive Examples
🎲 Rolling a Die
Sample Space: $$\{1, 2, 3, 4, 5, 6\}$$ Random Variable $X$: The number shownValues: $$X \in \{1, 2, 3, 4, 5, 6\}$$ Type: Discrete
🌡️ Temperature
Sample Space: All possible outcomesRandom Variable $X$: Temperature in °C
Values: $$X \in \mathbb{R} \quad \text{(any real number)}$$ Type: Continuous
📞 Phone Calls
Sample Space: All possible call patternsRandom Variable $X$: Number of calls per hour
Values: $$X \in \{0, 1, 2, 3, ...\}$$ Type: Discrete
⏱️ Waiting Time
Sample Space: All possible wait timesRandom Variable $X$: Time until bus arrives (minutes)
Values: $$X \in [0, \infty)$$ Type: Continuous
Discrete Random Variables
📊 Discrete Random Variables - Countable Values
Definition: Takes on a countable number of distinct values
Examples: Number of heads in coin flips, count of defects, number of customers
Key property: Can list all possible values (finite or countably infinite)
1. Non-negative: $$p_X(x) \geq 0 \text{ for all } x$$ 2. Sums to 1: $$\sum_{\text{all } x} p_X(x) = 1$$ 3. Only defined at discrete points Notation: $$\text{Also written as } P(X = x) \text{ or } f_X(x)$$ $$\text{"Mass" because probability is concentrated at discrete points}$$ Interpretation: $$\text{PMF gives exact probabilities for discrete values}$$
Example: Fair Die Roll
Random Variable $X$: Outcome of rolling a fair six-sided die
$$P(X = 3) = \frac{1}{6} \approx 0.167$$ $$P(X \leq 2) = P(X=1) + P(X=2) = \frac{1}{6} + \frac{1}{6} = \frac{1}{3}$$ $$P(X > 4) = P(X=5) + P(X=6) = \frac{2}{6} = \frac{1}{3}$$ $$P(X \text{ is even}) = P(X=2) + P(X=4) + P(X=6) = \frac{1}{2}$$
Example: Number of Heads in 3 Coin Flips
Random Variable $X$: Number of heads when flipping a fair coin 3 times
$x$ | 0 | 1 | 2 | 3 |
---|---|---|---|---|
$p_X(x)$ | $\frac{1}{8}$ | $\frac{3}{8}$ | $\frac{3}{8}$ | $\frac{1}{8}$ |
Decimal | 0.125 | 0.375 | 0.375 | 0.125 |
Continuous Random Variables
📈 Continuous Random Variables - Uncountable Values
Definition: Takes on any value in an interval (uncountably infinite values)
Examples: Height, weight, time, temperature, voltage
Key property: Cannot list all values - must use intervals
Important: $P(X = x) = 0$ for any specific value! (Probability is over intervals)
1. Non-negative: $$f_X(x) \geq 0 \text{ for all } x$$ 2. Integrates to 1: $$\int_{-\infty}^{\infty} f_X(x) \, dx = 1$$ 3. Zero at any point: $$P(X = x) = 0 \text{ for any specific } x \text{ (zero area at a point)}$$ 4. Endpoints don't matter: $$P(a \leq X \leq b) = P(a < X < b)$$
Example: Uniform Distribution on [0, 1]
Random Variable $X$: Uniformly distributed on interval $[0, 1]$
Example: Exponential Distribution
Use case: Modeling waiting times, time until failure (memoryless process)
Cumulative Distribution Function (CDF)
📊 CDF - The Universal Description
Works for both: Discrete and continuous random variables
Key property: Always non-decreasing, ranges from 0 to 1
Advantage: Unified way to describe any random variable
1. Non-decreasing: $$\text{If } x_1 < x_2, \text{ then } F_X(x_1) \leq F_X(x_2)$$ 2. Right-continuous: $$\lim_{h \to 0^+} F_X(x+h) = F_X(x)$$ 3. Limits: $$\lim_{x \to -\infty} F_X(x) = 0 \quad \text{and} \quad \lim_{x \to \infty} F_X(x) = 1$$ 4. Range: $$0 \leq F_X(x) \leq 1 \text{ for all } x$$ Key Insight: $$\text{CDF is defined for ALL random variables (discrete, continuous, or mixed)}$$
From PMF to CDF: $$F_X(x) = \sum_{k \leq x} p_X(k)$$ From CDF to PMF: $$p_X(x) = F_X(x) - F_X(x^-)$$ $$\text{where } F_X(x^-) \text{ is the left limit}$$ For Continuous Random Variables:
From PDF to CDF: $$F_X(x) = \int_{-\infty}^x f_X(t) \, dt$$ From CDF to PDF: $$f_X(x) = \frac{dF_X(x)}{dx}$$ $$\text{(derivative of CDF is PDF)}$$
Example: CDF for Fair Die
Random Variable $X$: Outcome of fair six-sided die
$x$ | $F_X(x) = P(X \leq x)$ | Interpretation |
---|---|---|
$x = 2.5$ | $\frac{2}{6} = 0.333$ | Prob. of rolling ≤ 2 |
$x = 4$ | $\frac{4}{6} = 0.667$ | Prob. of rolling ≤ 4 |
$x = 10$ | $1$ | Certain (all outcomes ≤ 10) |
Example: CDF for Uniform[0,1]
Random Variable $X$: Uniform on $[0,1]$
$$f_X(x) = \frac{dF_X(x)}{dx} = \frac{d(x)}{dx} = 1 \quad \text{for } x \in [0,1] \quad \checkmark$$
Using CDF for Probability Calculations
Comparison: Discrete vs Continuous
Property | Discrete | Continuous |
---|---|---|
Values | Countable set | Interval (uncountable) |
Distribution | PMF (Probability Mass) | PDF (Probability Density) |
Exact probability | $P(X = x) = p_X(x)$ | $P(X = x) = 0$ |
Interval probability | $\sum_{k=a}^b p_X(k)$ | $\int_a^b f_X(x) \, dx$ |
Normalization | $\sum_{\text{all } x} p_X(x) = 1$ | $\int_{-\infty}^{\infty} f_X(x) \, dx = 1$ |
CDF formula | $F_X(x) = \sum_{k \leq x} p_X(k)$ | $F_X(x) = \int_{-\infty}^x f_X(t) \, dt$ |
CDF shape | Step function (jumps) | Smooth curve |
Examples | Coin flips, dice, counts | Height, time, temperature |
Key Insights
This allows us to use calculus and algebra to analyze random events mathematically.
PMF vs PDF - Critical Difference:
PMF gives exact probabilities ($P(X=x)$), but PDF gives densities. For continuous variables, $P(X=x) = 0$ always!
PDF Can Exceed 1:
Common misconception - PDF values can be > 1 because they are densities, not probabilities. Only integrals (areas) are probabilities.
CDF is Universal:
Works for both discrete and continuous. Always non-decreasing, goes from 0 to 1. You can always recover PMF/PDF from CDF.
Discrete: Sum, Continuous: Integrate:
This is the fundamental difference in calculations. Discrete uses sums ($\sum$), continuous uses integrals ($\int$).
Endpoints Don't Matter for Continuous:
$P(a < X < b) = P(a \leq X \leq b)$ for continuous variables (since $P(X=a) = 0$). This is NOT true for discrete!
CDF Always Defined:
Even for mixed distributions (part discrete, part continuous), CDF is always well-defined. It's the most general description.
Visualization:
PMF: vertical bars at discrete points
PDF: smooth curve, area under curve = probability
CDF (discrete): step function
CDF (continuous): smooth S-shaped curve