Random Variables

Understanding discrete and continuous random variables, PMF, PDF, and CDF

What is a Random Variable?

🎲 Random Variable - Mapping Outcomes to Numbers

Definition: A function that assigns a numerical value to each outcome in a sample space

Notation: Usually denoted by capital letters: $X$, $Y$, $Z$

Purpose: Allows us to perform mathematical operations on random events

Example: $X$ = number of heads in 3 coin flips (maps outcomes like HHT → 2)

Formal Definition
$$X: \Omega \rightarrow \mathbb{R}$$
Where:
$$\Omega = \text{sample space (set of all possible outcomes)}$$ $$\mathbb{R} = \text{real numbers}$$ $$X(\omega) = \text{numerical value assigned to outcome } \omega$$ Concrete Example - Coin Flip: $$\text{Sample space: } \Omega = \{H, T\}$$ $$\text{Random variable: } X(H) = 1, \quad X(T) = 0$$ $$\text{Now we can do math: } P(X = 1) = 0.5, \quad E[X] = 0.5, \text{ etc.}$$

Intuitive Examples

🎲 Rolling a Die
Sample Space: $$\{1, 2, 3, 4, 5, 6\}$$ Random Variable $X$: The number shown
Values: $$X \in \{1, 2, 3, 4, 5, 6\}$$ Type: Discrete
🌡️ Temperature
Sample Space: All possible outcomes
Random Variable $X$: Temperature in °C
Values: $$X \in \mathbb{R} \quad \text{(any real number)}$$ Type: Continuous
📞 Phone Calls
Sample Space: All possible call patterns
Random Variable $X$: Number of calls per hour
Values: $$X \in \{0, 1, 2, 3, ...\}$$ Type: Discrete
⏱️ Waiting Time
Sample Space: All possible wait times
Random Variable $X$: Time until bus arrives (minutes)
Values: $$X \in [0, \infty)$$ Type: Continuous

Discrete Random Variables

📊 Discrete Random Variables - Countable Values

Definition: Takes on a countable number of distinct values

Examples: Number of heads in coin flips, count of defects, number of customers

Key property: Can list all possible values (finite or countably infinite)

Probability Mass Function (PMF)
$$p_X(x) = P(X = x)$$
Definition: $$\text{Probability that random variable } X \text{ takes value } x$$ Properties:
1. Non-negative: $$p_X(x) \geq 0 \text{ for all } x$$ 2. Sums to 1: $$\sum_{\text{all } x} p_X(x) = 1$$ 3. Only defined at discrete points Notation: $$\text{Also written as } P(X = x) \text{ or } f_X(x)$$ $$\text{"Mass" because probability is concentrated at discrete points}$$ Interpretation: $$\text{PMF gives exact probabilities for discrete values}$$

Example: Fair Die Roll

Random Variable $X$: Outcome of rolling a fair six-sided die

PMF: $$p_X(x) = \begin{cases} \frac{1}{6} & \text{if } x \in \{1, 2, 3, 4, 5, 6\} \\ 0 & \text{otherwise} \end{cases}$$
Verification: $$\sum_{x=1}^{6} p_X(x) = 6 \times \frac{1}{6} = 1 \quad \checkmark$$
Calculations:
$$P(X = 3) = \frac{1}{6} \approx 0.167$$ $$P(X \leq 2) = P(X=1) + P(X=2) = \frac{1}{6} + \frac{1}{6} = \frac{1}{3}$$ $$P(X > 4) = P(X=5) + P(X=6) = \frac{2}{6} = \frac{1}{3}$$ $$P(X \text{ is even}) = P(X=2) + P(X=4) + P(X=6) = \frac{1}{2}$$
Visual Representation: $$\text{PMF would be plotted as vertical bars at } x = 1, 2, 3, 4, 5, 6, \text{ each with height } \frac{1}{6}$$

Example: Number of Heads in 3 Coin Flips

Random Variable $X$: Number of heads when flipping a fair coin 3 times

Possible values: $$X \in \{0, 1, 2, 3\}$$ Sample space: $$\Omega = \{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT\} \quad \text{(8 outcomes)}$$
PMF Calculation: $$P(X = 0) = P(TTT) = \frac{1}{8}$$ $$P(X = 1) = P(HTT, THT, TTH) = \frac{3}{8}$$ $$P(X = 2) = P(HHT, HTH, THH) = \frac{3}{8}$$ $$P(X = 3) = P(HHH) = \frac{1}{8}$$
$x$ 0 1 2 3
$p_X(x)$ $\frac{1}{8}$ $\frac{3}{8}$ $\frac{3}{8}$ $\frac{1}{8}$
Decimal 0.125 0.375 0.375 0.125
Verification: $$\frac{1}{8} + \frac{3}{8} + \frac{3}{8} + \frac{1}{8} = 1 \quad \checkmark$$

Continuous Random Variables

📈 Continuous Random Variables - Uncountable Values

Definition: Takes on any value in an interval (uncountably infinite values)

Examples: Height, weight, time, temperature, voltage

Key property: Cannot list all values - must use intervals

Important: $P(X = x) = 0$ for any specific value! (Probability is over intervals)

Probability Density Function (PDF)
$$f_X(x) \geq 0, \quad \int_{-\infty}^{\infty} f_X(x) \, dx = 1$$
Definition: $$\text{Function whose integral over an interval gives probability}$$ Key Difference from PMF: $$\text{PMF gives exact probabilities: } P(X = x)$$ $$\text{PDF does NOT give probabilities directly!}$$ $$f_X(x) \text{ is a probability \textit{density}, not probability}$$ $$f_X(x) \text{ can be } > 1 \text{ (densities are not probabilities!)}$$ Probability Calculation: $$P(a \leq X \leq b) = \int_a^b f_X(x) \, dx$$ $$\text{Probability is the \textit{area under the curve}}$$ Properties:
1. Non-negative: $$f_X(x) \geq 0 \text{ for all } x$$ 2. Integrates to 1: $$\int_{-\infty}^{\infty} f_X(x) \, dx = 1$$ 3. Zero at any point: $$P(X = x) = 0 \text{ for any specific } x \text{ (zero area at a point)}$$ 4. Endpoints don't matter: $$P(a \leq X \leq b) = P(a < X < b)$$

Example: Uniform Distribution on [0, 1]

Random Variable $X$: Uniformly distributed on interval $[0, 1]$

PDF: $$f_X(x) = \begin{cases} 1 & \text{if } 0 \leq x \leq 1 \\ 0 & \text{otherwise} \end{cases}$$
Verification: $$\int_{-\infty}^{\infty} f_X(x) \, dx = \int_0^1 1 \, dx = 1 \quad \checkmark$$
Probability Calculations: $$P(X = 0.5) = 0 \quad \text{(probability at a single point is zero)}$$ $$P(0.2 \leq X \leq 0.7) = \int_{0.2}^{0.7} 1 \, dx = 0.7 - 0.2 = 0.5$$ $$P(X \leq 0.3) = \int_0^{0.3} 1 \, dx = 0.3$$ $$P(X > 0.8) = \int_{0.8}^1 1 \, dx = 0.2$$
Interpretation: $$\text{PDF is constant (height = 1) over } [0,1]$$ $$\text{Probability of any interval is proportional to its length}$$ $$\text{The area under the curve from 0.2 to 0.7 is } 1 \times 0.5 = 0.5$$

Example: Exponential Distribution

Use case: Modeling waiting times, time until failure (memoryless process)

PDF: $$f_X(x) = \begin{cases} \lambda e^{-\lambda x} & \text{if } x \geq 0 \\ 0 & \text{if } x < 0 \end{cases}$$ $$\text{where } \lambda > 0 \text{ is the rate parameter}$$
Example with $\lambda = 1$: $$P(X \leq 2) = \int_0^2 e^{-x} \, dx = [-e^{-x}]_0^2 = 1 - e^{-2} \approx 0.865$$ $$P(X > 1) = \int_1^{\infty} e^{-x} \, dx = e^{-1} \approx 0.368$$ $$P(0.5 \leq X \leq 1.5) = \int_{0.5}^{1.5} e^{-x} \, dx = e^{-0.5} - e^{-1.5} \approx 0.383$$
Note: $$\text{The PDF } f_X(0) = \lambda \text{ can be greater than 1 if } \lambda > 1$$ $$\text{This is okay because PDF is a density, not a probability!}$$

Cumulative Distribution Function (CDF)

📊 CDF - The Universal Description

Works for both: Discrete and continuous random variables

Key property: Always non-decreasing, ranges from 0 to 1

Advantage: Unified way to describe any random variable

CDF Definition
$$F_X(x) = P(X \leq x)$$
Interpretation: $$\text{Probability that } X \text{ is at most } x$$ Properties:
1. Non-decreasing: $$\text{If } x_1 < x_2, \text{ then } F_X(x_1) \leq F_X(x_2)$$ 2. Right-continuous: $$\lim_{h \to 0^+} F_X(x+h) = F_X(x)$$ 3. Limits: $$\lim_{x \to -\infty} F_X(x) = 0 \quad \text{and} \quad \lim_{x \to \infty} F_X(x) = 1$$ 4. Range: $$0 \leq F_X(x) \leq 1 \text{ for all } x$$ Key Insight: $$\text{CDF is defined for ALL random variables (discrete, continuous, or mixed)}$$
Relationship: PMF/PDF ↔ CDF
For Discrete Random Variables:
From PMF to CDF: $$F_X(x) = \sum_{k \leq x} p_X(k)$$ From CDF to PMF: $$p_X(x) = F_X(x) - F_X(x^-)$$ $$\text{where } F_X(x^-) \text{ is the left limit}$$ For Continuous Random Variables:
From PDF to CDF: $$F_X(x) = \int_{-\infty}^x f_X(t) \, dt$$ From CDF to PDF: $$f_X(x) = \frac{dF_X(x)}{dx}$$ $$\text{(derivative of CDF is PDF)}$$

Example: CDF for Fair Die

Random Variable $X$: Outcome of fair six-sided die

PMF: $$p_X(k) = \frac{1}{6} \quad \text{for } k \in \{1,2,3,4,5,6\}$$
CDF: $$F_X(x) = \begin{cases} 0 & \text{if } x < 1 \\ \frac{1}{6} & \text{if } 1 \leq x < 2 \\ \frac{2}{6} & \text{if } 2 \leq x < 3 \\ \frac{3}{6} & \text{if } 3 \leq x < 4 \\ \frac{4}{6} & \text{if } 4 \leq x < 5 \\ \frac{5}{6} & \text{if } 5 \leq x < 6 \\ 1 & \text{if } x \geq 6 \end{cases}$$
$x$ $F_X(x) = P(X \leq x)$ Interpretation
$x = 2.5$ $\frac{2}{6} = 0.333$ Prob. of rolling ≤ 2
$x = 4$ $\frac{4}{6} = 0.667$ Prob. of rolling ≤ 4
$x = 10$ $1$ Certain (all outcomes ≤ 10)
Visualization: $$\text{CDF for discrete variables is a step function with jumps at each possible value}$$ $$\text{Jump size equals PMF at that point}$$

Example: CDF for Uniform[0,1]

Random Variable $X$: Uniform on $[0,1]$

PDF: $$f_X(x) = 1 \quad \text{for } x \in [0,1]$$
CDF: $$F_X(x) = \begin{cases} 0 & \text{if } x < 0 \\ x & \text{if } 0 \leq x \leq 1 \\ 1 & \text{if } x > 1 \end{cases}$$ Derivation for $0 \leq x \leq 1$: $$F_X(x) = \int_{-\infty}^x f_X(t) \, dt = \int_0^x 1 \, dt = x$$
Examples: $$F_X(0.3) = 0.3 \quad \rightarrow \quad \text{30% chance } X \leq 0.3$$ $$F_X(0.7) = 0.7 \quad \rightarrow \quad \text{70% chance } X \leq 0.7$$ $$F_X(-1) = 0 \quad \rightarrow \quad \text{Impossible for } X \leq -1$$ $$F_X(2) = 1 \quad \rightarrow \quad \text{Certain that } X \leq 2$$
Verification (PDF from CDF):
$$f_X(x) = \frac{dF_X(x)}{dx} = \frac{d(x)}{dx} = 1 \quad \text{for } x \in [0,1] \quad \checkmark$$

Using CDF for Probability Calculations

Important Formulas
$$P(a < X \leq b) = F_X(b) - F_X(a)$$ $$P(X > a) = 1 - F_X(a)$$ $$P(X < a) = F_X(a^-)$$
Example with Uniform[0,1]: $$P(0.2 < X \leq 0.7) = F_X(0.7) - F_X(0.2) = 0.7 - 0.2 = 0.5$$ $$P(X > 0.6) = 1 - F_X(0.6) = 1 - 0.6 = 0.4$$ $$P(X \leq 0.25) = F_X(0.25) = 0.25$$

Comparison: Discrete vs Continuous

Property Discrete Continuous
Values Countable set Interval (uncountable)
Distribution PMF (Probability Mass) PDF (Probability Density)
Exact probability $P(X = x) = p_X(x)$ $P(X = x) = 0$
Interval probability $\sum_{k=a}^b p_X(k)$ $\int_a^b f_X(x) \, dx$
Normalization $\sum_{\text{all } x} p_X(x) = 1$ $\int_{-\infty}^{\infty} f_X(x) \, dx = 1$
CDF formula $F_X(x) = \sum_{k \leq x} p_X(k)$ $F_X(x) = \int_{-\infty}^x f_X(t) \, dt$
CDF shape Step function (jumps) Smooth curve
Examples Coin flips, dice, counts Height, time, temperature

Key Insights

Random Variables Map Outcomes to Numbers:
This allows us to use calculus and algebra to analyze random events mathematically.

PMF vs PDF - Critical Difference:
PMF gives exact probabilities ($P(X=x)$), but PDF gives densities. For continuous variables, $P(X=x) = 0$ always!

PDF Can Exceed 1:
Common misconception - PDF values can be > 1 because they are densities, not probabilities. Only integrals (areas) are probabilities.

CDF is Universal:
Works for both discrete and continuous. Always non-decreasing, goes from 0 to 1. You can always recover PMF/PDF from CDF.

Discrete: Sum, Continuous: Integrate:
This is the fundamental difference in calculations. Discrete uses sums ($\sum$), continuous uses integrals ($\int$).

Endpoints Don't Matter for Continuous:
$P(a < X < b) = P(a \leq X \leq b)$ for continuous variables (since $P(X=a) = 0$). This is NOT true for discrete!

CDF Always Defined:
Even for mixed distributions (part discrete, part continuous), CDF is always well-defined. It's the most general description.

Visualization:
PMF: vertical bars at discrete points
PDF: smooth curve, area under curve = probability
CDF (discrete): step function
CDF (continuous): smooth S-shaped curve