04: Probability Distributions

Why Probability Distributions Matter

A probability distribution is a mathematical model that describes how outcomes are spread across possible values.

What It Does Business Application
Models frequency of events Customer arrivals per hour (Poisson)
Quantifies risk Stock returns (Normal / t-distribution)
Enables prediction Default probability (Binomial)
Justifies inference Sample mean behavior (CLT → Normal)

Key idea: Once you identify the right distribution, you unlock a full toolkit of probabilities, expectations, and confidence intervals.

Random Variables: From Outcomes to Numbers

A random variable \(X\) maps sample space outcomes to real numbers.

Type Example Values
Discrete Number of defaults in portfolio 0, 1, 2, …
Continuous Daily stock return Any real number

Notation:

  • \(P(X = x)\) — probability mass (discrete)
  • \(f(x)\) — probability density (continuous)
  • \(F(x) = P(X \leq x)\) — cumulative distribution function (CDF)

Key distinction: For continuous \(X\), \(P(X = x) = 0\) for any single point. We only talk about intervals: \(P(a < X < b) = \int_a^b f(x)\,dx\).

Discrete Distributions: The Probability Mass Function (PMF)

For a discrete random variable \(X\):

\[ \large{ P(X = x_i) = p_i, \quad \sum_i p_i = 1 } \]

Expected Value:

\[ \large{ E[X] = \sum_i x_i \cdot p_i } \]

Variance:

\[ \large{ \text{Var}(X) = E[(X - \mu)^2] = E[X^2] - (E[X])^2 } \]

The Binomial Distribution: Counting Successes

Setting: \(n\) independent trials, each with success probability \(p\).

\[ \large{ P(X = k) = \binom{n}{k}p^k(1-p)^{n-k}, \quad k = 0, 1, \ldots, n } \]

Moments:

  • \(E[X] = np\)
  • \(\text{Var}(X) = np(1-p)\)

Financial Example: Portfolio of 100 bonds, each with 5% default probability:

  • Expected defaults: \(E[X] = 100 \times 0.05 = 5\)
  • Std dev: \(\sqrt{100 \times 0.05 \times 0.95} \approx 2.18\)

Case: Converting Website Visitors to Customers

Setup: 1,000 visitors/day, conversion rate = 3%.

  • Number of conversions \(X \sim \text{Binomial}(n=1000, p=0.03)\)
  • \(E[X] = 30\) conversions
  • \(\text{SD}(X) = \sqrt{1000 \times 0.03 \times 0.97} \approx 5.39\)

Business question: What is \(P(X < 20)\)?

Using the Normal approximation:

\[ Z = \frac{20 - 30}{5.39} \approx -1.86 \implies P(X < 20) \approx 3.1\% \]

Fewer than 20 conversions is a rare event — investigate if it occurs!

The Poisson Distribution: Counting Rare Events

Setting: Events occur randomly at an average rate \(\lambda\) per time interval.

\[ \large{ P(X = k) = \frac{e^{-\lambda}\lambda^k}{k!}, \quad k = 0, 1, 2, \ldots } \]

Remarkable property: \(E[X] = \text{Var}(X) = \lambda\)

Financial Applications:

Event \(\lambda\) (per period)
Customer complaints per day 4.2
Trading system failures per month 1.5
Credit defaults per quarter 2.8

Poisson as the Limit of Binomial

When \(n\) is large, \(p\) is small, and \(\lambda = np\) is moderate:

\[ \large{ \binom{n}{k}p^k(1-p)^{n-k} \xrightarrow{n\to\infty} \frac{e^{-\lambda}\lambda^k}{k!} } \]

The derivation sketch:

  1. \(\binom{n}{k} \approx \frac{n^k}{k!}\) for large \(n\)
  2. \(p^k = \left(\frac{\lambda}{n}\right)^k\)
  3. \((1-p)^{n-k} \approx e^{-\lambda}\)

Rule of thumb: Use Poisson when \(n > 100\) and \(p < 0.01\).

Continuous Distributions: The Normal Distribution

\[ \large{ f(x) = \frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) } \]

The 68-95-99.7 Rule:

Range Probability
\(\mu \pm 1\sigma\) 68.3%
\(\mu \pm 2\sigma\) 95.4%
\(\mu \pm 3\sigma\) 99.7%

Standardization: \(Z = \frac{X - \mu}{\sigma} \sim N(0,1)\)

Why the Normal Distribution Dominates Statistics

Four reasons:

  1. The Central Limit Theorem (coming soon) — sample means are approximately normal
  2. Mathematical convenience — closed-form likelihood, conjugate prior
  3. Maximum entropy — among all distributions with given mean and variance, the normal has maximum entropy (least assumptions)
  4. Historical momentum — Gauss used it; it became the default

But beware: Stock returns are NOT truly normal (fat tails, skewness). Using normal models for risk management led to massive underestimation of tail risk in 2008.

The Exponential Distribution: Time Between Events

\[ \large{ f(x) = \lambda e^{-\lambda x}, \quad x \geq 0 } \]

Moments: \(E[X] = \frac{1}{\lambda}\), \(\text{Var}(X) = \frac{1}{\lambda^2}\)

Memoryless Property:

\[ \large{ P(X > s + t \mid X > s) = P(X > t) } \]

“Given you’ve already waited \(s\) minutes, the probability of waiting at least \(t\) more is the same as if you just started.”

Application: Time between customer arrivals, system failures, or transactions.

The Central Limit Theorem: The Most Important Theorem in Statistics

Statement: If \(X_1, X_2, \ldots, X_n\) are i.i.d. with mean \(\mu\) and variance \(\sigma^2\), then as \(n \to \infty\):

\[ \large{ \frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \xrightarrow{d} N(0,1) } \]

In words: The sample mean is approximately normal for large \(n\), regardless of the original distribution.

This is revolutionary: You don’t need to know the population distribution to do inference!

CLT: Visual Intuition

Central Limit Theorem Demonstration Three rows showing how sample means become normal regardless of source distribution CLT: Sample Means Converge to Normal Source Distribution n = 5 n = 30 Uniform Exponential Bimodal All converge to the bell curve as n increases!

How Large Must \(n\) Be? Practical Guidelines

Source Population Shape Minimum \(n\) for CLT
Symmetric (e.g., uniform) ≥ 15
Moderately skewed ≥ 30
Heavily skewed / outliers ≥ 50–100

Financial data warning: Stock returns have fat tails, so:

  • Daily returns → \(n \geq 50\) recommended
  • Monthly returns → \(n \geq 30\) usually sufficient
  • For extreme quantiles (VaR) → CLT is inadequate; use Bootstrap

‘Dirty Work’: Mediocristan vs. Extremistan

Nassim Taleb’s classification of random phenomena:

Characteristic Mediocristan Extremistan
Tail behavior Thin (exponential decay) Fat (power law)
Extreme events Negligible impact Dominates total
CLT applies? Yes No (or slowly)
Example Height, weight, IQ Wealth, city size, book sales
Financial analog Interest on savings Venture capital returns

The 80/20 Rule: In Extremistan, 20% of causes produce 80% of effects. The top 1% of stocks drive a disproportionate share of index returns.

Sampling Distribution of the Sample Mean

If \(X_1, \ldots, X_n \overset{iid}{\sim} (\mu, \sigma^2)\), then the sample mean:

\[ \large{ \bar{X} \sim \left(\mu, \frac{\sigma^2}{n}\right) } \]

Key implications:

  1. Unbiased: \(E[\bar{X}] = \mu\) — the sample mean targets the population mean
  2. Precision increases with \(n\): \(\text{SE}(\bar{X}) = \frac{\sigma}{\sqrt{n}}\)
  3. To halve the standard error, you need 4× the sample size

The Chi-Square Distribution

If \(Z_1, \ldots, Z_k \overset{iid}{\sim} N(0,1)\):

\[ \large{ \chi^2_k = Z_1^2 + Z_2^2 + \cdots + Z_k^2 } \]

Key properties:

  • \(E[\chi^2_k] = k\), \(\text{Var}(\chi^2_k) = 2k\)
  • Right-skewed, but approaches normality as \(k\) increases
  • Application: testing variance, goodness-of-fit tests

Connection to variance:

\[ \frac{(n-1)s^2}{\sigma^2} \sim \chi^2_{n-1} \]

The t-Distribution: For Small Samples with Unknown Variance

\[ \large{ t_\nu = \frac{Z}{\sqrt{\chi^2_\nu / \nu}}, \quad Z \sim N(0,1), \quad \chi^2_\nu \text{ independent} } \]

Compared to the Normal:

Feature Normal t-distribution
Tails Thin Heavier
Shape parameter None \(\nu\) (degrees of freedom)
As \(\nu \to \infty\) Converges to Normal
Use when \(\sigma\) known or \(n\) large \(\sigma\) unknown and \(n\) small

Rule of thumb: Use \(t\) when \(n < 30\) and population variance is unknown (most real situations).

The F-Distribution: Comparing Two Variances

\[ \large{ F_{\nu_1,\nu_2} = \frac{\chi^2_{\nu_1}/\nu_1}{\chi^2_{\nu_2}/\nu_2} } \]

Key application: Testing whether two groups have equal variance

\[ F = \frac{s_1^2}{s_2^2} \sim F_{\nu_1, \nu_2} \]

The F-distribution also appears in:

  • ANOVA F-test (Chapter 9)
  • Regression overall significance test (Chapter 8)
  • Any hypothesis testing that compares variance ratios

The St. Petersburg Paradox: When Expected Value Breaks Down

The game: Flip a coin until heads. If heads appears on flip \(n\), you win \(2^n\) dollars.

Expected payoff:

\[ E[X] = \sum_{n=1}^{\infty}\frac{1}{2^n}\cdot 2^n = \sum_{n=1}^{\infty}1 = \infty \]

Yet nobody would pay more than ~$20 to play!

Resolution (Daniel Bernoulli, 1738): Use logarithmic utility \(U(x) = \ln(x)\):

\[ E[U(X)] = \sum_{n=1}^{\infty}\frac{1}{2^n}\cdot \ln(2^n) = \ln 2 \sum_{n=1}^{\infty}\frac{n}{2^n} = 2\ln 2 \approx 1.39 \]

This equals about $4 in certainty equivalent — explaining the paradox.

Benford’s Law: A Tool for Fraud Detection

Claim: In many natural datasets, the leading digit \(d\) follows:

\[ \large{ P(\text{first digit} = d) = \log_{10}\left(1 + \frac{1}{d}\right) } \]

Digit 1 2 3 4 5 6 7 8 9
Prob 30.1% 17.6% 12.5% 9.7% 7.9% 6.7% 5.8% 5.1% 4.6%

Application: If financial statements deviate significantly from Benford’s Law, it may indicate data fabrication. Auditors use this as a screening tool.

Chapter 4 Summary

Discrete Distributions:

  • Binomial (fixed trials, success counting) and Poisson (rare events, no fixed \(n\))

Continuous Distributions:

  • Normal (ubiquitous, CLT foundation), Exponential (waiting times)

The Central Limit Theorem:

  • Sample means → Normal, regardless of source — the foundation of all inference

Sampling Distributions:

  • \(\chi^2\) (variance testing), \(t\) (mean testing, small \(n\)), \(F\) (variance comparison, ANOVA)

Key Warning: CLT may not apply in Extremistan — fat-tailed data requires special methods.