A probability distribution is a mathematical model that describes how outcomes are spread across possible values.
| What It Does | Business Application |
|---|---|
| Models frequency of events | Customer arrivals per hour (Poisson) |
| Quantifies risk | Stock returns (Normal / t-distribution) |
| Enables prediction | Default probability (Binomial) |
| Justifies inference | Sample mean behavior (CLT → Normal) |
Key idea: Once you identify the right distribution, you unlock a full toolkit of probabilities, expectations, and confidence intervals.
A random variable \(X\) maps sample space outcomes to real numbers.
| Type | Example | Values |
|---|---|---|
| Discrete | Number of defaults in portfolio | 0, 1, 2, … |
| Continuous | Daily stock return | Any real number |
Notation:
Key distinction: For continuous \(X\), \(P(X = x) = 0\) for any single point. We only talk about intervals: \(P(a < X < b) = \int_a^b f(x)\,dx\).
For a discrete random variable \(X\):
\[ \large{ P(X = x_i) = p_i, \quad \sum_i p_i = 1 } \]
Expected Value:
\[ \large{ E[X] = \sum_i x_i \cdot p_i } \]
Variance:
\[ \large{ \text{Var}(X) = E[(X - \mu)^2] = E[X^2] - (E[X])^2 } \]
Setting: \(n\) independent trials, each with success probability \(p\).
\[ \large{ P(X = k) = \binom{n}{k}p^k(1-p)^{n-k}, \quad k = 0, 1, \ldots, n } \]
Moments:
Financial Example: Portfolio of 100 bonds, each with 5% default probability:
Setup: 1,000 visitors/day, conversion rate = 3%.
Business question: What is \(P(X < 20)\)?
Using the Normal approximation:
\[ Z = \frac{20 - 30}{5.39} \approx -1.86 \implies P(X < 20) \approx 3.1\% \]
Fewer than 20 conversions is a rare event — investigate if it occurs!
Setting: Events occur randomly at an average rate \(\lambda\) per time interval.
\[ \large{ P(X = k) = \frac{e^{-\lambda}\lambda^k}{k!}, \quad k = 0, 1, 2, \ldots } \]
Remarkable property: \(E[X] = \text{Var}(X) = \lambda\)
Financial Applications:
| Event | \(\lambda\) (per period) |
|---|---|
| Customer complaints per day | 4.2 |
| Trading system failures per month | 1.5 |
| Credit defaults per quarter | 2.8 |
When \(n\) is large, \(p\) is small, and \(\lambda = np\) is moderate:
\[ \large{ \binom{n}{k}p^k(1-p)^{n-k} \xrightarrow{n\to\infty} \frac{e^{-\lambda}\lambda^k}{k!} } \]
The derivation sketch:
Rule of thumb: Use Poisson when \(n > 100\) and \(p < 0.01\).
\[ \large{ f(x) = \frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) } \]
The 68-95-99.7 Rule:
| Range | Probability |
|---|---|
| \(\mu \pm 1\sigma\) | 68.3% |
| \(\mu \pm 2\sigma\) | 95.4% |
| \(\mu \pm 3\sigma\) | 99.7% |
Standardization: \(Z = \frac{X - \mu}{\sigma} \sim N(0,1)\)
Four reasons:
But beware: Stock returns are NOT truly normal (fat tails, skewness). Using normal models for risk management led to massive underestimation of tail risk in 2008.
\[ \large{ f(x) = \lambda e^{-\lambda x}, \quad x \geq 0 } \]
Moments: \(E[X] = \frac{1}{\lambda}\), \(\text{Var}(X) = \frac{1}{\lambda^2}\)
Memoryless Property:
\[ \large{ P(X > s + t \mid X > s) = P(X > t) } \]
“Given you’ve already waited \(s\) minutes, the probability of waiting at least \(t\) more is the same as if you just started.”
Application: Time between customer arrivals, system failures, or transactions.
Statement: If \(X_1, X_2, \ldots, X_n\) are i.i.d. with mean \(\mu\) and variance \(\sigma^2\), then as \(n \to \infty\):
\[ \large{ \frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \xrightarrow{d} N(0,1) } \]
In words: The sample mean is approximately normal for large \(n\), regardless of the original distribution.
This is revolutionary: You don’t need to know the population distribution to do inference!
| Source Population Shape | Minimum \(n\) for CLT |
|---|---|
| Symmetric (e.g., uniform) | ≥ 15 |
| Moderately skewed | ≥ 30 |
| Heavily skewed / outliers | ≥ 50–100 |
Financial data warning: Stock returns have fat tails, so:
Nassim Taleb’s classification of random phenomena:
| Characteristic | Mediocristan | Extremistan |
|---|---|---|
| Tail behavior | Thin (exponential decay) | Fat (power law) |
| Extreme events | Negligible impact | Dominates total |
| CLT applies? | Yes | No (or slowly) |
| Example | Height, weight, IQ | Wealth, city size, book sales |
| Financial analog | Interest on savings | Venture capital returns |
The 80/20 Rule: In Extremistan, 20% of causes produce 80% of effects. The top 1% of stocks drive a disproportionate share of index returns.
If \(X_1, \ldots, X_n \overset{iid}{\sim} (\mu, \sigma^2)\), then the sample mean:
\[ \large{ \bar{X} \sim \left(\mu, \frac{\sigma^2}{n}\right) } \]
Key implications:
If \(Z_1, \ldots, Z_k \overset{iid}{\sim} N(0,1)\):
\[ \large{ \chi^2_k = Z_1^2 + Z_2^2 + \cdots + Z_k^2 } \]
Key properties:
Connection to variance:
\[ \frac{(n-1)s^2}{\sigma^2} \sim \chi^2_{n-1} \]
\[ \large{ t_\nu = \frac{Z}{\sqrt{\chi^2_\nu / \nu}}, \quad Z \sim N(0,1), \quad \chi^2_\nu \text{ independent} } \]
Compared to the Normal:
| Feature | Normal | t-distribution |
|---|---|---|
| Tails | Thin | Heavier |
| Shape parameter | None | \(\nu\) (degrees of freedom) |
| As \(\nu \to \infty\) | — | Converges to Normal |
| Use when | \(\sigma\) known or \(n\) large | \(\sigma\) unknown and \(n\) small |
Rule of thumb: Use \(t\) when \(n < 30\) and population variance is unknown (most real situations).
\[ \large{ F_{\nu_1,\nu_2} = \frac{\chi^2_{\nu_1}/\nu_1}{\chi^2_{\nu_2}/\nu_2} } \]
Key application: Testing whether two groups have equal variance
\[ F = \frac{s_1^2}{s_2^2} \sim F_{\nu_1, \nu_2} \]
The F-distribution also appears in:
The game: Flip a coin until heads. If heads appears on flip \(n\), you win \(2^n\) dollars.
Expected payoff:
\[ E[X] = \sum_{n=1}^{\infty}\frac{1}{2^n}\cdot 2^n = \sum_{n=1}^{\infty}1 = \infty \]
Yet nobody would pay more than ~$20 to play!
Resolution (Daniel Bernoulli, 1738): Use logarithmic utility \(U(x) = \ln(x)\):
\[ E[U(X)] = \sum_{n=1}^{\infty}\frac{1}{2^n}\cdot \ln(2^n) = \ln 2 \sum_{n=1}^{\infty}\frac{n}{2^n} = 2\ln 2 \approx 1.39 \]
This equals about $4 in certainty equivalent — explaining the paradox.
Claim: In many natural datasets, the leading digit \(d\) follows:
\[ \large{ P(\text{first digit} = d) = \log_{10}\left(1 + \frac{1}{d}\right) } \]
| Digit | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|
| Prob | 30.1% | 17.6% | 12.5% | 9.7% | 7.9% | 6.7% | 5.8% | 5.1% | 4.6% |
Application: If financial statements deviate significantly from Benford’s Law, it may indicate data fabrication. Auditors use this as a screening tool.
Discrete Distributions:
Continuous Distributions:
The Central Limit Theorem:
Sampling Distributions:
Key Warning: CLT may not apply in Extremistan — fat-tailed data requires special methods.