Probability is the mathematical language of uncertainty.
| Application | Probability Concept |
|---|---|
| Credit risk models | P(Default) — conditional probability |
| Option pricing | Risk-neutral probability |
| Portfolio risk | Joint probability of losses |
| A/B testing | P(result due to chance) |
| Insurance pricing | Expected loss = P(event) × severity |
Every quantitative decision under uncertainty relies on probability theory.
| Approach | Formula | Best For |
|---|---|---|
| Classical | \(P(A) = \frac{|A|}{|\Omega|}\) | Equally likely outcomes (dice, cards) |
| Frequentist | \(P(A) = \lim_{n\to\infty}\frac{n_A}{n}\) | Repeatable experiments (quality control) |
| Subjective | Expert belief in \([0,1]\) | Unique events (startup valuation) |
Key Insight: In finance, all three coexist:
Given sample space \(\Omega\) and event \(A\):
Axiom 1 (Non-negativity): \(P(A) \geq 0\)
Axiom 2 (Normalization): \(P(\Omega) = 1\)
Axiom 3 (Countable Additivity): For mutually exclusive events \(A_1, A_2, \ldots\)
\[ \large{ P\left(\bigcup_{i=1}^{\infty}A_i\right) = \sum_{i=1}^{\infty}P(A_i) } \]
These three axioms are all we need. Every probability rule we’ll learn is derived from these.
Complement Rule:
\[ \large{ P(A^c) = 1 - P(A) } \]
“The probability something doesn’t happen = 1 minus the probability it does.”
General Addition Rule:
\[ \large{ P(A \cup B) = P(A) + P(B) - P(A \cap B) } \]
The subtraction avoids double-counting the overlap.
Special case: If \(A\) and \(B\) are mutually exclusive, \(P(A \cap B) = 0\), so \(P(A \cup B) = P(A) + P(B)\).
\[ \large{ P(A|B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0 } \]
Intuition: “Given that \(B\) has occurred, what fraction of \(B\)’s probability also belongs to \(A\)?”
Banking Example:
\[ P(D|H) = \frac{150/10000}{500/10000} = \frac{150}{500} = 30\% \]
30% of flagged applicants default — vs. only 2% overall.
Events \(A\) and \(B\) are independent if and only if:
\[ \large{ P(A \cap B) = P(A) \cdot P(B) } \]
Equivalently: \(P(A|B) = P(A)\) — knowing \(B\) doesn’t change your belief about \(A\).
Critical distinction:
| Concept | Definition | Example |
|---|---|---|
| Independent | \(P(A \cap B) = P(A)P(B)\) | Two unrelated stocks (maybe) |
| Mutually exclusive | \(P(A \cap B) = 0\) | Heads and tails on same flip |
These are almost opposites! If \(A\) and \(B\) are mutually exclusive (and both have \(P > 0\)), they cannot be independent.
Normal markets: Correlations between stocks ≈ 0.3 → assets appear nearly independent.
Crisis (2008, 2020): Correlations spike → 0.8+ → everything falls together.
Implication:
Normal distribution predicts: Events beyond 3σ occur with probability 0.27%
Empirical reality (A-share data):
| Threshold | Theoretical (Normal) | Empirical | Ratio |
|---|---|---|---|
| \(|r| > 3\sigma\) | 0.27% | 1.54% | 5.7× |
The real market produces extreme events ~6× more often than the bell curve predicts.
This is why risk managers add fat-tail adjustments to VaR models and why the 2008 crisis was a “25-sigma event” under normal assumptions — essentially impossible, yet it happened.
If \(B_1, B_2, \ldots, B_k\) form a partition of \(\Omega\):
\[ \large{ P(A) = \sum_{j=1}^{k}P(A|B_j) \cdot P(B_j) } \]
Intuition: Break a complex probability into simpler conditional pieces.
Marketing Example:
\[ P(\text{buy}) = 0.05 \times 0.4 + 0.03 \times 0.35 + 0.08 \times 0.25 = 5.05\% \]
\[ \large{ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} } \]
Or in the expanded form using Total Probability:
\[ \large{ P(A_i|B) = \frac{P(B|A_i) \cdot P(A_i)}{\sum_{j=1}^k P(B|A_j) \cdot P(A_j)} } \]
The key idea:
A fraud detection system has:
Question: If the alarm triggers, what is \(P(\text{Fraud}|\text{Alert})\)?
\[ P(\text{Fraud}|\text{Alert}) = \frac{0.99 \times 0.01}{0.99 \times 0.01 + 0.05 \times 0.99} = \frac{0.0099}{0.0594} \approx \mathbf{16.7\%} \]
83% of alerts are false alarms! The low base rate dominates.
Lesson: Even a 99%-accurate test produces mostly false positives when the event is rare.
A probability tree maps out all outcomes of a multi-stage process.
Example: Two-stage Marketing Funnel
\[ P(\text{click}) = P(\text{open}) \times P(\text{click}|\text{open}) = 0.40 \times 0.15 = 6\% \]
Probability trees make Total Probability and Bayes’ Theorem computations visual and intuitive.
Setup: 3 doors — 1 car, 2 goats. You pick a door. The host opens another door (always a goat). Should you switch?
| Strategy | Probability of Winning |
|---|---|
| Stay with original choice | 1/3 |
| Switch to other door | 2/3 |
Why switch wins 2/3 of the time:
The host’s action gives you information — ignoring it is irrational.
Question: How many people needed for a >50% chance that two share a birthday?
Answer: Only 23 people!
With just 23 people, there are \(\binom{23}{2} = 253\) possible pairs. The probability of no matches:
\[ P(\text{no match}) = \frac{365}{365} \times \frac{364}{365} \times \cdots \times \frac{343}{365} \approx 0.493 \]
So \(P(\text{at least one match}) \approx 50.7\%\).
Why surprising: We think about pairs involving ourselves (\(n-1\)), but should count all \(\binom{n}{2}\) pairs.
Observation: Among admitted students, SAT scores and GPA appear negatively correlated.
Reality: In the general population, they’re slightly positively correlated!
Mechanism: Admission requires either high SAT or high GPA (or both). By conditioning on admission:
In finance: Among observed (survived) hedge funds, returns and AUM may show spurious negative correlation because both helped them survive.
Core Framework:
Essential Rules:
Practical Warnings: