03: Probability Basics

Why Probability Matters in Business and Finance

Probability is the mathematical language of uncertainty.

Application	Probability Concept
Credit risk models	P(Default) — conditional probability
Option pricing	Risk-neutral probability
Portfolio risk	Joint probability of losses
A/B testing	P(result due to chance)
Insurance pricing	Expected loss = P(event) × severity

Every quantitative decision under uncertainty relies on probability theory.

Three Definitions of Probability

Approach	Formula	Best For
Classical	\(P(A) = \frac{\|A\|}{\|\Omega\|}\)	Equally likely outcomes (dice, cards)
Frequentist	\(P(A) = \lim_{n\to\infty}\frac{n_A}{n}\)	Repeatable experiments (quality control)
Subjective	Expert belief in \([0,1]\)	Unique events (startup valuation)

Key Insight: In finance, all three coexist:

Classical → lottery pricing
Frequentist → historical default rates
Subjective → “What is the probability of a trade war?”

Kolmogorov’s Axioms: The Foundation of All Probability

Given sample space \(\Omega\) and event \(A\):

Axiom 1 (Non-negativity): \(P(A) \geq 0\)

Axiom 2 (Normalization): \(P(\Omega) = 1\)

Axiom 3 (Countable Additivity): For mutually exclusive events \(A_1, A_2, \ldots\)

\[ \large{ P\left(\bigcup_{i=1}^{\infty}A_i\right) = \sum_{i=1}^{\infty}P(A_i) } \]

These three axioms are all we need. Every probability rule we’ll learn is derived from these.

Complement Rule and Addition Rule

Complement Rule:

\[ \large{ P(A^c) = 1 - P(A) } \]

“The probability something doesn’t happen = 1 minus the probability it does.”

General Addition Rule:

\[ \large{ P(A \cup B) = P(A) + P(B) - P(A \cap B) } \]

The subtraction avoids double-counting the overlap.

Special case: If \(A\) and \(B\) are mutually exclusive, \(P(A \cap B) = 0\), so \(P(A \cup B) = P(A) + P(B)\).

Conditional Probability: Updating Beliefs with New Information

\[ \large{ P(A|B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0 } \]

Intuition: “Given that \(B\) has occurred, what fraction of \(B\)’s probability also belongs to \(A\)?”

Banking Example:

Universe: 10,000 loan applications
200 eventually default (\(D\))
500 flagged as high-risk (\(H\))
150 are both high-risk AND default

\[ P(D|H) = \frac{150/10000}{500/10000} = \frac{150}{500} = 30\% \]

30% of flagged applicants default — vs. only 2% overall.

Independence: When Knowing One Thing Tells You Nothing About Another

Events \(A\) and \(B\) are independent if and only if:

\[ \large{ P(A \cap B) = P(A) \cdot P(B) } \]

Equivalently: \(P(A|B) = P(A)\) — knowing \(B\) doesn’t change your belief about \(A\).

Critical distinction:

Concept	Definition	Example
Independent	\(P(A \cap B) = P(A)P(B)\)	Two unrelated stocks (maybe)
Mutually exclusive	\(P(A \cap B) = 0\)	Heads and tails on same flip

These are almost opposites! If \(A\) and \(B\) are mutually exclusive (and both have \(P > 0\)), they cannot be independent.

‘Dirty Work’: Independence Breaks Down in Crises

Normal markets: Correlations between stocks ≈ 0.3 → assets appear nearly independent.

Crisis (2008, 2020): Correlations spike → 0.8+ → everything falls together.

Implication:

Portfolio diversification that “works” in normal times fails when you need it most
This is why stress testing uses crisis correlations, not average correlations
Models that assume constant independence (like early CDO pricing) can be catastrophically wrong

‘Dirty Work’: Fat Tails — Theory vs. Reality

Normal distribution predicts: Events beyond 3σ occur with probability 0.27%

Empirical reality (A-share data):

Threshold	Theoretical (Normal)	Empirical	Ratio
\(\|r\| > 3\sigma\)	0.27%	1.54%	5.7×

The real market produces extreme events ~6× more often than the bell curve predicts.

This is why risk managers add fat-tail adjustments to VaR models and why the 2008 crisis was a “25-sigma event” under normal assumptions — essentially impossible, yet it happened.

The Law of Total Probability

If \(B_1, B_2, \ldots, B_k\) form a partition of \(\Omega\):

\[ \large{ P(A) = \sum_{j=1}^{k}P(A|B_j) \cdot P(B_j) } \]

Intuition: Break a complex probability into simpler conditional pieces.

Marketing Example:

Channel email (\(B_1\), 40% of traffic), social (\(B_2\), 35%), search (\(B_3\), 25%)
Conversion rates: \(P(\text{buy}|B_1) = 5\%\), \(P(\text{buy}|B_2) = 3\%\), \(P(\text{buy}|B_3) = 8\%\)

\[ P(\text{buy}) = 0.05 \times 0.4 + 0.03 \times 0.35 + 0.08 \times 0.25 = 5.05\% \]

Bayes’ Theorem: Reversing Conditional Probability

\[ \large{ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} } \]

Or in the expanded form using Total Probability:

\[ \large{ P(A_i|B) = \frac{P(B|A_i) \cdot P(A_i)}{\sum_{j=1}^k P(B|A_j) \cdot P(A_j)} } \]

The key idea:

\(P(A)\): Prior — your belief before seeing evidence
\(P(B|A)\): Likelihood — how probable is the evidence given your belief
\(P(A|B)\): Posterior — updated belief after seeing evidence

Case: Financial Fraud Detection (Base Rate Fallacy)

A fraud detection system has:

Sensitivity: \(P(\text{Alert}|\text{Fraud}) = 99\%\) (catches 99% of frauds)
False positive rate: \(P(\text{Alert}|\text{No Fraud}) = 5\%\)
Base rate: \(P(\text{Fraud}) = 1\%\)

Question: If the alarm triggers, what is \(P(\text{Fraud}|\text{Alert})\)?

\[ P(\text{Fraud}|\text{Alert}) = \frac{0.99 \times 0.01}{0.99 \times 0.01 + 0.05 \times 0.99} = \frac{0.0099}{0.0594} \approx \mathbf{16.7\%} \]

83% of alerts are false alarms! The low base rate dominates.

Base Rate Fallacy: Why It’s Dangerous in Practice

Lesson: Even a 99%-accurate test produces mostly false positives when the event is rare.

Probability Trees for Sequential Decisions

A probability tree maps out all outcomes of a multi-stage process.

Example: Two-stage Marketing Funnel

Stage 1: Customer opens email → P(open) = 40%
Stage 2: Given opened, customer clicks → P(click|open) = 15%

\[ P(\text{click}) = P(\text{open}) \times P(\text{click}|\text{open}) = 0.40 \times 0.15 = 6\% \]

Probability trees make Total Probability and Bayes’ Theorem computations visual and intuitive.

The Monty Hall Problem: Why Intuition Fails

Setup: 3 doors — 1 car, 2 goats. You pick a door. The host opens another door (always a goat). Should you switch?

Strategy	Probability of Winning
Stay with original choice	1/3
Switch to other door	2/3

Why switch wins 2/3 of the time:

If you initially picked a goat (2/3 probability) → switching wins
If you initially picked the car (1/3 probability) → switching loses
Since you’re more likely to have picked a goat, switching is optimal

The host’s action gives you information — ignoring it is irrational.

The Birthday Paradox: How Many People for a Match?

Question: How many people needed for a >50% chance that two share a birthday?

Answer: Only 23 people!

With just 23 people, there are \(\binom{23}{2} = 253\) possible pairs. The probability of no matches:

\[ P(\text{no match}) = \frac{365}{365} \times \frac{364}{365} \times \cdots \times \frac{343}{365} \approx 0.493 \]

So \(P(\text{at least one match}) \approx 50.7\%\).

Why surprising: We think about pairs involving ourselves (\(n-1\)), but should count all \(\binom{n}{2}\) pairs.

Berkson’s Paradox: Selection Bias Creates False Correlations

Observation: Among admitted students, SAT scores and GPA appear negatively correlated.

Reality: In the general population, they’re slightly positively correlated!

Mechanism: Admission requires either high SAT or high GPA (or both). By conditioning on admission:

High-SAT admits don’t need high GPAs
High-GPA admits don’t need high SATs

In finance: Among observed (survived) hedge funds, returns and AUM may show spurious negative correlation because both helped them survive.

Chapter 3 Summary

Core Framework:

Three probability definitions → Classical, Frequentist, Subjective
Kolmogorov’s three axioms → foundation for everything

Essential Rules:

Complement, Addition, Conditional Probability, Independence
Total Probability → decompose complex events
Bayes’ Theorem → update beliefs with evidence

Practical Warnings:

Independence breaks down in crises → tail risk is correlated
Fat tails: extreme events are ~6× more common than normal theory predicts
Base Rate Fallacy: high-accuracy tests still produce mostly false alarms for rare events