03: Probability Basics

Why Probability Matters in Business and Finance

Probability is the mathematical language of uncertainty.

Application Probability Concept
Credit risk models P(Default) — conditional probability
Option pricing Risk-neutral probability
Portfolio risk Joint probability of losses
A/B testing P(result due to chance)
Insurance pricing Expected loss = P(event) × severity

Every quantitative decision under uncertainty relies on probability theory.

Three Definitions of Probability

Approach Formula Best For
Classical \(P(A) = \frac{|A|}{|\Omega|}\) Equally likely outcomes (dice, cards)
Frequentist \(P(A) = \lim_{n\to\infty}\frac{n_A}{n}\) Repeatable experiments (quality control)
Subjective Expert belief in \([0,1]\) Unique events (startup valuation)

Key Insight: In finance, all three coexist:

  • Classical → lottery pricing
  • Frequentist → historical default rates
  • Subjective → “What is the probability of a trade war?”

Kolmogorov’s Axioms: The Foundation of All Probability

Given sample space \(\Omega\) and event \(A\):

Axiom 1 (Non-negativity): \(P(A) \geq 0\)

Axiom 2 (Normalization): \(P(\Omega) = 1\)

Axiom 3 (Countable Additivity): For mutually exclusive events \(A_1, A_2, \ldots\)

\[ \large{ P\left(\bigcup_{i=1}^{\infty}A_i\right) = \sum_{i=1}^{\infty}P(A_i) } \]

These three axioms are all we need. Every probability rule we’ll learn is derived from these.

Complement Rule and Addition Rule

Complement Rule:

\[ \large{ P(A^c) = 1 - P(A) } \]

“The probability something doesn’t happen = 1 minus the probability it does.”

General Addition Rule:

\[ \large{ P(A \cup B) = P(A) + P(B) - P(A \cap B) } \]

The subtraction avoids double-counting the overlap.

Special case: If \(A\) and \(B\) are mutually exclusive, \(P(A \cap B) = 0\), so \(P(A \cup B) = P(A) + P(B)\).

Conditional Probability: Updating Beliefs with New Information

\[ \large{ P(A|B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0 } \]

Intuition: “Given that \(B\) has occurred, what fraction of \(B\)’s probability also belongs to \(A\)?”

Banking Example:

  • Universe: 10,000 loan applications
  • 200 eventually default (\(D\))
  • 500 flagged as high-risk (\(H\))
  • 150 are both high-risk AND default

\[ P(D|H) = \frac{150/10000}{500/10000} = \frac{150}{500} = 30\% \]

30% of flagged applicants default — vs. only 2% overall.

Independence: When Knowing One Thing Tells You Nothing About Another

Events \(A\) and \(B\) are independent if and only if:

\[ \large{ P(A \cap B) = P(A) \cdot P(B) } \]

Equivalently: \(P(A|B) = P(A)\) — knowing \(B\) doesn’t change your belief about \(A\).

Critical distinction:

Concept Definition Example
Independent \(P(A \cap B) = P(A)P(B)\) Two unrelated stocks (maybe)
Mutually exclusive \(P(A \cap B) = 0\) Heads and tails on same flip

These are almost opposites! If \(A\) and \(B\) are mutually exclusive (and both have \(P > 0\)), they cannot be independent.

‘Dirty Work’: Independence Breaks Down in Crises

Normal markets: Correlations between stocks ≈ 0.3 → assets appear nearly independent.

Crisis (2008, 2020): Correlations spike → 0.8+ → everything falls together.

Implication:

  • Portfolio diversification that “works” in normal times fails when you need it most
  • This is why stress testing uses crisis correlations, not average correlations
  • Models that assume constant independence (like early CDO pricing) can be catastrophically wrong

‘Dirty Work’: Fat Tails — Theory vs. Reality

Normal distribution predicts: Events beyond 3σ occur with probability 0.27%

Empirical reality (A-share data):

Threshold Theoretical (Normal) Empirical Ratio
\(|r| > 3\sigma\) 0.27% 1.54% 5.7×

The real market produces extreme events ~6× more often than the bell curve predicts.

This is why risk managers add fat-tail adjustments to VaR models and why the 2008 crisis was a “25-sigma event” under normal assumptions — essentially impossible, yet it happened.

The Law of Total Probability

If \(B_1, B_2, \ldots, B_k\) form a partition of \(\Omega\):

\[ \large{ P(A) = \sum_{j=1}^{k}P(A|B_j) \cdot P(B_j) } \]

Intuition: Break a complex probability into simpler conditional pieces.

Marketing Example:

  • Channel email (\(B_1\), 40% of traffic), social (\(B_2\), 35%), search (\(B_3\), 25%)
  • Conversion rates: \(P(\text{buy}|B_1) = 5\%\), \(P(\text{buy}|B_2) = 3\%\), \(P(\text{buy}|B_3) = 8\%\)

\[ P(\text{buy}) = 0.05 \times 0.4 + 0.03 \times 0.35 + 0.08 \times 0.25 = 5.05\% \]

Bayes’ Theorem: Reversing Conditional Probability

\[ \large{ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} } \]

Or in the expanded form using Total Probability:

\[ \large{ P(A_i|B) = \frac{P(B|A_i) \cdot P(A_i)}{\sum_{j=1}^k P(B|A_j) \cdot P(A_j)} } \]

The key idea:

  • \(P(A)\): Prior — your belief before seeing evidence
  • \(P(B|A)\): Likelihood — how probable is the evidence given your belief
  • \(P(A|B)\): Posterior — updated belief after seeing evidence

Case: Financial Fraud Detection (Base Rate Fallacy)

A fraud detection system has:

  • Sensitivity: \(P(\text{Alert}|\text{Fraud}) = 99\%\) (catches 99% of frauds)
  • False positive rate: \(P(\text{Alert}|\text{No Fraud}) = 5\%\)
  • Base rate: \(P(\text{Fraud}) = 1\%\)

Question: If the alarm triggers, what is \(P(\text{Fraud}|\text{Alert})\)?

\[ P(\text{Fraud}|\text{Alert}) = \frac{0.99 \times 0.01}{0.99 \times 0.01 + 0.05 \times 0.99} = \frac{0.0099}{0.0594} \approx \mathbf{16.7\%} \]

83% of alerts are false alarms! The low base rate dominates.

Base Rate Fallacy: Why It’s Dangerous in Practice

Base Rate Fallacy Illustration Shows how 10,000 cases split into fraud and non-fraud, and the resulting alert outcomes 10,000 Cases 100 Fraud (1%) 9,900 Non-Fraud (99%) 99 Alerts (TP) 495 Alerts (FP) 594 Total Alerts P(Fraud|Alert) = 99/594 = 16.7%

Lesson: Even a 99%-accurate test produces mostly false positives when the event is rare.

Probability Trees for Sequential Decisions

A probability tree maps out all outcomes of a multi-stage process.

Example: Two-stage Marketing Funnel

  • Stage 1: Customer opens email → P(open) = 40%
  • Stage 2: Given opened, customer clicks → P(click|open) = 15%

\[ P(\text{click}) = P(\text{open}) \times P(\text{click}|\text{open}) = 0.40 \times 0.15 = 6\% \]

Probability trees make Total Probability and Bayes’ Theorem computations visual and intuitive.

The Monty Hall Problem: Why Intuition Fails

Setup: 3 doors — 1 car, 2 goats. You pick a door. The host opens another door (always a goat). Should you switch?

Strategy Probability of Winning
Stay with original choice 1/3
Switch to other door 2/3

Why switch wins 2/3 of the time:

  • If you initially picked a goat (2/3 probability) → switching wins
  • If you initially picked the car (1/3 probability) → switching loses
  • Since you’re more likely to have picked a goat, switching is optimal

The host’s action gives you information — ignoring it is irrational.

The Birthday Paradox: How Many People for a Match?

Question: How many people needed for a >50% chance that two share a birthday?

Answer: Only 23 people!

With just 23 people, there are \(\binom{23}{2} = 253\) possible pairs. The probability of no matches:

\[ P(\text{no match}) = \frac{365}{365} \times \frac{364}{365} \times \cdots \times \frac{343}{365} \approx 0.493 \]

So \(P(\text{at least one match}) \approx 50.7\%\).

Why surprising: We think about pairs involving ourselves (\(n-1\)), but should count all \(\binom{n}{2}\) pairs.

Berkson’s Paradox: Selection Bias Creates False Correlations

Observation: Among admitted students, SAT scores and GPA appear negatively correlated.

Reality: In the general population, they’re slightly positively correlated!

Mechanism: Admission requires either high SAT or high GPA (or both). By conditioning on admission:

  • High-SAT admits don’t need high GPAs
  • High-GPA admits don’t need high SATs

In finance: Among observed (survived) hedge funds, returns and AUM may show spurious negative correlation because both helped them survive.

Chapter 3 Summary

Core Framework:

  • Three probability definitions → Classical, Frequentist, Subjective
  • Kolmogorov’s three axioms → foundation for everything

Essential Rules:

  • Complement, Addition, Conditional Probability, Independence
  • Total Probability → decompose complex events
  • Bayes’ Theorem → update beliefs with evidence

Practical Warnings:

  • Independence breaks down in crises → tail risk is correlated
  • Fat tails: extreme events are ~6× more common than normal theory predicts
  • Base Rate Fallacy: high-accuracy tests still produce mostly false alarms for rare events