05: Inferential Statistics

Why Inference? From Sample to Population

In practice, we never observe the entire population.

A bank cannot survey every potential customer
An asset manager cannot wait for infinite return data
A regulator cannot audit every transaction

Inferential statistics provides the mathematical machinery to draw rigorous conclusions about a population from a limited sample — and to quantify our uncertainty.

The Three Pillars of Inference

Pillar	Question It Answers	Key Output
Point Estimation	What is our best single guess for \(\theta\)?	\(\hat{\theta}\)
Interval Estimation	What range plausibly contains \(\theta\)?	Confidence Interval
Hypothesis Testing	Is a specific claim about \(\theta\) supported by data?	p-value, decision

These three tools form a complete inferential toolkit for data-driven decision making in finance and business.

Point Estimation: Three Desirable Properties

A good estimator \(\hat{\theta}\) should satisfy:

1. Unbiasedness: On average, it hits the target.

\[E[\hat{\theta}] = \theta\]

2. Efficiency: Among all unbiased estimators, it has the smallest variance (Cramér-Rao Lower Bound).

3. Consistency: As \(n \to \infty\), the estimator converges to the true value.

\[\hat{\theta}_n \xrightarrow{p} \theta\]

Maximum Likelihood Estimation (MLE)

MLE answers: What parameter value makes the observed data most probable?

Given observations \(x_1, x_2, \ldots, x_n\), the likelihood function is:

\[L(\theta) = \prod_{i=1}^{n} f(x_i \mid \theta)\]

The MLE maximizes this (or equivalently, the log-likelihood):

\[\hat{\theta}_{MLE} = \arg\max_\theta \sum_{i=1}^{n} \ln f(x_i \mid \theta)\]

Key property: MLE is asymptotically efficient — it achieves the Cramér-Rao bound as \(n \to \infty\).

MLE in Action: YRD Profitability Rate

Case: Among 1,978 listed companies in the Yangtze River Delta (2023 Q3), what fraction is profitable?

Model: Each company’s profitability is Bernoulli(\(p\))
Data: 1,678 companies reported positive net income
MLE: \(\hat{p} = 1678 / 1978 = 84.83\%\)

Caution: MLE for variance is \(\frac{1}{n}\sum(x_i - \bar{x})^2\), which is biased. The unbiased version divides by \(n-1\).

This illustrates a general lesson: MLE is not always unbiased, but it is always consistent.

Confidence Intervals: The Concept

A 95% confidence interval does NOT mean:

“There is a 95% probability that \(\mu\) lies in this interval.”

It means:

“If we repeated this sampling procedure infinitely many times, 95% of the constructed intervals would contain the true \(\mu\).”

The randomness is in the interval, not in \(\mu\). The population parameter \(\mu\) is fixed but unknown.

CI for the Mean: Two Cases

Case 1: \(\sigma\) known (rare in practice)

\[\bar{X} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\]

Case 2: \(\sigma\) unknown (the standard case)

\[\bar{X} \pm t_{\alpha/2, \, n-1} \cdot \frac{s}{\sqrt{n}}\]

CI for a proportion:

\[\hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]

The margin of error shrinks at rate \(1/\sqrt{n}\) — to halve the margin, you need 4× the sample size.

CI Case: Average ROE of YRD Electronics Firms

Data: 190 electronic industry firms in YRD, 2023 Q3.

Sample mean ROE: \(\bar{X} = 2.10\%\)
Sample std dev: \(s = 12.38\%\)

Confidence Level	\(t\) Critical Value	CI
90%	1.653	[0.62%, 3.59%]
95%	1.973	[0.33%, 3.87%]
99%	2.602	[−0.24%, 4.44%]

Interpretation: We are 95% confident that the true average ROE for YRD electronics firms lies between 0.33% and 3.87%.

Sample Size Planning

How large a sample do we need?

For estimating a mean with margin of error \(E\):

\[n = \left(\frac{z_{\alpha/2} \cdot \sigma}{E}\right)^2\]

For estimating a proportion:

\[n = \frac{z_{\alpha/2}^2 \cdot p(1-p)}{E^2}\]

Example: Zhejiang textile industry revenue survey (\(\sigma = 50\) million CNY):

Target Margin	Required \(n\)
±10 million CNY	97
±5 million CNY	385

Dirty Work: P-Hacking

P-hacking = testing many hypotheses until you find a “significant” one.

Simulation: Test 100 completely random features against a random target at \(\alpha = 0.05\).

Expected false positives: \(100 \times 0.05 = 5\)
Observed: ~5 features pass the significance test purely by chance

The lesson: With enough tests, you will find “significant” results even when nothing is real.

Defense: Pre-registration, multiple testing corrections (Bonferroni, FDR), and replication.

Dirty Work: The File Drawer Problem

Publication bias creates a distorted view of reality:

Studies with \(p < 0.05\) get published
Studies with \(p > 0.05\) stay in the “file drawer”
Published literature systematically overestimates effect sizes

Hypothesis Testing: The Logic of Proof by Contradiction

Hypothesis testing follows the logic of proof by contradiction:

Assume the null hypothesis \(H_0\) is true
Compute how surprising the observed data would be under \(H_0\)
If very surprising (small p-value), reject \(H_0\)

	\(H_0\) True	\(H_0\) False
Reject \(H_0\)	Type I Error (\(\alpha\))	Correct (Power = \(1-\beta\))
Fail to reject	Correct	Type II Error (\(\beta\))

Convention: \(\alpha = 0.05\) (willing to accept 5% false positive rate).

The p-Value: What It Actually Means

\[p\text{-value} = P(\text{data this extreme or more} \mid H_0 \text{ is true})\]

Three critical misconceptions:

Misconception	Reality
“p = probability \(H_0\) is true”	p measures data surprise, not hypothesis probability
“\(p < 0.05\) means the effect is large”	Statistical significance ≠ practical importance
“\(p > 0.05\) means no effect exists”	Absence of evidence ≠ evidence of absence

The ASA Statement (2016): “A p-value does not measure the probability that the studied hypothesis is true.”

One-Sample t-Test

Question: Does the population mean equal a hypothesized value \(\mu_0\)?

\[t = \frac{\bar{X} - \mu_0}{s / \sqrt{n}} \sim t_{n-1}\]

Case: YRD Bank ROE vs. 2.5% Benchmark

18 YRD-region banks, mean ROE = 8.54%
\(t = 8.13\), \(p \approx 0.000\)
Cohen’s d = 1.92 (very large effect)

Conclusion: YRD banks’ ROE is significantly and substantially above the 2.5% benchmark.

Two-Sample t-Test

Question: Do two populations have the same mean?

\[t = \frac{\bar{X}_1 - \bar{X}_2}{S_p \sqrt{1/n_1 + 1/n_2}}\]

where the pooled standard deviation is:

\[S_p = \sqrt{\frac{(n_1-1)S_1^2 + (n_2-1)S_2^2}{n_1 + n_2 - 2}}\]

Case: Shanghai (425 firms) vs. Anhui (168 firms) ROE

Shanghai: \(\bar{X}_1 = 2.19\%\), Anhui: \(\bar{X}_2 = 0.32\%\)
\(p = 0.054\) → Fail to reject at 5% level
Borderline result — more data might tip the balance

Statistical Significance vs. Practical Importance

A result can be:

	Practically Important	Practically Trivial
Statistically Significant	Ideal finding	Large \(n\) trap
Not Significant	Underpowered study	True null

Example: With 100,000 users per group, a conversion rate difference of 0.05% (5.00% vs 5.05%) yields \(p = 0.054\).

Always report effect sizes alongside p-values:

Cohen’s d for means: small (0.2), medium (0.5), large (0.8)
Odds ratio for proportions

Heuristic: The Hot Hand Fallacy

Scenario: A basketball player makes 8 consecutive shots. Is she “hot”?

Statistical reality:

In a sequence of Bernoulli trials (50% success rate), runs of 8 occur more often than intuition suggests
Our brains are pattern-seeking machines — we see streaks where randomness exists
The same fallacy applies to fund manager “hot streaks”

Lesson: Before attributing performance to skill, always test against the null hypothesis of pure randomness.

Heuristic: Regression to the Mean

Observation: Extreme values in one period tend to be less extreme in the next.

The top-performing fund this year will likely underperform next year
A company with an exceptionally high ROE will likely see it decline
Students who score highest on one exam tend to score lower on the next

This is NOT mysterious — it’s a direct mathematical consequence of imperfect correlation between successive measurements.

Implication: Don’t confuse regression to the mean with actual causal deterioration.

Chapter Summary

Concept	Key Takeaway
Point Estimation	MLE is consistent and asymptotically efficient
Confidence Intervals	Width \(\propto 1/\sqrt{n}\); 4× data for half the margin
Hypothesis Testing	Proof by contradiction; control Type I error at \(\alpha\)
p-Value	Measures data surprise, NOT probability of \(H_0\)
Effect Size	Always report alongside p-value
P-Hacking	Multiple testing inflates false positive rate
File Drawer	Publication bias overestimates effects

The golden rule: Statistical significance without practical significance is meaningless.