The Law of Large Numbers
If you repeat an experiment many times, the average of the results tends to stabilize. This simple observation underpins much of statistics and numerical methods.
If you repeat an experiment many times, the average of the results tends to stabilize. This simple observation underpins much of statistics and numerical methods.
Consider a collection of independent and identically distributed observations \(X_1,X_2,\dots\) with finite expectation \(\mu=E[X_1]\). The quantity we observe is the sample mean
The intuitive question is: what happens to \(\overline{X}_n\) as \(n\) grows? The answer: it tends to \(\mu\) according to various notions of convergence. The two most common are convergence in probability (WLLN) and almost sure convergence (SLLN).
Assume the variance \(\sigma^2=Var(X_1)\) exists. The variance of the sum \(\sum_{i=1}^n X_i\) grows linearly with \(n\). Normalizing by \(n^2\) gives the variance of the mean:
which goes to zero as \(n\to\infty\). Chebyshev's inequality turns this into a probabilistic statement: for every \(\varepsilon>0\),
This is the weak law of large numbers: for large \(n\) the probability of a significant deviation becomes small. It's a quantitative guideline: to make the probability of an error larger than \(\varepsilon\) small, choose \(n\) so that \(\sigma^2/(n\varepsilon^2)\) is small.
The WLLN does not assert that for each individual sequence of outcomes the mean converges; it only says the probability that the mean is far from \(\mu\) goes to zero. The SLLN strengthens the conclusion: with probability 1 the sequence of sample means converges to \(\mu\). Proving the SLLN requires controlling distribution tails: rare but very large values can spoil convergence unless controlled.
The standard method is truncation: define truncated variables \(X_i^{(M)} = X_i\mathbf{1}_{\{|X_i|\le M\}}\). Truncated versions have controlled tails and one applies strong results to them; then use the Borel–Cantelli lemma to show that events \(|X_i|>M\) occur only finitely many times almost surely as \(M\to\infty\). Combining these steps yields the SLLN for the original variables.
The simple Borel–Cantelli lemma: if \(\sum_n P(A_n)<\infty\) then the probability that infinitely many of the events \(A_n\) occur is 0. This connects sums of probabilities with almost-sure behavior.
Knowing convergence is useful, but knowing the scale of fluctuations is often more important in practice. The Central Limit Theorem (CLT) states that if \(Var(X_1)=\sigma^2<\infty\), then
This implies typical deviations of \(\overline{X}_n\) around \(\mu\) are of order \(1/\sqrt{n}\). For large \(n\), \(\overline{X}_n\) is approximately normal with mean \(\mu\) and variance \(\sigma^2/n\). Confidence intervals are therefore justified by the CLT.
Quantitative results (Berry–Esseen) provide an \(O(1/\sqrt{n})\) bound on the approximation error, with constants depending on third moments of the \(X_i\).
Non-asymptotic bounds giving exponentially small probabilities of large deviations are often useful. For bounded or sub-Gaussian variables there are inequalities like Hoeffding and Bernstein.
Compared to Chebyshev these bounds are much stronger: the probability of large deviations decays exponentially in \(n\), not just as \(1/n\).
For a simple didactic case let \(X_i\) be Bernoulli with parameter \(p\): \(X_i\in\{0,1\}\), \(P(X_i=1)=p\). Then \(\mu=p\) and variance \(p(1-p)\). The variance of the mean is:
The typical standard deviation is \(\sqrt{p(1-p)/n}\). In the demo, increasing \(n\) makes individual trajectories \(f_i(t)\) approach the horizontal line at \(p\) and the histogram of \(f_i(n)\) becomes more concentrated around \(p\). Increasing \(m\) (number of trajectories) with fixed \(n\) makes the histogram less noisy and better approximated.
An approximate 95% confidence interval for the final frequency is
For p near 0 or 1 and for small n this approximation can be poor; in those cases use exact methods (Clopper–Pearson) or transformations.
The LLN explains why repeating measurements and computing averages is effective. In labs or Monte Carlo simulations:
Now try the interactive simulation below to verify these points.
Results will appear here after the simulation.