1. Interpretations of probability
The term probability is used in different contexts and with meanings that, at first glance, may appear confusing or in conflict. The most common readings are:
1.1 Classical interpretation (Laplace)
This applies when the outcome space is finite and the elementary outcomes are considered equally likely for reasons of symmetry. The probability of an event is the ratio between the number of favorable outcomes and the total number of possible outcomes:
Limitation: it is not directly applicable to infinite spaces or continuous variables without an additional criterion that justifies equal weighting.
1.2 Frequentist interpretation
Probability is seen as the limit of relative frequencies in the long run: if an experiment is repeated many times, the fraction of times the event occurs tends to its probability.
Limitation: it requires the notion of repeatability and does not assign a probability to single non-repeatable events (e.g., the probability of a unique historical event).
1.3 Bayesian interpretation
Probability is a coherent measure of degree of belief or information of a rational agent. An updating rule (Bayes' theorem) is needed to change beliefs in light of new information.
Advantage: it allows assigning probabilities to hypotheses or parameters. Criticism: it requires an explicit choice of prior (but there are principles for 'non-informative' priors).
1.4 Geometric / measure interpretation
For continuous phenomena one introduces a measure on a space (e.g., Lebesgue measure on an interval) and normalizes it to obtain probabilities. A point chosen "at random" is modeled via the measure.
1.5 Propensity (physical tendencies)
In physics and philosophy of science the probability is sometimes interpreted as a tendency or disposition of the system to produce certain outcomes: it is an objective property of the system, not merely a matter of our ignorance.
1.6 Apparent conflicts and how axiomatization resolves them
The differences often arise from how one constructs the outcome space \((\Omega,\mathcal{F},P)\). The axiomatic approach (Kolmogorov) does not impose a unique construction of \(\Omega\) or \(P\): it imposes formal properties that any construction must satisfy (non-negativity, normalization, σ-additivity). In this way:
- The classical construction is a special case (finite space with equal weights).
- Lebesgue measure provides the geometric construction for continuous spaces.
- The frequentist approach gives an empirical justification for certain constructions of \(P\).
- The Bayesian uses the same axiomatic structure but interprets the assigned value as a degree of belief.
Thus there is no mathematical contradiction: the different readings operate at a semantic/interpretative level but converge on the same mathematical language.
2. Probability as a measure — technical details
2.1 Probability space
The formal model is the triple
- \(\Omega\): the set of outcomes (sample space).
- \(\mathcal{F}\): a σ-algebra of measurable subsets (closed under complements and countable unions).
- \(P:\mathcal{F}\to[0,1]\): a measure with \(P(\Omega)=1\) and σ-additivity.
Example of a σ-algebra: on \(\Omega=\mathbb{R}\) the standard σ-algebra is the Borel σ-algebra \(\mathcal{B}(\mathbb{R})\), generated by open intervals; one often also works with the completion with respect to a measure (e.g., Lebesgue measure).
2.2 Random variables and measurability
A random variable is a measurable function
Measurability means that for every Borel set \(B\) the preimage \(X^{-1}(B)\in\mathcal{F}\). This ensures we can speak of the probability that \(X\) falls in a given set and define the pushforward distribution \(P_X\).
2.3 Expectation and the Lebesgue integral
The expectation of a random variable is the Lebesgue integral with respect to the probability measure:
This formalism is more general and robust than the Riemann integral: it allows handling heavy-tailed variables, discontinuous functions, convergence issues, etc.
2.4 Main notions of convergence
- Almost everywhere convergence (a.e.): \(X_n(\omega)\to X(\omega)\) for every \(\omega\) outside a set of measure zero.
- Convergence in probability: \(P(|X_n-X|>\varepsilon)\to 0\) for every \(\varepsilon>0\).
- Convergence in \(L^1\): \(\mathbb{E}[|X_n-X|]\to 0\).
Fundamental results (law of large numbers, central limit theorems) are expressed in these terms.
3. Required axiomatic derivations
3.1 Subadditivity (Boole)
Statement. For any sequence of events \(A_1,A_2,\dots\) we have
Proof. Define disjoint sets
Then the \(B_n\) are disjoint and \(\bigcup_n B_n = \bigcup_n A_n\). By σ-additivity
Since \(B_i\subseteq A_i\) it follows \(P(B_i)\le P(A_i)\) and therefore summing yields the inequality.
3.2 Inclusion–exclusion principle
For a finite number of events \(A_1,\dots,A_n\) we have
Proof (idea). For each \(\omega\) consider the indicator functions \(1_{A_i}(\omega)\). The combinatorial identity for indicators gives the correct pointwise count; integrating with respect to \(P\) yields the formula for probabilities. An alternative proof proceeds by induction on \(n\) using the identity
Practical note: the formula is exact but practically impossible to use if \(n\) is large; one therefore resorts to inequalities (Bonferroni) or probabilistic estimates.
4. References
- A. N. Kolmogorov — Foundations of the Theory of Probability (1933).
- P. Billingsley — Probability and Measure.
- R. Durrett — Probability: Theory and Examples.
- J. F. C. Kingman — Poisson Processes.