Probability, Rényi-style
2023-10-16 — 2026-01-07
Wherein conditional probability is taken as primitive, an admissible ’bunch’ of conditions is specified (often excluding Ω), and σ‑finite measures are treated up to scale so probabilities are furnished by ratios.
1 Motivation: why change the primitives?
In Kolmogorov’s formulation, we start with an absolute probability measure \(P\) on a measurable space \((\Omega,\mathcal A)\), and then define conditional probabilities \(P(A\mid B)\) as derived objects (often via Radon–Nikodym derivatives or regular conditional probabilities). This works well when:
- \(P\) is a finite measure (so \(P(\Omega)=1\)),
- we condition on events \(B\) with \(P(B)>0\),
- and we have the regularity needed for “conditioning on \(\sigma\)-fields” to be represented by a version \(P(\,\cdot\mid\mathcal G)\).
But in applied work we constantly encounter “probabilities” that are only specified up to a normalizing constant (e.g., likelihoods, Gibbs measures, Bayesian priors/posteriors before normalization). We might want to use “improper priors” that aren’t probability measures at all (e.g., a flat prior on \(\mathbb R\)). In these cases, the conditional statements can still be coherent and operational, even when there is no global \(P(\Omega)=1\).
Rényi’s move makes this work better by making conditional probability the primitive notion. Absolute probabilities are secondary—available only when “conditioning on \(\Omega\)” is part of the system.
2 Basic objects
A Rényi conditional probability space consists of:
- a measurable space \((\Omega,\mathcal A)\),
- a nonempty family \(\mathcal B\subseteq \mathcal A\) of admissible conditioning events (which Rényi called a bunch),
- a map \[ P(\,\cdot\mid\cdot): \mathcal A\times \mathcal B \to [0,1], \qquad (A,B)\mapsto P(A\mid B), \] It’s interpreted as “the probability of \(A\) given condition \(B\)”.
Intuition: \(\mathcal B\) encodes which “experimental conditions” are legitimate. In classical probability we might take \(\mathcal B=\{B\in\mathcal A: P(B)>0\}\).
In σ-finite measure models (improper priors, invariant measures), \(\mathcal B\) is typically the family of sets with finite, positive mass.
A minimal structural property usually assumed for \(\mathcal B\) is closure under finite intersections: \[ B_1,B_2\in\mathcal B \implies B_1\cap B_2\in\mathcal B, \] So we can refine conditions.
3 Rényi’s axioms
The axioms are designed so that every fixed condition \(B\) induces an ordinary probability measure, and different conditions are mutually consistent. NB: I haven’t read these from Rényi’s original text; I’ve just summarized secondary sources that notionally present nicer variants of the axioms.
3.1 Axiom R1 (probability in the first argument)
For every \(B\in\mathcal B\), the function \[ A \mapsto P(A\mid B) \] This is a (Kolmogorov) probability measure on \((\Omega,\mathcal A)\). Concretely:
- \(P(\Omega\mid B)=1\),
- if \((A_i)\) are disjoint, then \(P(\cup_i A_i \mid B)=\sum_i P(A_i\mid B)\),
- \(P(A\mid B)\ge 0\).
So “given \(B\)” we are in familiar territory.
3.2 Axiom R2 (chain rule / consistency across conditions)
For all \(A\in\mathcal A\) and \(B,C\in\mathcal B\) with \(B\cap C\in\mathcal B\), \[ P(A\cap B \mid C)=P(A\mid B\cap C)\,P(B\mid C). \] This is the conditional-probability product rule, taken as an axiom. It enforces coherence when we change the condition from \(C\) to a refinement \(B\cap C\).
Immediate consequences:
- Taking \(A=\Omega\) gives \(P(B\mid B)=1\) (so conditions are “self-certain”).
- Setting \(C=B\) yields \(P(A\cap B\mid B)=P(A\mid B)\).
- Bayes’ rule and the usual algebra of conditional probability follow wherever both sides are defined.
3.3 Recovering Kolmogorov probability
If \(\Omega\in\mathcal B\) holds, we define the unconditional probability by \[ P(A) := P(A\mid \Omega). \] Then R1 says that \(P\) is a probability measure on \((\Omega,\mathcal A)\), and R2 reduces to the standard product rule. \[ P(A\cap B)=P(A\mid B)\,P(B). \] Kolmogorov spaces are a special case; Rényi spaces generalize them by allowing \(\Omega\notin\mathcal B\).
4 The canonical model: ratios of a σ-finite measure
The most important representation is:
If \(\mu\) is a σ-finite measure on \((\Omega,\mathcal A)\), we define \[ \mathcal B := \{B\in\mathcal A: 0<\mu(B)<\infty\} \] and \[ P(A\mid B) := \frac{\mu(A\cap B)}{\mu(B)}. \] Then \((\Omega,\mathcal A,\mathcal B,P)\) satisfies R1–R2.
This looks like ordinary conditioning, except that \(\mu\) need not be a probability measure. The restriction \(0<\mu(B)<\infty\) ensures the ratio is meaningful.
4.1 Scale invariance
If we replace \(\mu\) with \(c\mu\) for any \(c>0\), then \(P(A\mid B)\) remains unchanged. Hence the underlying “state” is really the equivalence class. \[ [\mu] = \{c\mu: c>0\}, \] It’s not a particular normalization. This is a Rényi state.
This formalizes what applied probabilists routinely do with unnormalized densities: we work with something proportional to a measure, and only ratios matter.
5 Practical implications
5.1 Improper priors become legitimate states
In Bayesian inference, we may take a prior proportional to Lebesgue measure on \(\mathbb R\) (the “flat prior”). This is not a probability measure (\(\mu(\mathbb R)=\infty\)), but it is σ-finite. Rényi says: fine — treat it as a state \([\mu]\). We can still form posteriors whenever the normalizing integral over the relevant set is finite.
Example:
- Prior: \(\mu(d\theta)=d\theta\) on \(\mathbb R\).
- Conditioning event \(B\): the set “\(\theta\in[-M,M]\)” has finite mass \(2M\).
- Then \(P(\theta\in A\mid \theta\in[-M,M])\) is the usual uniform conditional distribution on \([-M,M]\).
The point isn’t that \(\theta\) is “uniform on \(\mathbb R\)” (it isn’t, as a probability statement); it’s that all finite-window conditionals are coherent.
5.2 Likelihood-first modeling
In statistical mechanics and MCMC, we often specify a target distribution by an unnormalized density \(f(x)\propto e^{-H(x)}\). Rényi’s perspective is that an unnormalized density defines a measure up to scale — hence a state, hence a web of conditional probabilities.
5.3 Conditioning on null sets
Kolmogorov conditioning on events with \(P(B)=0\) is not defined as a number \(P(A\mid B)\); instead we pass to conditioning on \(\sigma\)-fields and use versions defined almost surely. Rényi does not magically assign numbers to null events either. Instead, it proposes we choose \(\mathcal B\) as the conditions we can actually condition on, typically those with finite positive mass under a σ-finite measure. This is often exactly what we do operationally.
6 6. Relation to standard measure-theoretic conditioning
If we start from a probability space \((\Omega,\mathcal A,P)\) in the Kolmogorov sense, we can build a Rényi space by taking \[ \mathcal B = \{B\in\mathcal A: P(B)>0\},\quad P_{\text{Rényi}}(A\mid B)=\frac{P(A\cap B)}{P(B)}. \] Rényi doesn’t conflict with Kolmogorov; it reorganizes the foundations to:
- treat conditional probability as primitive,
- make normalization optional,
- and cleanly accommodate σ-finite “probability up to scale”.
7 Crib sheet
- Primitive: \(P(A\mid B)\) for \(B\) in an admissible class \(\mathcal B\).
- Local Kolmogorov: for each fixed \(B\), \(A\mapsto P(A\mid B)\) is a probability measure.
- Global coherence: the chain rule \(P(A\cap B\mid C)=P(A\mid B\cap C)P(B\mid C)\).
- σ-finite representation: most useful Rényi spaces come from ratios of a σ-finite measure \(\mu\).
- Normalization is irrelevant: \(\mu\) is defined only up to multiplication by a positive constant (Rényi state).
If we already think in terms of “densities up to proportionality” and “condition then normalize,” we’re already using Rényi’s logic; the axioms make it precise.
