Calibration of probabilistic forecasts

Proper scoring rules, skill scores etc

2015-06-16 — 2023-11-15

Wherein the Notion of Probabilistic Calibration Is Treated as a Rule to Be Enforced, the Dictum That Eighty‑percent Forecasts Are Mistaken in About Twenty Percent of Cases Is Adduced, and Assessment Methods Are Noted.

model selection
regression
signal processing
statistics
stochastic processes
time series
Figure 1

Intuitively speaking, we need to ensure that if our prediction is 80% certain, we are wrong as close to 20% of the time as possible. The same applies to all other certainties.

Placeholder.

I do not know much about this, but I could probably start from the compact lit review in Gneiting and Raftery (2007), or chapter 2 of Neyman (2024) which generalises from calibration to all sort of interesting topics in Bayesian epistemics. The same scoring-rule primitive also underwrites the loss-as-voting framing in AI alignment to collective values, the truth-from-strategic-agents formalisms in learning from the madness of crowds, and the futarchy / prediction-market line in utopian governance.

1 See also

  • Bayesian epistemics — proper scoring rules, peer prediction, truth elicitation
  • AI alignment to collective values — losses as scoring-rule-implementing voting mechanisms
  • Learning from the madness of crowds — extracting calibrated belief from biased reporters
  • Groupthink and the wisdom of crowds — when many forecasters are better calibrated than one
  • Utopian governance — futarchy, prediction markets, scoring-rule mechanisms

2 Incoming

3 References

Buja, Stuetzle, and Shen. 2005. “Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications.”
Gneiting, and Raftery. 2007. Strictly Proper Scoring Rules, Prediction, and Estimation.” Journal of the American Statistical Association.
Henzi, Shen, Law, et al. 2023. Invariant Probabilistic Prediction.”
Neyman. 2024. Algorithmic Bayesian Epistemology.”
Nixon, Dusenberry, Zhang, et al. n.d. “Measuring Calibration in Deep Learning.”
Pacchiardi, and Dutta. 2022. Generalized Bayesian Likelihood-Free Inference Using Scoring Rules Estimators.” arXiv:2104.03889 [Stat].
Reid, and Williamson. 2010. Composite Binary Losses.” Journal of Machine Learning Research.
Székely, and Rizzo. 2013. Energy Statistics: A Class of Statistics Based on Distances.” Journal of Statistical Planning and Inference.