Over at statistical mechanics of statistics we wonder about the connection between statistical mechanics and statistics, which suggests we might consider the connection between entropy and information. Entropy, the physics concept, and information, the computer science concept, look dang similar and yet they are defined for different things. How do they connect?

Connected somehow: algorithmic statistics, information geometry.

Michael Betancourt says:

Friendly reminder that entropy is not a property of individual states/configurations of a system but rather a probability distribution of all the possible states/configurations (relative to some reference measure, if we want to get technical).

An apparently “messy” or “disorganized” configuration of a room is not by itself high entropy. By definition, any room configuration completely describes the room. In other words, there is no uncertainty about where every individual object is placed.

On the other hand, if we don’t know what the configuration of the room is, then we might describe its possible configurations with a probability distribution over room configurations.

If this probability distribution exhibits high entropy (relative to a uniform measure), then all room configurations will be nearly equally probable. Moreover, if there are many more messy configurations than clean configurations, then we can say that clean room configurations are rare.

Too plain? Try this. Shalizi and Moore (2003):

We consider the question of whether thermodynamic macrostates are objective consequences of dynamics, or subjective reflections of our ignorance of a physical system. We argue that they are both; more specifically, that the set of macrostates forms the unique maximal partition of phase space which 1) is consistent with our observations (a subjective fact about our ability to observe the system) and 2) obeys a Markov process (an objective fact about the system’s dynamics). We review the ideas of computational mechanics, an information-theoretic method for finding optimal causal models of stochastic processes, and argue that macrostates coincide with the “causal states” of computational mechanics. Defining a set of macrostates thus consists of an inductive process where we start with a given set of observables, and then refine our partition of phase space until we reach a set of states which predict their own future, i.e. which are Markovian. Macrostates arrived at in this way are provably optimal statistical predictors of the future values of our observables.

1 MaxEnt

The basic principle from my school days was selecting a probability distribution that maximises the entropy subject to certain constraints. The rationale is that the maximum entropy distribution is the ‘least biased’ estimate possible while still satisfying those given constraints. So much is relatively uncontroversial. If I am doing Bayesian inference, for example, I might want to select my priors to be the maximum entropy distribution that satisfies the constraints of my prior knowledge. If I am lucky, this might even be tractable. For example, if for some reason my constraints are that my prior must be continuous and have a specified mean and variance, then the Gaussian distribution is the maximum entropy distribution that satisfies those constraints. It gets a bit harder if I have weird constraints, such as “my prior must be over valid covariance functions” or “my prior must comprise valid solutions to the Navier-Stokes equations”. Nonetheless, the idea seems simple and nifty.

However, from context cues (the fact people introduced a camelcase acronym) I deduce there are more things happening here. I first looked into it 10+ years ago, but I have had my interest piqued again after ignoring it for ages, because Bert de Vries claimed to have actioned MaxEnt as a particularly useful phenomenological idea within the predictive coding theory of mind, which inclines me to return to the original MaxEnt work by Caticha, which now has textbooks about it (Caticha 2015, 2008) and review articles (Caticha 2021, 2014).

2 Incoming

Statistical Physics of Inference and Bayesian Estimation Informational entropy versus thermodynamic entropy.
John Baez’s A Characterisation of Entropy etc. See also There is now a book (John Carlos Baez 2024)! the post What is Entropy? introduces with the pitch including

I have largely avoided the second law of thermodynamics, which says that entropy always increases. While fascinating, this is so problematic that a good explanation would require another book! I have also avoided the role of entropy in biology, black hole physics, etc. Thus, the aspects of entropy most beloved by physics popularisers will not be found here. I also never say that entropy is ‘disorder’.
Wolpert (2006)
Daniel Ellerman’s Logical Entropy stuff and which he has now written up as Ellerman (2017).
Information loss and entropy
Feldman, A Brief Introduction to: Information Theory, Excess Entropy and Computational Mechanics
It Took Me 10 Years to Understand Entropy, Here is What I Learned. | by Aurelien Pelissier

3 References

Alexiadis, Alessio. 2019. “Deep Multiphysics and Particle–Neuron Duality: A Computational Framework Coupling (Discrete) Multiphysics and Deep Learning.” Applied Sciences.

Alexiadis, A., Simmons, Stamatopoulos, et al. 2020. “The Duality Between Particle Methods and Artificial Neural Networks.” Scientific Reports.

Altaner. 2017. “Nonequilibrium Thermodynamics and Information Theory: Basic Concepts and Relaxing Dynamics.” Journal of Physics A: Mathematical and Theoretical.

Baez, John Carlos. 2024. “What Is Entropy?”

Baez, John C., Fritz, and Leinster. 2011. “A Characterization of Entropy in Terms of Information Loss.” Entropy.

Barnum, Barrett, Clark, et al. 2010. “Entropy and Information Causality in General Probabilistic Theories.” New Journal of Physics.

Beck, and Schögl. 1995. Thermodynamics of Chaotic Systems.

Beretta. 2020. “The Fourth Law of Thermodynamics: Steepest Entropy Ascent.” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

Bialek, Nemenman, and Tishby. 2001. “Complexity Through Nonextensivity.” Physica A: Statistical and Theoretical Physics.

———. 2006. “Predictability, Complexity, and Learning.” Neural Computation.

Bieniawski, and Wolpert. 2004. “Adaptive, Distributed Control of Constrained Multi-Agent Systems.” In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems-Volume 3.

Caticha. 2007. “Information and Entropy.” In AIP Conference Proceedings.

———. 2008. “Lectures on Probability, Entropy, and Statistical Physics.”

———. 2009. “Quantifying Rational Belief.” In.

———. 2011. “Entropic Inference.” In.

———. 2014. “Towards an Informational Pragmatic Realism.” Minds and Machines.

———. 2015. “The Basics of Information Geometry.” In.

———. 2021. “Entropy, Information, and the Updating of Probabilities.” Entropy.

———. 2022. Probability, Entropy, and the Foundations of Physics.

Caticha, and Giffin. 2006. “Updating Probabilities.” In AIP Conference Proceedings.

Cuzzolin. 2021. “A Geometric Approach to Conditioning Belief Functions.”

Dewar. 2003. “Information Theory Explanation of the Fluctuation Theorem, Maximum Entropy Production and Self-Organized Criticality in Non-Equilibrium Stationary States.” Journal of Physics A: Mathematical and General.

Earman, and Norton. 1998. “Exorcist XIV: The Wrath of Maxwell’s Demon. Part I. From Maxwell to Szilard.” Studies in History and Philosophy of Modern Physics.

———. 1999. “Exorcist XIV: The Wrath of Maxwell’s Demon. Part II. From Szilard to Landauer and Beyond.” Studies in History and Philosophy of Modern Physics.

Elith, Phillips, Hastie, et al. 2011. “A Statistical Explanation of MaxEnt for Ecologists.” Diversity and Distributions.

Ellerman. 2017. “Logical Information Theory: New Foundations for Information Theory.” arXiv:1707.04728 [Quant-Ph].

Gelman, and Shalizi. 2013. “Philosophy and the Practice of Bayesian Statistics.” British Journal of Mathematical and Statistical Psychology.

Gray. 1991. Entropy and Information Theory.

Hoel. 2017. “When the Map Is Better Than the Territory.” Entropy.

Hoel, Albantakis, and Tononi. 2013. “Quantifying Causal Emergence Shows That Macro Can Beat Micro.” Proceedings of the National Academy of Sciences.

Jaynes. 1990. “Probability in Quantum Theory.”

Jaynes, and Bretthorst. 2003. Probability Theory: The Logic of Science.

Machta. 1999. “Entropy, Information, and Computation.” American Journal of Physics.

Natal, Ávila, Tsukahara, et al. 2021. “Entropy: From Thermodynamics to Information Processing.” Entropy.

Nemenman, Shafee, and Bialek. 2001. “Entropy and Inference, Revisited.” In arXiv:physics/0108025.

Sethna. 2006. Statistical Mechanics: Entropy, Order Parameters, and Complexity.

Shalizi, and Moore. 2003. “What Is a Macrostate? Subjective Observations and Objective Dynamics.”

Still. 2020. “Thermodynamic Cost and Benefit of Memory.” Physical Review Letters.

Tseng, and Caticha. 2002. “Yet Another Resolution of the Gibbs Paradox: An Information Theory Approach.” In AIP Conference Proceedings.

Wolpert. 2006. “Information Theory — The Bridge Connecting Bounded Rational Game Theory and Statistical Physics.” In Complex Engineered Systems. Understanding Complex Systems.

Yedidia, Freeman, and Weiss. 2005. “Constructing Free-Energy Approximations and Generalized Belief Propagation Algorithms.” IEEE Transactions on Information Theory.

Zellner. 2002. “Information Processing and Bayesian Analysis.” Journal of Econometrics, Information and Entropy Econometrics,.