Entropy vs information
MaxEnt(?), macrostates, subjective updating, epistemic randomness, Szilard engines, Gibbs paradox…
December 2, 2010 — May 22, 2024
Over at statistical mechanics of statistics we wonder about the connection between statistical mechanics and statistics, which suggests we might consider the connection between entropy and information. Entropy, the physics concept, and information, the computer science concept, look dang similar and yet they are defined for different things. How do they connect?
Connected somehow: algorithmic statistics, information geometry.
Friendly reminder that entropy is not a property of individual states/configurations of a system but rather a probability distribution of all the possible states/configurations (relative to some reference measure, if we want to get technical).
An apparently “messy” or “disorganized” configuration of a room is not by itself high entropy. By definition, any room configuration completely describes the room. In other words, there is no uncertainty about where every individual object is placed.
On the other hand, if we don’t know what the configuration of the room is, then we might describe its possible configurations with a probability distribution over room configurations.
If this probability distribution exhibits high entropy (relative to a uniform measure), then all room configurations will be nearly equally probable. Moreover, if there are many more messy configurations than clean configurations, then we can say that clean room configurations are rare.
Too plain? Try this. Shalizi and Moore (2003):
We consider the question of whether thermodynamic macrostates are objective consequences of dynamics, or subjective reflections of our ignorance of a physical system. We argue that they are both; more specifically, that the set of macrostates forms the unique maximal partition of phase space which 1) is consistent with our observations (a subjective fact about our ability to observe the system) and 2) obeys a Markov process (an objective fact about the system’s dynamics). We review the ideas of computational mechanics, an information-theoretic method for finding optimal causal models of stochastic processes, and argue that macrostates coincide with the “causal states” of computational mechanics. Defining a set of macrostates thus consists of an inductive process where we start with a given set of observables, and then refine our partition of phase space until we reach a set of states which predict their own future, i.e. which are Markovian. Macrostates arrived at in this way are provably optimal statistical predictors of the future values of our observables.
1 MaxEnt
The basic principle from my school days was selecting a probability distribution that maximises the entropy subject to certain constraints. The rationale is that the maximum entropy distribution is the ‘least biased’ estimate possible while still satisfying those given constraints. So much is relatively uncontroversial. If I am doing Bayesian inference, for example, I might want to select my priors to be the maximum entropy distribution that satisfies the constraints of my prior knowledge. If I am lucky, this might even be tractable. For example, if for some reason my constraints are that my prior must be continuous and have a specified mean and variance, then the Gaussian distribution is the maximum entropy distribution that satisfies those constraints. It gets a bit harder if I have weird constraints, such as “my prior must be over valid covariance functions” or “my prior must comprise valid solutions to the Navier-Stokes equations”. Nonetheless, the idea seems simple and nifty.
However, from context cues (the fact people introduced a camelcase acronym) I deduce there are more things happening here. I first looked into it 10+ years ago, but I have had my interest piqued again after ignoring it for ages, because Bert de Vries claimed to have actioned MaxEnt as a particularly useful phenomenological idea within the predictive coding theory of mind, which inclines me to return to the original MaxEnt work by Caticha, which now has textbooks about it (Caticha 2015, 2008) and review articles (Caticha 2021, 2014).
2 Incoming
Statistical Physics of Inference and Bayesian Estimation Informational entropy versus thermodynamic entropy.
John Baez’s A Characterisation of Entropy etc. See also There is now a book (John Carlos Baez 2024)! the post What is Entropy? introduces with the pitch including
I have largely avoided the second law of thermodynamics, which says that entropy always increases. While fascinating, this is so problematic that a good explanation would require another book! I have also avoided the role of entropy in biology, black hole physics, etc. Thus, the aspects of entropy most beloved by physics popularisers will not be found here. I also never say that entropy is ‘disorder’.
Wolpert (2006)
Daniel Ellerman’s Logical Entropy stuff and which he has now written up as Ellerman (2017).
Feldman, A Brief Introduction to: Information Theory, Excess Entropy and Computational Mechanics
It Took Me 10 Years to Understand Entropy, Here is What I Learned. | by Aurelien Pelissier