Not “free as in speech” or “free as in beer”, nor “free energy” in the sense of perpetual motion machines, zero point energy or pills that turn your water into petroleum, but rather a particular mathematical definition.

This would connect us to message passing in variational inference.

## In variational Bayes

Variational Bayes inference is a formalism for learning borrowing bits from statistical mechanics and graphical models.

Free energy shows up in variational Bayes as the negative of the
ELBO, the *evidence lower bound* which AFAICT means that this term is defined by
the Kullback-Leibler divergence.
Presumably an analogous term would pop up in
non KL approximation.

## As a model for cognition

This term, with the same (?) definition appears to pop up in a “free energy principle” where it is instrumental as a unifying concept for learning systems such as brains.

Here is the most compact version I could find:

The free energy principle (FEP) claims that self-organization in biological agents is driven by variational free energy (FE) minimization in a generative probabilistic model of the agent’s environment.

The chief pusher of this wheelbarrow appears to be Karl Friston.

He starts his *Nature Reviews Neuroscience* with this statement of the
principle:

The free-energy principle says that any self-organizing system that is at equilibrium with its environment must minimize its free energy.

Is that “must” in

- the sense of moral obligation, or is it
- a testable conservation law of some kind?

If the latter, self-organising in what sense? What type of equilibrium? For which definition of the free energy? What is our chief experimental evidence for this hypothesis?

I think it means that any right thinking brain, seeking to avoid the vice of slothful and decadent perception after the manner of foreigners, and compulsive masturbators, would do well to seek to maximise its free energy before partaking of a stimulating and refreshing physical recreation such as a game of cricket.

We do get a definition of free energy itself, with a diagram, which

…shows the dependencies among the quantities that define free energy. These include the internal states of the brain \(\mu(t)\) and quantities describing its exchange with the environment: sensory signals (and their motion) \(\bar{s}(t) = [s,s',s''…]^T\) plus action \(a(t)\). The environment is described by equations of motion, which specify the trajectory of its hidden states. The causes \(\vartheta \supset {\bar{x}, \theta, \gamma }\) of sensory input comprise hidden states \(\bar{x} (t),\) parameters \(\theta\), and precisions \(\gamma\) controlling the amplitude of the random fluctuations \(\bar{z}(t)\) and \(\bar{w}(t)\). Internal brain states and action minimize free energy \(F(\bar{s}, \mu)\), which is a function of sensory input and a probabilistic representation \(q(\vartheta|\mu)\) of its causes. This representation is called the recognition density and is encoded by internal states \(\mu\).

The free energy depends on two probability densities: the recognition density \(q(\vartheta|\mu)\) and one that generates sensory samples and their causes, \(p(\bar{s},\vartheta|m)\). The latter represents a probabilistic generative model (denoted by \(m\)), the form of which is entailed by the agent or brain…

\[F = -<\ln p(\bar{s},\vartheta|m)>_q + -<\ln q(\vartheta|\mu)>_q\]

This is (minus the actions) the variational principle in Bayesian inference.

See also: Exergy, Landauer’s Principle, the Slate Star Codex Friston dogpile, based on an exposition by Wolfgang Schwarz.

Bengio, Yoshua. 2009. “Learning Deep Architectures for AI.” *Foundations and Trends® in Machine Learning* 2 (1): 1–127. https://doi.org/10.1561/2200000006.

Castellani, Tommaso, and Andrea Cavagna. 2005. “Spin-Glass Theory for Pedestrians.” *Journal of Statistical Mechanics: Theory and Experiment* 2005 (05): P05012. https://doi.org/10.1088/1742-5468/2005/05/P05012.

Frey, B. J., and Nebojsa Jojic. 2005. “A Comparison of Algorithms for Inference and Learning in Probabilistic Graphical Models.” *IEEE Transactions on Pattern Analysis and Machine Intelligence* 27 (9): 1392–1416. https://doi.org/10.1109/TPAMI.2005.169.

Friston, Karl. 2010. “The Free-Energy Principle: A Unified Brain Theory?” *Nature Reviews Neuroscience* 11 (2): 127. https://doi.org/10.1038/nrn2787.

———. 2013. “Life as We Know It.” *Journal of the Royal Society Interface* 10 (86). https://doi.org/10.1098/rsif.2013.0475.

Geirhos, Robert, Jörn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, and Felix A. Wichmann. 2020. “Shortcut Learning in Deep Neural Networks,” April. http://arxiv.org/abs/2004.07780.

Jordan, Michael I., Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. 1999. “An Introduction to Variational Methods for Graphical Models.” *Machine Learning* 37 (2): 183–233. https://doi.org/10.1023/A:1007665907178.

Jordan, Michael I., and Yair Weiss. 2002. “Probabilistic Inference in Graphical Models.” *Handbook of Neural Networks and Brain Theory*. http://mlg.eng.cam.ac.uk/zoubin/course03/hbtnn2e-I.pdf.

LeCun, Yann, Sumit Chopra, Raia Hadsell, M. Ranzato, and F. Huang. 2006. “A Tutorial on Energy-Based Learning.” *Predicting Structured Data*. http://classes.soe.ucsc.edu/cmps290c/Spring12/lect/9/energytut.pdf.

Montanari, Andrea. 2011. “Lecture Notes for Stat 375 Inference in Graphical Models.” http://www.stanford.edu/~montanar/TEACHING/Stat375/handouts/notes_stat375_1.pdf.

Wainwright, Martin J., and Michael I. Jordan. 2008. *Graphical Models, Exponential Families, and Variational Inference*. Vol. 1. Foundations and Trends® in Machine Learning. http://www.cs.berkeley.edu/~jordan/papers/wainwright-jordan-fnt.pdf.

Wainwright, M., and M. Jordan. 2005. “A Variational Principle for Graphical Models.” In *New Directions in Statistical Signal Processing*. Vol. 155. MIT Press.

Wang, Chaohui, Nikos Komodakis, and Nikos Paragios. 2013. “Markov Random Field Modeling, Inference & Learning in Computer Vision & Image Understanding: A Survey.” *Computer Vision and Image Understanding* 117 (11): 1610–27. https://doi.org/10.1016/j.cviu.2013.07.004.

Williams, Daniel. 2020. “Predictive Coding and Thought.” *Synthese* 197 (4): 1749–75. https://doi.org/10.1007/s11229-018-1768-x.

Xing, Eric P., Michael I. Jordan, and Stuart Russell. 2003. “A Generalized Mean Field Algorithm for Variational Inference in Exponential Families.” In *Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence*, 583–91. UAI’03. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. http://arxiv.org/abs/1212.2512.

Yedidia, Jonathan S., W. T. Freeman, and Y. Weiss. 2005. “Constructing Free-Energy Approximations and Generalized Belief Propagation Algorithms.” *IEEE Transactions on Information Theory* 51 (7): 2282–2312. https://doi.org/10.1109/TIT.2005.850085.

Yedidia, J. S., W. T. Freeman, and Y. Weiss. 2003. “Understanding Belief Propagation and Its Generalizations.” In *Exploring Artificial Intelligence in the New Millennium*, edited by G. Lakemeyer and B. Nebel, 239–36. Morgan Kaufmann Publishers. http://www.merl.com/publications/TR2001-22.