Mind as statistical learner

Various morsels on the theme of what-machine-learning-teaches-us-about-our-own-learning. Thus biomimetic algorithms find their converse in our algo-mimetic biology.

This should be more about general learning theory insights. Nitty gritty details about how computing is done by biological systems is more what I think of as biocomputing. If you can unify those then well done, you can grow minds in a petri dish.

Eliezer Yudkowsky’s essay, How an algorithm feels from the inside.

Language theory

The OG test-case of mind-like behaviour is grammatical inference, where a lot of ink was spilled over learnability of languages of various kinds. This is less popular nowadays, where Natural language processing by computer is doing rather interesting things without bothering with the details of formal syntax or traditional semantics. What does that mean? I do not hazard opinions on that because I am too busy for now to form them.

Descriptive statistical models of cognition

See, e.g. a Bayesian model of human problem-solving, Probabilistic Models of Cognition, by Noah Goodman and Joshua Tenenbaum and others, which is also a probabilitic programming textbook.

This book explores the probabilistic approach to cognitive science, which models learning and reasoning as inference in complex probabilistic models. We examine how a broad range of empirical phenomena, including intuitive physics, concept learning, causal reasoning, social cognition, and language understanding, can be modeled using probabilistic programs (using the WebPPL language).

Disclaimer, I have not actually read the book.

This descriptive model is not the same thing as the normative model of Bayesian cognition.

(Dezfouli, Nock, and Dayan 2020; Peterson et al. 2021) do something different again, finding ML models that are good “second-order” fits to how people seem to learn things in practice.

That free energy thing

This section is dedicated to vivisecting a confusing discussion happening in the literature which I have not looked in to. It is could be a profound insight, or terminological confusion, or a re-statement of the Bayes brain stuff with weirder prose style. I may return one day and decide which.

In this realm, the “free energy principle” is instrumental as a unifying concept for learning systems such as brains.

Here is the most compact version I could find:

The free energy principle (FEP) claims that self-organization in biological agents is driven by variational free energy (FE) minimization in a generative probabilistic model of the agent’s environment.

The chief pusher of this wheelbarrow appears to be Karl Friston. (Friston 2010, 2013; Williams 2020)

He starts his Nature Reviews Neuroscience with this statement of the principle:

The free-energy principle says that any self-organizing system that is at equilibrium with its environment must minimize its free energy.

Is that “must” in the sense that it is a

  • moral obligation, or
  • a testable conservation law of some kind?

If the latter, self-organising in what sense? What type of equilibrium? For which definition of the free energy? What is our chief experimental evidence for this hypothesis?

I think it means that any right thinking brain, seeking to avoid the vice of slothful and decadent perception after the manner of foreigners and compulsive masturbators, would do well to seek to maximise its free energy before partaking of a stimulating and refreshing physical recreation such as a game of cricket.

We do get a definition of free energy itself, with a diagram, which

…shows the dependencies among the quantities that define free energy. These include the internal states of the brain \(\mu(t)\) and quantities describing its exchange with the environment: sensory signals (and their motion) \(\bar{s}(t) = [s,s',s''…]^T\) plus action \(a(t)\). The environment is described by equations of motion, which specify the trajectory of its hidden states. The causes \(\vartheta \supset {\bar{x}, \theta, \gamma }\) of sensory input comprise hidden states \(\bar{x} (t),\) parameters \(\theta\), and precisions \(\gamma\) controlling the amplitude of the random fluctuations \(\bar{z}(t)\) and \(\bar{w}(t)\). Internal brain states and action minimize free energy \(F(\bar{s}, \mu)\), which is a function of sensory input and a probabilistic representation \(q(\vartheta|\mu)\) of its causes. This representation is called the recognition density and is encoded by internal states \(\mu\).

The free energy depends on two probability densities: the recognition density \(q(\vartheta|\mu)\) and one that generates sensory samples and their causes, \(p(\bar{s},\vartheta|m)\). The latter represents a probabilistic generative model (denoted by \(m\)), the form of which is entailed by the agent or brain…

\[F = -<\ln p(\bar{s},\vartheta|m)>_q + -<\ln q(\vartheta|\mu)>_q\]

This is (minus the actions) the variational free energy principle in Bayesian inference.

OK, so self-organising systems must improve their variational approximations to posterior beliefs? What is the contentful prediction here?

See also: the Slate Star Codex Friston dogpile, based on an exposition by Wolfgang Schwarz.


Blazek, Paul J., and Milo M. Lin. 2020. “A Neural Network Model of Perception and Reasoning.” arXiv:2002.11319 [cs, q-Bio], February. http://arxiv.org/abs/2002.11319.
Dezfouli, Amir, Richard Nock, and Peter Dayan. 2020. “Adversarial Vulnerabilities of Human Decision-Making.” Proceedings of the National Academy of Sciences 117 (46): 29221–28. https://doi.org/10.1073/pnas.2016921117.
Freer, Cameron E., Daniel M. Roy, and Joshua B. Tenenbaum. 2012. “Towards common-sense reasoning via conditional simulation: legacies of Turing in Artificial Intelligence.” In Turing’s Legacy: Developments from Turing’s Ideas in Logic. Cambridge, United Kingdom: Cambridge University Press. http://arxiv.org/abs/1212.4799.
Friston, Karl. 2010. “The Free-Energy Principle: A Unified Brain Theory?” Nature Reviews Neuroscience 11 (2): 127. https://doi.org/10.1038/nrn2787.
———. 2013. “Life as We Know It.” Journal of The Royal Society Interface 10 (86). https://doi.org/10.1098/rsif.2013.0475.
Gold, E Mark. 1967. “Language Identification in the Limit.” Information and Control 10 (5): 447–74. https://doi.org/10.1016/S0019-9958(67)91165-5.
Gopnik, Alison. 2020. “Childhood as a Solution to Explore–Exploit Tensions.” Philosophical Transactions of the Royal Society B: Biological Sciences 375 (1803): 20190502. https://doi.org/10.1098/rstb.2019.0502.
Greibach, Sheila A. 1966. “The Unsolvability of the Recognition of Linear Context-Free Languages.” J. ACM 13 (4): 582–87. https://doi.org/10.1145/321356.321365.
Griffiths, Thomas L, Nick Chater, Charles Kemp, Amy Perfors, and Joshua B Tenenbaum. 2010. “Probabilistic Models of Cognition: Exploring Representations and Inductive Biases.” Trends in Cognitive Sciences 14 (8): 357–64. https://doi.org/10.1016/j.tics.2010.05.004.
Hasson, Uri, Samuel A. Nastase, and Ariel Goldstein. 2020. “Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks.” Neuron 105 (3): 416–34. https://doi.org/10.1016/j.neuron.2019.12.002.
Kemp, Charles, and Joshua B Tenenbaum. 2008. “The Discovery of Structural Form.” Proceedings of the National Academy of Sciences 105 (31): 10687–92. https://doi.org/10.1073/pnas.0802631105.
Ma, Wei Ji, and Benjamin Peters. 2020. “A Neural Network Walks into a Lab: Towards Using Deep Nets as Models for Human Behavior.” arXiv:2005.02181 [cs, q-Bio], May. http://arxiv.org/abs/2005.02181.
Mansinghka, Vikash, Charles Kemp, Thomas Griffiths, and Joshua Tenenbaum. 2012. “Structured Priors for Structure Learning.” arXiv:1206.6852, June. http://arxiv.org/abs/1206.6852.
Millidge, Beren, Alexander Tschantz, and Christopher L. Buckley. 2020. “Predictive Coding Approximates Backprop Along Arbitrary Computation Graphs.” arXiv:2006.04182 [cs], October. http://arxiv.org/abs/2006.04182.
O’Donnell, Timothy J., Joshua B. Tenenbaum, and Noah D. Goodman. 2009. “Fragment Grammars: Exploring Computation and Reuse in Language,” March. http://dspace.mit.edu/handle/1721.1/44963.
Peterson, Joshua C., David D. Bourgin, Mayank Agrawal, Daniel Reichman, and Thomas L. Griffiths. 2021. “Using Large-Scale Experiments and Machine Learning to Discover Theories of Human Decision-Making.” Science 372 (6547): 1209–14. https://doi.org/10.1126/science.abe2629.
Saxe, Andrew, Stephanie Nelli, and Christopher Summerfield. 2020. “If Deep Learning Is the Answer, Then What Is the Question?” arXiv:2004.07580 [q-Bio], April. http://arxiv.org/abs/2004.07580.
Steyvers, Mark, and Joshua B. Tenenbaum. 2005. “The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth.” Cognitive Science 29 (1): 41–78. https://doi.org/10.1207/s15516709cog2901_3.
Tenenbaum, Joshua B, Charles Kemp, Thomas L Griffiths, and Noah D Goodman. 2011. “How to Grow a Mind: Statistics, Structure, and Abstraction.” Science 331 (6022): 1279. https://doi.org/10.1126/science.1192788.
Ullman, Tomer D., Noah D. Goodman, and Joshua B. Tenenbaum. 2012. “Theory Learning as Stochastic Search in the Language of Thought.” Cognitive Development. https://doi.org/10.1016/j.cogdev.2012.07.005.
Williams, Daniel. 2020. “Predictive Coding and Thought.” Synthese 197 (4): 1749–75. https://doi.org/10.1007/s11229-018-1768-x.
Wolff, J Gerard. 2000. “Syntax, Parsing and Production of Natural Language in a Framework of Information Compression by Multiple Alignment, Unification and Search.” Journal of Universal Computer Science 6 (8): 781–829.
Yuan, Lei, Violet Xiang, David Crandall, and Linda Smith. 2020. “Learning the generative principles of a symbol system from limited examples.” Cognition 200 (July): 104243. https://doi.org/10.1016/j.cognition.2020.104243.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.