Mind as statistical learner

Shoggoth pride

2020-06-23 — 2025-08-31

Wherein human learning is depicted as statistical inference, and Bayesian probabilistic programs and grammatical-inference schemes are employed to explain perception, language, and problem‑solving.

AI safety

collective knowledge

learning

life

mind

probability

statistics

statmech

Figure 1: The figure represents the contents of the consciousness; a part, under A, being present in attention, the portion P representing self-consciousness; a part, under B, being outside the range of attention and hence subconscious, a part, under C, being so far removed from consciousness as to be almost inaccessible.

Models of mind are popular among ML nerds. Various morsels on the theme of what machine learning teaches us about our own learning. Thus biomimetic algorithms find their converse in our algo-mimetic biology, perhaps.

This is more about general learning-theory insights. Nitty-gritty details about how biological systems compute are more what I think of as biocomputing. If we can unify those, then—well done: we can grow minds in a petri dish.

1 Language theory

The OG test-case of mind-like behaviour is grammatical inference, where a lot of ink was spilled on the learnability of various kinds of languages. This’s less popular nowadays, as computer-based Natural language processing is doing rather interesting things without bothering with the details of formal syntax or traditional semantics. What does that mean? I don’t hazard an opinion on that because I’m too busy right now to form one.

There are some provocative results, though, coming from the theory of vector embeddings, e.g. Goldstein et al. (2025).

2 Descriptive statistical models of cognition

See, for example, the Bayesian model of human problem-solving in Probabilistic Models of Cognition, by Noah Goodman, Joshua Tenenbaum, and others; it’s also a probabilistic programming textbook.

This book explores the probabilistic approach to cognitive science, which models learning and reasoning as inference in complex probabilistic models. We examine how a broad range of empirical phenomena, including intuitive physics, concept learning, causal reasoning, social cognition, and language understanding, can be modelled using probabilistic programs (using the WebPPL language).

Disclaimer: I have not actually read the book.

This descriptive model isn’t the same as the normative model of Bayesian cognition.

(Dezfouli, Nock, and Dayan 2020; Peterson et al. 2021) do something different: they find ML models that are good “second-order” fits for how people seem to learn in practice.

3 Biologically (more) plausible neural nets

See, e.g., forward-forward.

4 Free energy principle

See predictive coding.

5 Little shoggoths, we

Shoggoth with Smiley Face (Artificial Intelligence) | Know Your Meme

I will put the obvious question: Can I know I am not a shoggoth?

6 Life as ML

Michael Levin talks a good game here:

SC: Does this whole philosophy help us, either philosophically or practically, when it comes to our ambitions to go in there and change organisms, not just solve, cure diseases, but to make new organisms to do synthetic biology to create new things from scratch and vice versa, does it help us in what we would think of usually as robotics or technology, can we learn lessons from the biological side of things?

ML: Yeah, I think absolutely. And there’s two ways to… There’s sort of a short-term view and a longer-term view of this. The short-term view is that, absolutely, so we work very closely with roboticists to take deep concepts in both directions. So on the one hand, take the things that we’ve learned from the robustness and intelligence… I mean, the intelligent problem-solving of these living forms is incredibly high, and even organisms without brains, this whole focus on kind of like neuromorphic architectures for AI, I think is really a very limiting way to look at it. And so we try very hard to export some of these concepts into machine learning, into robotics, and so on, multi-scale robotics… I gave a talk called why robots don’t get cancer. And this is, this is exactly the problem, is we make devices where the pieces don’t have sub-goals, and that’s the good news is, yes, no, you’re not going to have a robots where part of it decides to defect and do something different, but on the other hand, the robots aren’t very good, they’re not very flexible.

ML: So part of this we’re trying to export, and then going in the other direction and take interesting concepts from computer science, from cognitive science, into biology to help us understand how this works. I fundamentally think that computer science and biology are not really different fields, I think we are all studying computation just in different media, and I do think there’s a lot of opportunity for back and forth. But now, the other thing that you mentioned is really important, which is the creation of novel systems. We are doing some work on synthetic living machines and creating new life forms by basically taking perfectly normal cells and giving them additional freedom and then some stimulation to become other types of organisms.

ML: We, I think in our lifetime, I think, we are going to be surrounded by… Darwin had this phrase, endless forms most beautiful. I think the reality is going to be a variety of living agents that he couldn’t have even conceived of, in the sense that the space, and this is something I’m working on now, is to map out at least the axes of this option space of all possible agents, because what the bioengineering is enabling us to do is to create hybrid… To create hybrid agents that are in part biological, in part electronic, the parts are designed, parts are evolved. The parts that are evolved might have been biologically evolved or they might have been evolved in a virtual environment using genetic algorithms on a computer, all of these combinations, and this… We’re going to see everything from household appliances that are run in part by machine learning and part by living brains that are sort of being controllers for various things that we would like to optimize, to humans and animals that have various implants that may allow them to control other devices and communicate with each other.

7 Shard Theory

Shard Theory: An Overview

Shard theory is a research program aimed at explaining the systematic relationships between the reinforcement schedules and learned values of reinforcement-learning agents. It consists of a basic ontology of reinforcement learners, their internal computations, and their relationship to their environment. It makes several predictions about a range of RL systems, both RL models and humans. Indeed, shard theory can be thought of as simply applying the modern ML lens to the question of value learning under reinforcement in artificial and natural neural networks!

Some of shard theory’s confident predictions can be tested immediately in modern RL agents. Less confident predictions about i.i.d.-trained language models can also be tested now. Shard theory also has numerous retrodictions about human psychological phenomena that are otherwise mysterious from only the viewpoint of EU maximization, with no further substantive mechanistic account of human learned values. Finally, shard theory fails some retrodictions in humans; on further inspection, these lingering confusions might well falsify the theory.

If shard theory captures the essential dynamic relating reinforcement schedules and learned values, then we’ll be able to carry out a steady stream of further experiments yielding a lot of information about how to reliably instill more of the values we want in our RL agents and fewer of those we don’t.

8 References

Addicott, Pearson, Schechter, et al. 2021. “Attention-Deficit/Hyperactivity Disorder and the Explore/Exploit Trade-Off.” Neuropsychopharmacology.

Aimone, and Parekh. 2023. “The Brain’s Unique Take on Algorithms.” Nature Communications.

Beniaguev, Segev, and London. 2021. “Single Cortical Neurons as Deep Artificial Neural Networks.” Neuron.

Binz, Dasgupta, Jagadish, et al. 2024. “Meta-Learned Models of Cognition.” Behavioral and Brain Sciences.

Blazek, and Lin. 2020. “A Neural Network Model of Perception and Reasoning.” arXiv:2002.11319 [Cs, q-Bio].

Dabagia, Papadimitriou, and Vempala. 2023. “Computation with Sequences in the Brain.”

Dabagia, Vempala, and Papadimitriou. 2022. “Assemblies of Neurons Learn to Classify Well-Separated Distributions.” In Proceedings of Thirty Fifth Conference on Learning Theory.

Dezfouli, Nock, and Dayan. 2020. “Adversarial Vulnerabilities of Human Decision-Making.” Proceedings of the National Academy of Sciences.

Doyle, and Csete. 2011. “Architecture, Constraints, and Behavior.” Proceedings of the National Academy of Sciences.

Drugowitsch, Mendonça, Mainen, et al. 2019. “Learning Optimal Decisions with Confidence.” Proceedings of the National Academy of Sciences.

Du, Fu, Wen, et al. 2025. “Human-Like Object Concept Representations Emerge Naturally in Multimodal Large Language Models.” Nature Machine Intelligence.

Freer, Roy, and Tenenbaum. 2012. “Towards common-sense reasoning via conditional simulation: legacies of Turing in Artificial Intelligence.” In Turing’s Legacy: Developments from Turing’s Ideas in Logic.

Friston, Karl. 2010. “The Free-Energy Principle: A Unified Brain Theory?” Nature Reviews Neuroscience.

———. 2013. “Life as We Know It.” Journal of The Royal Society Interface.

Friston, Karl J., Parr, and de Vries. 2017. “The Graphical Brain: Belief Propagation and Active Inference.” Network Neuroscience.

Glymour. 2007. “When Is a Brain Like the Planet?” Philosophy of Science.

Gold. 1967. “Language Identification in the Limit.” Information and Control.

Goldstein, Wang, Niekerken, et al. 2025. “A Unified Acoustic-to-Speech-to-Language Embedding Space Captures the Neural Basis of Natural Language Processing in Everyday Conversations.” Nature Human Behaviour.

Gopnik. 2020. “Childhood as a Solution to Explore–Exploit Tensions.” Philosophical Transactions of the Royal Society B: Biological Sciences.

Greibach. 1966. “The Unsolvability of the Recognition of Linear Context-Free Languages.” J. ACM.

Griffiths, Chater, Kemp, et al. 2010. “Probabilistic Models of Cognition: Exploring Representations and Inductive Biases.” Trends in Cognitive Sciences.

Hasson, Nastase, and Goldstein. 2020. “Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks.” Neuron.

Hinton. 2022. “The Forward-Forward Algorithm: Some Preliminary Investigations.”

Hoel. 2021. “The Overfitted Brain: Dreams Evolved to Assist Generalization.” Patterns.

Hulsbosch, Beckers, De Meyer, et al. n.d. “Instrumental Learning and Behavioral Persistence in Children with Attention-Deficit/Hyperactivity-Disorder: Does Reinforcement Frequency Matter?” Journal of Child Psychology and Psychiatry.

Jaeger, Noheda, and van der Wiel. 2023. “Toward a Formal Theory for Computing Machines Made Out of Whatever Physics Offers.” Nature Communications.

Jha, Zhang, Shmatikov, et al. 2025. “Harnessing the Universal Geometry of Embeddings.”

Kemp, and Tenenbaum. 2008. “The Discovery of Structural Form.” Proceedings of the National Academy of Sciences.

Kemp, Tenenbaum, Niyogi, et al. 2010. “A Probabilistic Model of Theory Formation.” Cognition.

Kosinski. 2023. “Theory of Mind May Have Spontaneously Emerged in Large Language Models.”

Kosoy, Chan, Liu, et al. 2022. “Towards Understanding How Machines Can Learn Causal Overhypotheses.”

Lee, Leibo, An, et al. 2022. “Importance of prefrontal meta control in human-like reinforcement learning.” Frontiers in Computational Neuroscience.

Lillicrap, and Santoro. 2019. “Backpropagation Through Time and the Brain.” Current Opinion in Neurobiology, Machine Learning, Big Data, and Neuroscience,.

Mainen, Häusser, and Pouget. 2016. “A Better Way to Crack the Brain.” Nature.

Ma, Wei Jin, Kording, and Goldreich. 2022. Bayesian Models of Perception and Action.

Mansinghka, Kemp, Griffiths, et al. 2012. “Structured Priors for Structure Learning.” arXiv:1206.6852.

Ma, Wei Ji, and Peters. 2020. “A Neural Network Walks into a Lab: Towards Using Deep Nets as Models for Human Behavior.” arXiv:2005.02181 [Cs, q-Bio].

McGee, Kosterlitz, Kaznatcheev, et al. 2022. “The Cost of Information Acquisition by Natural Selection.”

Meyniel, Sigman, and Mainen. 2015. “Confidence as Bayesian Probability: From Neural Origins to Behavior.” Neuron.

Millidge, Tschantz, and Buckley. 2020. “Predictive Coding Approximates Backprop Along Arbitrary Computation Graphs.” arXiv:2006.04182 [Cs].

Mitropolsky, Collins, and Papadimitriou. 2021. “A Biologically Plausible Parser.”

Nissan, Hertz, Shahar, et al. 2023. “Distinct Reinforcement Learning Profiles Distinguish Between Language and Attentional Neurodevelopmental Disorders.” Behavioral and Brain Functions.

O’Donnell, Tenenbaum, and Goodman. 2009. “Fragment Grammars: Exploring Computation and Reuse in Language.”

Ororbia, and Mali. 2023. “The Predictive Forward-Forward Algorithm.”

Papadimitriou, and Vempala. 2018. “Random Projection in the Brain and Computation with Assemblies of Neurons.” In 10th Innovations in Theoretical Computer Science Conference (ITCS 2019). Leibniz International Proceedings in Informatics (LIPIcs).

Papadimitriou, Vempala, Mitropolsky, et al. 2020. “Brain computation by assemblies of neurons.” Proceedings of the National Academy of Sciences of the United States of America.

Peterson, Bourgin, Agrawal, et al. 2021. “Using Large-Scale Experiments and Machine Learning to Discover Theories of Human Decision-Making.” Science.

Pollak. 2023. “Poor Learning or Hyper‐exploration?: A Commentary on Hulsbosch Et Al. (2023).” Journal of Child Psychology and Psychiatry.

Porr, and Miller. 2020. “Forward Propagation Closed Loop Learning.” Adaptive Behavior.

Ren, Kornblith, Liao, et al. 2022. “Scaling Forward Gradient With Local Losses.”

Robertazzi, Vissani, Schillaci, et al. 2022. “Brain-Inspired Meta-Reinforcement Learning Cognitive Control in Conflictual Inhibition Decision-Making Task for Artificial Agents.” Neural Networks.

Saxe, Nelli, and Summerfield. 2020. “If Deep Learning Is the Answer, Then What Is the Question?” arXiv:2004.07580 [q-Bio].

Shiffrin, and Mitchell. 2023. “Probing the Psychology of AI Models.” Proceedings of the National Academy of Sciences.

Smith, Taylor, Wilson, et al. 2022. “Lower Levels of Directed Exploration and Reflective Thinking Are Associated With Greater Anxiety and Depression.” Frontiers in Psychiatry.

Starr. 1913. Organic and functional nervous diseases; a text-book of neurology.

Steyvers, and Tenenbaum. 2005. “The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth.” Cognitive Science.

Tenenbaum, Kemp, Griffiths, et al. 2011. “How to Grow a Mind: Statistics, Structure, and Abstraction.” Science.

Ullman, Goodman, and Tenenbaum. 2012. “Theory Learning as Stochastic Search in the Language of Thought.” Cognitive Development.

Vanchurin, Wolf, Katsnelson, et al. 2021. “Towards a Theory of Evolution as Multilevel Learning.”

Wang, Kurth-Nelson, Kumaran, et al. 2018. “Prefrontal cortex as a meta-reinforcement learning system.” Nature Neuroscience.

Williams. 2020. “Predictive Coding and Thought.” Synthese.

Wolff. 2000. “Syntax, Parsing and Production of Natural Language in a Framework of Information Compression by Multiple Alignment, Unification and Search.” Journal of Universal Computer Science.

Yuan, Xiang, Crandall, et al. 2020. “Learning the generative principles of a symbol system from limited examples.” Cognition.

Yu, Xu, Weston, et al. 2024. “Distilling System 2 into System 1.”