World models arising in foundation models.

2024-12-20 — 2026-02-05

Wherein embeddings from sundry models are found mappable by structure alone, without paired data, and neural speech activity is aligned linearly with such contextual vectors.

AI safety

approximation

Bayes

generative

language

machine learning

meta learning

Monte Carlo

neural nets

NLP

optimization

probabilistic algorithms

probability

statistics

stringology

time series

Placeholder notes on what kinds of world models sit inside large neural nets. It seems they do have some kind of internal model of the outside world; In practice what kind of thing is it?

1 Representational similarity

Are the semantics of embeddings or other internal representations in different models or modalities represented in a common “Platonic” space that’s universal in some sense (Huh et al. 2024b)? If so, should we care?

I confess I struggle to make this concrete enough to produce testable hypotheses; that’s probably because I haven’t read enough of the literature. Here’s something that might be progress:

Jack Morris “Excited to finally share on arXiv what we’ve known for a while now: All Embedding Models Learn The Same Thing. Embeddings from different models are so similar that we can map between them based on structure alone — without any paired data. Feels like magic, but it’s real:🧵” (Jha et al. 2025)

My friend Pascal Hirsch mentions the hypothesis that

This should also apply to the embeddings people have in their brains, referring to this fascinating recent Google paper (Goldstein et al. 2025)

[…] neural activity in the human brain aligns linearly with the internal contextual embeddings of speech and language within LLMs as they process everyday conversations.

2 Causal world models

World models are somehow a somewhat different concept than “representation”; I’m not precisely sure how, but from skimming, it seems like world models might be easier to ground in causal abstraction and causal inference.

See causal abstraction for a discussion of the idea that a neural net’s latent space can end up discovering causal representations of the world.

3 Creating worlds to model

Rosas, Boyd, and Baltieri (2025) makes a pleasing connection to the simulation hypothesis:

Recent work proposes using world models to generate controlled virtual environments in which AI agents can be tested before deployment to ensure their reliability and safety. However, accurate world models often have high computational demands that can severely restrict the scope and depth of such assessments. Inspired by the classic `brain in a vat’ thought experiment, here we investigate ways of simplifying world models that remain agnostic to the AI agent under evaluation. By following principles from computational mechanics, our approach reveals a fundamental trade-off in world model construction between efficiency and interpretability, demonstrating that no single world model can optimise all desirable characteristics. Building on this trade-off, we identify procedures to build world models that either minimise memory requirements, delineate the boundaries of what is learnable, or allow tracking causes of undesirable outcomes. In doing so, this work establishes fundamental limits in world modelling, leading to actionable guidelines that inform core design choices related to effective agent evaluation.

4 Incoming

Condensation: a theory of concepts — Sam Eisenstat at MAISU 2025, talk and discussion
Tom Wentworth’s Testing the Natural Abstraction Hypothesis: Project Intro
Jon Kleinberg, AI’s Models of the World, and Ours | Theoretically Speaking
How Does a Blind Model See the Earth? - by Henry — latent geographical “world” model (!)
NeurIPS 2023 Tutorial: Language Models meet World Models

5 References

Basu, Grayson, Morrison, et al. 2024. “Understanding Information Storage and Transfer in Multi-Modal Large Language Models.”

Bengio, Courville, and Vincent. 2013. “Representation Learning: A Review and New Perspectives.” IEEE Transactions on Pattern Analysis and Machine Intelligence.

Cai. 2024. Invitation to Supervisory Control of Discrete-Event Systems: with Hands-On PyTCT.

Chirimuuta. 2025. “The Prehistory of the Idea That Thinking Is Modelling.” Human Arenas.

Costa-Castello, Nebot, and Grino. 2005. “Demonstration of the Internal Model Principle by Digital Repetitive Control of an Educational Laboratory Plant.” IEEE Transactions on Education.

Du, Fu, Wen, et al. 2025. “Human-Like Object Concept Representations Emerge Naturally in Multimodal Large Language Models.” Nature Machine Intelligence.

Francis, and Wonham. 1976. “The Internal Model Principle of Control Theory.” Automatica.

Ge, Huang, Zhou, et al. 2024. “WorldGPT: Empowering LLM as Multimodal World Model.” In Proceedings of the 32nd ACM International Conference on Multimedia. MM ’24.

Goldstein, Wang, Niekerken, et al. 2025. “A Unified Acoustic-to-Speech-to-Language Embedding Space Captures the Neural Basis of Natural Language Processing in Everyday Conversations.” Nature Human Behaviour.

Hafner, Pasukonis, Ba, et al. 2024. “Mastering Diverse Domains Through World Models.”

Hao, Gu, Ma, et al. 2023. “Reasoning with Language Model Is Planning with World Model.”

Huang, Isidori, Marconi, et al. 2018. “Internal Models in Control, Biology and Neuroscience.” In 2018 IEEE Conference on Decision and Control (CDC).

Huh, Cheung, Wang, et al. 2024a. “Position: The Platonic Representation Hypothesis.” In Proceedings of the 41st International Conference on Machine Learning.

———, et al. 2024b. “The Platonic Representation Hypothesis.”

Hu, and Shu. 2023. “Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning.”

Jha, Zhang, Shmatikov, et al. 2025. “Harnessing the Universal Geometry of Embeddings.”

Kanerva. 2009. “Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors.” Cognitive Computation.

Klabunde, Amor, Granitzer, et al. 2023. “Towards Measuring Representational Similarity of Large Language Models.”

Klabunde, Schumacher, Strohmaier, et al. 2025. “Similarity of Neural Network Models: A Survey of Functional and Representational Measures.” ACM Comput. Surv.

Klabunde, Wald, Schumacher, et al. 2025. “ReSi: A Comprehensive Benchmark for Representational Similarity Measures.”

Lin, and Tegmark. 2016. “Why Does Deep and Cheap Learning Work so Well?” arXiv:1608.08225 [Cond-Mat, Stat].

Nielsen, Marconato, Dittadi, et al. 2025. “When Does Closeness in Distribution Imply Representational Similarity? An Identifiability Perspective.”

Park, Choe, and Veitch. 2024. “The Linear Representation Hypothesis and the Geometry of Large Language Models.”

Richens, Abel, Bellot, et al. 2025. “General Agents Contain World Models.” In ICML.

Richens, and Everitt. 2024. “Robust Agents Learn Causal World Models.”

Rosas, Boyd, and Baltieri. 2025. “AI in a Vat: Fundamental Limits of Efficient World Modelling for Agent Sandboxing and Interpretability.” In.

Saengkyongam, Rosenfeld, Ravikumar, et al. 2024. “Identifying Representations for Intervention Extrapolation.”

Wong, Grand, Lew, et al. 2023. “From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought.”

Wonham, and Cai. 2019. Supervisory Control of Discrete-Event Systems. Communications and Control Engineering.

Yildirim, and Paul. 2024. “From Task Structures to World Models: What Do LLMs Know?” Trends in Cognitive Sciences.