Platonic and convergent representations in neural nets

2024-12-20 — 2026-06-29

Wherein the Convergence of Internal Representations Across Neural Networks and Human Brains Is Examined, With Attention to Findings That Embedding Models of Differing Architectures May Be Mapped Between One Another Without Paired Data.

AI safety

approximation

Bayes

generative

language

machine learning

meta learning

Monte Carlo

neural nets

NLP

optimization

probabilistic algorithms

probability

statistics

stringology

time series

Placeholder notes on when representations of the world in learning systems converge, in some sense, to “universal” or “Platonic” representations. It seems that such systems, including LLMs and most neural networks, do have some kind of internal model of the outside world; what needs do such models share?

Are the semantics of embeddings or other internal representations in different models or modalities represented in a common “Platonic” space that’s universal in some sense (Huh et al. 2024b)? If so, should we care?

I confess I struggle to make this concrete enough to produce testable hypotheses; that’s probably because I haven’t read enough of the literature. Here’s something that might be progress:

Jack Morris “Excited to finally share on arXiv what we’ve known for a while now: All Embedding Models Learn The Same Thing. Embeddings from different models are so similar that we can map between them based on structure alone — without any paired data. Feels like magic, but it’s real:🧵” (Jha et al. 2025)

My friend Pascal Hirsch mentions the hypothesis that

This should also apply to the embeddings people have in their brains, referring to this fascinating recent Google paper (Goldstein et al. 2025)

[…] neural activity in the human brain aligns linearly with the internal contextual embeddings of speech and language within LLMs as they process everyday conversations.

1 Incoming

Condensation: a theory of concepts — Sam Eisenstat at MAISU 2025, talk and discussion
Tom Wentworth’s Testing the Natural Abstraction Hypothesis: Project Intro
Jon Kleinberg, AI’s Models of the World, and Ours | Theoretically Speaking

2 References

Du, Fu, Wen, et al. 2025. “Human-Like Object Concept Representations Emerge Naturally in Multimodal Large Language Models.” Nature Machine Intelligence.

Goldstein, Wang, Niekerken, et al. 2025. “A Unified Acoustic-to-Speech-to-Language Embedding Space Captures the Neural Basis of Natural Language Processing in Everyday Conversations.” Nature Human Behaviour.

Huang, Isidori, Marconi, et al. 2018. “Internal Models in Control, Biology and Neuroscience.” In 2018 IEEE Conference on Decision and Control (CDC).

Huh, Cheung, Wang, et al. 2024a. “Position: The Platonic Representation Hypothesis.” In Proceedings of the 41st International Conference on Machine Learning.

———, et al. 2024b. “The Platonic Representation Hypothesis.”

Jha, Zhang, Shmatikov, et al. 2025. “Harnessing the Universal Geometry of Embeddings.”

Klabunde, Amor, Granitzer, et al. 2023. “Towards Measuring Representational Similarity of Large Language Models.”

Klabunde, Schumacher, Strohmaier, et al. 2025. “Similarity of Neural Network Models: A Survey of Functional and Representational Measures.” ACM Comput. Surv.

Klabunde, Wald, Schumacher, et al. 2025. “ReSi: A Comprehensive Benchmark for Representational Similarity Measures.”

Nielsen, Marconato, Dittadi, et al. 2025. “When Does Closeness in Distribution Imply Representational Similarity? An Identifiability Perspective.”

Park, Choe, and Veitch. 2024. “The Linear Representation Hypothesis and the Geometry of Large Language Models.”