Semantics

Compressed representations of reality for syntactic agents; which might be what meaning means

2014-12-29 — 2022-08-27

Wherein the mapping between linguistic tokens and their referents is surveyed, and attention is given to vector grounding in transformers, MRI evidence for shared conceptualisations, and object‑anchored embeddings.

classification

communicating

feature construction

high d

language

machine learning

metrics

mind

NLP

“[…] archetypes don’t exist; the body exists.
The belly inside is beautiful, because the baby grows there,
because your sweet cock, all bright and jolly, thrusts there,
and good, tasty food descends there,
and for this reason the cavern, the grotto, the tunnel
are beautiful and important, and the labyrinth, too,
which is made in the image of our wonderful intestines.
When somebody wants to invent something beautiful and important,
it has to come from there,
because you also came from there the day you were born,
because fertility always comes from inside a cavity,
where first something rots and then, lo and behold,
there’s a little man, a date, a baobab.

And high is better than low,
because if you have your head down, the blood goes to your brain,
because feet stink and hair doesn’t stink as much,
because it’s better to climb a tree and pick fruit
than end up underground, food for worms,
and because you rarely hurt yourself hitting something above
— you really have to be in an attic —
while you often hurt yourself falling.
That’s why up is angelic and down devilish.”

— Umberto Eco. Foucault’s Pendulum.

On the mapping between linguistic tokens and what they denote.

If I had time I would learn about: Wierzbicka’s semantic primes, Wittgenstein, probably Mark Johnson if the over-egging doesn’t kill me. Logic-and-language philosophers, toy axiomatic worlds. Classic AI symbolic reasoning approaches. Drop in via game theory and neurolinguistics? Ignore most of it, mention plausible models based on statistical learnability.

1 Symbol grounding

Piantadosi and Hill (2022) on the Symbol grounding problem in transformers. This is now charmingly referred to as the “vector grounding problem” (Mollo and Millière 2023).

2 As a classification problem

Eliezer Yudkowsky’s essay, How an algorithm feels from the inside, which inspired Scott Alexander’s The Categories Were Made For Man, Not Man For The Categories.

From a different direction, Microsoft argues that objects are a kind of anchor point in training cross-modal AI systems. (Li et al. 2020)

…objects can be naturally used as anchor points to ease the learning of semantic alignments between images and texts. This discovery leads to a novel VLP framework that creates new state-of-the-art performance on six well-established vision-and-language tasks. …. Though the observed data varies among different channels (modalities), we hypothesize that important factors tend to be shared among multiple channels (for example, dogs can be described visually and verbally), capturing channel-invariant (or modality-invariant) factors at the semantic level. In vision-and-language tasks, salient objects in an image can be mostly detected by modern object detectors, and such objects are often mentioned in the paired text.

Also does embodiment mean for this stuff, in terms of priors?

3 As an evolutionary phenomenon

Moved to Language games.

4 Simulacra

See simulacra.

5 Neurology of

What does the MRI tell us about denotation in the brain?

(Stolk et al. 2014) is worth it for the tagline: “experimental semiotics”

How can we understand each other during communicative interactions? An influential suggestion holds that communicators are primed by each other’s behaviours, with associative mechanisms automatically coordinating the production of communicative signals and the comprehension of their meanings. An alternative suggestion posits that mutual understanding requires shared conceptualisations of a signal’s use, i.e., “conceptual pacts” that are abstracted away from specific experiences. Both accounts predict coherent neural dynamics across communicators, aligned either to the occurrence of a signal or to the dynamics of conceptual pacts. Using coherence spectral-density analysis of cerebral activity simultaneously measured in pairs of communicators, this study shows that establishing mutual understanding of novel signals synchronises cerebral dynamics across communicators’ right temporal lobes. This interpersonal cerebral coherence occurred only within pairs with a shared communicative history, and at temporal scales independent from signals’ occurrences. These findings favour the notion that meaning emerges from shared conceptualisations of a signal’s use.

6 Word vector models

Where the action is at. See vector embeddings.

7 Incoming

8 References

Abend, and Rappoport. 2017. “The State of the Art in Semantic Representation.” In.

Arbib. 2002. “The Mirror System, Imitation, and the Evolution of Language.” In Imitation in Animals and Artifacts.

Baronchelli, Gong, Puglisi, et al. 2010. “Modeling the emergence of universality in color naming patterns.” Proceedings of the National Academy of Sciences of the United States of America.

Bender, and Koller. 2020. “Climbing Towards NLU: On Meaning, Form, and Understanding in the Age of Data.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.

Bengio, Ducharme, Vincent, et al. 2003. “A Neural Probabilistic Language Model.” Journal of Machine Learning Research.

Bishop. 2021. “Artificial Intelligence Is Stupid and Causal Reasoning Will Not Fix It.” Frontiers in Psychology.

Cancho, and Solé. 2003. “Least Effort and the Origins of Scaling in Human Language.” Proceedings of the National Academy of Sciences.

Cao, Hripcsak, and Markatou. 2007. “A statistical methodology for analyzing co-occurrence data from a large sample.” Journal of Biomedical Informatics.

Christiansen, and Chater. 2008. “Language as Shaped by the Brain.” Behavioral and Brain Sciences.

Corominas-Murtra, and Solé. 2010. “Universality of Zipf’s Law.” Physical Review E.

Cunningham, Ewart, Riggs, et al. 2023. “Sparse Autoencoders Find Highly Interpretable Features in Language Models.”

Deerwester, Dumais, Furnas, et al. 1990. “Indexing by Latent Semantic Analysis.”

Elman. 1990. “Finding Structure in Time.” Cognitive Science.

———. 1993. “Learning and Development in Neural Networks: The Importance of Starting Small.” Cognition.

———. 1995. “Language as a Dynamical System.”

Feldman, and Choi. 2022. “Meaning and Reference from a Probabilistic Point of View.” Cognition.

Gärdenfors. 2014. Geometry of Meaning: Semantics Based on Conceptual Spaces.

Goldstein, Wang, Niekerken, et al. 2025. “A Unified Acoustic-to-Speech-to-Language Embedding Space Captures the Neural Basis of Natural Language Processing in Everyday Conversations.” Nature Human Behaviour.

Gozli. 2023. “Principles of Categorization: A Synthesis.” Seeds of Science.

Guthrie, Allison, Liu, et al. 2006. “A Closer Look at Skip-Gram Modelling.” In.

Jha, Zhang, Shmatikov, et al. 2025. “Harnessing the Universal Geometry of Embeddings.”

Kiros, Zhu, Salakhutdinov, et al. 2015. “Skip-Thought Vectors.” arXiv:1506.06726 [Cs].

Lazaridou, Nguyen, Bernardi, et al. 2015. “Unveiling the Dreams of Word Embeddings: Towards Language-Driven Image Generation.” arXiv:1506.03500 [Cs].

Le, and Mikolov. 2014. “Distributed Representations of Sentences and Documents.” In Proceedings of The 31st International Conference on Machine Learning.

Li, Yin, Li, et al. 2020. “Oscar: Object-Semantics Aligned Pre-Training for Vision-Language Tasks.”

Loreto, Mukherjee, and Tria. 2012. “On the Origin of the Hierarchy of Color Names.” Proceedings of the National Academy of Sciences of the United States of America.

Mikolov, Chen, Corrado, et al. 2013. “Efficient Estimation of Word Representations in Vector Space.” arXiv:1301.3781 [Cs].

Mikolov, Le, and Sutskever. 2013. “Exploiting Similarities Among Languages for Machine Translation.” arXiv:1309.4168 [Cs].

Mikolov, Sutskever, Chen, et al. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” In arXiv:1310.4546 [Cs, Stat].

Mikolov, Yih, and Zweig. 2013. “Linguistic Regularities in Continuous Space Word Representations.” In HLT-NAACL.

Mollo, and Millière. 2023. “The Vector Grounding Problem.”

Narayanan, Chandramohan, Venkatesan, et al. 2017. “Graph2vec: Learning Distributed Representations of Graphs.” arXiv:1707.05005 [Cs].

Nunes, and Antunes. 2024. “Machines of Meaning.”

“Oscar: Objects Are the Secret Key to Link Between Language and Vision.” 2020. Microsoft Research (blog).

Park, Choe, and Veitch. 2024. “The Linear Representation Hypothesis and the Geometry of Large Language Models.”

Pennington, Socher, and Manning. 2014. “GloVe: Global Vectors for Word Representation.” Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014).

Petersson, Folia, and Hagoort. 2012. “What Artificial Grammar Learning Reveals about the Neurobiology of Syntax.” Brain and Language, The Neurobiology of Syntax,.

Piantadosi, and Hill. 2022. “Meaning Without Reference in Large Language Models.”

Rizzolatti, and Craighero. 2004. “The Mirror-Neuron System.” Annual Review of Neuroscience.

Smith, and Kirby. 2008. “Cultural Evolution: Implications for Understanding the Human Language Faculty and Its Evolution.” Philosophical Transactions of the Royal Society B: Biological Sciences.

Steyvers, and Tenenbaum. 2005. “The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth.” Cognitive Science.

Stolk, Noordzij, Verhagen, et al. 2014. “Cerebral Coherence Between Communicators Marks the Emergence of Meaning.” Proceedings of the National Academy of Sciences.

Zanette. 2006. “Zipf’s Law and the Creation of Musical Context.” Musicae Scientiae.