Knowledge geometry

2012-05-23 — 2025-04-18

Wherein the adjacency of disciplines is considered, and embeddings of papers into hyperbolic metric spaces and citation networks are examined as models for the growth and gaps of collective knowledge.

collective knowledge

how do science

stringology

topology

1 What is the shape of collected human knowledge?

Figure 1: Visualising the Neurips paper series by Hendrik Strobelt and Benjamin Hoover Lee Campbell and Marc’Aurelio Ranzato. See the blogpost for more info.

I don’t know how to make this idea precise, but I’d like to spitball a few ideas about how branches of knowledge might be adjacent to each other or not, in various senses.

For example, what metric space provides a natural embedding for the articles of an encyclopedia so that articles that are near in (the metric of) that space possess similar subject classifications? I’ve seen arguments that hyperbolic and other non-Euclidean geometries arise here. But then, because our monkey minds work this way, we seem to project knowledge onto 2D Euclidean maps, and it seems to be at least interesting:

What kind of knowledge relationship mechanisms are plausible? Could you mine patent networks or theorem networks to parameterise a stochastic process for this model which made it a plausible model for theorem growth? If not, what quality does knowledge possess which this could not encapsulate? If we have a good metric on knowledge that we can learn from (e.g.) the text of papers, can we use it to discover similar research? Or to identify gaps in knowledge that we can fill?

See also:

platonic representation hypothesis
Innovation Is a material basis for technology plus a knowledge topology equal to a model of technology? I suspect not — surely there are emergent effects. But there must be a relationship.
Spaces of strings
String dynamics
Embeddings for search which attempt to index documents about knowledge; a surely related topic.

Figure 3: Franka Miriam Brückler’s famous map of Middle Math

Or should we think about the mechanism of knowledge generation? Can we represent knowledge as a network (or a landscape?) that grows around agent activity? Some kind of growth process around researchers? (keywords: “models of growth aggregation”, “rough interfaces”, “growth with surface diffusion”, “nucleation”, “morphogenesis”) Is this a constrained growth problem, like the one that governs coral drills?

Investigate configuration spaces of technologies. (See configuration space of the economy.) Genotype-phenotype interactions as a model of knowledge-economic systems? What is the most basic stochastic process that would serve as a model of these?

(Practical aside: How much area must a new thesis carve out from the unmade world?)

Now, going out on a limb, consider a problem domain that looks evolutionary if you squint at it: creating mathematical theorems. Certainly Gödel and Turing invite looking at the things as symbol strings. I saw a presentation (Leibon and Rockmore 2013) suggesting that there was a natural embedding of mathematical fields onto hyperbolic geometry. Sure, his data set was Wikipedia mathematical article links, and the whole idea was tongue-in-cheek. But it feels like there is something in there, if not a whole-cloth topological theory of human knowledge. Is there some process driving mathematical innovation that means that the links between fields sit so naturally in hyperbolic space? Is it some characteristic of the subject matter itself? If either of these are true, would they be true of other fields? Science in general? Philosophy? Engineering? Design? Biological fitnesses?

2 Topic modelling in text databases

Various text similarity measures in NLP, especially vector embeddings, provide a space for topic modelling free text.

3 Citation graphs

Many try to construct a network topology research that incorporates not just “content” but also human means of communication: citation networks.

There are enough that the citation network citation network is already a non-trivial dataset to study. If you publish on that dataset in particular, you bring us closer to the day when we can discuss the citation network citation network citation network.

Useful for research discovery.

Here are some samples.

KWRegan’s Connect the Stars: How papers are like constellations
Microsoft Academic Knowledge Graph

We present the Microsoft Academic Knowledge Graph (MAKG), a large RDF data set with over eight billion triples with information about scientific publications and related entities, such as authors, institutions, journals, and fields of study. The data set is based on the Microsoft Academic Graph and licensed under the Open Data Attributions license. Furthermore, we provide entity embeddings for all 210M represented scientific papers.

4 Artificial chemistry

The class of things I think of as string dynamical — autocatalytic systems and the like. (c.f. Fontana’s “Turing gas”)

5 References

Andjelković, Tadić, Mitrović Dankulov, et al. 2016. “Topology of Innovation Spaces in the Knowledge Networks Emerging Through Questions-And-Answers.” PLoS ONE.

Bloom, Jones, Van Reenen, et al. 2020. “Are Ideas Getting Harder to Find?” American Economic Review.

Iacopini, Milojević, and Latora. 2018. “Network Dynamics of Innovation Processes.” Physical Review Letters.

König, Battiston, Napoletano, et al. 2011. “Recombinant Knowledge and the Evolution of Innovation Networks.” Journal of Economic Behavior & Organization.

Leibon, and Rockmore. 2013. “Orienteering in Knowledge Spaces: The Hyperbolic Geometry of Wikipedia Mathematics.” PLoS ONE.

Loreto, Servedio, Strogatz, et al. 2016. “Dynamics on Expanding Spaces: Modeling the Emergence of Novelties.” In Creativity and Universality in Language. Lecture Notes in Morphogenesis.

Napolitano, Evangelou, Pugliese, et al. n.d. “Technology Networks: The Autocatalytic Origins of Innovation.” Royal Society Open Science.

Tadić, Dankulov, and Melnik. 2017. “Mechanisms of self-organized criticality in social processes of knowledge creation.” Physical Review. E.

Thorngate, Liu, and Chowdhury. 2011. “The Competition for Attention and the Evolution of Science.” Journal of Artificial Societies and Social Simulation.