Knowledge geometry
May 23, 2012 — April 18, 2025
Suspiciously similar content
1 What is the shape of collected human knowledge?
I don’t know how to make this idea precise yet, but I’d like to spitball a few ideas about how branches of knowledge might be adjacent to each other or not, in various senses.
For example, what metric space provides a natural embedding for the articles of an encyclopedia so that articles that are near in (the metric of) that space possess similar subject classifications? I’ve seen arguments that hyperbolic and other non-Euclidean geometries arise here. But then, because our monkey minds work this way, we seem to project knowledge onto 2D Euclidean maps, and it seems to be at least interesting:
What kind of knowledge relationship mechanisms are plausible? Could you mine patent networks or theorem networks to parameterise a stochastic process for this model which made it a plausible model for theorem growth? If not, what quality does knowledge possess which this could not encapsulate? If we have a good metric on knowledge that we can learn from (e.g.) the text of papers, can we use it to discover similar research? Or to identify gaps in knowledge that we can fill?
See also:
- Innovation Is a material basis for technology plus a knowledge topology equal to a model of technology? I suspect not — surely there are emergent effects. But there must be a relationship.
- Spaces of strings
- String dynamics
- Embeddings for search
Or should we think about the mechanism of knowledge generation? Can we represent knowledge as a network (or a landscape?) that grows around agent activity? Some kind of growth process around researchers? (keywords: “models of growth aggregation”, “rough interfaces”, “growth with surface diffusion”, “nucleation”, “morphogenesis”) Is this a constrained growth problem, like the one that governs coral drills?
Investigate configuration spaces of technologies. (See configuration space of the economy.) Genotype-phenotype interactions as a model of knowledge-economic systems? What is the most basic stochastic process that would serve as a model of these?
(Practical aside: How much area must a new thesis carve out from the unmade world?)
Now, going out on a limb, consider a problem domain that looks evolutionary if you squint at it: creating mathematical theorems. Certainly Gödel and Turing invite looking at the things as symbol strings. I saw a presentation (Leibon and Rockmore 2013) suggesting that there was a natural embedding of mathematical fields onto hyperbolic geometry. Sure, his data set was Wikipedia mathematical article links, and the whole idea was tongue-in-cheek. But it feels like there is something in there, if not a whole-cloth topological theory of human knowledge. Is there some process driving mathematical innovation that means that the links between fields sit so naturally in hyperbolic space? Is it some characteristic of the subject matter itself? If either of these are true, would they be true of other fields? Science in general? Philosophy? Engineering? Design? Biological fitnesses?
2 Topic modelling in text databases
Various text similarity measures in NLP, especially vector embeddings, provide a space for topic modelling free text.
3 Citation graphs
Many try to construct a network topology research that incorporates not just “content” but also human means of communication: citation networks.
There are enough that the citation network citation network is already a non-trivial dataset to study. If you publish on that dataset in particular, you bring us closer to the day when we can discuss the citation network citation network citation network.
Useful for research discovery.
Here are some samples.
KWRegan’s Connect the Stars: How papers are like constellations
Microsoft Academic Knowledge Graph
We present the Microsoft Academic Knowledge Graph (MAKG), a large RDF data set with over eight billion triples with information about scientific publications and related entities, such as authors, institutions, journals, and fields of study. The data set is based on the Microsoft Academic Graph and licensed under the Open Data Attributions license. Furthermore, we provide entity embeddings for all 210M represented scientific papers.
4 Artificial chemistry
The class of things I think of as string dynamical — autocatalytic systems and the like. (c.f. Fontana’s “Turing gas”)