Differentiable learning of automata
October 14, 2016 — November 4, 2024
Learning stack machines, random access machines, nested hierarchical parsing machines, Turing machines, and whatever other automata-with-memory you wish, from data. In other words, teaching computers to program themselves via a deep learning formalism.
This is an obvious idea, indeed, one of the original ideas from the GOFAI days. How to make it work in NNs is not obvious, but there are some charming toy examples. Obviously, a hypothetical superhuman Artificial General Intelligence would be good at handling computer-science problems.
Directly learning automata is not the absolute hippest research area right now, though, on account of being hard in general. Some progress has been made. My sense is that most of the hyped research that looks like differentiable computer learning is in the slightly-better-contained area of reinforcement learning where more progress can be made, or in the hot area of large LLMs which are harder to explain but solve the same kind of problems whilst looking different inside.
Related: grammatical inference, memory machines.
1 Incoming
Language Generation in the Limit (Kleinberg and Mullainathan 2024) looks interesting.
Cheng Soon Ong described it as a
talk by Jon Kleinberg on language generation being easier than language identification. He mentions an interesting tension between hallucination and mode collapse.
sanderwood/bgpt: Beyond Language Models: Byte Models are Digital World Simulators (Wu et al. 2024)
Blazek claims his neural networks implement predicate logic directly and yet are tractable which would be interesting to look into (Blazek and Lin 2021, 2020; Blazek, Venkatesh, and Lin 2021).
Google branded: Differentiable neural computers.
Christopher Olah’s Characteristically pedagogic intro.
Adrian Colyer’s introduction to neural Turing machines.
Andrej Karpathy’s memory machine list.
Facebook’s GTN:
GTN is an open source framework for automatic differentiation with a powerful, expressive type of graph called weighted finite-state transducers (WFSTs). Just as PyTorch provides a framework for automatic differentiation with tensors, GTN provides such a framework for WFSTs. AI researchers and engineers can use GTN to more effectively train graph-based machine learning models.