Differentiable learning of automata

Learning stack machines, random access machines, nested hierarchical parsing machines, Turing machines and whatever other automata-with-memory that you wish, from data. In other words, teaching computers to program themselves, via a deep learning formalism.

This is a kind of obvious idea and there are some charming toy examples. Indeed this is wort-of what we have traditionally imagined AI might do.

Obviously a hypothetical superhuman Artificial General Intelligence would be good at handling problems; It’s not the absolute hippest research area right now though, on account of being hard in general, just like we always imagined from earlier attempts. Some progress has been made. My sense is that most of the hyped research that looks like differentiable computer learning is in the slightly-better-contained area of reinforcement learning where more progress can be made, or in the hot area of transformer networks which are harder to explain but solve the same kind of troubles.

Related: grammatical inference.

Google branded: Differentiable neural computers.

Christopher Olah’s Characteristically pedagogic intro

Adrian Colyer’s introduction to neural Turing machines.

Andrej Karpathy’s memory machine list has some good starting point.

Facebook’s GTN might be a tool here:

GTN is an open source framework for automatic differentiation with a powerful, expressive type of graph called weighted finite-state transducers (WFSTs). Just as PyTorch provides a framework for automatic differentiation with tensors, GTN provides such a framework for WFSTs. AI researchers and engineers can use GTN to more effectively train graph-based machine learning models.


Bottou, Leon. 2011. “From Machine Learning to Machine Reasoning.” February 9, 2011. http://arxiv.org/abs/1102.1808.
Clark, Peter, Oyvind Tafjord, and Kyle Richardson. 2020. “Transformers as Soft Reasoners over Language.” In IJCAI 2020. http://arxiv.org/abs/2002.05867.
Ellis, Kevin, Armando Solar-Lezama, and Josh Tenenbaum. 2016. “Sampling for Bayesian Program Learning.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 1289–97. Curran Associates, Inc. http://papers.nips.cc/paper/6082-sampling-for-bayesian-program-learning.pdf.
Graves, Alex, Greg Wayne, and Ivo Danihelka. 2014. “Neural Turing Machines.” October 20, 2014. http://arxiv.org/abs/1410.5401.
Graves, Alex, Greg Wayne, Malcolm Reynolds, Tim Harley, Ivo Danihelka, Agnieszka Grabska-Barwińska, Sergio Gómez Colmenarejo, et al. 2016. “Hybrid Computing Using a Neural Network with Dynamic External Memory.” Nature advance online publication (October). https://doi.org/10.1038/nature20101.
Grefenstette, Edward, Karl Moritz Hermann, Mustafa Suleyman, and Phil Blunsom. 2015. “Learning to Transduce with Unbounded Memory.” June 8, 2015. http://arxiv.org/abs/1506.02516.
Gulcehre, Caglar, Sarath Chandar, Kyunghyun Cho, and Yoshua Bengio. 2016. “Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes.” June 30, 2016. http://arxiv.org/abs/1607.00036.
Hannun, Awni, Vineel Pratap, Jacob Kahn, and Wei-Ning Hsu. 2020. “Differentiable Weighted Finite-State Transducers.” October 2, 2020. http://arxiv.org/abs/2010.01003.
Kaiser, Łukasz, and Ilya Sutskever. 2015. “Neural GPUs Learn Algorithms.” November 25, 2015. http://arxiv.org/abs/1511.08228.
Lamb, Luis C., Artur Garcez, Marco Gori, Marcelo Prates, Pedro Avelar, and Moshe Vardi. 2020. “Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective.” In IJCAI 2020. http://arxiv.org/abs/2003.00330.
Lample, Guillaume, and François Charton. 2019. “Deep Learning for Symbolic Mathematics.” December 2, 2019. http://arxiv.org/abs/1912.01412.
Looks, Moshe, Marcello Herreshoff, DeLesley Hutchins, and Peter Norvig. 2017. “Deep Learning with Dynamic Computation Graphs.” In Proceedings of ICLR. http://arxiv.org/abs/1702.02181.
Perez, Julien, and Fei Liu. 2016. “Gated End-to-End Memory Networks.” October 13, 2016. http://arxiv.org/abs/1610.04211.
Putzky, Patrick, and Max Welling. 2017. “Recurrent Inference Machines for Solving Inverse Problems.” June 13, 2017. http://arxiv.org/abs/1706.04008.
Wei, Qi, Kai Fan, Lawrence Carin, and Katherine A. Heller. 2017. “An Inner-Loop Free Solution to Inverse Problems Using Deep Neural Networks.” September 6, 2017. http://arxiv.org/abs/1709.01841.
Weston, Jason, Sumit Chopra, and Antoine Bordes. 2014. “Memory Networks.” October 14, 2014. http://arxiv.org/abs/1410.3916.

Warning! Experimental comments system! If is does not work for you, let me know via the contact form.

No comments yet!

GitHub-flavored Markdown & a sane subset of HTML is supported.