Grammar induction

Mathematically speaking, inferring the “formal language” which can describe a set of expressions. In the slightly looser sense used by linguists studying natural human language, discovering the syntactic rules of a given language, which is kinda the same thing but with every term sloppier, and the subject matter itself messier.

This is already a crazily complex area, and being naturally perverse, I am interested in an especially esoteric corner of it, to whit, grammars of things that aren’t speech; inferring design grammars, say, could allow you to produce more things off the same “basic plan” from some examples of the thing. Look at enough trees and you know how to build the rest of the forest, that kind of thing. That is called inverse procedural modelling, apparently. I’m especially interested in things expressed not as a sequence of symbols from a finite alphabet - i.e. not over the free monoid, or in a partially-observed context. This is a boutique interest, although not entirely novel.

Fun aside: In the 19th century it was fashionable to think of design as a grammatical thing, although I am not exactly clear on how similar their notion of grammar was to mine.

I also care about probabilistic grammars, i.e. assigning measures to these things.

Normally design grammars deal with simple languages, such as, say “regular” languages. I’m interested in things a rung or two up the Chomsky hierarchy - context-free grammars, maybe even context-sensitive ones.

See also design grammars, iterated function systems and my defunct research proposal in this area, Cosma Shalizi’s inevitable mention in 3, 2, 1… go

A nice position paper Peter Norvig on Chomsky and statistical versus explanatory models of natural language syntax. Full of sick burns.

Basically, Chomsky says

It’s true there’s been a lot of work on trying to apply statistical models to various linguistic problems. I think there have been some successes, but a lot of failures. There is a notion of success … which I think is novel in the history of science. It interprets success as approximating unanalyzed data.

Norvig then says, actually, opaque, predictive approximations are OK and scientifically interesting. That was 2011 and since then, the scruffy, dense, opaque, predictive models have continued to be ascendant in language processing, particularly transformer networks, which do a disturbingly good job of handling language without an explicit grammar.


Angluin, Dana. 1987. Learning Regular Sets from Queries and Counterexamples.” Information and Computation 75 (2): 87–106.
———. 1988. Identifying Languages from Stochastic Examples.” No. YALEU/DCS/RR-614.
Badii, Remo, and Antonio Politi. 1999. Complexity: Hierarchical Structures and Scaling in Physics. Cambridge Nonlinear Science Series. Cambridge University Press.
Bolhuis, Johan J., Ian Tattersall, Noam Chomsky, and Robert C. Berwick. 2014. How Could Language Have Evolved? PLoS Biol 12 (8): e1001934.
Boulanger-Lewandowski, Nicolas, Yoshua Bengio, and Pascal Vincent. 2012. Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription.” In 29th International Conference on Machine Learning.
Casacuberta, Francisco, and Colin de la Higuera. 2000. Computational Complexity of Problems on Probabilistic Grammars and Transducers.” In Grammatical Inference: Algorithms and Applications, edited by Arlindo L. Oliveira, 1891:15–24. Berlin, Heidelberg: Springer Berlin Heidelberg.
Charniak, Eugene. 1996a. “Tree-Bank Grammars.” In In Proceedings of the Thirteenth National Conference on Artificial Intelligence, 1031–36.
———. 1996b. Statistical Language Learning. Reprint. A Bradford Book.
Chater, Nick, and Christopher D Manning. 2006. Probabilistic Models of Language Processing and Acquisition. Trends in Cognitive Sciences 10 (7): 335–44.
Chomsky, N. 1956. Three Models for the Description of Language.” IRE Transactions on Information Theory 2 (3): 113–24.
Chomsky, Noam. 2002. Syntactic Structures. 2nd ed. Walter de Gruyter.
Clark, Alexander, and Rémi Eyraud. 2005. Identification in the Limit of Substitutable Context-Free Languages.” In Algorithmic Learning Theory, edited by Sanjay Jain, Hans Simon, and Etsuji Tomita, 3734:283–96. Lecture Notes in Computer Science. Springer Berlin / Heidelberg.
Clark, Peter, Oyvind Tafjord, and Kyle Richardson. 2020. Transformers as Soft Reasoners over Language.” In IJCAI 2020.
Dehbi, Youness, Fabian Hadiji, Gerhard Gröger, Kristian Kersting, and Lutz Plümer. 2017. Statistical Relational Learning of Grammar Rules for 3D Building Reconstruction.” Transactions in GIS 21 (1): 134–50.
Duvenaud, David, James Lloyd, Roger Grosse, Joshua Tenenbaum, and Ghahramani Zoubin. 2013. Structure Discovery in Nonparametric Regression Through Compositional Kernel Search.” In Proceedings of the 30th International Conference on Machine Learning (ICML-13), 1166–74.
Elman, Jeffrey L. 1990. Finding Structure in Time.” Cognitive Science 14: 179–211.
———. 1991. Distributed Representations, Simple Recurrent Networks, and Grammatical Structure.” Machine Learning 7: 195–225.
———. 1993. Learning and Development in Neural Networks: The Importance of Starting Small.” Cognition 48: 71–99.
Fee, Michale S, Alexay A Kozhevnikov, and Richard H Hahnloser. 2004. Neural Mechanisms of Vocal Sequence Generation in the Songbird. Annals of the New York Academy of Sciences 1016: 153–70.
Gaillard, Pierre, and Paul Baudin. 2014. A Consistent Deterministic Regression Tree for Non-Parametric Prediction of Time Series.” arXiv:1405.1533 [Cs, Math, Stat], May.
Giesa, Tristan, David Spivak, and Markus Buehler. 2011. Reoccurring Patterns in Hierarchical Protein Materials and Music: The Power of Analogies.” BioNanoScience 1 (4): 153–61.
Gold, E Mark. 1967. Language Identification in the Limit.” Information and Control 10 (5): 447–74.
Grosse, Roger, Ruslan R. Salakhutdinov, William T. Freeman, and Joshua B. Tenenbaum. 2012. Exploiting Compositionality to Explore a Large Space of Model Structures.” In Proceedings of the Conference on Uncertainty in Artificial Intelligence.
Grünwald, Peter. 1996. A Minimum Description Length Approach to Grammar Inference.” In Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, 1040:203–16. Lecture Notes in Computer Science. London, UK, UK: Springer-Verlag.
Hannun, Awni, Vineel Pratap, Jacob Kahn, and Wei-Ning Hsu. 2020. Differentiable Weighted Finite-State Transducers.” arXiv:2010.01003 [Cs, Stat], October.
Heckel, Reiko, Georgios Lajios, and Sebastian Menge. 2004. Stochastic Graph Transformation Systems.” In Graph Transformations, edited by Hartmut Ehrig, Gregor Engels, Francesco Parisi-Presicce, and Grzegorz Rozenberg, 3256:243–46. Lecture Notes in Computer Science. Springer Berlin / Heidelberg.
Higuera, Colin de la. 2000. Current Trends in Grammatical Inference.” In Advances in Pattern Recognition, edited by Francesc Ferri, José Iñesta, Adnan Amin, and Pavel Pudil, 1876:28–31. Lecture Notes in Computer Science. Springer Berlin / Heidelberg.
———. 2005. A Bibliographical Study of Grammatical Inference.” Pattern Recognition 38 (9): 1332–48.
———. 2010. Grammatical Inference : Learning Automata and Grammars. Cambridge; New York: Cambridge University Press.
Higuera, Colin de la, Frédéric Piat, and Frédéric Tantini. 2006. Learning Stochastic Finite Automata for Musical Style Recognition.” In Implementation and Application of Automata, edited by Jacques Farré, Igor Litovsky, and Sylvain Schmitz, 3845:345–46. Lecture Notes in Computer Science. Springer Berlin / Heidelberg.
Hopcroft, John E., and Jeffrey D. Ullman. 1979. Introduction to Automata Theory, Languages and Computation. 1st ed. Addison-Wesley Publishing Company.
Hsu, Anne S., Nick Chater, and Paul M.B. Vitányi. 2011. The Probabilistic Analysis of Language Acquisition: Theoretical, Computational, and Experimental Analysis.” Cognition 120 (3): 380–90.
Jackendoff, Ray. 2002. Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford University Press, USA.
Jin, Dezhe Z, and Alexay A Kozhevnikov. 2011. A Compact Statistical Model of the Song Syntax in Bengalese Finch.” PLoS Comput Biol 7 (3): –1001108.
Johnson, Kent. 2004. Gold’s Theorem and Cognitive Science.” Philosophy of Science 71 (4): 571–92.
Jones, Owen, J. B. (John Burley) Waring, J. O. (John Obadiah) Westwood, M. Digby (Matthew Digby) Wyatt, and publisher Bernard Quaritch (Firm). 1868. The grammar of ornament. London : Bernard Quaritch, 15 Piccadilly.
Keller, Bill, and Rudi Lutz. 1997. “Evolving Stochastic Context-Free Grammars from Examples Using a Minimum Description Length Principle.” In 1997 Workshop on Automata Induction Grammatical Inference and Language Acquisition. Citeseer.
Kontorovich, Leonid, Corinna Cortes, and Mehryar Mohri. 2006. Learning Linearly Separable Languages.” In Algorithmic Learning Theory, edited by José L. Balcázar, Philip M. Long, and Frank Stephan, 288–303. Lecture Notes in Computer Science 4264. Springer Berlin Heidelberg.
Lari, K., and S.J. Young. 1990. The Estimation of Stochastic Context-Free Grammars Using the Inside-Outside Algorithm.” Computer Speech & Language 4 (1): 35–56.
López, Damián, José M. Sempere, and Pedro García. 2004. Inference of Reversible Tree Languages.” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 34 (4): 1658–65.
Manning, Christopher D. 2002. “Probabilistic Syntax.” In Probabilistic Linguistics, 289–341. Cambridge, MA: MIT Press.
Manning, Christopher D, and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, Mass.: MIT Press.
Meyer, Franz Sales. 1900. Handbook of ornament; a grammar of art, industrial and architectural designing in all its branches, for practical as well as theoretical use. New York, B. Hessling.
Mohri, Mehryar, and Michael D. Riley. 2016. A Disambiguation Algorithm for Weighted Automata.” Theoretical Computer Science.
Nevill-Manning, Craig G, and Ian H Witten. 1997. “Identifying Hierarchical Structure in Sequences: A Linear-Time Algorithm.”
Nowak, Martin A, Natalia L Komarova, and Partha Niyogi. 2001. Evolution of Universal Grammar. Science 291: 114–18.
Pinker, Steven, and Paul Bloom. 1990. “Natural Language and Natural Selection.” Behavioral and Brain Sciences 13: 707.
Shalizi, Cosma Rohilla, and James P Crutchfield. 2000. “Computational Mechanics: Pattern and Prediction, Structure and Simplicity.”
Smith, Alvy Ray. 1984. Plants, Fractals, and Formal Languages.” In SIGGRAPH Comput. Graph., 18:1–10. ACM.
Talton, Jerry, Lingfeng Yang, Ranjitha Kumar, Maxine Lim, Noah Goodman, and Radomír Měch. 2012. Learning Design Patterns with Bayesian Grammar Induction.” In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology - UIST ’12, 63. Cambridge, Massachusetts, USA: ACM Press.
Tong, Matthew H., Adam D. Bickett, Eric M. Christiansen, and Garrison W. Cottrell. 2007. Learning Grammatical Structure with Echo State Networks.” Neural Networks 20 (3): 424–32.
Valiant, L. G. 1984. A Theory of the Learnable.” Commun. ACM 27 (11): 1134–42.
Valiant, Leslie G. 2009. Evolvability.” J. ACM 56 (1): 3:1–21.
Vidal, Enrique, Frank Thollard, Colin de la Higuera, Francisco Casacuberta, and Rafael C Carrasco. 2005. Probabilistic Finite-State Machines - Part I.” IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (7): 1013–25.
Vinyals, Oriol, Łukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, and Geoffrey Hinton. 2015. Grammar as a Foreign Language.” In Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, R. Garnett, and R. Garnett, 2755–63. Curran Associates, Inc.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.