Grammatical inference

Mathematically speaking, inferring the “formal language” which can describe a set of expressions. In the slightly looser sense used by linguists studying natural human language, discovering the syntactic rules of a given language, which is kinda the same thing but with every term sloppier, and the subject matter itself messier.

This is already a crazy complex area, and being naturally perverse, I am interested in an especially esoteric corner of it, to whit, grammars of things that aren’t speech; inferring design grammars, say, could allow you to produce more things off the same “basic plan” from some examples of the thing; look at enough trees and you know how to build the rest of the forest, that kind of thing. I’m especially interested in things expressed not as a sequence of symbols from a finite alphabet - i.e. not over the free monoid, or over the free monoid but the symbol expression is hidden. This is a boutique interest. I also care about probabilistic ones, i.e. assigning measures to these things. That part is not at all boutique.

Normally design grammars deal with simple languages, such as, say “regular” languages. I’m interested in things a rung or two up the Chomsky hierarchy - Context-free grammars, maybe even context-sensitive ones.

See also design grammars, iterated function systems and my research proposal in this area.

Things to read

  • Peter Norvig on Chomsky and statistical versus explanatory models of natural language syntax. Full of sick burns.

    In January of 2011, television personality Bill O’Reilly weighed in on more than one culture war with his statement “tide goes in, tide goes out. Never a miscommunication. You can’t explain that,” which he proposed as an argument for the existence of God. O’Reilly realizes that it doesn’t matter what his detractors think of his astronomical ignorance, because his supporters think he has gotten exactly to the key issue: why? He doesn’t care how the tides work, tell him why they work. Why is the moon at the right distance to provide a gentle tide, and exert a stabilizing effect on earth’s axis of rotation, thus protecting life here? Why does gravity work the way it does? Why does anything at all exist rather than not exist? O’Reilly is correct that these questions can only be addressed by mythmaking, religion or philosophy, not by science.

    Chomsky has a philosophy based on the idea that we should focus on the deep whys and that mere explanations of reality don’t matter. In this, Chomsky is in complete agreement with O’Reilly. (I recognize that the previous sentence would have an extremely low probability in a probabilistic model trained on a newspaper or TV corpus.)

  • Cosma Shalizi’s inevitable mention in 3, 2, 1… go

Badii, Remo, and Antonio Politi. 1999. Complexity: Hierarchical Structures and Scaling in Physics. Cambridge Nonlinear Science Series. Cambridge University Press.

Bolhuis, Johan J., Ian Tattersall, Noam Chomsky, and Robert C. Berwick. 2014. “How Could Language Have Evolved?” PLoS Biol 12 (8): e1001934.

Boulanger-Lewandowski, Nicolas, Yoshua Bengio, and Pascal Vincent. 2012. “Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription.” In 29th International Conference on Machine Learning.

Charniak, Eugene. 1996a. “Tree-Bank Grammars.” In In Proceedings of the Thirteenth National Conference on Artificial Intelligence, 1031–6.

———. 1996b. Statistical Language Learning. Reprint. A Bradford Book.

Chater, Nick, and Christopher D Manning. 2006. “Probabilistic Models of Language Processing and Acquisition.” Trends in Cognitive Sciences 10 (7): 335–44.

Chomsky, N. 1956. “Three Models for the Description of Language.” IRE Transactions on Information Theory 2 (3): 113–24.

Chomsky, Noam. 2002. Syntactic Structures. 2nd ed. Walter de Gruyter.

Duvenaud, David, James Lloyd, Roger Grosse, Joshua Tenenbaum, and Ghahramani Zoubin. 2013. “Structure Discovery in Nonparametric Regression Through Compositional Kernel Search.” In Proceedings of the 30th International Conference on Machine Learning (ICML-13), 1166–74.

Elman, Jeffrey L. 1990. “Finding Structure in Time.” Cognitive Science 14: 179–211.

———. 1991. “Distributed Representations, Simple Recurrent Networks, and Grammatical Structure.” Machine Learning 7: 195–225.

———. 1993. “Learning and Development in Neural Networks: The Importance of Starting Small.” Cognition 48: 71–99.

Fee, Michale S, Alexay A Kozhevnikov, and Richard H Hahnloser. 2004. “Neural Mechanisms of Vocal Sequence Generation in the Songbird.” Annals of the New York Academy of Sciences 1016: 153–70.

Gaillard, Pierre, and Paul Baudin. 2014. “A Consistent Deterministic Regression Tree for Non-Parametric Prediction of Time Series,” May.

Giesa, Tristan, David Spivak, and Markus Buehler. 2011. “Reoccurring Patterns in Hierarchical Protein Materials and Music: The Power of Analogies.” BioNanoScience 1 (4): 153–61.

Gold, E Mark. 1967. “Language Identification in the Limit.” Information and Control 10 (5): 447–74.

Grosse, Roger, Ruslan R. Salakhutdinov, William T. Freeman, and Joshua B. Tenenbaum. 2012. “Exploiting Compositionality to Explore a Large Space of Model Structures.” In Proceedings of the Conference on Uncertainty in Artificial Intelligence.

Grünwald, Peter. 1996. “A Minimum Description Length Approach to Grammar Inference.” In Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, 1040:203–16. Lecture Notes in Computer Science. London, UK, UK: Springer-Verlag.

Heckel, Reiko, Georgios Lajios, and Sebastian Menge. 2004. “Stochastic Graph Transformation Systems.” In Graph Transformations, edited by Hartmut Ehrig, Gregor Engels, Francesco Parisi-Presicce, and Grzegorz Rozenberg, 3256:243–46. Lecture Notes in Computer Science. Springer Berlin / Heidelberg.

Higuera, Colin de la. 2000. “Current Trends in Grammatical Inference.” In Advances in Pattern Recognition, edited by Francesc Ferri, José Iñesta, Adnan Amin, and Pavel Pudil, 1876:28–31. Lecture Notes in Computer Science. Springer Berlin / Heidelberg.

———. 2010. Grammatical Inference : Learning Automata and Grammars. Cambridge; New York: Cambridge University Press.

———. 2005. “A Bibliographical Study of Grammatical Inference.” Pattern Recognition 38 (9): 1332–48.

Higuera, Colin de la, Frédéric Piat, and Frédéric Tantini. 2006. “Learning Stochastic Finite Automata for Musical Style Recognition.” In Implementation and Application of Automata, edited by Jacques Farré, Igor Litovsky, and Sylvain Schmitz, 3845:345–46. Lecture Notes in Computer Science. Springer Berlin / Heidelberg.

Jackendoff, Ray. 2002. Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford University Press, USA.

Jin, Dezhe Z, and Alexay A Kozhevnikov. 2011. “A Compact Statistical Model of the Song Syntax in Bengalese Finch.” PLoS Comput Biol 7 (3): –1001108.

Johnson, Kent. 2004. “Gold’s Theorem and Cognitive Science.” Philosophy of Science 71 (4): 571–92.

Keller, Bill, and Rudi Lutz. 1997. “Evolving Stochastic Context-Free Grammars from Examples Using a Minimum Description Length Principle.” In 1997 Workshop on Automata Induction Grammatical Inference and Language Acquisition. Citeseer.

López, Damián, José M. Sempere, and Pedro García. 2004. “Inference of Reversible Tree Languages.” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 34 (4): 1658–65.

Manning, Christopher D. 2002. “Probabilistic Syntax.” In Probabilistic Linguistics, 289–341. Cambridge, MA: MIT Press.

Manning, Christopher D, and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, Mass.: MIT Press.

Mohri, Mehryar, and Michael D. Riley. 2016. “A Disambiguation Algorithm for Weighted Automata.” Theoretical Computer Science.

Nevill-Manning, Craig G, and Ian H Witten. 1997. “Identifying Hierarchical Structure in Sequences: A Linear-Time Algorithm.”

Nowak, Martin A, Natalia L Komarova, and Partha Niyogi. 2001. “Evolution of Universal Grammar.” Science 291: 114–18.

Pinker, Steven, and Paul Bloom. 1990. “Natural Language and Natural Selection.” Behavioral and Brain Sciences 13: 707.

Shalizi, Cosma Rohilla, and James P Crutchfield. 2000. “Computational Mechanics: Pattern and Prediction, Structure and Simplicity.”

Smith, Alvy Ray. 1984. “Plants, Fractals, and Formal Languages.” In SIGGRAPH Comput. Graph., 18:1–10. ACM.

Tong, Matthew H., Adam D. Bickett, Eric M. Christiansen, and Garrison W. Cottrell. 2007. “Learning Grammatical Structure with Echo State Networks.” Neural Networks 20 (3): 424–32.

Valiant, Leslie G. 2009. “Evolvability.” J. ACM 56 (1): 3:1–3:21.

Valiant, L. G. 1984. “A Theory of the Learnable.” Commun. ACM 27 (11): 1134–42.

Vidal, Enrique, Frank Thollard, Colin de la Higuera, Francisco Casacuberta, and Rafael C Carrasco. 2005a. “Probabilistic Finite-State Machines - Part I.” IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (7): 1013–25.

———. 2005b. “Probabilistic Finite-State Machines - Part II.” IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (7): 1026–39.

Vinyals, Oriol, Łukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, and Geoffrey Hinton. 2015. “Grammar as a Foreign Language.” In Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, R. Garnett, and R. Garnett, 2755–63. Curran Associates, Inc.