Algorithmic statistics


The intersection between probability, ignorance, and algorithms, butting up against computational complexity, coding theory, dynamical systems, ergodic theory, minimum description length and probability. When is the relation between things sufficiently unstructured that we may treat them as random? Stochastic approximations to deterministic algorithms. Kolmogorov complexity. Compressibility, Shannon information. Sideswipe at deterministic chaos. Chaotic systems treated as if stochastic. (Are “real” systems not precisely that?) Statistical mechanics and ergodicity.

I saw a provocative talk by Daniela Andrés on, nominally, Parkinson’s disease. The grabbing part was talking about the care and feeding of neural “codewords”, and the information theory of the brain, which she did in the foreign (to me) language of “algorithmic statistics”, and “Kolmogorov structure functions”. I have no idea what she meant. This is a placeholder to remind me to come back and see if it is as useful as it sounded like it might be.

To consider: relationship between underlying event space and measures we construct on such spaces. How much topology is lost by laundering our events through pullback of a (e.g. probability) measure?

Chazelle:

The discrepancy method has produced the most fruitful line of attack on a pivotal computer science question: What is the computational power of random bits? It has also played a major role in recent developments in complexity theory. This book tells the story of the discrepancy method in a few succinct independent vignettes. The chapters explore such topics as communication complexity, pseudo-randomness, rapidly mixing Markov chains, points on a sphere, derandomization, convex hulls and Voronoi diagrams, linear programming, geometric sampling and VC-dimension theory, minimum spanning trees, circuit complexity, and multidimensional searching. The mathematical treatment is thorough and self-contained, with minimal prerequisites. More information can be found on the book’s home page.

Random number generation

Cosma Shalizi’s upcoming textbook has the world’s pithiest summary:

In fact, what we really have to assume is that the relationships between the causes omitted from the DAG and those included is so intricate and convoluted that it might as well be noise, along the lines of algorithmic information theory (Li and Vitányi, 1997), whose key result might be summed up as “Any determinism distinguishable from randomness is insufficiently complex”.

  • As far as I can tell, this is the main research thrust of the Max Planck institute at Leipzig to solve a kind of inverse problem, inspecting inferred probabilities for evidence of computation. Is that… sane? Why would one wish to do that?

  • Just had this recommended by David Donoho: Mezard, M. (2009). Information, physics, and computation. Oxford ; New York: Oxford University Press. Online.

Here, a John Baez talk on foundational issues.

Information-based complexity theory

Is a specialty within this field?

ICB website:

Information-based complexity (IBC) is the branch of computational complexity that studies problems for which the information is partial, contaminated, and priced.

To motivate these assumptions about information consider the problem of the numerical computation of an integral. Here, the integrands consist of functions defined over the d-dimensional unit cube. Since a digital computer can store only a finite set of numbers, these functions must be replaced by such finite sets (by, for example, evaluating the functions at a finite number of points). Therefore, we have only partial information about the functions. Furthermore, the function values may be contaminated by round-off error. Finally, evaluating the functions can be expensive, and so computing these values has a price.

  • The RJLipton post on IBC linking it to complexity theory:

    Now as ordinary complexity theorists, our first instinct would be to define properties intrinsic to the function {f} and try to prove they cause high complexity for any algorithm. Making a continuous analogy to concepts in discrete Boolean complexity, drawing on papers like this by Noam Nisan and Mario Szegedy, we would try to tailor an effective measure of “sensitivity.” We would talk about functions {f} that resemble the {n}-ary parity function in respect of sensitivity but don’t have a simple known integral. Notions of {f} being “isotropic” could cut both ways—they could make the sensitivity pervasive but could enable a good global estimate of the integral.

    IBC, however, focuses on properties of algorithms and restrictions on the kind of inputs they are given. Parlett’s general objection is that doing so begs the question of a proper complexity theory and reverts to the standard—and hallowed enough—domain of ordinary numerical analysis of algorithms.

Ben-David, Shai, Pavel Hrubeš, Shay Moran, Amir Shpilka, and Amir Yehudayoff. 2019. “Learnability Can Be Undecidable.” Nature Machine Intelligence 1 (1): 44. https://doi.org/10.1038/s42256-018-0002-3.

Bhattacharyya, Arnab, Philips George John, Suprovat Ghoshal, and Raghu Meka. 2019. “Average Bias and Polynomial Sources.” 079. https://eccc.weizmann.ac.il/report/2019/079/.

Blumer, Anselm, Andrzej Ehrenfeucht, David Haussler, and Manfred K. Warmuth. 1987. “Occam’s Razor.” Information Processing Letters 24 (6): 377–80. https://doi.org/10.1016/0020-0190(87)90114-1.

Braverman, Mark. 2008. “Polylogarithmic Independence Fools AC0 Circuits.” J. ACM 57 (5): 28:1–28:10. https://doi.org/10.1145/1754399.1754401.

Calude, Cristian S. 2002. Information and Randomness : An Algorithmic Perspective. Springer.

Chaitin, Gregory J. 1966. “On the Length of Programs for Computing Finite Binary Sequences.” Journal of the ACM (JACM) 13 (4): 547–69.

———. 1977. “Algorithmic Information Theory.” IBM Journal of Research and Development.

———. 1988. Information, Randomness and Incompleteness: Papers on Algorithmic Information Theory (World Scientific Series in Computer Science, Vol 8). World Scientific Pub Co Inc.

———. 2002. “The Intelligibility of the Universe and the Notions of Simplicity, Complexity and Irreducibility.”

Chaitin, Gregory J. 1969. “On the Simplicity and Speed of Programs for Computing Infinite Sets of Natural Numbers.” J. ACM 16 (3): 407–22. https://doi.org/10.1145/321526.321530.

Chazelle, Bernard. 2001. The Discrepancy Method: Randomness and Complexity. 1st paperback ed. Cambridge: Cambridge Univ. Press. http://www.cs.princeton.edu/~chazelle/book.html.

Clark, Alexander, Christophe Costa Florêncio, Chris Watkins, and Mariette Serayet. 2006. “Planar Languages and Learnability.” In Grammatical Inference: Algorithms and Applications, edited by Yasubumi Sakakibara, Satoshi Kobayashi, Kengo Sato, Tetsuro Nishino, and Etsuji Tomita, 148–60. Lecture Notes in Computer Science 4201. Springer Berlin Heidelberg. http://link.springer.com/chapter/10.1007/11872436_13.

Cohen, Paul J. 1963. “The Independence of the Continuum Hypothesis.” Proceedings of the National Academy of Sciences 50 (6): 1143–8. https://doi.org/10.1073/pnas.50.6.1143.

———. 1964. “The Independence of the Continuum Hypothesis, Ii.” Proceedings of the National Academy of Sciences 51 (1): 105–10. https://doi.org/10.1073/pnas.51.1.105.

Cortes, Corinna, Mehryar Mohri, Ashish Rastogi, and Michael Riley. 2008. “On the Computation of the Relative Entropy of Probabilistic Automata.” International Journal of Foundations of Computer Science 19 (01): 219–42. https://doi.org/10.1142/S0129054108005644.

Cover, Thomas M., Péter Gács, and Robert M. Gray. 1989. “Kolmogorov’s Contributions to Information Theory and Algorithmic Complexity.” The Annals of Probability 17 (3): 840–65. https://doi.org/10.1214/aop/1176991250.

Davies, Paul C W. 2007. “The Implications of a Cosmological Information Bound for Complexity, Quantum Information and the Nature of Physical Law.” arXiv:Quant-Ph/0703041. http://arxiv.org/abs/quant-ph/0703041.

Diakonikolas, Ilias, Parikshit Gopalan, Ragesh Jaiswal, Rocco Servedio, and Emanuele Viola. 2009. “Bounded Independence Fools Halfspaces.” 016. http://eccc.hpi-web.de/report/2009/016/.

Gács, Péter, JohnT. Tromp, and Paul M. B. Vitányi. 2001. “Algorithmic Statistics.” IEEE Transactions on Information Theory 47 (6): 2443–63. https://doi.org/10.1109/18.945257.

Gödel, K. 1940. The Consistency of the Continuum HypothesisThe Consistency of the Continuum Hypothesis.

Gödel, Kurt. 1931. “Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I.” Monatshefte für Mathematik und Physik 38 (1): 173–98. https://doi.org/10.1007/BF01700692.

Hansen, Mark H., and Bin Yu. 2001. “Model Selection and the Principle of Minimum Description Length.” Journal of the American Statistical Association 96 (454): 746–74. http://www.jstor.org/stable/2670311.

Kolmogorov, A N. 1968. “Three Approaches to the Quantitative Definition of Information.” International Journal of Computer Mathematics 2 (1): 157–68.

Kontorovich, Leonid, Corinna Cortes, and Mehryar Mohri. 2006. “Learning Linearly Separable Languages.” In Algorithmic Learning Theory, edited by José L. Balcázar, Philip M. Long, and Frank Stephan, 288–303. Lecture Notes in Computer Science 4264. Springer Berlin Heidelberg. http://link.springer.com/chapter/10.1007/11894841_24.

Kuipers, L., and H. Niederreiter. 2012. Uniform Distribution of Sequences. Dover Publications.

Legg, Shane. 2006. “Is There an Elegant Universal Theory of Prediction?” In Algorithmic Learning Theory, edited by José L. Balcázar, Philip M. Long, and Frank Stephan, 274–87. Lecture Notes in Computer Science 4264. Springer Berlin Heidelberg. http://link.springer.com/chapter/10.1007/11894841_23.

Li, Ming, and Paul MB Vitányi. 2009. An Introduction to Kolmogorov Complexity and Its Applications. 3rd ed. Springer. http://books.google.com/books?hl=en&lr=&id=25fue3UYDN0C&oi=fnd&pg=PR3&ots=U62oahU4ci&sig=dcCX1ETvTRC1X_2MVB0Vo5gH49g.

Littlestone, N., and M. K. Warmuth. 1986. Relating Data Compression and Learnability.

Mezard, Marc, and Andrea Montanari. 2009. Information, Physics, and Computation. 1 edition. Oxford Graduate Texts. Oxford ; New York: Oxford University Press. http://web.stanford.edu/~montanar/RESEARCH/book.html.

Norton, John D. 2013. “All Shook up: Fluctuations, Maxwell’s Demon and the Thermodynamics of Computation.” Entropy 15 (10): 4432–83. https://doi.org/10.3390/e15104432.

Reyzin, Lev. 2019. “Unprovability Comes to Machine Learning.” Nature 565 (7738): 166. https://doi.org/10.1038/d41586-019-00012-4.

Rissanen, Jorma. 2007. Information and Complexity in Statistical Modeling. Information Science and Statistics. New York: Springer. http://www.springer.com/mathematics/probability/book/978-0-387-36610-4.

Shibata, Takeshi, Ryo Yoshinaka, and Takashi Chikayama. 2006. “Probabilistic Generalization of Simple Grammars and Its Application to Reinforcement Learning.” In Algorithmic Learning Theory, edited by José L. Balcázar, Philip M. Long, and Frank Stephan, 348–62. Lecture Notes in Computer Science 4264. Springer Berlin Heidelberg. http://link.springer.com/chapter/10.1007/11894841_28.

Solomonoff, R. J. 1964a. “A Formal Theory of Inductive Inference. Part I.” Information and Control 7 (1): 1–22. https://doi.org/10.1016/S0019-9958(64)90223-2.

———. 1964b. “A Formal Theory of Inductive Inference. Part II.” Information and Control 7 (2): 224–54. https://doi.org/10.1016/S0019-9958(64)90131-7.

Sterkenburg, Tom F. 2016a. “Solomonoff Prediction and Occam’s Razor.” Philosophy of Science 83 (4): 459–79. https://doi.org/10.1086/687257.

———. 2016b. “Solomonoff Prediction and Occam’s Razor.” Philosophy of Science 83 (4): 459–79. https://doi.org/10.1086/687257.

Valiant, L. G. 1984. “A Theory of the Learnable.” Commun. ACM 27 (11): 1134–42. https://doi.org/10.1145/1968.1972.

Vapnik, V., and A. Chervonenkis. 1971. “On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities.” Theory of Probability & Its Applications 16 (2): 264–80. https://doi.org/10.1137/1116025.

Vereshchagin, N. K., and Paul MB Vitányi. 2004. “Kolmogorov’s Structure Functions and Model Selection.” IEEE Transactions on Information Theory 50 (12): 3265–90. https://doi.org/10.1109/TIT.2004.838346.

———. 2010. “Rate Distortion and Denoising of Individual Data Using Kolmogorov Complexity.” IEEE Transactions on Information Theory 56 (7): 3438–54. https://doi.org/10.1109/TIT.2010.2048491.

Vitányi, Paul M. 2006. “Meaningful Information.” IEEE Transactions on Information Theory 52 (10): 4617–26. https://doi.org/10.1109/TIT.2006.881729.