Statistical inference when computation isn’t free and memory is not infinite; what does this tell us about the learnable?
Lots of statistics is already concerned with complexity scaling already; in Gaussian process regression, for example, we care a lot about avoiding naïve methods that scale as \(\mathcal{O}(N^3)\). In optimisation we care greatly about the trade off between approach to the optimum and number of steps.
There is a converse question which is how good a model can you fit, holding a computational/memory budget fixed? General results of this flavour are what I might collate here.
For now there are some fun ones about large-models.
References
Bossaerts, Peter, Nitin Yadav, and Carsten Murawski. 2019. “Uncertainty and Computational Complexity.” Philosophical Transactions of the Royal Society B: Biological Sciences 374 (1766): 20180138.
Frey, B.J., and Nebojsa Jojic. 2005. “A Comparison of Algorithms for Inference and Learning in Probabilistic Graphical Models.” IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (9): 1392–1416.
Jordan, Michael I., and Yair Weiss. 2002. “Probabilistic Inference in Graphical Models.” Handbook of Neural Networks and Brain Theory.
Moshkovitz, Dana, and Michal Moshkovitz. 2017. “Mixing Implies Lower Bounds for Space Bounded Learning.” In Conference on Learning Theory, 1516–66.
No comments yet. Why not leave one?