Meta learning

Few-shot learning, learning fast weights, learning to learn

Placeholder for what we now call few shot learning, I think?

Is this what Schmidhuber means when he discusses neural nets learning to program neural nets with fast weights? He dates that idea to the 1990s (Schmidhuber 1992) and relates it via Schlag, Irie, and Schmidhuber (2021) to transformer models.

A mainstream and current approach is to discuss meta-learning:

On the futility of trying to be clever (the bitter lesson redux) summarises some recent negative results

two recent papers, (Raghu et al. 2020; Tian et al. 2020), show that in practice the inner loop run doesn’t really do much in these algorithms, so much so that one can safely do away with the inner loop entirely. This means that the success of these algorithms can be explained completely by standard (single-loop) learning on the entire lumped meta-training dataset. Another recent beautiful theory paper (Du et al. 2021) sheds some light on these experimental results.


Antoniou, Antreas, Harrison Edwards, and Amos Storkey. 2019. How to Train Your MAML.” arXiv:1810.09502 [Cs, Stat], March.
Arnold, Sébastien M. R., Praateek Mahajan, Debajyoti Datta, Ian Bunner, and Konstantinos Saitas Zarkias. 2020. Learn2learn: A Library for Meta-Learning Research.” arXiv:2008.12284 [Cs, Stat], August.
Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. Language Models Are Few-Shot Learners.” arXiv:2005.14165 [Cs], June.
Du, Simon S., Wei Hu, Sham M. Kakade, Jason D. Lee, and Qi Lei. 2021. Few-Shot Learning via Learning the Representation, Provably.” arXiv.
Erven, Tim van, and Wouter M Koolen. 2016. MetaGrad: Multiple Learning Rates in Online Learning.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 3666–74. Curran Associates, Inc.
Fiebrink, Rebecca, Dan Trueman, and Perry R. Cook. 2009. A Metainstrument for Interactive, on-the-Fly Machine Learning.” In Proceefdings of NIME, 2:3.
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.” In Proceedings of the 34th International Conference on Machine Learning, 1126–35. PMLR.
Künzel, Sören R., Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. 2019. Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning.” Proceedings of the National Academy of Sciences 116 (10): 4156–65.
Lee, Kwonjoon, Subhransu Maji, Avinash Ravichandran, and Stefano Soatto. 2019. Meta-Learning with Differentiable Convex Optimization,” April.
Medasani, Bharat, Anthony Gamst, Hong Ding, Wei Chen, Kristin A. Persson, Mark Asta, Andrew Canning, and Maciej Haranczyk. 2016. Predicting Defect Behavior in B2 Intermetallics by Merging Ab Initio Modeling and Machine Learning.” Npj Computational Materials 2 (1): 1.
Mikulik, Vladimir, Grégoire Delétang, Tom McGrath, Tim Genewein, Miljan Martic, Shane Legg, and Pedro A. Ortega. 2020. Meta-Trained Agents Implement Bayes-Optimal Agents.” arXiv.
Munkhdalai, Tsendsuren, Alessandro Sordoni, Tong Wang, and Adam Trischler. 2019. Metalearned Neural Memory.” In Advances In Neural Information Processing Systems.
Oreshkin, Boris N., Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. 2020. Meta-Learning Framework with Applications to Zero-Shot Time-Series Forecasting.” arXiv.
Ortega, Pedro A., Jane X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Razvan Pascanu, Nicolas Heess, et al. 2019. Meta-Learning of Sequential Strategies.” arXiv.
Pestourie, Raphaël, Youssef Mroueh, Thanh V. Nguyen, Payel Das, and Steven G. Johnson. 2020. Active Learning of Deep Surrogates for PDEs: Application to Metasurface Design.” Npj Computational Materials 6 (1): 1–7.
Raghu, Aniruddh, Maithra Raghu, Samy Bengio, and Oriol Vinyals. 2020. Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML.” arXiv.
Rajeswaran, Aravind, Chelsea Finn, Sham Kakade, and Sergey Levine. 2019. Meta-Learning with Implicit Gradients,” September.
Schlag, Imanol, Kazuki Irie, and Jürgen Schmidhuber. 2021. Linear Transformers Are Secretly Fast Weight Programmers.” arXiv:2102.11174 [Cs], June.
Schmidhuber, Jürgen. 1992. Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks.” Neural Computation 4 (1): 131–39.
Tian, Yonglong, Yue Wang, Dilip Krishnan, Joshua B. Tenenbaum, and Phillip Isola. 2020. Rethinking Few-Shot Image Classification: A Good Embedding Is All You Need? arXiv.
Uttl, Bob, Carmela A. White, and Daniela Wong Gonzalez. 2017. Meta-Analysis of Faculty’s Teaching Effectiveness: Student Evaluation of Teaching Ratings and Student Learning Are Not Related.” Studies in Educational Evaluation, Evaluation of teaching: Challenges and promises, 54 (September): 22–42.
Zhang, Kaiqi, and Yu-Xiang Wang. 2022. Deep Learning Meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive? arXiv.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.