Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016.
βLayer Normalization.β arXiv.
Bach, Francis. 2014.
βBreaking the Curse of Dimensionality with Convex Neural Networks.β arXiv:1412.8690 [Cs, Math, Stat], December.
Bahadori, Mohammad Taha, Krzysztof Chalupka, Edward Choi, Robert Chen, Walter F. Stewart, and Jimeng Sun. 2017.
βNeural Causal Regularization Under the Independence of Mechanisms Assumption.β arXiv:1702.02604 [Cs, Stat], February.
Baldi, Pierre, Peter Sadowski, and Zhiqin Lu. 2016.
βLearning in the Machine: Random Backpropagation and the Learning Channel.β arXiv:1612.02734 [Cs], December.
Bartlett, Peter L., Andrea Montanari, and Alexander Rakhlin. 2021.
βDeep Learning: A Statistical Viewpoint.β Acta Numerica 30 (May): 87β201.
Baydin, Atilim Gunes, and Barak A. Pearlmutter. 2014.
βAutomatic Differentiation of Algorithms for Machine Learning.β arXiv:1404.7456 [Cs, Stat], April.
Belkin, Mikhail, Daniel Hsu, Siyuan Ma, and Soumik Mandal. 2019.
βReconciling Modern Machine-Learning Practice and the Classical BiasβVariance Trade-Off.β Proceedings of the National Academy of Sciences 116 (32): 15849β54.
Belkin, Mikhail, Siyuan Ma, and Soumik Mandal. 2018.
βTo Understand Deep Learning We Need to Understand Kernel Learning.β In
International Conference on Machine Learning, 541β49.
Bengio, Yoshua. 2000.
βGradient-Based Optimization of Hyperparameters.β Neural Computation 12 (8): 1889β1900.
Dasgupta, Sakyasingha, Takayuki Yoshizumi, and Takayuki Osogami. 2016.
βRegularized Dynamic Boltzmann Machine with Delay Pruning for Unsupervised Learning of Temporal Sequences.β arXiv:1610.01989 [Cs, Stat], September.
Finlay, Chris, JΓΆrn-Henrik Jacobsen, Levon Nurbekyan, and Adam M Oberman. n.d. βHow to Train Your Neural ODE: The World of Jacobian and Kinetic Regularization.β In ICML, 14.
Gal, Yarin, and Zoubin Ghahramani. 2016.
βA Theoretically Grounded Application of Dropout in Recurrent Neural Networks.β In
arXiv:1512.05287 [Stat].
Golowich, Noah, Alexander Rakhlin, and Ohad Shamir. 2017.
βSize-Independent Sample Complexity of Neural Networks.β arXiv:1712.06541 [Cs, Stat], December.
Graves, Alex. 2011.
βPractical Variational Inference for Neural Networks.β In
Proceedings of the 24th International Conference on Neural Information Processing Systems, 2348β56. NIPSβ11. USA: Curran Associates Inc.
Hardt, Moritz, Benjamin Recht, and Yoram Singer. 2015.
βTrain Faster, Generalize Better: Stability of Stochastic Gradient Descent.β arXiv:1509.01240 [Cs, Math, Stat], September.
Im, Daniel Jiwoong, Michael Tao, and Kristin Branson. 2016.
βAn Empirical Analysis of the Optimization of Deep Network Loss Surfaces.β arXiv:1612.04010 [Cs], December.
Immer, Alexander, Matthias Bauer, Vincent Fortuin, Gunnar RΓ€tsch, and Khan Mohammad Emtiyaz. 2021.
βScalable Marginal Likelihood Estimation for Model Selection in Deep Learning.β In
Proceedings of the 38th International Conference on Machine Learning, 4563β73. PMLR.
Izmailov, Pavel, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2018.
βAveraging Weights Leads to Wider Optima and Better Generalization,β March.
Kawaguchi, Kenji, Leslie Pack Kaelbling, and Yoshua Bengio. 2017.
βGeneralization in Deep Learning.β arXiv:1710.05468 [Cs, Stat], October.
Kelly, Jacob, Jesse Bettencourt, Matthew James Johnson, and David Duvenaud. 2020.
βLearning Differential Equations That Are Easy to Solve.β In.
Klambauer, GΓΌnter, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017.
βSelf-Normalizing Neural Networks.β In
Proceedings of the 31st International Conference on Neural Information Processing Systems, 972β81. Red Hook, NY, USA: Curran Associates Inc.
Koch, Parker, and Jason J. Corso. 2016.
βSparse Factorization Layers for Neural Networks with Limited Supervision.β arXiv:1612.04468 [Cs, Stat], December.
Lee, Jaehoon, Lechao Xiao, Samuel S. Schoenholz, Yasaman Bahri, Roman Novak, Jascha Sohl-Dickstein, and Jeffrey Pennington. 2019.
βWide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent.β In
Advances in Neural Information Processing Systems, 8570β81.
Lobacheva, Ekaterina, Nadezhda Chirkova, and Dmitry Vetrov. 2017.
βBayesian Sparsification of Recurrent Neural Networks.β In
Workshop on Learning to Generate Natural Language.
Loog, Marco, Tom Viering, Alexander Mey, Jesse H. Krijthe, and David M. J. Tax. 2020.
βA Brief Prehistory of Double Descent.β Proceedings of the National Academy of Sciences 117 (20): 10625β26.
Maclaurin, Dougal, David Duvenaud, and Ryan Adams. 2015.
βGradient-Based Hyperparameter Optimization Through Reversible Learning.β In
Proceedings of the 32nd International Conference on Machine Learning, 2113β22. PMLR.
Molchanov, Dmitry, Arsenii Ashukha, and Dmitry Vetrov. 2017.
βVariational Dropout Sparsifies Deep Neural Networks.β In
Proceedings of ICML.
Nguyen Xuan Vinh, Sarah Erfani, Sakrapee Paisitkriangkrai, James Bailey, Christopher Leckie, and Kotagiri Ramamohanarao. 2016.
βTraining Robust Models Using Random Projection.β In, 531β36. IEEE.
NΓΈkland, Arild. 2016.
βDirect Feedback Alignment Provides Learning in Deep Neural Networks.β In
Advances In Neural Information Processing Systems.
Pan, Wei, Hao Dong, and Yike Guo. 2016.
βDropNeuron: Simplifying the Structure of Deep Neural Networks.β arXiv:1606.07326 [Cs, Stat], June.
Papyan, Vardan, Yaniv Romano, Jeremias Sulam, and Michael Elad. 2017. βConvolutional Dictionary Learning via Local Processing.β In Proceedings of the IEEE International Conference on Computer Vision, 5296β5304.
Prechelt, Lutz. 2012.
βEarly Stopping β But When?β In
Neural Networks: Tricks of the Trade, edited by GrΓ©goire Montavon, GeneviΓ¨ve B. Orr, and Klaus-Robert MΓΌller, 53β67. Lecture Notes in Computer Science 7700. Springer Berlin Heidelberg.
Salimans, Tim, and Diederik P Kingma. 2016.
βWeight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks.β In
Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 901β1. Curran Associates, Inc.
Santurkar, Shibani, Dimitris Tsipras, Andrew Ilyas, and Aleksander Madry. 2019.
βHow Does Batch Normalization Help Optimization?β arXiv:1805.11604 [Cs, Stat], April.
Scardapane, Simone, Danilo Comminiello, Amir Hussain, and Aurelio Uncini. 2016.
βGroup Sparse Regularization for Deep Neural Networks.β arXiv:1607.00485 [Cs, Stat], July.
Srinivas, Suraj, and R. Venkatesh Babu. 2016.
βGeneralized Dropout.β arXiv:1611.06791 [Cs], November.
Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014.
βDropout: A Simple Way to Prevent Neural Networks from Overfitting.β The Journal of Machine Learning Research 15 (1): 1929β58.
Taheri, Mahsa, Fang Xie, and Johannes Lederer. 2020.
βStatistical Guarantees for Regularized Neural Networks.β arXiv:2006.00294 [Cs, Math, Stat], May.
Xie, Bo, Yingyu Liang, and Le Song. 2016.
βDiversity Leads to Generalization in Neural Networks.β arXiv:1611.03131 [Cs, Stat], November.
You, Zhonghui, Jinmian Ye, Kunming Li, and Ping Wang. 2018.
βAdversarial Noise Layer: Regularize Neural Network By Adding Noise.β arXiv:1805.08000 [Cs], May.
Zhang, Chiyuan, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2017.
βUnderstanding Deep Learning Requires Rethinking Generalization.β In
Proceedings of ICLR.
No comments yet. Why not leave one?