Amari, Shun-ichi. 1998.
“Natural Gradient Works Efficiently in Learning.” Neural Computation 10 (2): 251–76.
https://doi.org/10.1162/089976698300017746.
Andrychowicz, Marcin, Misha Denil, Sergio Gomez, Matthew W. Hoffman, David Pfau, Tom Schaul, and Nando de Freitas. 2016.
“Learning to Learn by Gradient Descent by Gradient Descent.” June 14, 2016.
http://arxiv.org/abs/1606.04474.
Arel, I, D C Rose, and T P Karnowski. 2010.
“Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier].” IEEE Computational Intelligence Magazine 5 (4): 13–18.
https://doi.org/10.1109/MCI.2010.938364.
Arora, Sanjeev, Rong Ge, Tengyu Ma, and Ankur Moitra. 2015.
“Simple, Efficient, and Neural Algorithms for Sparse Coding.” In
Proceedings of The 28th Conference on Learning Theory, 40:113–49.
Paris, France:
PMLR.
http://proceedings.mlr.press/v40/Arora15.html.
Bach, Francis. 2014.
“Breaking the Curse of Dimensionality with Convex Neural Networks.” December 30, 2014.
http://arxiv.org/abs/1412.8690.
Baldassi, Carlo, Christian Borgs, Jennifer T. Chayes, Alessandro Ingrosso, Carlo Lucibello, Luca Saglietti, and Riccardo Zecchina. 2016.
“Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes.” Proceedings of the National Academy of Sciences 113 (48): E7655–62.
https://doi.org/10.1073/pnas.1608103113.
Barron, A. R. 1993.
“Universal Approximation Bounds for Superpositions of a Sigmoidal Function.” IEEE Transactions on Information Theory 39 (3): 930–45.
https://doi.org/10.1109/18.256500.
Baydin, Atılım Güneş, Barak A. Pearlmutter, and Jeffrey Mark Siskind. 2016.
“Tricks from Deep Learning.” November 10, 2016.
http://arxiv.org/abs/1611.03777.
Bengio, Yoshua. 2009.
Learning Deep Architectures for AI. Vol. 2.
https://doi.org/10.1561/2200000006.
Bengio, Yoshua, Aaron Courville, and Pascal Vincent. 2013.
“Representation Learning: A Review and New Perspectives.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35: 1798–828.
https://doi.org/10.1109/TPAMI.2013.50.
Bengio, Yoshua, and Yann LeCun. 2007.
“Scaling Learning Algorithms Towards AI.” Large-Scale Kernel Machines 34: 1–41.
http://www.iro.umontreal.ca/~lisa/bib/pub_subject/language/pointeurs/bengio+lecun-chapter2007.pdf.
Bengio, Yoshua, Nicolas L. Roux, Pascal Vincent, Olivier Delalleau, and Patrice Marcotte. 2005.
“Convex Neural Networks.” In
Advances in Neural Information Processing Systems, 123–30.
http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2005_583.pdf.
Boser, B. 1991.
“An Analog Neural Network Processor with Programmable Topology.” J. Solid State Circuits 26: 2017–25.
https://doi.org/10.1109/4.104196.
Brock, Andrew, Theodore Lim, J. M. Ritchie, and Nick Weston. 2017.
“FreezeOut: Accelerate Training by Progressively Freezing Layers.” June 15, 2017.
http://arxiv.org/abs/1706.04983.
Cadieu, C. F. 2014.
“Deep Neural Networks Rival the Representation of Primate It Cortex for Core Visual Object Recognition.” PLoS Comp. Biol. 10: e1003963.
https://doi.org/10.1371/journal.pcbi.1003963.
Chen, Tianqi, Ian Goodfellow, and Jonathon Shlens. 2015.
“Net2Net: Accelerating Learning via Knowledge Transfer.” November 17, 2015.
http://arxiv.org/abs/1511.05641.
Cho, Kyunghyun, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014.
“On the Properties of Neural Machine Translation: Encoder-Decoder Approaches.” 2014.
http://arxiv.org/abs/1409.1259.
Choromanska, Anna, MIkael Henaff, Michael Mathieu, Gerard Ben Arous, and Yann LeCun. 2015.
“The Loss Surfaces of Multilayer Networks.” In
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 192–204.
http://proceedings.mlr.press/v38/choromanska15.html.
Ciodaro, T. 2012.
“Online Particle Detection with Neural Networks Based on Topological Calorimetry Information.” J. Phys. Conf. Series 368: 012030.
https://doi.org/10.1088/1742-6596/368/1/012030.
Ciresan, D. 2012.
“Multi-Column Deep Neural Network for Traffic Sign Classification.” Neural Networks 32: 333–38.
https://doi.org/10.1016/j.neunet.2012.02.023.
Cybenko, G. 1989.
“Approximation by Superpositions of a Sigmoidal Function.” Mathematics of Control, Signals and Systems 2: 303–14.
https://doi.org/10.1007/BF02551274.
Dahl, G. E. 2012.
“Context-Dependent Pre-Trained Deep Neural Networks for Large Vocabulary Speech Recognition.” IEEE Transactions on Audio, Speech and Language Processing 20: 33–42.
https://doi.org/10.1109/TASL.2011.2134090.
Dauphin, Yann, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. 2014.
“Identifying and Attacking the Saddle Point Problem in High-Dimensional Non-Convex Optimization.” In
Advances in Neural Information Processing Systems 27, 2933–41.
Curran Associates, Inc. http://arxiv.org/abs/1406.2572.
Dieleman, Sander, and Benjamin Schrauwen. 2014.
“End to End Learning for Music Audio.” In
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6964–68.
IEEE.
https://doi.org/10.1109/ICASSP.2014.6854950.
Erhan, Dumitru, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. 2010.
“Why Does Unsupervised Pre-Training Help Deep Learning?” Journal of Machine Learning Research 11: 625–60.
http://www.jmlr.org/papers/v11/erhan10a.html.
Farabet, C. 2013.
“Learning Hierarchical Features for Scene Labeling.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35: 1915–29.
https://doi.org/10.1109/TPAMI.2012.231.
Fukumizu, K., and S. Amari. 2000.
“Local Minima and Plateaus in Hierarchical Structures of Multilayer Perceptrons.” Neural Networks 13 (3): 317–27.
https://doi.org/10.1016/S0893-6080(00)00009-5.
Fukushima, Kunihiko, and Sei Miyake. 1982.
“Neocognitron: A New Algorithm for Pattern Recognition Tolerant of Deformations and Shifts in Position.” Pattern Recognition 15 (6): 455–69.
https://doi.org/10.1016/0031-3203(82)90024-3.
Gal, Yarin, and Zoubin Ghahramani. 2016.
“A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.” In.
http://arxiv.org/abs/1512.05287.
Garcia, C. 2004.
“Convolutional Face Finder: A Neural Architecture for Fast and Robust Face Detection.” IEEE Transactions on Pattern Analysis and Machine Intelligence 26: 1408–23.
https://doi.org/10.1109/TPAMI.2004.97.
Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. 2015.
“A Neural Algorithm of Artistic Style.” August 26, 2015.
http://arxiv.org/abs/1508.06576.
Giryes, R., G. Sapiro, and A. M. Bronstein. 2016.
“Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?” IEEE Transactions on Signal Processing 64 (13): 3444–57.
https://doi.org/10.1109/TSP.2016.2546221.
Giryes, Raja, Guillermo Sapiro, and Alex M. Bronstein. 2014.
“On the Stability of Deep Networks.” December 18, 2014.
http://arxiv.org/abs/1412.5896.
Globerson, Amir, and Roi Livni. 2016.
“Learning Infinite-Layer Networks: Beyond the Kernel Trick.” June 16, 2016.
http://arxiv.org/abs/1606.05316.
Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. 2014.
“Explaining and Harnessing Adversarial Examples.” December 19, 2014.
http://arxiv.org/abs/1412.6572.
Goodfellow, Ian J., Oriol Vinyals, and Andrew M. Saxe. 2014.
“Qualitatively Characterizing Neural Network Optimization Problems.” December 19, 2014.
http://arxiv.org/abs/1412.6544.
Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014.
“Generative Adversarial Nets.” In
Advances in Neural Information Processing Systems 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 2672–80.
NIPS’14.
Cambridge, MA, USA:
Curran Associates, Inc. http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf.
Hadsell, R., S. Chopra, and Y. LeCun. 2006.
“Dimensionality Reduction by Learning an Invariant Mapping.” In
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2:1735–42.
https://doi.org/10.1109/CVPR.2006.100.
Hasson, Uri, Samuel A. Nastase, and Ariel Goldstein. 2020.
“Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks.” Neuron 105 (3): 416–34.
https://doi.org/10.1016/j.neuron.2019.12.002.
He, Kun, Yan Wang, and John Hopcroft. 2016.
“A Powerful Generative Model Using Random Weights for the Deep Image Representation.” In
Advances in Neural Information Processing Systems.
http://arxiv.org/abs/1606.04801.
Helmstaedter, M. 2013.
“Connectomic Reconstruction of the Inner Plexiform Layer in the Mouse Retina.” Nature 500: 168–74.
https://doi.org/10.1038/nature12346.
Hinton, G., Li Deng, Dong Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, et al. 2012.
“Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups.” IEEE Signal Processing Magazine 29 (6): 82–97.
https://doi.org/10.1109/MSP.2012.2205597.
Hinton, G. E. 1995.
“The Wake-Sleep Algorithm for Unsupervised Neural Networks.” Science 268 (5214): 1558–1161.
https://doi.org/10.1126/science.7761831.
Hinton, Geoffrey. 2010.
“A Practical Guide to Training Restricted Boltzmann Machines.” In
Neural Networks: Tricks of the Trade, 9:926. Lecture
Notes in
Computer Science 7700.
Springer Berlin Heidelberg.
http://www.csri.utoronto.ca/ hinton/absps/guideTR.pdf.
Hinton, Geoffrey E. 2007.
“To Recognize Shapes, First Learn to Generate Images.” In
Progress in Brain Research, edited by Trevor Drew and John F. Kalaska Paul Cisek, Volume 165:535–47. Computational
Neuroscience:
Theoretical Insights into
Brain Function.
Elsevier.
http://www.cs.toronto.edu/ hinton/absps/montrealTR.pdf.
Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. 2006.
“Reducing the Dimensionality of Data with Neural Networks.” Science 313 (5786): 504–7.
https://doi.org/10.1126/science.1127647.
Hinton, G, S Osindero, and Y Teh. 2006.
“A Fast Learning Algorithm for Deep Belief Nets.” Neural Computation 18 (7): 1527–54.
https://doi.org/10.1162/neco.2006.18.7.1527.
Hornik, Kurt, Maxwell Stinchcombe, and Halbert White. 1989.
“Multilayer Feedforward Networks Are Universal Approximators.” Neural Networks 2 (5): 359–66.
https://doi.org/10.1016/0893-6080(89)90020-8.
Hu, Tao, Cengiz Pehlevan, and Dmitri B. Chklovskii. 2014.
“A Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization.” In
2014 48th Asilomar Conference on Signals, Systems and Computers.
https://doi.org/10.1109/ACSSC.2014.7094519.
Huang, Guang-Bin, and Chee-Kheong Siew. 2005.
“Extreme Learning Machine with Randomly Assigned RBF Kernels.” International Journal of Information Technology 11 (1): 16–24.
http://pop.intjit.org/journal/volume/11/1/111_2.pdf.
Huang, Guang-Bin, Dian Hui Wang, and Yuan Lan. 2011.
“Extreme Learning Machines: A Survey.” International Journal of Machine Learning and Cybernetics 2 (2): 107–22.
https://doi.org/10.1007/s13042-011-0019-y.
Huang, Guang-Bin, Qin-Yu Zhu, and Chee-Kheong Siew. 2004.
“Extreme Learning Machine: A New Learning Scheme of Feedforward Neural Networks.” In
2004 IEEE International Joint Conference on Neural Networks, 2004. Proceedings, 2:985–990 vol.2.
https://doi.org/10.1109/IJCNN.2004.1380068.
———. 2006.
“Extreme Learning Machine: Theory and Applications.” Neurocomputing, Neural
Networks Selected Papers from the 7th
Brazilian Symposium on
Neural Networks (
SBRN ’04) 7th
Brazilian Symposium on
Neural Networks, 70 (1–3): 489–501.
https://doi.org/10.1016/j.neucom.2005.12.126.
Hubel, D. H. 1962.
“Receptive Fields, Binocular Interaction, and Functional Architecture in the Cat’s Visual Cortex.” J. Physiol. 160: 106–54.
https://doi.org/10.1113/jphysiol.1962.sp006837.
Jaderberg, Max, Wojciech Marian Czarnecki, Simon Osindero, Oriol Vinyals, Alex Graves, and Koray Kavukcuoglu. 2016.
“Decoupled Neural Interfaces Using Synthetic Gradients.” August 18, 2016.
http://arxiv.org/abs/1608.05343.
Kaiser, Łukasz, and Ilya Sutskever. 2015.
“Neural GPUs Learn Algorithms.” November 25, 2015.
http://arxiv.org/abs/1511.08228.
Kalchbrenner, Nal, Ivo Danihelka, and Alex Graves. 2016.
“Grid Long Short-Term Memory.” January 7, 2016.
http://arxiv.org/abs/1507.01526.
Kavukcuoglu, Koray, Marc’Aurelio Ranzato, and Yann LeCun. 2010.
“Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition.” October 17, 2010.
http://arxiv.org/abs/1010.3467.
Kingma, Diederik P., Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 2016.
“Improving Variational Inference with Inverse Autoregressive Flow.” In
Advances in Neural Information Processing Systems 29.
Curran Associates, Inc. http://arxiv.org/abs/1606.04934.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2012.
“Imagenet Classification with Deep Convolutional Neural Networks.” In
Advances in Neural Information Processing Systems, 1097–1105.
http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.
Kulkarni, Tejas D., Will Whitney, Pushmeet Kohli, and Joshua B. Tenenbaum. 2015.
“Deep Convolutional Inverse Graphics Network.” March 11, 2015.
http://arxiv.org/abs/1503.03167.
Larsen, Anders Boesen Lindbo, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. 2015.
“Autoencoding Beyond Pixels Using a Learned Similarity Metric.” December 31, 2015.
http://arxiv.org/abs/1512.09300.
Lawrence, S. 1997.
“Face Recognition: A Convolutional Neural-Network Approach.” IEEE Transactions on Neural Networks 8: 98–113.
https://doi.org/10.1109/72.554195.
LeCun, Y. 1998.
“Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11): 2278–2324.
https://doi.org/10.1109/5.726791.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015.
“Deep Learning.” Nature 521 (7553): 436–44.
https://doi.org/10.1038/nature14539.
LeCun, Yann, Sumit Chopra, Raia Hadsell, M. Ranzato, and F. Huang. 2006.
“A Tutorial on Energy-Based Learning.” Predicting Structured Data.
http://classes.soe.ucsc.edu/cmps290c/Spring12/lect/9/energytut.pdf.
Lee, Honglak, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. 2009.
“Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations.” In
Proceedings of the 26th Annual International Conference on Machine Learning, 609–16.
ICML ’09.
New York, NY, USA:
ACM.
https://doi.org/10.1145/1553374.1553453.
Lee, Wee Sun, Peter L. Bartlett, and Robert C. Williamson. 1996.
“Efficient Agnostic Learning of Neural Networks with Bounded Fan-in.” IEEE Transactions on Information Theory 42 (6, 6): 2118–32.
https://doi.org/10.1109/18.556601.
Leung, M. K. 2014.
“Deep Learning of the Tissue-Regulated Splicing Code.” Bioinformatics 30: i121–29.
https://doi.org/10.1093/bioinformatics/btu277.
Liang, Feynman, Marcin Tomczak, Matt Johnson, Mark Gotham, Jamie Shotten, and Bill Byrne. n.d. “BachBot: Deep Generative Modeling of Bach Chorales,” 1.
Lin, Henry W., and Max Tegmark. 2016a.
“Critical Behavior from Deep Dynamics: A Hidden Dimension in Natural Language.” June 21, 2016.
http://arxiv.org/abs/1606.06737.
———. 2016b.
“Why Does Deep and Cheap Learning Work so Well?” August 29, 2016.
http://arxiv.org/abs/1608.08225.
Lipton, Zachary C. 2016a.
“Stuck in a What? Adventures in Weight Space.” February 23, 2016.
http://arxiv.org/abs/1602.07320.
———. 2016b.
“The Mythos of Model Interpretability.” In.
http://arxiv.org/abs/1606.03490.
Lipton, Zachary C., John Berkowitz, and Charles Elkan. 2015.
“A Critical Review of Recurrent Neural Networks for Sequence Learning.” May 29, 2015.
http://arxiv.org/abs/1506.00019.
Ma, J. 2015.
“Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships.” J. Chem. Inf. Model. 55: 263–74.
https://doi.org/10.1021/ci500747n.
Maclaurin, Dougal, David K. Duvenaud, and Ryan P. Adams. 2015.
“Gradient-Based Hyperparameter Optimization Through Reversible Learning.” In
ICML, 2113–22.
http://www.jmlr.org/proceedings/papers/v37/maclaurin15.pdf.
Mallat, Stéphane. 2012.
“Group Invariant Scattering.” Communications on Pure and Applied Mathematics 65 (10, 10): 1331–98.
https://doi.org/10.1002/cpa.21413.
———. 2016.
“Understanding Deep Convolutional Networks.” January 19, 2016.
http://arxiv.org/abs/1601.04920.
Mehta, Pankaj, and David J. Schwab. 2014.
“An Exact Mapping Between the Variational Renormalization Group and Deep Learning.” October 14, 2014.
http://arxiv.org/abs/1410.3831.
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013.
“Efficient Estimation of Word Representations in Vector Space.” January 16, 2013.
http://arxiv.org/abs/1301.3781.
Mikolov, Tomas, Quoc V. Le, and Ilya Sutskever. 2013.
“Exploiting Similarities Among Languages for Machine Translation.” September 16, 2013.
http://arxiv.org/abs/1309.4168.
Mnih, V. 2015.
“Human-Level Control Through Deep Reinforcement Learning.” Nature 518: 529–33.
https://doi.org/10.1038/nature14236.
Mohamed, A. r, G. E. Dahl, and G. Hinton. 2012.
“Acoustic Modeling Using Deep Belief Networks.” IEEE Transactions on Audio, Speech, and Language Processing 20 (1): 14–22.
https://doi.org/10.1109/TASL.2011.2109382.
Monner, Derek, and James A. Reggia. 2012.
“A Generalized LSTM-Like Training Algorithm for Second-Order Recurrent Neural Networks.” Neural Networks 25 (January): 70–83.
https://doi.org/10.1016/j.neunet.2011.07.003.
Montufar, G. 2014.
“When Does a Mixture of Products Contain a Product of Mixtures?” J. Discrete Math. 29: 321–47.
https://doi.org/10.1137/140957081.
Mousavi, Ali, and Richard G. Baraniuk. 2017.
“Learning to Invert: Signal Recovery via Deep Convolutional Networks.” In
ICASSP.
http://arxiv.org/abs/1701.03891.
Ning, F. 2005.
“Toward Automatic Phenotyping of Developing Embryos from Videos.” IEEE Transactions on Image Processing 14: 1360–71.
https://doi.org/10.1109/TIP.2005.852470.
Nøkland, Arild. 2016.
“Direct Feedback Alignment Provides Learning in Deep Neural Networks.” In
Advances In Neural Information Processing Systems.
http://arxiv.org/abs/1609.01596.
Olshausen, B. A., and D. J. Field. 1996.
“Natural Image Statistics and Efficient Coding.” Network (Bristol, England) 7 (2): 333–39.
https://doi.org/10.1088/0954-898X/7/2/014.
Olshausen, Bruno A., and David J. Field. 1996.
“Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images.” Nature 381 (6583): 607–9.
https://doi.org/10.1038/381607a0.
Olshausen, Bruno A, and David J Field. 2004.
“Sparse Coding of Sensory Inputs.” Current Opinion in Neurobiology 14 (4): 481–87.
https://doi.org/10.1016/j.conb.2004.07.007.
Oord, Aäron van den. 2016. “Wavenet: A Generative Model for Raw Audio.”
Oord, Aäron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. 2016.
“Pixel Recurrent Neural Networks.” January 25, 2016.
http://arxiv.org/abs/1601.06759.
Oord, Aäron van den, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. 2016.
“Conditional Image Generation with PixelCNN Decoders.” June 16, 2016.
http://arxiv.org/abs/1606.05328.
Pan, Wei, Hao Dong, and Yike Guo. 2016.
“DropNeuron: Simplifying the Structure of Deep Neural Networks.” June 23, 2016.
http://arxiv.org/abs/1606.07326.
Parisotto, Emilio, and Ruslan Salakhutdinov. 2017.
“Neural Map: Structured Memory for Deep Reinforcement Learning.” February 27, 2017.
http://arxiv.org/abs/1702.08360.
Pascanu, Razvan, Yann N. Dauphin, Surya Ganguli, and Yoshua Bengio. 2014.
“On the Saddle Point Problem for Non-Convex Optimization.” May 19, 2014.
http://arxiv.org/abs/1405.4604.
Paul, Arnab, and Suresh Venkatasubramanian. 2014.
“Why Does Deep Learning Work? - A Perspective from Group Theory.” December 20, 2014.
http://arxiv.org/abs/1412.6621.
Pinkus, Allan. 1999.
“Approximation Theory of the MLP Model in Neural Networks.” Acta Numerica 8 (January): 143–95.
https://doi.org/10.1017/S0962492900002919.
Radford, Alec, Luke Metz, and Soumith Chintala. 2015.
“Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks.” In.
http://arxiv.org/abs/1511.06434.
Ranzato, M. 2013.
“Modeling Natural Images Using Gated MRFs.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (9): 2206–22.
https://doi.org/10.1109/TPAMI.2013.29.
Ranzato, Marc’aurelio, Y.-lan Boureau, and Yann L. Cun. 2008.
“Sparse Feature Learning for Deep Belief Networks.” In
Advances in Neural Information Processing Systems 20, edited by J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, 1185–92.
Curran Associates, Inc. http://papers.nips.cc/paper/3363-sparse-feature-learning-for-deep-belief-networks.pdf.
Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. 1986.
“Learning Representations by Back-Propagating Errors.” Nature 323 (6088): 533–36.
https://doi.org/10.1038/323533a0.
Sagun, Levent, V. Ugur Guney, Gerard Ben Arous, and Yann LeCun. 2014.
“Explorations on High Dimensional Landscapes.” December 20, 2014.
http://arxiv.org/abs/1412.6615.
Salimans, Tim, and Diederik P Kingma. 2016.
“Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks.” In
Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 901–1.
Curran Associates, Inc. http://papers.nips.cc/paper/6114-weight-normalization-a-simple-reparameterization-to-accelerate-training-of-deep-neural-networks.pdf.
Scardapane, Simone, Danilo Comminiello, Amir Hussain, and Aurelio Uncini. 2016.
“Group Sparse Regularization for Deep Neural Networks.” July 2, 2016.
http://arxiv.org/abs/1607.00485.
Shazeer, Noam, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. 2017.
“Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.” January 23, 2017.
http://arxiv.org/abs/1701.06538.
Shwartz-Ziv, Ravid, and Naftali Tishby. 2017.
“Opening the Black Box of Deep Neural Networks via Information.” March 2, 2017.
http://arxiv.org/abs/1703.00810.
Smith, Leslie N., and Nicholay Topin. 2017.
“Exploring Loss Function Topology with Cyclical Learning Rates.” February 14, 2017.
http://arxiv.org/abs/1702.04283.
Springenberg, Jost Tobias, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. 2014.
“Striving for Simplicity: The All Convolutional Net.” In
Proceedings of International Conference on Learning Representations (ICLR) 2015.
http://arxiv.org/abs/1412.6806.
Telgarsky, Matus. 2015.
“Representation Benefits of Deep Feedforward Networks.” September 27, 2015.
http://arxiv.org/abs/1509.08101.
Turaga, S. C. 2010.
“Convolutional Networks Can Learn to Generate Affinity Graphs for Image Segmentation.” Neural Comput. 22: 511–38.
https://doi.org/10.1162/neco.2009.10-08-881.
Urban, Gregor, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan, Shengjie Wang, Rich Caruana, Abdelrahman Mohamed, Matthai Philipose, and Matt Richardson. 2016.
“Do Deep Convolutional Nets Really Need to Be Deep (Or Even Convolutional)?” March 17, 2016.
http://arxiv.org/abs/1603.05691.
Wiatowski, Thomas, and Helmut Bölcskei. 2015.
“A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction.” In
Proceedings of IEEE International Symposium on Information Theory.
http://arxiv.org/abs/1512.06293.
Wiatowski, Thomas, Philipp Grohs, and Helmut Bölcskei. 2018.
“Energy Propagation in Deep Convolutional Neural Networks.” IEEE Transactions on Information Theory 64 (7): 1–1.
https://doi.org/10.1109/TIT.2017.2756880.
Xie, Bo, Yingyu Liang, and Le Song. 2016.
“Diversity Leads to Generalization in Neural Networks.” November 9, 2016.
http://arxiv.org/abs/1611.03131.
Yu, D., and L. Deng. 2011.
“Deep Learning and Its Applications to Signal and Information Processing [Exploratory DSP].” IEEE Signal Processing Magazine 28 (1): 145–54.
https://doi.org/10.1109/MSP.2010.939038.
Zhang, Chiyuan, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2017.
“Understanding Deep Learning Requires Rethinking Generalization.” In
Proceedings of ICLR.
http://arxiv.org/abs/1611.03530.
Zhang, Sixin, Anna Choromanska, and Yann LeCun. 2015.
“Deep Learning with Elastic Averaging SGD.” In
Advances In Neural Information Processing Systems.
http://arxiv.org/abs/1412.6651.