Neural nets

Designing the fanciest usable differentiable loss surface



Bjorn Stenger’s brief history of machine learning.

Modern computational neural network methods reascend the hype phase transition. a.k.a deep learning or double plus fancy brainbots or please give the department have a bigger GPU budget it’s not to play video games I swear.

I don’t intend to write an introduction to deep learning here; that ground has been tilled already.

But here are some handy links to resources I frequently use and a bit of under-discussed background.

What?

To be specific, deep learning is

  • a library of incremental improvements in areas such as Stochastic Gradient Descent, approximation theory, graphical models, and signal processing research, plus some handy advancements in SIMD architectures that, taken together, surprisingly elicit the kind of results from machine learning that everyone was hoping we’d get by at least 20 years ago, yet without requiring us to develop substantially more clever grad students to do so, or,
  • the state-of-the-art in artificial kitten recognition.
  • a metatstatizing buzzword

It’s a frothy (some might say foamy-mouthed) research bubble right now, with such cuteness at the extrema as, e.g. Inceptionising inceptionism (Andrychowicz et al. 2016) which learns to learn neural networks using neural networks. (well, it sort of does that, but is a long way from a bootstrapping general AI) Stay tuned for more of this.

There is not much to do with β€œneurons” left in the paradigm at this stage. What there is, is a bundle of clever tricks for training deep constrained hierarchical predictors and classifiers on modern computer hardware. Something closer to a convenient technology stack than a single β€œtheory”.

Some network methods hew closer to behaviour of real neurons, although not that close; simulating actual brains is a different discipline with only intermittent and indirect connection.

Subtopics of interest to me:

Why bother?

There are many answers.

The ultimate regression algorithm

…until the next ultimate regression algorithm.

It turns out that this particular learning model (class of learning models) and training technologies is surprisingly good at getting every better models out of ever more data. Why burn three grad students on a perfect tractable and specific regression algorithm when you can use one algorithm to solve a whole bunch of regression problems, and which improves with the number of computers and the amount of data you have? How much of a relief is it to capital to decouple its effectiveness from the uncertainty and obstreperousness of human labour?

Cool maths

Function approximations, interesting manifold inference. Weird product measure things, e.g. (Montufar 2014).

Even the stuff I’d assumed was trivial, like backpropagation, has a few wrinkles in practice. See Michael Nielson’s chapter and Chrisopher Olah’s visual summary.

Yes, this is a regular paper mill. Not only are there probably new insights to be had, but also you can recycle any old machine learning insight, replace a layer in a network with that and poof β€” new paper.

Insight into the mind

πŸ— Maybe.

There claims to be communication between real neurology and neural networks in computer vision, but elsewhere neural networks are driven by their similarities to other things, such as being differentiable relaxations of traditional models, (differentiable stack machines!) or of being license to fit hierarchical models without regard for statistical niceties.

There might be some kind of occasional β€œstylised fact”-type relationship.

For some works which lean harder in to this, try Neauran neural networks.

Trippy art projects

See generative art and neural networks

Hip keywords for NN models

Not necessarily mutually exclusive; some design patterns you can use.

There are many summaries floating around. Some that I looked at are Tomasz Malisiewicz’s summary of Deep Learning Trends @ ICLR 2016, or the Neural network zoo or Simon Brugman’s deep learning papers.

Some of these are descriptions of topologies, others of training tricks or whatever. Recurrent and convolutional are two types of topologies you might have in your ANN. But there are so many other possible ones: β€œGrid”, β€œhighway”, β€œTuring” others…

Many are mentioned in passing in David McAllester’s Cognitive Architectures post.

Probabilistic/variational

See probabilistic Neural Networks.

Convolutional

See the convnets entry.

Generative Adversarial Networks

Train two networks to beat each other.

Recurrent neural networks

Feedback neural networks structures to have with memory and a notion of time and β€œcurrent” versus β€œpast” state. See recurrent neural networks.

Transfer learning

I have seen two versions of this term.

One starts from the idea that if you have, say, a network that solves, some particular computer vision problem well, possibly you can use them to solve another one without starting from scratch on another computer vision problem. This is the Recycling someone else’s features framing. I don’t know why this has a special term - I think it’s so that you can claim to do β€œend-to-end” learning, but then actually do what everyone else as done forever and works totally OK, which is to re-use other people’s work like real scientists.

The other version is you would like to do domain adaptation, which is to say, to learn on one dataset but still make good predictions on a different dataset.

These two things can clearly be related if you squint hard. Using β€˜transfer learning’ in this second sense irritates me slightly because it already has so many names: I would describe that problem as external validity, instead of domain adaptation but other names spotted in the wild include dataset shift, covariate shift, data fusion and there are probably more. This is a fundamental problem in statistics, and the philosophy of science generally, and has been for a long time.

Attention mechanism

See Attention mechanism.

Spike-based

Most simulated neural networks are based on a continuous activation potential and discrete time, unlike spiking biological ones, which are driven by discrete events in continuous time. There are a great many other differences (to real biology). What difference does this in particular make? I suspect it means that time is handled different.

Kernel networks

Kernel trick + ANN = kernel ANNs.

(Stay tuned for reframing more things as deep learning.)

Is this what convex networks (Bengio et al. 2005) are?

Francis Bach:

I’m sure the brain totes does this

AFAICT these all boil down to rebadged extensions of Gaussian processes but maybe I’m missing something?

Autoencoding

πŸ— Making a sparse encoding of something by demanding your network reproduces the after passing the network activations through a narrow bottleneck. Many flavours.

Optimisation methods

Backpropagation plus stochastic gradient descent rules at the moment.

Does anything else get performance at this scale? What other techniques can be extracted from variational inference or MC sampling, or particle filters, since there is no clear reason that shoving any of these in as intermediate layers in the network is any less well-posed than a classical backprop layer? Although it does require more nous from the enthusiastic grad student.

Preventing overfitting

See regularising deep learning.

Activations for neural networks

See activation functions

Implementing

See implementing neural nets.

References

Amari, Shun-ichi. 1998. β€œNatural Gradient Works Efficiently in Learning.” Neural Computation 10 (2): 251–76.
Amari, Shunichi. 1967. β€œA Theory of Adaptive Pattern Classifiers.” IEEE Transactions on Electronic Computers EC-16 (3): 299–307.
Andrychowicz, Marcin, Misha Denil, Sergio Gomez, Matthew W. Hoffman, David Pfau, Tom Schaul, and Nando de Freitas. 2016. β€œLearning to Learn by Gradient Descent by Gradient Descent.” arXiv:1606.04474 [Cs], June.
Arel, I, D C Rose, and T P Karnowski. 2010. β€œDeep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier].” IEEE Computational Intelligence Magazine 5 (4): 13–18.
Arora, Sanjeev, Rong Ge, Tengyu Ma, and Ankur Moitra. 2015. β€œSimple, Efficient, and Neural Algorithms for Sparse Coding.” In Proceedings of The 28th Conference on Learning Theory, 40:113–49. Paris, France: PMLR.
Bach, Francis. 2014. β€œBreaking the Curse of Dimensionality with Convex Neural Networks.” arXiv:1412.8690 [Cs, Math, Stat], December.
Baldassi, Carlo, Christian Borgs, Jennifer T. Chayes, Alessandro Ingrosso, Carlo Lucibello, Luca Saglietti, and Riccardo Zecchina. 2016. β€œUnreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes.” Proceedings of the National Academy of Sciences 113 (48): E7655–62.
Barron, A.R. 1993. β€œUniversal Approximation Bounds for Superpositions of a Sigmoidal Function.” IEEE Transactions on Information Theory 39 (3): 930–45.
Baydin, AtΔ±lΔ±m GΓΌneş, Barak A. Pearlmutter, and Jeffrey Mark Siskind. 2016. β€œTricks from Deep Learning.” arXiv:1611.03777 [Cs, Stat], November.
Bengio, Yoshua. 2009. Learning Deep Architectures for AI. Vol. 2.
Bengio, Yoshua, Aaron Courville, and Pascal Vincent. 2013. β€œRepresentation Learning: A Review and New Perspectives.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35: 1798–828.
Bengio, Yoshua, and Yann LeCun. 2007. β€œScaling Learning Algorithms Towards AI.” Large-Scale Kernel Machines 34: 1–41.
Bengio, Yoshua, Nicolas L. Roux, Pascal Vincent, Olivier Delalleau, and Patrice Marcotte. 2005. β€œConvex Neural Networks.” In Advances in Neural Information Processing Systems, 18:123–30. MIT Press.
Boser, B. 1991. β€œAn Analog Neural Network Processor with Programmable Topology.” J. Solid State Circuits 26: 2017–25.
Brock, Andrew, Theodore Lim, J. M. Ritchie, and Nick Weston. 2017. β€œFreezeOut: Accelerate Training by Progressively Freezing Layers.” arXiv:1706.04983 [Cs, Stat], June.
Cadieu, C. F. 2014. β€œDeep Neural Networks Rival the Representation of Primate It Cortex for Core Visual Object Recognition.” PLoS Comp. Biol. 10: e1003963.
Chen, Tianqi, Ian Goodfellow, and Jonathon Shlens. 2015. β€œNet2Net: Accelerating Learning via Knowledge Transfer.” arXiv:1511.05641 [Cs], November.
Cho, Kyunghyun, Bart van MerriΓ«nboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. β€œOn the Properties of Neural Machine Translation: Encoder-Decoder Approaches.” arXiv Preprint arXiv:1409.1259.
Choromanska, Anna, MIkael Henaff, Michael Mathieu, Gerard Ben Arous, and Yann LeCun. 2015. β€œThe Loss Surfaces of Multilayer Networks.” In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 192–204.
Ciodaro, T. 2012. β€œOnline Particle Detection with Neural Networks Based on Topological Calorimetry Information.” J. Phys. Conf. Series 368: 012030.
Ciresan, D. 2012. β€œMulti-Column Deep Neural Network for Traffic Sign Classification.” Neural Networks 32: 333–38.
Cybenko, G. 1989. β€œApproximation by Superpositions of a Sigmoidal Function.” Mathematics of Control, Signals and Systems 2: 303–14.
Dahl, G. E. 2012. β€œContext-Dependent Pre-Trained Deep Neural Networks for Large Vocabulary Speech Recognition.” IEEE Transactions on Audio, Speech and Language Processing 20: 33–42.
Dauphin, Yann, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. 2014. β€œIdentifying and Attacking the Saddle Point Problem in High-Dimensional Non-Convex Optimization.” In Advances in Neural Information Processing Systems 27, 2933–41. Curran Associates, Inc.
Dieleman, Sander, and Benjamin Schrauwen. 2014. β€œEnd to End Learning for Music Audio.” In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6964–68. IEEE.
Erhan, Dumitru, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. 2010. β€œWhy Does Unsupervised Pre-Training Help Deep Learning?” Journal of Machine Learning Research 11 (Feb): 625–60.
Farabet, C. 2013. β€œLearning Hierarchical Features for Scene Labeling.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35: 1915–29.
Fukumizu, K., and S. Amari. 2000. β€œLocal Minima and Plateaus in Hierarchical Structures of Multilayer Perceptrons.” Neural Networks 13 (3): 317–27.
Fukushima, Kunihiko, and Sei Miyake. 1982. β€œNeocognitron: A New Algorithm for Pattern Recognition Tolerant of Deformations and Shifts in Position.” Pattern Recognition 15 (6): 455–69.
Gal, Yarin, and Zoubin Ghahramani. 2016. β€œA Theoretically Grounded Application of Dropout in Recurrent Neural Networks.” In arXiv:1512.05287 [Stat].
Garcia, C. 2004. β€œConvolutional Face Finder: A Neural Architecture for Fast and Robust Face Detection.” IEEE Transactions on Pattern Analysis and Machine Intelligence 26: 1408–23.
Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. 2015. β€œA Neural Algorithm of Artistic Style.” arXiv:1508.06576 [Cs, q-Bio], August.
Giryes, Raja, Guillermo Sapiro, and Alex M. Bronstein. 2014. β€œOn the Stability of Deep Networks.” arXiv:1412.5896 [Cs, Math, Stat], December.
Giryes, R., G. Sapiro, and A. M. Bronstein. 2016. β€œDeep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?” IEEE Transactions on Signal Processing 64 (13): 3444–57.
Globerson, Amir, and Roi Livni. 2016. β€œLearning Infinite-Layer Networks: Beyond the Kernel Trick.” arXiv:1606.05316 [Cs], June.
Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. 2014. β€œExplaining and Harnessing Adversarial Examples.” arXiv:1412.6572 [Cs, Stat], December.
Goodfellow, Ian J., Oriol Vinyals, and Andrew M. Saxe. 2014. β€œQualitatively Characterizing Neural Network Optimization Problems.” arXiv:1412.6544 [Cs, Stat], December.
Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. β€œGenerative Adversarial Nets.” In Advances in Neural Information Processing Systems 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 2672–80. NIPS’14. Cambridge, MA, USA: Curran Associates, Inc.
Hadsell, R., S. Chopra, and Y. LeCun. 2006. β€œDimensionality Reduction by Learning an Invariant Mapping.” In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2:1735–42.
Hasson, Uri, Samuel A. Nastase, and Ariel Goldstein. 2020. β€œDirect Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks.” Neuron 105 (3): 416–34.
He, Kun, Yan Wang, and John Hopcroft. 2016. β€œA Powerful Generative Model Using Random Weights for the Deep Image Representation.” In Advances in Neural Information Processing Systems.
Helmstaedter, M. 2013. β€œConnectomic Reconstruction of the Inner Plexiform Layer in the Mouse Retina.” Nature 500: 168–74.
Hinton, G. E. 1995. β€œThe Wake-Sleep Algorithm for Unsupervised Neural Networks.” Science 268 (5214): 1558–1161.
Hinton, G., Li Deng, Dong Yu, G.E. Dahl, A. Mohamed, N. Jaitly, A. Senior, et al. 2012. β€œDeep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups.” IEEE Signal Processing Magazine 29 (6): 82–97.
Hinton, Geoffrey. 2010. β€œA Practical Guide to Training Restricted Boltzmann Machines.” In Neural Networks: Tricks of the Trade, 9:926. Lecture Notes in Computer Science 7700. Springer Berlin Heidelberg.
Hinton, Geoffrey E. 2007. β€œTo Recognize Shapes, First Learn to Generate Images.” In Progress in Brain Research, edited by Trevor Drew and John F. Kalaska Paul Cisek, Volume 165:535–47. Computational Neuroscience: Theoretical Insights into Brain Function. Elsevier.
Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. 2006. β€œReducing the Dimensionality of Data with Neural Networks.” Science 313 (5786): 504–7.
Hinton, G, S Osindero, and Y Teh. 2006. β€œA Fast Learning Algorithm for Deep Belief Nets.” Neural Computation 18 (7): 1527–54.
Hornik, Kurt, Maxwell Stinchcombe, and Halbert White. 1989. β€œMultilayer Feedforward Networks Are Universal Approximators.” Neural Networks 2 (5): 359–66.
Hu, Tao, Cengiz Pehlevan, and Dmitri B. Chklovskii. 2014. β€œA Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization.” In 2014 48th Asilomar Conference on Signals, Systems and Computers.
Huang, Guang-Bin, and Chee-Kheong Siew. 2005. β€œExtreme Learning Machine with Randomly Assigned RBF Kernels.” International Journal of Information Technology 11 (1): 16–24.
Huang, Guang-Bin, Dian Hui Wang, and Yuan Lan. 2011. β€œExtreme Learning Machines: A Survey.” International Journal of Machine Learning and Cybernetics 2 (2): 107–22.
Huang, Guang-Bin, Qin-Yu Zhu, and Chee-Kheong Siew. 2004. β€œExtreme Learning Machine: A New Learning Scheme of Feedforward Neural Networks.” In 2004 IEEE International Joint Conference on Neural Networks, 2004. Proceedings, 2:985–990 vol.2.
β€”β€”β€”. 2006. β€œExtreme Learning Machine: Theory and Applications.” Neurocomputing, Neural Networks Selected Papers from the 7th Brazilian Symposium on Neural Networks (SBRN ’04) 7th Brazilian Symposium on Neural Networks, 70 (1–3): 489–501.
Hubel, D. H. 1962. β€œReceptive Fields, Binocular Interaction, and Functional Architecture in the Cat’s Visual Cortex.” J. Physiol. 160: 106–54.
Jaderberg, Max, Wojciech Marian Czarnecki, Simon Osindero, Oriol Vinyals, Alex Graves, and Koray Kavukcuoglu. 2016. β€œDecoupled Neural Interfaces Using Synthetic Gradients.” arXiv:1608.05343 [Cs], August.
Kaiser, Łukasz, and Ilya Sutskever. 2015. β€œNeural GPUs Learn Algorithms.” arXiv:1511.08228 [Cs], November.
Kalchbrenner, Nal, Ivo Danihelka, and Alex Graves. 2016. β€œGrid Long Short-Term Memory.” arXiv:1507.01526 [Cs], January.
Kavukcuoglu, Koray, Marc’Aurelio Ranzato, and Yann LeCun. 2010. β€œFast Inference in Sparse Coding Algorithms with Applications to Object Recognition.” arXiv:1010.3467 [Cs], October.
Kingma, Diederik P., Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 2016. β€œImproving Variational Inference with Inverse Autoregressive Flow.” In Advances in Neural Information Processing Systems 29. Curran Associates, Inc.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2012. β€œImagenet Classification with Deep Convolutional Neural Networks.” In Advances in Neural Information Processing Systems, 1097–1105.
Kulkarni, Tejas D., Will Whitney, Pushmeet Kohli, and Joshua B. Tenenbaum. 2015. β€œDeep Convolutional Inverse Graphics Network.” arXiv:1503.03167 [Cs], March.
Larsen, Anders Boesen Lindbo, SΓΈren Kaae SΓΈnderby, Hugo Larochelle, and Ole Winther. 2015. β€œAutoencoding Beyond Pixels Using a Learned Similarity Metric.” arXiv:1512.09300 [Cs, Stat], December.
Lawrence, S. 1997. β€œFace Recognition: A Convolutional Neural-Network Approach.” IEEE Transactions on Neural Networks 8: 98–113.
LeCun, Y. 1998. β€œGradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11): 2278–2324.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. β€œDeep Learning.” Nature 521 (7553): 436–44.
LeCun, Yann, Sumit Chopra, Raia Hadsell, M. Ranzato, and F. Huang. 2006. β€œA Tutorial on Energy-Based Learning.” In Predicting Structured Data.
Lee, Honglak, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. 2009. β€œConvolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations.” In Proceedings of the 26th Annual International Conference on Machine Learning, 609–16. ICML ’09. New York, NY, USA: ACM.
Lee, Wee Sun, Peter L. Bartlett, and Robert C. Williamson. 1996. β€œEfficient Agnostic Learning of Neural Networks with Bounded Fan-in.” IEEE Transactions on Information Theory 42 (6): 2118–32.
Leung, M. K. 2014. β€œDeep Learning of the Tissue-Regulated Splicing Code.” Bioinformatics 30: i121–29.
Liang, Feynman, Marcin Tomczak, Matt Johnson, Mark Gotham, Jamie Shotten, and Bill Byrne. n.d. β€œBachBot: Deep Generative Modeling of Bach Chorales,” 1.
Lin, Henry W., and Max Tegmark. 2016a. β€œCritical Behavior from Deep Dynamics: A Hidden Dimension in Natural Language.” arXiv:1606.06737 [Cond-Mat], June.
β€”β€”β€”. 2016b. β€œWhy Does Deep and Cheap Learning Work so Well?” arXiv:1608.08225 [Cond-Mat, Stat], August.
Lipton, Zachary C. 2016a. β€œStuck in a What? Adventures in Weight Space.” arXiv:1602.07320 [Cs], February.
β€”β€”β€”. 2016b. β€œThe Mythos of Model Interpretability.” In arXiv:1606.03490 [Cs, Stat].
Lipton, Zachary C., John Berkowitz, and Charles Elkan. 2015. β€œA Critical Review of Recurrent Neural Networks for Sequence Learning.” arXiv:1506.00019 [Cs], May.
Ma, J. 2015. β€œDeep Neural Nets as a Method for Quantitative Structure-Activity Relationships.” J. Chem. Inf. Model. 55: 263–74.
Maclaurin, Dougal, David Duvenaud, and Ryan Adams. 2015. β€œGradient-Based Hyperparameter Optimization Through Reversible Learning.” In Proceedings of the 32nd International Conference on Machine Learning, 2113–22. PMLR.
Mallat, StΓ©phane. 2012. β€œGroup Invariant Scattering.” Communications on Pure and Applied Mathematics 65 (10): 1331–98.
β€”β€”β€”. 2016. β€œUnderstanding Deep Convolutional Networks.” arXiv:1601.04920 [Cs, Stat], January.
Mehta, Pankaj, and David J. Schwab. 2014. β€œAn Exact Mapping Between the Variational Renormalization Group and Deep Learning.” arXiv.
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. β€œEfficient Estimation of Word Representations in Vector Space.” arXiv:1301.3781 [Cs], January.
Mikolov, Tomas, Quoc V. Le, and Ilya Sutskever. 2013. β€œExploiting Similarities Among Languages for Machine Translation.” arXiv:1309.4168 [Cs], September.
Mnih, V. 2015. β€œHuman-Level Control Through Deep Reinforcement Learning.” Nature 518: 529–33.
Mohamed, A. r, G. E. Dahl, and G. Hinton. 2012. β€œAcoustic Modeling Using Deep Belief Networks.” IEEE Transactions on Audio, Speech, and Language Processing 20 (1): 14–22.
Monner, Derek, and James A. Reggia. 2012. β€œA Generalized LSTM-Like Training Algorithm for Second-Order Recurrent Neural Networks.” Neural Networks 25 (January): 70–83.
Montufar, G. 2014. β€œWhen Does a Mixture of Products Contain a Product of Mixtures?” J. Discrete Math. 29: 321–47.
Mousavi, Ali, and Richard G. Baraniuk. 2017. β€œLearning to Invert: Signal Recovery via Deep Convolutional Networks.” In ICASSP.
Ning, F. 2005. β€œToward Automatic Phenotyping of Developing Embryos from Videos.” IEEE Transactions on Image Processing 14: 1360–71.
NΓΈkland, Arild. 2016. β€œDirect Feedback Alignment Provides Learning in Deep Neural Networks.” In Advances In Neural Information Processing Systems.
Olshausen, B. A., and D. J. Field. 1996. β€œNatural image statistics and efficient coding.” Network (Bristol, England) 7 (2): 333–39.
Olshausen, Bruno A., and David J. Field. 1996. β€œEmergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images.” Nature 381 (6583): 607–9.
Olshausen, Bruno A, and David J Field. 2004. β€œSparse Coding of Sensory Inputs.” Current Opinion in Neurobiology 14 (4): 481–87.
Oord, Aaron van den, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. β€œWaveNet: A Generative Model for Raw Audio.” In 9th ISCA Speech Synthesis Workshop.
Oord, AΓ€ron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. 2016. β€œPixel Recurrent Neural Networks.” arXiv:1601.06759 [Cs], January.
Oord, AΓ€ron van den, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. 2016. β€œConditional Image Generation with PixelCNN Decoders.” arXiv:1606.05328 [Cs], June.
Pan, Wei, Hao Dong, and Yike Guo. 2016. β€œDropNeuron: Simplifying the Structure of Deep Neural Networks.” arXiv:1606.07326 [Cs, Stat], June.
Parisotto, Emilio, and Ruslan Salakhutdinov. 2017. β€œNeural Map: Structured Memory for Deep Reinforcement Learning.” arXiv:1702.08360 [Cs], February.
Pascanu, Razvan, Yann N. Dauphin, Surya Ganguli, and Yoshua Bengio. 2014. β€œOn the Saddle Point Problem for Non-Convex Optimization.” arXiv:1405.4604 [Cs], May.
Paul, Arnab, and Suresh Venkatasubramanian. 2014. β€œWhy Does Deep Learning Work? - A Perspective from Group Theory.” arXiv:1412.6621 [Cs, Stat], December.
Pinkus, Allan. 1999. β€œApproximation Theory of the MLP Model in Neural Networks.” Acta Numerica 8 (January): 143–95.
Radford, Alec, Luke Metz, and Soumith Chintala. 2015. β€œUnsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks.” In arXiv:1511.06434 [Cs].
Ranzato, M. 2013. β€œModeling Natural Images Using Gated MRFs.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (9): 2206–22.
Ranzato, Marc’aurelio, Y.-lan Boureau, and Yann L. Cun. 2008. β€œSparse Feature Learning for Deep Belief Networks.” In Advances in Neural Information Processing Systems 20, edited by J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, 1185–92. Curran Associates, Inc.
Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. 1986. β€œLearning Representations by Back-Propagating Errors.” Nature 323 (6088): 533–36.
Sagun, Levent, V. Ugur Guney, Gerard Ben Arous, and Yann LeCun. 2014. β€œExplorations on High Dimensional Landscapes.” arXiv:1412.6615 [Cs, Stat], December.
Salimans, Tim, and Diederik P Kingma. 2016. β€œWeight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 901–1. Curran Associates, Inc.
Scardapane, Simone, Danilo Comminiello, Amir Hussain, and Aurelio Uncini. 2016. β€œGroup Sparse Regularization for Deep Neural Networks.” arXiv:1607.00485 [Cs, Stat], July.
Schmidhuber, Juergen. 2022. β€œAnnotated History of Modern AI and Deep Learning.” arXiv.
Shazeer, Noam, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. 2017. β€œOutrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.” arXiv:1701.06538 [Cs, Stat], January.
Shwartz-Ziv, Ravid, and Naftali Tishby. 2017. β€œOpening the Black Box of Deep Neural Networks via Information.” arXiv:1703.00810 [Cs], March.
Smith, Leslie N., and Nicholay Topin. 2017. β€œExploring Loss Function Topology with Cyclical Learning Rates.” arXiv:1702.04283 [Cs], February.
Springenberg, Jost Tobias, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. 2014. β€œStriving for Simplicity: The All Convolutional Net.” In Proceedings of International Conference on Learning Representations (ICLR) 2015.
Starr, M. Allen (Moses Allen). 1913. Organic and functional nervous diseases; a text-book of neurology. New York, Philadelphia, Lea & Febiger.
Steeg, Greg ver, and Aram Galstyan. 2015. β€œThe Information Sieve.” arXiv:1507.02284 [Cs, Math, Stat], July.
Telgarsky, Matus. 2015. β€œRepresentation Benefits of Deep Feedforward Networks.” arXiv:1509.08101 [Cs], September.
Turaga, S. C. 2010. β€œConvolutional Networks Can Learn to Generate Affinity Graphs for Image Segmentation.” Neural Comput. 22: 511–38.
Urban, Gregor, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan, Shengjie Wang, Rich Caruana, Abdelrahman Mohamed, Matthai Philipose, and Matt Richardson. 2016. β€œDo Deep Convolutional Nets Really Need to Be Deep (Or Even Convolutional)?” arXiv:1603.05691 [Cs, Stat], March.
Wiatowski, Thomas, and Helmut BΓΆlcskei. 2015. β€œA Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction.” In Proceedings of IEEE International Symposium on Information Theory.
Wiatowski, Thomas, Philipp Grohs, and Helmut BΓΆlcskei. 2018. β€œEnergy Propagation in Deep Convolutional Neural Networks.” IEEE Transactions on Information Theory 64 (7): 1–1.
Xie, Bo, Yingyu Liang, and Le Song. 2016. β€œDiversity Leads to Generalization in Neural Networks.” arXiv:1611.03131 [Cs, Stat], November.
Yu, D., and L. Deng. 2011. β€œDeep Learning and Its Applications to Signal and Information Processing [Exploratory DSP].” IEEE Signal Processing Magazine 28 (1): 145–54.
Zhang, Chiyuan, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2017. β€œUnderstanding Deep Learning Requires Rethinking Generalization.” In Proceedings of ICLR.
Zhang, Sixin, Anna Choromanska, and Yann LeCun. 2015. β€œDeep Learning with Elastic Averaging SGD.” In Advances In Neural Information Processing Systems.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.