Agostinelli, Forest, Matthew Hoffman, Peter Sadowski, and Pierre Baldi. 2015.
“Learning Activation Functions to Improve Deep Neural Networks.” In
Proceedings of International Conference on Learning Representations (ICLR) 2015.
http://arxiv.org/abs/1412.6830.
Anil, Cem, James Lucas, and Roger Grosse. 2018.
“Sorting Out Lipschitz Function Approximation,” November.
https://arxiv.org/abs/1811.05381v1.
Arjovsky, Martin, Amar Shah, and Yoshua Bengio. 2016.
“Unitary Evolution Recurrent Neural Networks.” In
Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, 1120–28.
ICML’16.
New York, NY, USA:
JMLR.org.
http://arxiv.org/abs/1511.06464.
Balduzzi, David, Marcus Frean, Lennox Leary, J. P. Lewis, Kurt Wan-Duo Ma, and Brian McWilliams. 2017.
“The Shattered Gradients Problem: If Resnets Are the Answer, Then What Is the Question?” In
PMLR, 342–50.
http://proceedings.mlr.press/v70/balduzzi17b.html.
Cho, Youngmin, and Lawrence K. Saul. 2009.
“Kernel Methods for Deep Learning.” In
Proceedings of the 22nd International Conference on Neural Information Processing Systems, 22:342–50.
NIPS’09.
Red Hook, NY, USA:
Curran Associates Inc. https://papers.nips.cc/paper/2009/hash/5751ec3e9a4feab575962e78e006250d-Abstract.html.
Clevert, Djork-Arné, Thomas Unterthiner, and Sepp Hochreiter. 2016.
“Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs).” In
Proceedings of ICLR.
http://arxiv.org/abs/1511.07289.
Duch, Włodzisław, and Norbert Jankowski. 1999.
“Survey of Neural Transfer Functions.” ftp://ftp.icsi.berkeley.edu/pub/ai/jagota/vol2_6.pdf.
Glorot, Xavier, and Yoshua Bengio. 2010.
“Understanding the Difficulty of Training Deep Feedforward Neural Networks.” In
Aistats, 9:249–56.
http://www.jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf?hc_location=ufi.
Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. 2011.
“Deep Sparse Rectifier Neural Networks.” In
Aistats, 15:275.
http://www.jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf.
Goodfellow, Ian J., David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. 2013.
“Maxout Networks.” In
ICML (3), 28:1319–27.
http://arxiv.org/abs/1302.4389.
Hayou, Soufiane, Arnaud Doucet, and Judith Rousseau. 2019.
“On the Impact of the Activation Function on Deep Neural Networks Training.” May 26, 2019.
http://arxiv.org/abs/1902.06853.
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015a.
“Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.” February 6, 2015.
http://arxiv.org/abs/1502.01852.
———. 2015b.
“Deep Residual Learning for Image Recognition.” http://arxiv.org/abs/1512.03385.
———. 2016.
“Identity Mappings in Deep Residual Networks.” In.
http://arxiv.org/abs/1603.05027.
Hochreiter, Sepp. 1998.
“The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions.” International Journal of Uncertainty Fuzziness and Knowledge Based Systems 6: 107–15.
http://www.worldscientific.com/doi/abs/10.1142/S0218488598000094.
Hochreiter, Sepp, Yoshua Bengio, Paolo Frasconi, and Jürgen Schmidhuber. 2001.
“Gradient Flow in Recurrent Nets: The Difficulty of Learning Long-Term Dependencies.” In
A Field Guide to Dynamical Recurrent Neural Networks.
IEEE Press.
http://www.bioinf.jku.at/publications/older/ch7.pdf.
Klambauer, Günter, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017.
“Self-Normalizing Neural Networks.” June 8, 2017.
http://arxiv.org/abs/1706.02515.
Laurent, Thomas. n.d. “The Multilinear Structure of ReLU Networks,” 9.
Lederer, Johannes. 2021.
“Activation Functions in Artificial Neural Networks: A Systematic Overview.” January 25, 2021.
http://arxiv.org/abs/2101.09957.
Lee, Jaehoon, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, and Jascha Sohl-Dickstein. 2018.
“Deep Neural Networks as Gaussian Processes.” In
ICLR.
http://arxiv.org/abs/1711.00165.
Maas, Andrew L., Awni Y. Hannun, and Andrew Y. Ng. 2013.
“Rectifier Nonlinearities Improve Neural Network Acoustic Models.” In
Proceedings of ICML. Vol. 30.
https://web.stanford.edu/~awni/papers/relu_hybrid_icml2013_final.pdf.
Pascanu, Razvan, Tomas Mikolov, and Yoshua Bengio. 2013.
“On the Difficulty of Training Recurrent Neural Networks.” In, 1310–18.
http://arxiv.org/abs/1211.5063.
Sitzmann, Vincent, Julien N. P. Martel, Alexander W. Bergman, David B. Lindell, and Gordon Wetzstein. 2020.
“Implicit Neural Representations with Periodic Activation Functions.” June 17, 2020.
http://arxiv.org/abs/2006.09661.
Srivastava, Rupesh Kumar, Klaus Greff, and Jürgen Schmidhuber. 2015.
“Highway Networks.” In.
http://arxiv.org/abs/1505.00387.
Wisdom, Scott, Thomas Powers, John Hershey, Jonathan Le Roux, and Les Atlas. 2016.
“Full-Capacity Unitary Recurrent Neural Networks.” In
Advances in Neural Information Processing Systems, 4880–88.
http://papers.nips.cc/paper/6327-full-capacity-unitary-recurrent-neural-networks.
Yang, Greg, and Hadi Salman. 2020.
“A Fine-Grained Spectral Perspective on Neural Networks.” April 9, 2020.
http://arxiv.org/abs/1907.10599.