There is a whole cottage industry in showing neural networks are reasonably universal function approximators with fairly general nonlinearities as activations, under fairly general conditions. Nonetheless, you might like to play with the precise form of the nonlinearities, even making them themselves directly learnable, because some function shapes might have better approximation properties in a sense I will not trouble to make rigorous now, vague hand-waving arguments being the whole point of deep learning.

I think a large part of this field has been subsumed into the stability-of-dynamical-systems setting?

Nonetheless, here are a some handy references.

Agostinelli, Forest, Matthew Hoffman, Peter Sadowski, and Pierre Baldi. 2015. â€śLearning Activation Functions to Improve Deep Neural Networks.â€ť In *Proceedings of International Conference on Learning Representations (ICLR) 2015*. http://arxiv.org/abs/1412.6830.

Anil, Cem, James Lucas, and Roger Grosse. 2018. â€śSorting Out Lipschitz Function Approximation,â€ť November. https://arxiv.org/abs/1811.05381v1.

Arjovsky, Martin, Amar Shah, and Yoshua Bengio. 2016. â€śUnitary Evolution Recurrent Neural Networks.â€ť In *Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48*, 1120â€“8. ICMLâ€™16. New York, NY, USA: JMLR.org. http://arxiv.org/abs/1511.06464.

Balduzzi, David, Marcus Frean, Lennox Leary, J. P. Lewis, Kurt Wan-Duo Ma, and Brian McWilliams. 2017. â€śThe Shattered Gradients Problem: If Resnets Are the Answer, Then What Is the Question?â€ť In *PMLR*, 342â€“50. http://proceedings.mlr.press/v70/balduzzi17b.html.

Clevert, Djork-ArnĂ©, Thomas Unterthiner, and Sepp Hochreiter. 2016. â€śFast and Accurate Deep Network Learning by Exponential Linear Units (ELUs).â€ť In *Proceedings of ICLR*. http://arxiv.org/abs/1511.07289.

Glorot, Xavier, and Yoshua Bengio. 2010. â€śUnderstanding the Difficulty of Training Deep Feedforward Neural Networks.â€ť In *Aistats*, 9:249â€“56. http://www.jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf?hc_location=ufi.

Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. 2011. â€śDeep Sparse Rectifier Neural Networks.â€ť In *Aistats*, 15:275. http://www.jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf.

Goodfellow, Ian J., David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. 2013. â€śMaxout Networks.â€ť In *ICML (3)*, 28:1319â€“27. http://arxiv.org/abs/1302.4389.

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015a. â€śDelving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,â€ť February. http://arxiv.org/abs/1502.01852.

â€”â€”â€”. 2015b. â€śDeep Residual Learning for Image Recognition.â€ť http://arxiv.org/abs/1512.03385.

â€”â€”â€”. 2016. â€śIdentity Mappings in Deep Residual Networks.â€ť In. http://arxiv.org/abs/1603.05027.

Hochreiter, Sepp. 1998. â€śThe Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions.â€ť *International Journal of Uncertainty Fuzziness and Knowledge Based Systems* 6: 107â€“15. http://www.worldscientific.com/doi/abs/10.1142/S0218488598000094.

Hochreiter, Sepp, Yoshua Bengio, Paolo Frasconi, and JĂĽrgen Schmidhuber. 2001. â€śGradient Flow in Recurrent Nets: The Difficulty of Learning Long-Term Dependencies.â€ť In *A Field Guide to Dynamical Recurrent Neural Networks*. IEEE Press. http://www.bioinf.jku.at/publications/older/ch7.pdf.

Klambauer, GĂĽnter, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017. â€śSelf-Normalizing Neural Networks,â€ť June. http://arxiv.org/abs/1706.02515.

Maas, Andrew L., Awni Y. Hannun, and Andrew Y. Ng. 2013. â€śRectifier Nonlinearities Improve Neural Network Acoustic Models.â€ť In *Proceedings of ICML*. Vol. 30. https://web.stanford.edu/~awni/papers/relu_hybrid_icml2013_final.pdf.

Pascanu, Razvan, Tomas Mikolov, and Yoshua Bengio. 2013. â€śOn the Difficulty of Training Recurrent Neural Networks.â€ť In, 1310â€“8. http://arxiv.org/abs/1211.5063.

Srivastava, Rupesh Kumar, Klaus Greff, and JĂĽrgen Schmidhuber. 2015. â€śHighway Networks.â€ť In. http://arxiv.org/abs/1505.00387.

Wisdom, Scott, Thomas Powers, John Hershey, Jonathan Le Roux, and Les Atlas. 2016. â€śFull-Capacity Unitary Recurrent Neural Networks.â€ť In *Advances in Neural Information Processing Systems*, 4880â€“8. http://papers.nips.cc/paper/6327-full-capacity-unitary-recurrent-neural-networks.