There is a whole cottage industry in showing neural networks are reasonably universal function approximators with various nonlinearities as activations, under various conditions. In practice you can take this as a given. Nonetheless, you might like to play with the precise form of the nonlinearities, even making them themselves directly learnable, because some function shapes might have better approximation properties with respect to various assumptions on the learning problems, in a sense which I will not attempt to make rigorous now, vague hand-waving arguments being the whole point of deep learning.

I think a large part of this field has been subsumed into the stability-of-dynamical-systems setting?

Nonetheless, here are a some handy references.

The current default activation function is ReLU, i.e. \(x\mapsto \max\{0,x\}\), which has many nice properties. However, it does lead to piecewise linear spline approximators which makes it hard to solve differential equations. Other classic approximators such as \(x\mapsto\tanh x\) have fallen from favour. Sitzmann et al. (2020) argues for \(x\mapsto\sin x\) which has some handy properties but requires good initialisation.

The virtues of differentiable activation functions in Implicit Neural Representations with Periodic Activation Functions.

## References

*Proceedings of International Conference on Learning Representations (ICLR) 2015*. http://arxiv.org/abs/1412.6830.

*Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48*, 1120–28. ICML’16. New York, NY, USA: JMLR.org. http://arxiv.org/abs/1511.06464.

*PMLR*, 342–50. http://proceedings.mlr.press/v70/balduzzi17b.html.

*Proceedings of the 22nd International Conference on Neural Information Processing Systems*, 22:342–50. NIPS’09. Red Hook, NY, USA: Curran Associates Inc. https://papers.nips.cc/paper/2009/hash/5751ec3e9a4feab575962e78e006250d-Abstract.html.

*Proceedings of ICLR*. http://arxiv.org/abs/1511.07289.

*Aistats*, 9:249–56. http://www.jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf?hc_location=ufi.

*Aistats*, 15:275. http://www.jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf.

*ICML (3)*, 28:1319–27. http://arxiv.org/abs/1302.4389.

*International Journal of Uncertainty Fuzziness and Knowledge Based Systems*6: 107–15. http://www.worldscientific.com/doi/abs/10.1142/S0218488598000094.

*A Field Guide to Dynamical Recurrent Neural Networks*. IEEE Press. http://www.bioinf.jku.at/publications/older/ch7.pdf.

*ICLR*. http://arxiv.org/abs/1711.00165.

*Proceedings of ICML*. Vol. 30. https://web.stanford.edu/~awni/papers/relu_hybrid_icml2013_final.pdf.

*Advances in Neural Information Processing Systems*, 4880–88. http://papers.nips.cc/paper/6327-full-capacity-unitary-recurrent-neural-networks.