There is a whole cottage industry in showing neural networks are reasonably universal function approximators with various nonlinearities as activations, under various conditions. Usually we take it as a given that the particular activation function is not too important.

Sometimes, we might like to play with the precise form of the nonlinearities, even making the nonlinearities themselves directly learnable, because some function shapes might have better approximation properties with respect to various assumptions on the learning problems, in a sense which I will not attempt to make rigorous now, vague hand-waving arguments being the whole point of deep learning.

I think a part of this field has been subsumed into the stability-of-dynamical-systems setting? Or we do not care because something-something BatchNorm?

The current default activation function is ReLU, i.e. \(x\mapsto \max\{0,x\}\), which has many nice properties. However, it does lead to piecewise linear spline approximators. One could regard that as a plus (Unser 2019) but OTOH that makes it hard to solve differential equations.

Sometimes, then, we want something different. Other classic activations such as \(x\mapsto\tanh x\) have fallen from favour, supplanted by ReLU. However, differentiable activations are useful, especially if higher-order gradients of the solution will be important. Many virtues of differentiable activation functions are documented Implicit Neural Representations with Periodic Activation Functions. Sitzmann et al. (2020) argues for \(x\mapsto\sin x\) on the basis of various handy properties. Ramachandran, Zoph, and Le (2017) advocate Swish, \(x\mapsto \frac{x}{1+\exp -x}.\)

Other fun things, SELU, the βself-normalizingβ SELU (scaled exponential linear unit) Klambauer et al. (2017).

## References

*Proceedings of International Conference on Learning Representations (ICLR) 2015*.

*Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48*, 1120β28. ICMLβ16. New York, NY, USA: JMLR.org.

*PMLR*, 342β50.

*Proceedings of the 22nd International Conference on Neural Information Processing Systems*, 22:342β50. NIPSβ09. Red Hook, NY, USA: Curran Associates Inc.

*Proceedings of ICLR*.

*Aistats*, 9:249β56.

*Aistats*, 15:275.

*ICML (3)*, 28:1319β27.

*Proceedings of the 36th International Conference on Machine Learning*, 2672β80. PMLR.

*arXiv:1502.01852 [Cs]*, February.

*arXiv:1603.05027 [Cs]*.

*International Journal of Uncertainty Fuzziness and Knowledge Based Systems*6: 107β15.

*A Field Guide to Dynamical Recurrent Neural Networks*. IEEE Press.

*Proceedings of the 31st International Conference on Neural Information Processing Systems*, 972β81. Red Hook, NY, USA: Curran Associates Inc.

*arXiv:2101.09957 [Cs, Stat]*, January.

*ICLR*.

*Proceedings of ICML*. Vol. 30.

*arXiv:1211.5063 [Cs]*, 1310β18.

*arXiv:1806.08734 [Cs, Stat]*, May.

*arXiv:1710.05941 [Cs]*, October.

*arXiv:2006.09661 [Cs, Eess]*, June.

*arXiv:1505.00387 [Cs]*.

*Journal of Machine Learning Research*20 (110): 30.

*Advances in Neural Information Processing Systems*, 4880β88.

*arXiv:1907.10599 [Cs, Stat]*, April.

## No comments yet. Why not leave one?