Neural net attention mechanisms

On brilliance through selective ignorance



Attention, self attention… What are these things? I am no expert, so see some good blog posts explaining everything:

There is a lot of activity in a particular type of attention network, the transformer, which is a neural network architecture that is very good at processing sequential data, such as text. The transformer is a stack of attention layers, and the attention mechanism is the key to its success.

References

Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate.” In. arXiv.
Cao, Shuhao. 2021. Choose a Transformer: Fourier or Galerkin.” In Advances in Neural Information Processing Systems, 34:24924–40. Curran Associates, Inc.
Celikyilmaz, Asli, Li Deng, Lihong Li, and Chong Wang. 2017. Scaffolding Networks for Teaching and Learning to Comprehend.” arXiv:1702.08653 [Cs], February.
Chen, Xinhai, Rongliang Chen, Qian Wan, Rui Xu, and Jie Liu. 2021. An Improved Data-Free Surrogate Model for Solving Partial Differential Equations Using Deep Neural Networks.” Scientific Reports 11 (September): 19507.
Choy, Christopher B, JunYoung Gwak, Silvio Savarese, and Manmohan Chandraker. 2016. Universal Correspondence Network.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2406–14. Curran Associates, Inc.
Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-Based Neural Machine Translation.” arXiv.
Ortega, Pedro A., Markus Kunesch, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Joel Veness, Jonas Buchli, et al. 2021. Shaking the Foundations: Delusions in Sequence Models for Interaction and Control.” arXiv:2110.10819 [Cs], October.
Ramsauer, Hubert, Bernhard Schäfl, Johannes Lehner, Philipp Seidl, Michael Widrich, Lukas Gruber, Markus Holzleitner, et al. 2020. Hopfield Networks Is All You Need.” arXiv:2008.02217 [Cs, Stat], July.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need.” arXiv:1706.03762 [Cs], June.
Yang, Greg, and Edward J. Hu. 2020. Feature Learning in Infinite-Width Neural Networks.” arXiv:2011.14522 [Cond-Mat], November.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.