Neural net attention mechanisms

On brilliance through selective ignorance

2017-12-20 — 2022-08-05

language

machine learning

neural nets

NLP

Suspiciously similar content

Attention, self-attention… What are these things? I am no expert, so see some good blog posts explaining everything:

Attention? Attention! credits, in order, Bahdanau, Cho, and Bengio (2015), then Luong, Pham, and Manning (2015), and then Vaswani et al. (2017) with the series of innovations that led to modern transformer models
Visualising A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) – Jay Alammar – Visualising machine learning one concept at a time.

There is a lot of activity in a particular type of attention network, the transformer, which is a neural network architecture that is very good at processing sequential data, such as text. The transformer is a stack of attention layers, and the attention mechanism is the key to its success.

1 Incoming

ELI5: FlashAttention. Step by step explanation of how one of…

2 References

Bahdanau, Cho, and Bengio. 2015. “Neural Machine Translation by Jointly Learning to Align and Translate.” In.

Bodnar, Bruinsma, Lucic, et al. 2024. “Aurora: A Foundation Model of the Atmosphere.”

Cao. 2021. “Choose a Transformer: Fourier or Galerkin.” In Advances in Neural Information Processing Systems.

Celikyilmaz, Deng, Li, et al. 2017. “Scaffolding Networks for Teaching and Learning to Comprehend.” arXiv:1702.08653 [Cs].

Chen, Chen, Wan, et al. 2021. “An Improved Data-Free Surrogate Model for Solving Partial Differential Equations Using Deep Neural Networks.” Scientific Reports.

Choy, Gwak, Savarese, et al. 2016. “Universal Correspondence Network.” In Advances in Neural Information Processing Systems 29.

Khatri, Laakkonen, Liu, et al. 2024. “On the Anatomy of Attention.”

Kim, Mnih, Schwarz, et al. 2019. “Attentive Neural Processes.”

Luong, Pham, and Manning. 2015. “Effective Approaches to Attention-Based Neural Machine Translation.”

Ortega, Kunesch, Delétang, et al. 2021. “Shaking the Foundations: Delusions in Sequence Models for Interaction and Control.” arXiv:2110.10819 [Cs].

Qin, Zhu, Qin, et al. 2019. “Recurrent Attentive Neural Process for Sequential Data.”

Ramsauer, Schäfl, Lehner, et al. 2020. “Hopfield Networks Is All You Need.” arXiv:2008.02217 [Cs, Stat].

Vaswani, Shazeer, Parmar, et al. 2017. “Attention Is All You Need.” arXiv:1706.03762 [Cs].

Yang, and Hu. 2020. “Feature Learning in Infinite-Width Neural Networks.” arXiv:2011.14522 [Cond-Mat].