Neural net attention mechanisms

On brilliance through selective ignorance

Attention, self attention… What are these things? I am no expert, so see some good blog posts explaining everything:

There is a lot of activity in a particular type of attention network, the transformer, which is a neural network architecture that is very good at processing sequential data, such as text. The transformer is a stack of attention layers, and the attention mechanism is the key to its success.


