Learnable memory
2021-03-02 — 2021-03-02
Wherein the question of how learning algorithms store and retrieve memories at inference is posed, how such memories are distinguished from weight storage is considered, and learnable longer-term transformer memory via extended context windows is introduced.
At the training time of a neural net, say, we are storing something like memories in the weights. How best should learning algorithms store and retrieve memories at inference time?
As my colleague Tom Blau points out, perhaps best considered as a topic in its own right. Is it distinct from continual learning?
Implicit in recurrent networks. One of the chief advantages of neural Turing machines is that they make this need explicit. A great trick of transformers is that they have an idea of what to “remember” baked in to their context window, in the sense of what they attend to, although it is not so clear how to make this general. Behrouz, Zhong, and Mirrokni (2024) introduce learnable longer-term memory to transformers.