Learnable memory

2021-03-02 — 2021-03-02

Wherein the question of how learning algorithms store and retrieve memories at inference is posed, how such memories are distinguished from weight storage is considered, and learnable longer-term transformer memory via extended context windows is introduced.

compsci

language

machine learning

neural nets

NLP

At the training time of a neural net, say, we are storing something like memories in the weights. How best should learning algorithms store and retrieve memories at inference time?

As my colleague Tom Blau points out, perhaps best considered as a topic in its own right. Is it distinct from continual learning?

Implicit in recurrent networks. One of the chief advantages of neural Turing machines is that they make this need explicit. A great trick of transformers is that they have an idea of what to “remember” baked in to their context window, in the sense of what they attend to, although it is not so clear how to make this general. Behrouz, Zhong, and Mirrokni (2024) introduce learnable longer-term memory to transformers.

1 References

Behrouz, Zhong, and Mirrokni. 2024. “Titans: Learning to Memorize at Test Time.”

Charles, Yin, and Rozell. 2016. “Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks.” arXiv:1605.08346 [Cs, Math, Stat].

Dehghani, Gouws, Vinyals, et al. 2019. “Universal Transformers.” In.

Gordo, Almazan, Revaud, et al. 2016. “End-to-End Learning of Deep Visual Representations for Image Retrieval.” arXiv:1610.07940 [Cs].

Graves, Wayne, Reynolds, et al. 2016. “Hybrid Computing Using a Neural Network with Dynamic External Memory.” Nature.

Grefenstette, Hermann, Suleyman, et al. 2015. “Learning to Transduce with Unbounded Memory.” arXiv:1506.02516 [Cs].

Hochreiter, and Schmidhuber. 1997. “Long Short-Term Memory.” Neural Computation.

Munkhdalai, Sordoni, Wang, et al. 2019. “Metalearned Neural Memory.” In Advances In Neural Information Processing Systems.

Nagathan, Mungara, and Manimozhi. 2014. “Content-Based Image Retrieval System Using Feed-Forward Backpropagation Neural Network.” International Journal of Computer Science and Network Security (IJCSNS).

Patraucean, Handa, and Cipolla. 2015. “Spatio-Temporal Video Autoencoder with Differentiable Memory.” arXiv:1511.06309 [Cs].

Perez, and Liu. 2016. “Gated End-to-End Memory Networks.” arXiv:1610.04211 [Cs, Stat].

Schuurmans, Dai, and Zanini. 2024. “Autoregressive Large Language Models Are Computationally Universal.”

Voelker, Kajic, and Eliasmith. n.d. “Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks.”

Weiss, Goldberg, and Yahav. 2018. “Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples.” In International Conference on Machine Learning.

Weston, Chopra, and Bordes. 2014. “Memory Networks.” arXiv:1410.3916 [Cs, Stat].

Zhan, Xie, Mao, et al. 2022. “Evaluating Interpolation and Extrapolation Performance of Neural Retrieval Models.” In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. CIKM ’22.