Recurrent neural networks



Feedback networks structured to have memory and a notion of “current” and “past” states, which can encode time (or whatever). Many wheels are re-invented with these, but the essential model is that we have a heavily nonlinear state filter inferred by gradient descent.

The connection with these and convolutional neural networks is suggestive for the same reason.

Many different flavours and topologies. On the border with deep automata.

Here I mostly talk about RNNs which have what I would call an uninterpretable hidden state. If we are interested in actually learning about dynamics from some meaningful state, I think of those more as Neural networks that learn dynamics.

Intro

As someone who does a lot of signal processing for music, the notion that these generalise linear systems theory is suggestive of interesting DSP applications, e.g. generative music.

Flavours

Linear

If the NN has no nonlinear activations then it is simply a linear system, e.g. an ARIMA model. As seen in classical signal processing. Learning such models by classical gradient descent can be painful, but the tools to mitigate that problem are well-understood even if they are not always feasible. The essential insight is that the propagation of linear updates through a dynamical system can be explosive, but there are analyses of the system which mitigate this problem. See Stability of linear dynamical systems for some useful tricks in the general systems stability case. TBD: discussing this in the context of learning.

Vanilla non-linear

Imagine an ARIMA-type model, as above, but now with nonlinear activations in the gradient update step (Werbos 1990; Elman 1990). These can be even less reliable to train than classic linear models (Y. Bengio, Simard, and Frasconi 1994). The next few flavours are proposed solutions for that.

Long Short Term Memory (LSTM)

The workhorse.

As always, Christopher Olah wins the visual explanation prize: Understanding LSTM Networks. Also neat: LSTM Networks for Sentiment Analysis: Alex Graves (Graves 2013) generates handwriting.

In a traditional recurrent neural network, during the gradient back-propagation phase, the gradient signal can end up being multiplied a large number of times (as many as the number of timesteps) by the weight matrix associated with the connections between the neurons of the recurrent hidden layer. This means that, the magnitude of weights in the transition matrix can have a strong impact on the learning process. […]

These issues are the main motivation behind the LSTM model which introduces a new structure called a memory cell. […] A memory cell is composed of four main elements: an input gate, a neuron with a self-recurrent connection (a connection to itself), a forget gate and an output gate. […] The gates serve to modulate the interactions between the memory cell itself and its environment.

Gate Recurrent Unit (GRU)

Simpler than the LSTM, although you end up needing a couple more units, so I am told. wings and roundabouts. (Chung et al. 2014; Chung, Gulcehre, et al. 2015)

Unitary

Charming connection with my other research into acoustics, what I would call “Gerzon allpass” filters or orthonormal matrices are useful in neural networks because of favourable normalisation characteristics and general dynamical considerations.

Probabilistic

i.e. Kalman fiters, but rebranded in the fine neural networks tradition of taking something uncontroversial from another field and putting the word “neural” in front. Practically these are usually variational, but there are some random sampling based ones.

🏗

Phased

Long story. Something I meant to follow up because I met a guy in a poster session (Neil, Pfeiffer, and Liu 2016). Possibly subsumed into attention mechanisms?

Attention

The current hotness in time series prediction is transformer-type methods which are a whole research area unto themselves.

Reservoir computing

Reservoir computing models seem to be a kooky type of RNN.

Connection with continuous time

Clearly related to NODEs. Some methods exploit both - e.g. Gu et al. (2021).

Other

TBD

recursive estimation

See recursive identification for generic theory of learning under the distribution shift induced by a moving parameter vector.

practicalities

Loading data

pytorch-forecasting has a utility class TimeSeriesDataSet which loads up examples for us, which is nice; However it seems to wish to have the prediction data for each time step be (comparatively) low dimensional, possibly tabular data, so it is not clear how to use it for predicting dense matrices or tensors. i.e. looks handy for predicting stock prices, but not so much for predicting video frames.

References

Aicher, Christopher, Nicholas J. Foti, and Emily B. Fox. 2020. Adaptively Truncating Backpropagation Through Time to Control Gradient Bias.” In Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, 799–808. PMLR.
Allen-Zhu, Zeyuan, and Yuanzhi Li. 2019. Can SGD Learn Recurrent Neural Networks with Provable Generalization? arXiv:1902.01028 [Cs, Math, Stat], February.
Anderson, Alexander G., and Cory P. Berg. 2017. The High-Dimensional Geometry of Binary Neural Networks.” arXiv:1705.07199 [Cs], May.
Arisoy, Ebru, Tara N. Sainath, Brian Kingsbury, and Bhuvana Ramabhadran. 2012. “Deep Neural Network Language Models.” In Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-Gram Model? On the Future of Language Modeling for HLT, 20–28. WLM ’12. Montreal, Canada: Association for Computational Linguistics.
Arjovsky, Martin, Amar Shah, and Yoshua Bengio. 2016. Unitary Evolution Recurrent Neural Networks.” In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, 1120–28. ICML’16. New York, NY, USA: JMLR.org.
Auer, Peter, Harald Burgsteiner, and Wolfgang Maass. 2008. A Learning Rule for Very Simple Universal Approximators Consisting of a Single Layer of Perceptrons.” Neural Networks 21 (5): 786–95.
Balduzzi, David, Marcus Frean, Lennox Leary, J. P. Lewis, Kurt Wan-Duo Ma, and Brian McWilliams. 2017. The Shattered Gradients Problem: If Resnets Are the Answer, Then What Is the Question? In PMLR, 342–50.
Bazzani, Loris, Lorenzo Torresani, and Hugo Larochelle. 2017. “Recurrent Mixture Density Network for Spatiotemporal Visual Attention,” 15.
Ben Taieb, Souhaib, and Amir F. Atiya. 2016. A Bias and Variance Analysis for Multistep-Ahead Time Series Forecasting.” IEEE transactions on neural networks and learning systems 27 (1): 62–76.
Bengio, Samy, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks.” In Advances in Neural Information Processing Systems 28, 1171–79. NIPS’15. Cambridge, MA, USA: Curran Associates, Inc.
Bengio, Y., P. Simard, and P. Frasconi. 1994. Learning Long-Term Dependencies with Gradient Descent Is Difficult.” IEEE Transactions on Neural Networks 5 (2): 157–66.
Boulanger-Lewandowski, Nicolas, Yoshua Bengio, and Pascal Vincent. 2012. Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription.” In 29th International Conference on Machine Learning.
Bown, Oliver, and Sebastian Lexer. 2006. Continuous-Time Recurrent Neural Networks for Generative and Interactive Musical Performance.” In Applications of Evolutionary Computing, edited by Franz Rothlauf, Jürgen Branke, Stefano Cagnoni, Ernesto Costa, Carlos Cotta, Rolf Drechsler, Evelyne Lutton, et al., 652–63. Lecture Notes in Computer Science 3907. Springer Berlin Heidelberg.
Buhusi, Catalin V., and Warren H. Meck. 2005. What Makes Us Tick? Functional and Neural Mechanisms of Interval Timing.” Nature Reviews Neuroscience 6 (10): 755–65.
Chang, Bo, Minmin Chen, Eldad Haber, and Ed H. Chi. 2019. AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks.” In Proceedings of ICLR.
Charles, Adam, Dong Yin, and Christopher Rozell. 2016. Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks.” arXiv:1605.08346 [Cs, Math, Stat], May.
Chevillon, Guillaume. 2007. Direct Multi-Step Estimation and Forecasting.” Journal of Economic Surveys 21 (4): 746–85.
Cho, Kyunghyun, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation.” In EMNLP 2014.
Cho, Kyunghyun, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches.” arXiv Preprint arXiv:1409.1259.
Chung, Junyoung, Sungjin Ahn, and Yoshua Bengio. 2016. Hierarchical Multiscale Recurrent Neural Networks.” arXiv:1609.01704 [Cs], September.
Chung, Junyoung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling.” In NIPS.
Chung, Junyoung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2015. Gated Feedback Recurrent Neural Networks.” In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, 2067–75. ICML’15. JMLR.org.
Chung, Junyoung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron C Courville, and Yoshua Bengio. 2015. A Recurrent Latent Variable Model for Sequential Data.” In Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2980–88. Curran Associates, Inc.
Collins, Jasmine, Jascha Sohl-Dickstein, and David Sussillo. 2016. Capacity and Trainability in Recurrent Neural Networks.” In arXiv:1611.09913 [Cs, Stat].
Cooijmans, Tim, Nicolas Ballas, César Laurent, Çağlar Gülçehre, and Aaron Courville. 2016. Recurrent Batch Normalization.” arXiv Preprint arXiv:1603.09025.
Dasgupta, Sakyasingha, Takayuki Yoshizumi, and Takayuki Osogami. 2016. Regularized Dynamic Boltzmann Machine with Delay Pruning for Unsupervised Learning of Temporal Sequences.” arXiv:1610.01989 [Cs, Stat], September.
Doelling, Keith B., and David Poeppel. 2015. Cortical Entrainment to Music and Its Modulation by Expertise.” Proceedings of the National Academy of Sciences 112 (45): E6233–42.
Elman, Jeffrey L. 1990. Finding Structure in Time.” Cognitive Science 14: 179–211.
Fortunato, Meire, Charles Blundell, and Oriol Vinyals. 2017. Bayesian Recurrent Neural Networks.” arXiv:1704.02798 [Cs, Stat], April.
Fraccaro, Marco, Sø ren Kaae Sø nderby, Ulrich Paquet, and Ole Winther. 2016. Sequential Neural Models with Stochastic Layers.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2199–2207. Curran Associates, Inc.
Gal, Yarin, and Zoubin Ghahramani. 2016. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.” In arXiv:1512.05287 [Stat].
Gers, Felix A., Jürgen Schmidhuber, and Fred Cummins. 2000. Learning to Forget: Continual Prediction with LSTM.” Neural Computation 12 (10): 2451–71.
Gers, Felix A., Nicol N. Schraudolph, and Jürgen Schmidhuber. 2002. Learning Precise Timing with LSTM Recurrent Networks.” Journal of Machine Learning Research 3 (Aug): 115–43.
Graves, Alex. 2011. Practical Variational Inference for Neural Networks.” In Proceedings of the 24th International Conference on Neural Information Processing Systems, 2348–56. NIPS’11. USA: Curran Associates Inc.
———. 2012. Supervised Sequence Labelling with Recurrent Neural Networks. Studies in Computational Intelligence, v. 385. Heidelberg ; New York: Springer.
———. 2013. Generating Sequences With Recurrent Neural Networks.” arXiv:1308.0850 [Cs], August.
Gregor, Karol, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. DRAW: A Recurrent Neural Network For Image Generation.” arXiv:1502.04623 [Cs], February.
Gruslys, Audrunas, Remi Munos, Ivo Danihelka, Marc Lanctot, and Alex Graves. 2016. Memory-Efficient Backpropagation Through Time.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 4125–33. Curran Associates, Inc.
Grzyb, B. J., E. Chinellato, G. M. Wojcik, and W. A. Kaminski. 2009. Which Model to Use for the Liquid State Machine? In 2009 International Joint Conference on Neural Networks, 1018–24.
Gu, Albert, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, and Christopher Ré. 2021. Combining Recurrent, Convolutional, and Continuous-Time Models with Linear State Space Layers.” In Advances in Neural Information Processing Systems, 34:572–85. Curran Associates, Inc.
Hardt, Moritz, Tengyu Ma, and Benjamin Recht. 2018. Gradient Descent Learns Linear Dynamical Systems.” The Journal of Machine Learning Research 19 (1): 1025–68.
Hazan, Elad, Karan Singh, and Cyril Zhang. 2017. Learning Linear Dynamical Systems via Spectral Filtering.” In NIPS.
Hazan, Hananel, and Larry M. Manevitz. 2012. Topological Constraints and Robustness in Liquid State Machines.” Expert Systems with Applications 39 (2): 1597–1606.
He, Kun, Yan Wang, and John Hopcroft. 2016. A Powerful Generative Model Using Random Weights for the Deep Image Representation.” In Advances in Neural Information Processing Systems.
Hinton, G., Li Deng, Dong Yu, G.E. Dahl, A. Mohamed, N. Jaitly, A. Senior, et al. 2012. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups.” IEEE Signal Processing Magazine 29 (6): 82–97.
Hochreiter, Sepp. 1998. The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions.” International Journal of Uncertainty Fuzziness and Knowledge Based Systems 6: 107–15.
Hochreiter, Sepp, Yoshua Bengio, Paolo Frasconi, and Jürgen Schmidhuber. 2001. Gradient Flow in Recurrent Nets: The Difficulty of Learning Long-Term Dependencies.” In A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press.
Hochreiter, Sepp, and Jiirgen Schmidhuber. 1997a. LTSM Can Solve Hard Time Lag Problems.” In Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference, 473–79.
Hochreiter, Sepp, and Jürgen Schmidhuber. 1997b. Long Short-Term Memory.” Neural Computation 9 (8): 1735–80.
Huszár, Ferenc. 2015. How (Not) to Train Your Generative Model: Scheduled Sampling, Likelihood, Adversary? arXiv:1511.05101 [Cs, Math, Stat], November.
Jaeger, Herbert. 2002. Tutorial on Training Recurrent Neural Networks, Covering BPPT, RTRL, EKF and the” Echo State Network” Approach. Vol. 5. GMD-Forschungszentrum Informationstechnik.
Jing, Li, Yichen Shen, Tena Dubcek, John Peurifoy, Scott Skirlo, Yann LeCun, Max Tegmark, and Marin Soljačić. 2017. Tunable Efficient Unitary Neural Networks (EUNN) and Their Application to RNNs.” In PMLR, 1733–41.
Jozefowicz, Rafal, Wojciech Zaremba, and Ilya Sutskever. 2015. An Empirical Exploration of Recurrent Network Architectures.” In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 2342–50.
Karpathy, Andrej, Justin Johnson, and Li Fei-Fei. 2015. Visualizing and Understanding Recurrent Networks.” arXiv:1506.02078 [Cs], June.
Katharopoulos, Angelos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers Are RNNs: Fast Autoregressive Transformers with Linear Attention.” arXiv:2006.16236 [Cs, Stat], August.
Kingma, Diederik P., Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 2016. Improving Variational Inference with Inverse Autoregressive Flow.” In Advances in Neural Information Processing Systems 29. Curran Associates, Inc.
Koutník, Jan, Klaus Greff, Faustino Gomez, and Jürgen Schmidhuber. 2014. A Clockwork RNN.” arXiv:1402.3511 [Cs], February.
Krishnan, Rahul G., Uri Shalit, and David Sontag. 2015. Deep Kalman Filters.” arXiv Preprint arXiv:1511.05121.
Lamb, Alex, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron Courville, and Yoshua Bengio. 2016. Professor Forcing: A New Algorithm for Training Recurrent Networks.” In Advances In Neural Information Processing Systems.
Laurent, Thomas, and James von Brecht. 2016. A Recurrent Neural Network Without Chaos.” arXiv:1612.06212 [Cs], December.
LeCun, Y. 1998. Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11): 2278–2324.
Legenstein, Robert, Christian Naeger, and Wolfgang Maass. 2005. What Can a Neuron Learn with Spike-Timing-Dependent Plasticity? Neural Computation 17 (11): 2337–82.
Lillicrap, Timothy P, and Adam Santoro. 2019. Backpropagation Through Time and the Brain.” Current Opinion in Neurobiology, Machine Learning, Big Data, and Neuroscience, 55 (April): 82–89.
Lipton, Zachary C., John Berkowitz, and Charles Elkan. 2015. A Critical Review of Recurrent Neural Networks for Sequence Learning.” arXiv:1506.00019 [Cs], May.
Lukoševičius, Mantas, and Herbert Jaeger. 2009. Reservoir Computing Approaches to Recurrent Neural Network Training.” Computer Science Review 3 (3): 127–49.
Maass, W., T. Natschläger, and H. Markram. 2004. Computational Models for Generic Cortical Microcircuits.” In Computational Neuroscience: A Comprehensive Approach, 575–605. Chapman & Hall/CRC.
MacKay, Matthew, Paul Vicol, Jimmy Ba, and Roger Grosse. 2018. Reversible Recurrent Neural Networks.” In Advances In Neural Information Processing Systems.
Maddison, Chris J., Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, and Yee Whye Teh. 2017. Filtering Variational Objectives.” arXiv Preprint arXiv:1705.09279.
Martens, James. 2010. Deep Learning via Hessian-Free Optimization.” In Proceedings of the 27th International Conference on International Conference on Machine Learning, 735–42. ICML’10. USA: Omnipress.
Martens, James, and Ilya Sutskever. 2011. Learning Recurrent Neural Networks with Hessian-Free Optimization.” In Proceedings of the 28th International Conference on International Conference on Machine Learning, 1033–40. ICML’11. USA: Omnipress.
———. 2012. Training Deep and Recurrent Networks with Hessian-Free Optimization.” In Neural Networks: Tricks of the Trade, 479–535. Lecture Notes in Computer Science. Springer.
Mhammedi, Zakaria, Andrew Hellicar, Ashfaqur Rahman, and James Bailey. 2017. Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections.” In PMLR, 2401–9.
Mikolov, Tomáš, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent Neural Network Based Language Model.” In Eleventh Annual Conference of the International Speech Communication Association.
Miller, John, and Moritz Hardt. 2018. When Recurrent Models Don’t Need To Be Recurrent.” arXiv:1805.10369 [Cs, Stat], May.
Mnih, V. 2015. Human-Level Control Through Deep Reinforcement Learning.” Nature 518: 529–33.
Mohamed, A. r, G. E. Dahl, and G. Hinton. 2012. Acoustic Modeling Using Deep Belief Networks.” IEEE Transactions on Audio, Speech, and Language Processing 20 (1): 14–22.
Monner, Derek, and James A. Reggia. 2012. A Generalized LSTM-Like Training Algorithm for Second-Order Recurrent Neural Networks.” Neural Networks 25 (January): 70–83.
Neil, Daniel, Michael Pfeiffer, and Shih-Chii Liu. 2016. Phased LSTM: Accelerating Recurrent Network Training for Long or Event-Based Sequences.” arXiv:1610.09513 [Cs], October.
Niu, Murphy Yuezhen, Lior Horesh, and Isaac Chuang. 2019. Recurrent Neural Networks in the Eye of Differential Equations.” arXiv:1904.12933 [Quant-Ph, Stat], April.
Nussbaum-Thom, Markus, Jia Cui, Bhuvana Ramabhadran, and Vaibhava Goel. 2016. Acoustic Modeling Using Bidirectional Gated Recurrent Convolutional Units.” In, 390–94.
Oliva, Junier B., Barnabas Poczos, and Jeff Schneider. 2017. The Statistical Recurrent Unit.” arXiv:1703.00381 [Cs, Stat], March.
Pascanu, Razvan, Tomas Mikolov, and Yoshua Bengio. 2013. On the Difficulty of Training Recurrent Neural Networks.” In arXiv:1211.5063 [Cs], 1310–18.
Patraucean, Viorica, Ankur Handa, and Roberto Cipolla. 2015. Spatio-Temporal Video Autoencoder with Differentiable Memory.” arXiv:1511.06309 [Cs], November.
Pillonetto, Gianluigi. 2016. The Interplay Between System Identification and Machine Learning.” arXiv:1612.09158 [Cs, Stat], December.
Ravanbakhsh, Siamak, Jeff Schneider, and Barnabas Poczos. 2016. Deep Learning with Sets and Point Clouds.” In arXiv:1611.04500 [Cs, Stat].
Roberts, Adam, Jesse Engel, Colin Raffel, Curtis Hawthorne, and Douglas Eck. 2018. A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music.” arXiv:1803.05428 [Cs, Eess, Stat], March.
Rohrbach, Anna, Marcus Rohrbach, and Bernt Schiele. 2015. The Long-Short Story of Movie Description.” arXiv:1506.01698 [Cs], June.
Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning Representations by Back-Propagating Errors.” Nature 323 (6088): 533–36.
Ryder, Thomas, Andrew Golightly, A. Stephen McGough, and Dennis Prangle. 2018. Black-Box Variational Inference for Stochastic Differential Equations.” arXiv:1802.03335 [Stat], February.
Sjöberg, Jonas, Qinghua Zhang, Lennart Ljung, Albert Benveniste, Bernard Delyon, Pierre-Yves Glorennec, Håkan Hjalmarsson, and Anatoli Juditsky. 1995. Nonlinear Black-Box Modeling in System Identification: A Unified Overview.” Automatica, Trends in System Identification, 31 (12): 1691–1724.
Song, Yang, Chenlin Meng, Renjie Liao, and Stefano Ermon. 2020. Nonlinear Equation Solving: A Faster Alternative to Feedforward Computation.” arXiv:2002.03629 [Cs, Stat], February.
Steil, J. J. 2004. Backpropagation-Decorrelation: Online Recurrent Learning with O(N) Complexity.” In 2004 IEEE International Joint Conference on Neural Networks, 2004. Proceedings, 2:843–848 vol.2.
Surace, Simone Carlo, and Jean-Pascal Pfister. 2016. “Online Maximum Likelihood Estimation of the Parameters of Partially Observed Diffusion Processes.” In.
Sutskever, Ilya. 2013. Training Recurrent Neural Networks.” PhD Thesis, Toronto, Ont., Canada, Canada: University of Toronto.
Takamoto, Makoto, Timothy Praditia, Raphael Leiteritz, Dan MacKinlay, Francesco Alesiani, Dirk Pflüger, and Mathias Niepert. 2022. PDEBench: An Extensive Benchmark for Scientific Machine Learning.” In.
Tallec, Corentin, and Yann Ollivier. 2017. Unbiasing Truncated Backpropagation Through Time.” arXiv.
Taylor, Graham W., Geoffrey E. Hinton, and Sam T. Roweis. 2006. Modeling Human Motion Using Binary Latent Variables.” In Advances in Neural Information Processing Systems, 1345–52.
Theis, Lucas, and Matthias Bethge. 2015. Generative Image Modeling Using Spatial LSTMs.” arXiv:1506.03478 [Cs, Stat], June.
Visin, Francesco, Kyle Kastner, Kyunghyun Cho, Matteo Matteucci, Aaron Courville, and Yoshua Bengio. 2015. ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks.” arXiv:1505.00393 [Cs], May.
Voelker, Aaron R, Ivana Kajic, and Chris Eliasmith. n.d. “Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks,” 10.
Wen, Ruofeng, Kari Torkkola, and Balakrishnan Narayanaswamy. 2017. A Multi-Horizon Quantile Recurrent Forecaster.” arXiv:1711.11053 [Stat], November.
Werbos, Paul J. 1988. Generalization of Backpropagation with Application to a Recurrent Gas Market Model.” Neural Networks 1 (4): 339–56.
———. 1990. Backpropagation Through Time: What It Does and How to Do It.” Proceedings of the IEEE 78 (10): 1550–60.
Williams, Ronald J., and Jing Peng. 1990. An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories.” Neural Computation 2 (4): 490–501.
Williams, Ronald J., and David Zipser. 1989. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks.” Neural Computation 1 (2): 270–80.
Wisdom, Scott, Thomas Powers, John Hershey, Jonathan Le Roux, and Les Atlas. 2016. Full-Capacity Unitary Recurrent Neural Networks.” In Advances in Neural Information Processing Systems, 4880–88.
Wisdom, Scott, Thomas Powers, James Pitton, and Les Atlas. 2016. Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery.” In Advances in Neural Information Processing Systems 29.
Wu, Yuhuai, Saizheng Zhang, Ying Zhang, Yoshua Bengio, and Ruslan R Salakhutdinov. 2016. On Multiplicative Integration with Recurrent Neural Networks.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2856–64. Curran Associates, Inc.
Yao, Li, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, and Aaron Courville. 2015. Describing Videos by Exploiting Temporal Structure.” arXiv:1502.08029 [Cs, Stat], February.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.