Learning Gamelan

Attention conservation notice: Crib notes for a 2 year long project which I ultimately abandoned in late 2018 about approximating convnet with recurrent neural networks for analysing time series. This project currently exists purely as LaTeX files on my hard drive, which need to be imported here for future reference. I did learn some useful tricks along the way about controlling the poles of IIR filters for learning by gradient descent, and those will be actually interesting.

I feel a certain class of audio signal should be easy to decompose and thence learn in a musically useful way; ones approximated by LTI, nearly-linear, nearly-additive filterbanks with sparse activations. Mostly we handle musical signals via convnets which is not satisfying, and one feels one could do better with a more appropriate architecture. This project was about finding that architecture.


Abdallah, Samer A., and Mark D. Plumbley. 2004. “Polyphonic Music Transcription by Non-Negative Sparse Coding of Power Spectra.” In. http://ismir2004.ismir.net/proceedings/p058-page-318-paper216.pdf.

Allen-Zhu, Zeyuan, and Yuanzhi Li. 2019. “Can SGD Learn Recurrent Neural Networks with Provable Generalization?” February 3, 2019. http://arxiv.org/abs/1902.01028.

Alliney, S. 1992. “Digital Filters as Absolute Norm Regularizers.” IEEE Transactions on Signal Processing 40 (6): 1548–62. https://doi.org/10.1109/78.139258.

Antoniou, Andreas. 2005. Digital Signal Processing: Signals, Systems and Filters. New York: McGraw-Hill.

Arjovsky, Martin, Amar Shah, and Yoshua Bengio. 2016. “Unitary Evolution Recurrent Neural Networks.” In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, 1120–8. ICML’16. New York, NY, USA: JMLR.org. http://arxiv.org/abs/1511.06464.

Ascher, Uri M. 2008. Numerical Methods for Evolutionary Differential Equations. Computational Science and Engineering 5. Philadelphia, Pa: SIAM, Soc. for Industrial and Applied Mathematics.

Atal, B. S. 2006. “The History of Linear Prediction.” IEEE Signal Processing Magazine 23 (2): 154–61. https://doi.org/10.1109/MSP.2006.1598091.

Bach, Francis R., and Michael I. Jordan. 2006. “Learning Spectral Clustering, with Application to Speech Separation.” Journal of Machine Learning Research 7 (Oct): 1963–2001. http://www.jmlr.org/papers/v7/bach06b.html.

Bach, Francis R., and Eric Moulines. 2013. “Non-Strongly-Convex Smooth Stochastic Approximation with Convergence Rate O(1/N).” In, 773–81. https://arxiv.org/abs/1306.2119v1.

Banitalebi-Dehkordi, Mehdi, and Amin Banitalebi-Dehkordi. 2014. “Music Genre Classification Using Spectral Analysis and Sparse Representation of the Signals.” Journal of Signal Processing Systems 74 (2): 273–80. https://doi.org/10.1007/s11265-013-0797-4.

Barron, A. R. 1993. “Universal Approximation Bounds for Superpositions of a Sigmoidal Function.” IEEE Transactions on Information Theory 39 (3): 930–45. https://doi.org/10.1109/18.256500.

Baydin, Atilim Gunes, and Barak A. Pearlmutter. 2014. “Automatic Differentiation of Algorithms for Machine Learning.” April 28, 2014. http://arxiv.org/abs/1404.7456.

Bayro-Corrochano, Eduardo. 2005. “The Theory and Use of the Quaternion Wavelet Transform.” Journal of Mathematical Imaging and Vision 24 (1): 19–35. https://doi.org/10.1007/s10851-005-3605-3.

Bengio, Samy, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. “Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks.” In Advances in Neural Information Processing Systems 28, 1171–9. NIPS’15. Cambridge, MA, USA: Curran Associates, Inc. http://papers.nips.cc/paper/5956-scheduled-sampling-for-sequence-prediction-with-recurrent-neural-networks.

Bengio, Y., P. Simard, and P. Frasconi. 1994. “Learning Long-Term Dependencies with Gradient Descent Is Difficult.” IEEE Transactions on Neural Networks 5 (2): 157–66. https://doi.org/10.1109/72.279181.

Ben Taieb, Souhaib, and Amir F. Atiya. 2016. “A Bias and Variance Analysis for Multistep-Ahead Time Series Forecasting.” IEEE Transactions on Neural Networks and Learning Systems 27 (1): 62–76. https://doi.org/10.1109/TNNLS.2015.2411629.

Bertin, N., R. Badeau, and E. Vincent. 2010. “Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription.” IEEE Transactions on Audio, Speech, and Language Processing 18 (3): 538–49. https://doi.org/10.1109/TASL.2010.2041381.

Blackman, R. B., and J. W. Tukey. 1959. The Measurement of Power Spectra from the Point of View of Communications Engineering. New York: Dover Publications.

Blei, David M., Alp Kucukelbir, and Jon D. McAuliffe. 2017. “Variational Inference: A Review for Statisticians.” Journal of the American Statistical Association 112 (518): 859–77. https://doi.org/10.1080/01621459.2017.1285773.

Bogert, B P, M J R Healy, and J W Tukey. 1963. “The Quefrency Alanysis of Time Series for Echoes: Cepstrum, Pseudo-Autocovariance, Cross-Cepstrum and Saphe Cracking.” In, 209–43.

Bojarski, Mariusz, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, et al. 2016. “End to End Learning for Self-Driving Cars.” April 25, 2016. http://arxiv.org/abs/1604.07316.

Bora, Ashish, Ajil Jalal, Eric Price, and Alexandros G. Dimakis. 2017. “Compressed Sensing Using Generative Models.” In International Conference on Machine Learning, 537–46. http://arxiv.org/abs/1703.03208.

Bordes, Antoine, Léon Bottou, and Patrick Gallinari. 2009. “SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent.” Journal of Machine Learning Research 10 (December): 1737–54. http://jmlr.org/papers/volume10/bordes09a/bordes09a.pdf.

Borzì, Alfio, and Volker Schulz. 2012. Computational Optimization of Systems Governed by Partial Differential Equations. Computational Science and Engineering Series. Philadelphia: Society for Industrial and Applied Mathematics.

Bottou, Léon. 1998. “Online Algorithms and Stochastic Approximations.” In Online Learning and Neural Networks, edited by David Saad, 17:142. Cambridge, UK: Cambridge University Press. http://leon.bottou.org/publications/pdf/online-1998.pdf.

———. 2012. “Stochastic Gradient Descent Tricks.” In Neural Networks: Tricks of the Trade, 421–36. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_25.

———. 2010. “Large-Scale Machine Learning with Stochastic Gradient Descent.” In Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT’2010), 177–86. Paris, France: Springer. http://leon.bottou.org/papers/bottou-2010.

Bottou, Léon, and Olivier Bousquet. 2008. “The Tradeoffs of Large Scale Learning.” In Advances in Neural Information Processing Systems, edited by J. C. Platt, D. Koller, Y. Singer, and S. Roweis, 20:161–68. NIPS Foundation (http://books.nips.cc). http://leon.bottou.org/papers/bottou-bousquet-2008.

Bottou, Léon, Frank E. Curtis, and Jorge Nocedal. 2016. “Optimization Methods for Large-Scale Machine Learning.” June 15, 2016. http://arxiv.org/abs/1606.04838.

Bottou, Léon, and Yann LeCun. 2004. “Large Scale Online Learning.” In Advances in Neural Information Processing Systems 16, edited by Sebastian Thrun, Lawrence Saul, and Bernhard Schölkopf. Cambridge, MA: MIT Press. http://leon.bottou.org/papers/bottou-lecun-2004.

Boulanger-Lewandowski, Nicolas, Yoshua Bengio, and Pascal Vincent. 2012. “Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription.” In 29th International Conference on Machine Learning. http://arxiv.org/abs/1206.6392.

Box, George E. P., Gwilym M. Jenkins, Gregory C. Reinsel, and Greta M. Ljung. 2016. Time Series Analysis: Forecasting and Control. Fifth edition. Wiley Series in Probability and Statistics. Hoboken, New Jersey: John Wiley & Sons, Inc.

Bridle, J. S., and M. D. Brown. 1974. “An Experimental Automatic Word Recognition System.” JSRU Report 1003 (5).

Buch, Michael, Elio Quinton, and Bob L Sturm. 2017. “NichtnegativeMatrixFaktorisierungnutzendesKlangsynthesenSystem (NiMFKS): Extensions of NMF-Based Concatenative Sound Synthesis.” In Proceedings of the 20th International Conference on Digital Audio Effects, 7. Edinburgh.

Cakir, Emre, Ezgi Can Ozan, and Tuomas Virtanen. 2016. “Filterbank Learning for Deep Neural Network Based Polyphonic Sound Event Detection.” In Neural Networks (IJCNN), 2016 International Joint Conference on, 3399–3406. IEEE. http://ieeexplore.ieee.org/abstract/document/7727634/.

Carabias-Orti, J. J., T. Virtanen, P. Vera-Candeas, N. Ruiz-Reyes, and F. J. Canadas-Quesada. 2011. “Musical Instrument Sound Multi-Excitation Model for Non-Negative Spectrogram Factorization.” IEEE Journal of Selected Topics in Signal Processing 5 (6): 1144–58. https://doi.org/10.1109/JSTSP.2011.2159700.

Carpenter, Bob, Matthew D. Hoffman, Marcus Brubaker, Daniel Lee, Peter Li, and Michael Betancourt. 2015. “The Stan Math Library: Reverse-Mode Automatic Differentiation in C++.” 2015. http://arxiv.org/abs/1509.07164.

Chang, Bo, Minmin Chen, Eldad Haber, and Ed H. Chi. 2019. “AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks.” In Proceedings of ICLR. http://arxiv.org/abs/1902.09689.

Chang, Bo, Lili Meng, Eldad Haber, Lars Ruthotto, David Begert, and Elliot Holtham. 2018. “Reversible Architectures for Arbitrarily Deep Residual Neural Networks.” In. http://arxiv.org/abs/1709.03698.

Chang, Bo, Lili Meng, Eldad Haber, Frederick Tung, and David Begert. 2018. “Multi-Level Residual Networks from Dynamical Systems View.” In PRoceedings of ICLR. http://arxiv.org/abs/1710.10348.

Charles, Adam, Aurele Balavoine, and Christopher Rozell. 2016. “Dynamic Filtering of Time-Varying Sparse Signals via L1 Minimization.” IEEE Transactions on Signal Processing 64 (21): 5644–56. https://doi.org/10.1109/TSP.2016.2586745.

Chen, Y., and A. O. Hero. 2012. “Recursive ℓ1,∞ Group Lasso.” IEEE Transactions on Signal Processing 60 (8): 3978–87. https://doi.org/10.1109/TSP.2012.2192924.

Chevillon, Guillaume. 2007. “Direct Multi-Step Estimation and Forecasting.” Journal of Economic Surveys 21 (4): 746–85. https://doi.org/10.1111/j.1467-6419.2007.00518.x.

Cho, Kyunghyun, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. “On the Properties of Neural Machine Translation: Encoder-Decoder Approaches.” 2014. http://arxiv.org/abs/1409.1259.

Choi, Keunwoo, George Fazekas, and Mark Sandler. 2016. “Automatic Tagging Using Deep Convolutional Neural Networks.” In PRoceedings of ISMIR. http://arxiv.org/abs/1606.00298.

Choi, Keunwoo, George Fazekas, Mark Sandler, and Kyunghyun Cho. 2016. “Convolutional Recurrent Neural Networks for Music Classification.” In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2392–6. https://doi.org/10.1109/ICASSP.2017.7952585.

Choi, Keunwoo, György Fazekas, Kyunghyun Cho, and Mark Sandler. 2017. “A Tutorial on Deep Learning for Music Information Retrieval.” September 13, 2017. http://arxiv.org/abs/1709.04396.

Choi, Keunwoo, György Fazekas, Mark Sandler, and Kyunghyun Cho. 2017. “Transfer Learning for Music Classification and Regression Tasks.” In Proceeding of the 18th International Society of Music Information Retrieval (ISMIR) Conference 2017. suzhou, China. http://arxiv.org/abs/1703.09179.

Chollet, François. 2016. “Xception: Deep Learning with Depthwise Separable Convolutions.” October 7, 2016. http://arxiv.org/abs/1610.02357.

Choromanska, Anna, MIkael Henaff, Michael Mathieu, Gerard Ben Arous, and Yann LeCun. 2015. “The Loss Surfaces of Multilayer Networks.” In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 192–204. http://proceedings.mlr.press/v38/choromanska15.html.

Chung, Junyoung, Sungjin Ahn, and Yoshua Bengio. 2016. “Hierarchical Multiscale Recurrent Neural Networks.” September 6, 2016. http://arxiv.org/abs/1609.01704.

Chung, Junyoung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling.” In NIPS. http://arxiv.org/abs/1412.3555.

Chung, Junyoung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2015. “Gated Feedback Recurrent Neural Networks.” In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, 2067–75. ICML’15. JMLR.org. http://arxiv.org/abs/1502.02367.

Chung, Junyoung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron C Courville, and Yoshua Bengio. 2015. “A Recurrent Latent Variable Model for Sequential Data.” In Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2980–8. Curran Associates, Inc. http://papers.nips.cc/paper/5653-a-recurrent-latent-variable-model-for-sequential-data.pdf.

Collins, Jasmine, Jascha Sohl-Dickstein, and David Sussillo. 2016. “Capacity and Trainability in Recurrent Neural Networks.” In. http://arxiv.org/abs/1611.09913.

Cooijmans, Tim, Nicolas Ballas, César Laurent, Çağlar Gülçehre, and Aaron Courville. 2016. “Recurrent Batch Normalization.” 2016. https://arxiv.org/abs/1603.09025.

Cybenko, G. 1989. “Approximation by Superpositions of a Sigmoidal Function.” Mathematics of Control, Signals and Systems 2: 303–14. https://doi.org/10.1007/BF02551274.

Cyrta, Pawel, Tomasz Trzciński, and Wojciech Stokowiec. 2017. “Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings.” August 9, 2017. http://arxiv.org/abs/1708.02840.

Dai, Wei, Chia Dai, Shuhui Qu, Juncheng Li, and Samarjit Das. 2016. “Very Deep Convolutional Neural Networks for Raw Waveforms.” October 1, 2016. http://arxiv.org/abs/1610.00087.

Dauphin, Yann, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. 2014. “Identifying and Attacking the Saddle Point Problem in High-Dimensional Non-Convex Optimization.” In Advances in Neural Information Processing Systems 27, 2933–41. Curran Associates, Inc. http://arxiv.org/abs/1406.2572.

Davis, S., and P. Mermelstein. 1980. “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences.” IEEE Transactions on Acoustics, Speech, and Signal Processing 28 (4): 357–66. https://doi.org/10.1109/TASSP.1980.1163420.

Defferrard, Michaël, Kirell Benzi, Pierre Vandergheynst, and Xavier Bresson. 2017. “FMA: A Dataset for Music Analysis.” In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR’2017), Suzhou, China. http://arxiv.org/abs/1612.01840.

Dieleman, Sander, and Benjamin Schrauwen. 2014. “End to End Learning for Music Audio.” In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6964–8. IEEE. https://doi.org/10.1109/ICASSP.2014.6854950.

Doerr, Andreas, Christian Daniel, Martin Schiegg, Duy Nguyen-Tuong, Stefan Schaal, Marc Toussaint, and Sebastian Trimpe. 2018. “Probabilistic Recurrent State-Space Models.” January 31, 2018. http://arxiv.org/abs/1801.10395.

Doucet, Arnaud, Nando Freitas, and Neil Gordon. 2001. Sequential Monte Carlo Methods in Practice. New York, NY: Springer New York. http://public.eblib.com/choice/publicfullrecord.aspx?p=3087052.

Dozat, Timothy. n.d. “NAdam Report.”

Duchi, John, Elad Hazan, and Yoram Singer. 2011. “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.” Journal of Machine Learning Research 12 (Jul): 2121–59. http://www.jmlr.org/papers/v12/duchi11a.html.

Dumitrescu, Bogdan. 2017. Positive Trigonometric Polynomials and Signal Processing Applications. Second edition. Signals and Communication Technology. Cham: Springer. https://doi.org/10.1007/978-3-319-53688-0.

Durbin, J., and S. J. Koopman. 2012. Time Series Analysis by State Space Methods. 2nd ed. Oxford Statistical Science Series 38. Oxford: Oxford University Press.

Eichler, Michael, Rainer Dahlhaus, and Johannes Dueck. 2016. “Graphical Modeling for Multivariate Hawkes Processes with Nonparametric Link Functions.” Journal of Time Series Analysis, January, n/a–n/a. https://doi.org/10.1111/jtsa.12213.

Ekanadham, C., D. Tranchina, and E. P. Simoncelli. 2011. “Recovery of Sparse Translation-Invariant Signals with Continuous Basis Pursuit.” IEEE Transactions on Signal Processing 59 (10): 4735–44. https://doi.org/10.1109/TSP.2011.2160058.

Elbaz, Dan, and Michael Zibulevsky. 2017. “Perceptual Audio Loss Function for Deep Learning.” In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR’2017), Suzhou, China. http://arxiv.org/abs/1708.05987.

Engel, Jesse, Cinjon Resnick, Adam Roberts, Sander Dieleman, Douglas Eck, Karen Simonyan, and Mohammad Norouzi. 2017. “Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders.” In PMLR. http://arxiv.org/abs/1704.01279.

Evensen, G. 2009. “The Ensemble Kalman Filter for Combined State and Parameter Estimation.” IEEE Control Systems 29 (3): 83–104. https://doi.org/10.1109/MCS.2009.932223.

Févotte, Cédric, Nancy Bertin, and Jean-Louis Durrieu. 2008. “Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis.” Neural Computation 21 (3): 793–830. https://doi.org/10.1162/neco.2008.04-08-771.

Finke, Axel, and Sumeetpal S. Singh. 2016. “Approximate Smoothing and Parameter Estimation in High-Dimensional State-Space Models.” June 28, 2016. http://arxiv.org/abs/1606.08650.

Flamary, Rémi, Cédric Févotte, Nicolas Courty, and Valentin Emiya. 2016. “Optimal Spectral Transportation with Application to Music Transcription.” In, 703–11. Curran Associates, Inc. http://papers.nips.cc/paper/6479-optimal-spectral-transportation-with-application-to-music-transcription.pdf.

Fonseca, Eduardo, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, and Xavier Serra. 2019. “Learning Sound Event Classifiers from Web Audio with Noisy Labels.” January 4, 2019. http://arxiv.org/abs/1901.01189.

Fraccaro, Marco, Sø ren Kaae Sø nderby, Ulrich Paquet, and Ole Winther. 2016. “Sequential Neural Models with Stochastic Layers.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2199–2207. Curran Associates, Inc. http://papers.nips.cc/paper/6039-sequential-neural-models-with-stochastic-layers.pdf.

Friston, K. J. 2008. “Variational Filtering.” NeuroImage 41 (3): 747–66. https://doi.org/10.1016/j.neuroimage.2008.03.017.

Fukumizu, K., and S. Amari. 2000. “Local Minima and Plateaus in Hierarchical Structures of Multilayer Perceptrons.” Neural Networks 13 (3): 317–27. https://doi.org/10.1016/S0893-6080(00)00009-5.

Gal, Yarin, and Zoubin Ghahramani. 2016. “A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.” In. http://arxiv.org/abs/1512.05287.

———. 2015. “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In Proceedings of the 33rd International Conference on Machine Learning (ICML-16). http://arxiv.org/abs/1506.02142.

Gemmeke, Jort F., Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, and Marvin Ritter. 2017. “Audio Set: An Ontology and Human-Labeled Dataset for Audio Events.” In Proceedings of ICASSP 2017. New Orleans, LA. https://research.google.com/pubs/pub45857.html.

Geronimo, Jeffrey S., and Hugo J. Woerdeman. 2004. “Positive Extensions, Fejér-Riesz Factorization and Autoregressive Filters in Two Variables.” Annals of Mathematics 160 (3): 839–906. http://people.math.gatech.edu/~geronimo/GWfinal2.pdf.

Gers, Felix A., Nicol N. Schraudolph, and Jürgen Schmidhuber. 2002. “Learning Precise Timing with LSTM Recurrent Networks.” Journal of Machine Learning Research 3 (Aug): 115–43. http://www.jmlr.org/papers/v3/gers02a.html.

Ghosh, Tapabrata. 2017. “Towards a New Interpretation of Separable Convolutions.” January 16, 2017. http://arxiv.org/abs/1701.04489.

Goertzel, Gerald. 1958. “An Algorithm for the Evaluation of Finite Trigonometric Series.” The American Mathematical Monthly 65 (1): 34. https://doi.org/10.2307/2310304.

Goodfellow, Ian J., Oriol Vinyals, and Andrew M. Saxe. 2014. “Qualitatively Characterizing Neural Network Optimization Problems.” December 19, 2014. http://arxiv.org/abs/1412.6544.

Goodwin, M M, and M Vetterli. 1999. “Matching Pursuit and Atomic Signal Models Based on Recursive Filter Banks.” IEEE Transactions on Signal Processing 47 (7, 7): 1890–1902. https://doi.org/10.1109/78.771038.

Goudarzi, Alireza, Peter Banda, Matthew R. Lakin, Christof Teuscher, and Darko Stefanovic. 2014. “A Comparative Study of Reservoir Computing for Temporal Signal Processing.” January 9, 2014. http://arxiv.org/abs/1401.2224.

Graves, Alex. 2012. Supervised Sequence Labelling with Recurrent Neural Networks. Studies in Computational Intelligence, v. 385. Heidelberg ; New York: Springer. http://www.cs.toronto.edu/~graves/preprint.pdf.

Green, D., and S. Bass. 1984. “Representing Periodic Waveforms with Nonorthogonal Basis Functions.” IEEE Transactions on Circuits and Systems 31 (6): 518–34. https://doi.org/10.1109/TCS.1984.1085543.

Gregor, Karol, and Yann LeCun. 2010. “Learning Fast Approximations of Sparse Coding.” In Proceedings of the 27th International Conference on Machine Learning (ICML-10), 399–406. http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_GregorL10.pdf.

———. 2011. “Efficient Learning of Sparse Invariant Representations.” May 26, 2011. http://arxiv.org/abs/1105.5307.

Gribonval, R. 2003. “Piecewise Linear Source Separation.” In Proc. Soc. Photographic Instrumentation Eng., 5207:297–310. San Diego, CA, USA. https://doi.org/10.1117/12.504790.

Gribonval, R., and Emmanuel Bacry. 2003. “Harmonic Decomposition of Audio Signals with Matching Pursuit.” IEEE Transactions on Signal Processing 51 (1): 101–11. https://doi.org/10.1109/TSP.2002.806592.

Gribonval, R., R. M. Figueras i Ventura, and P. Vandergheynst. 2006. “A Simple Test to Check the Optimality of a Sparse Signal Approximation.” Signal Processing, Sparse Approximations in Signal and Image ProcessingSparse Approximations in Signal and Image Processing, 86 (3): 496–510. https://doi.org/10.1016/j.sigpro.2005.05.026.

Griewank, Andreas, and Andrea Walther. 2008. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. 2nd ed. Philadelphia, PA: Society for Industrial and Applied Mathematics.

Grosse, Roger, Rajat Raina, Helen Kwong, and Andrew Y. Ng. 2007. “Shift-Invariant Sparse Coding for Audio Classification.” In The Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007), 9:8. http://arxiv.org/abs/1206.5241.

Gruslys, Audrunas, Remi Munos, Ivo Danihelka, Marc Lanctot, and Alex Graves. 2016. “Memory-Efficient Backpropagation Through Time.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 4125–33. Curran Associates, Inc. http://papers.nips.cc/paper/6221-memory-efficient-backpropagation-through-time.pdf.

Gu, Shixiang, Sergey Levine, Ilya Sutskever, and Andriy Mnih. 2016. “MuProp: Unbiased Backpropagation for Stochastic Neural Networks.” In Proceedings of ICLR. https://arxiv.org/abs/1511.05176v3.

Gulrajani, Ishaan, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. 2017. “Improved Training of Wasserstein GANs.” March 31, 2017. http://arxiv.org/abs/1704.00028.

Ha, David, Andrew Dai, and Quoc V. Le. 2016. “HyperNetworks.” September 27, 2016. http://arxiv.org/abs/1609.09106.

Haber, Eldad, and Lars Ruthotto. 2018. “Stable Architectures for Deep Neural Networks.” Inverse Problems 34 (1): 014004. https://doi.org/10.1088/1361-6420/aa9a90.

Hamel, Philippe, Matthew E. P. Davies, Kazuyoshi Yoshii, and Masataka Goto. 2013. “Transfer Learning in MIR: Sharing Learned Latent Representations for Music Audio Classification and Similarity.” In. https://research.google.com/pubs/pub41530.html.

Hardt, Moritz, Tengyu Ma, and Benjamin Recht. 2016. “Gradient Descent Learns Linear Dynamical Systems.” September 16, 2016. http://arxiv.org/abs/1609.05191.

Harris, Fredric J. 1978. “On the Use of Windows for Harmonic Analysis with the Discrete Fourier Transform.” Proceedings of the IEEE 66 (1): 51–83. https://doi.org/10.1109/PROC.1978.10837.

Haykin, Simon S., ed. 2001. Kalman Filtering and Neural Networks. Adaptive and Learning Systems for Signal Processing, Communications, and Control. New York: Wiley. http://booksbw.com/books/mathematical/hayking-s/2001/files/kalmanfilteringneuralnetworks2001.pdf.

Hazan, Elad, Kfir Levy, and Shai Shalev-Shwartz. 2015. “Beyond Convexity: Stochastic Quasi-Convex Optimization.” In Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 1594–1602. Curran Associates, Inc. http://papers.nips.cc/paper/5718-beyond-convexity-stochastic-quasi-convex-optimization.pdf.

Hazan, Elad, Karan Singh, and Cyril Zhang. 2017. “Learning Linear Dynamical Systems via Spectral Filtering.” In NIPS. http://arxiv.org/abs/1711.00946.

Helén, M., and T. Virtanen. 2005. “Separation of Drums from Polyphonic Music Using Non-Negative Matrix Factorization and Support Vector Machine.” In Signal Processing Conference, 2005 13th European, 1–4. http://www.cs.tut.fi/sgn/arg/music/tuomasv/EUSIPCO2005.pdf.

Helmholtz, Heinrich. 1863. Die Lehre von Den Tonempfindungen Als Physiologische Grundlage Für Die Theorie Der Musik. Braunschweig: J. Vieweg.

Henaff, Mikael, Kevin Jarrett, Koray Kavukcuoglu, and Yann LeCun. 2011. “Unsupervised Learning of Sparse Features for Scalable Audio Classification.” In ISMIR. http://ismir2011.ismir.net/papers/PS6-5.pdf.

Heyde, C. C. 1974. “On Martingale Limit Theory and Strong Convergence Results for Stochastic Approximation Procedures.” Stochastic Processes and Their Applications 2 (4): 359–70. https://doi.org/10.1016/0304-4149(74)90004-0.

Hinton, G., Li Deng, Dong Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, et al. 2012. “Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups.” IEEE Signal Processing Magazine 29 (6): 82–97. https://doi.org/10.1109/MSP.2012.2205597.

Hinton, G. E. 1995. “The Wake-Sleep Algorithm for Unsupervised Neural Networks.” Science 268 (5214): 1558–1161. https://doi.org/10.1126/science.7761831.

Hochreiter, Sepp. 1998. “The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions.” International Journal of Uncertainty Fuzziness and Knowledge Based Systems 6: 107–15. http://www.worldscientific.com/doi/abs/10.1142/S0218488598000094.

Hochreiter, Sepp, Yoshua Bengio, Paolo Frasconi, and Jürgen Schmidhuber. 2001. “Gradient Flow in Recurrent Nets: The Difficulty of Learning Long-Term Dependencies.” In A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press. http://www.bioinf.jku.at/publications/older/ch7.pdf.

Hochreiter, Sepp, and Jiirgen Schmidhuber. 1997a. “LTSM Can Solve Hard Time Lag Problems.” In Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference, 473–79. https://papers.nips.cc/paper/1215-lstm-can-solve-hard-long-time-lag-problems.pdf.

Hochreiter, Sepp, and Jürgen Schmidhuber. 1997b. “Long Short-Term Memory.” Neural Computation 9 (8): 1735–80. https://doi.org/10.1162/neco.1997.9.8.1735.

Hoffman, M D, and A Gelman. 2011. “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” 2011. http://arxiv.org/abs/1111.4246.

Holan, Scott H., Robert Lund, and Ginger Davis. 2010. “The ARMA Alphabet Soup: A Tour of ARMA Model Variants.” Statistics Surveys 4: 232–74. https://doi.org/10.1214/09-SS060.

Hornik, Kurt. 1991. “Approximation Capabilities of Multilayer Feedforward Networks.” Neural Networks 4 (2): 251–57. https://doi.org/10.1016/0893-6080(91)90009-T.

Hornik, Kurt, Maxwell Stinchcombe, and Halbert White. 1989. “Multilayer Feedforward Networks Are Universal Approximators.” Neural Networks 2 (5): 359–66. https://doi.org/10.1016/0893-6080(89)90020-8.

Hoshen, Yedid, Ron J. Weiss, and Kevin W. Wilson. 2015. “Speech Acoustic Modeling from Raw Multichannel Waveforms.” In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, 4624–8. IEEE. https://doi.org/10.1109/ICASSP.2015.7178847.

Hou, Elizabeth, Earl Lawrence, and Alfred O. Hero. 2016. “Penalized Ensemble Kalman Filters for High Dimensional Non-Linear Systems.” October 1, 2016. http://arxiv.org/abs/1610.00195.

Howard, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.” April 16, 2017. http://arxiv.org/abs/1704.04861.

Hua, Yingbo, and Tapan K. Sarkar. 1990. “Matrix Pencil Method for Estimating Parameters of Exponentially Damped/Undamped Sinusoids in Noise.” IEEE Transactions on Acoustics, Speech and Signal Processing 38 (5): 814–24. https://doi.org/10.1109/29.56027.

Huang, Gao, Zhuang Liu, Kilian Q. Weinberger, and Laurens van der Maaten. 2016. “Densely Connected Convolutional Networks.” August 24, 2016. http://arxiv.org/abs/1608.06993.

Huggins, P S, and S W Zucker. 2007. “Greedy Basis Pursuit.” IEEE Transactions on Signal Processing 55 (7): 3760–72. https://doi.org/10.1109/TSP.2007.894287.

Huszár, Ferenc. 2015. “How (Not) to Train Your Generative Model: Scheduled Sampling, Likelihood, Adversary?” November 16, 2015. http://arxiv.org/abs/1511.05101.

Hürzeler, Markus, and Hans R. Künsch. 2001. “Approximating and Maximising the Likelihood for a General State-Space Model.” In Sequential Monte Carlo Methods in Practice, 159–75. Statistics for Engineering and Information Science. Springer, New York, NY. https://doi.org/10.1007/978-1-4757-3437-9_8.

Hyvärinen, Aapo, and Patrik Hoyer. 2000. “Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces.” Neural Computation 12 (7): 1705–20. https://doi.org/10.1162/089976600300015312.

Ignjatovic, Aleksandar, Chamith Wijenayake, and Gabriele Keller. 2019. “Chromatic Derivatives and Approximations in Practice (III): Continuous Time MUSIC Algorithm for Adaptive Frequency Estimation in Colored Noise,” 16.

———. 2018a. “Chromatic Derivatives and Approximations in Practice—Part I: A General Framework.” IEEE Transactions on Signal Processing 66 (6): 1498–1512. https://doi.org/10.1109/TSP.2017.2787127.

———. 2018b. “Chromatic Derivatives and Approximations in Practice—Part II: Nonuniform Sampling, Zero-Crossings Reconstruction, and Denoising.” IEEE Transactions on Signal Processing 66 (6): 1513–25. https://doi.org/10.1109/TSP.2017.2787149.

Ionides, Edward L., Anindya Bhadra, Yves Atchadé, and Aaron King. 2011. “Iterated Filtering.” The Annals of Statistics 39 (3): 1776–1802. https://doi.org/10.1214/11-AOS886.

Ionides, E. L., C. Bretó, and A. A. King. 2006. “Inference for Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences 103 (49): 18438–43. https://doi.org/10.1073/pnas.0603181103.

Jaeger, Herbert. 2002. Tutorial on Training Recurrent Neural Networks, Covering BPPT, RTRL, EKF and the" Echo State Network" Approach. Vol. 5. GMD-Forschungszentrum Informationstechnik. http://minds.jacobs-university.de/sites/default/files/uploads/papers/ESNTutorialRev.pdf.

Jaganathan, Kishore, Yonina C. Eldar, and Babak Hassibi. 2015. “Phase Retrieval: An Overview of Recent Developments.” October 26, 2015. http://arxiv.org/abs/1510.07713.

Jing, Li, Yichen Shen, Tena Dubcek, John Peurifoy, Scott Skirlo, Yann LeCun, Max Tegmark, and Marin Soljačić. 2017. “Tunable Efficient Unitary Neural Networks (EUNN) and Their Application to RNNs.” In PMLR, 1733–41. http://proceedings.mlr.press/v70/jing17a.html.

Johnson, Matthew James. 2012. “A Simple Explanation of A Spectral Algorithm for Learning Hidden Markov Models.” April 11, 2012. http://arxiv.org/abs/1204.2477.

Jost, P., P. Vandergheynst, and P. Frossard. 2006. “Tree-Based Pursuit: Algorithm and Properties.” IEEE Transactions on Signal Processing 54 (12): 4685–97. https://doi.org/10.1109/TSP.2006.882080.

Jost, P., P. Vandergheynst, S. Lesage, and R. Gribonval. 2006. “MoTIF: An Efficient Algorithm for Learning Translation Invariant Dictionaries.” In 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings, 5:V–V. Toulouse, France. https://doi.org/10.1109/ICASSP.2006.1661411.

Jozefowicz, Rafal, Wojciech Zaremba, and Ilya Sutskever. 2015. “An Empirical Exploration of Recurrent Network Architectures.” In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 2342–50. http://machinelearning.wustl.edu/mlpapers/paper_files/icml2015_jozefowicz15.pdf.

Jung, Alexander. 2013. “An RKHS Approach to Estimation with Sparsity Constraints.” In Advances in Neural Information Processing Systems 29. http://arxiv.org/abs/1311.5768.

Kailath, Thomas. 1980. Linear Systems. Prentice-Hall Information and System Science Series. Englewood Cliffs, N.J: Prentice-Hall.

Kailath, Thomas, Ali H. Sayed, and Babak Hassibi. 2000. Linear Estimation. Prentice Hall Information and System Sciences Series. Upper Saddle River, N.J: Prentice Hall.

Kantas, N., A. Doucet, S. S. Singh, and J. M. Maciejowski. 2009. “An Overview of Sequential Monte Carlo Methods for Parameter Estimation in General State-Space Models.” IFAC Proceedings Volumes, 15th IFAC Symposium on System Identification, 42 (10): 774–85. https://doi.org/10.3182/20090706-3-FR-2004.00129.

Karpathy, Andrej, Justin Johnson, and Li Fei-Fei. 2015. “Visualizing and Understanding Recurrent Networks.” June 5, 2015. http://arxiv.org/abs/1506.02078.

Kavčić, A., and J. M. F. Moura. 2000. “Matrices with Banded Inverses: Inversion Algorithms and Factorization of Gauss-Markov Processes.” IEEE Transactions on Information Theory 46 (4): 1495–1509. https://doi.org/10.1109/18.954748.

Kingma, Diederik P., Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 2016. “Improving Variational Inference with Inverse Autoregressive Flow.” In Advances in Neural Information Processing Systems 29. Curran Associates, Inc. http://arxiv.org/abs/1606.04934.

Klapuri, A., T. Virtanen, and T. Heittola. 2010. “Sound Source Separation in Monaural Music Signals Using Excitation-Filter Model and Em Algorithm.” In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 5510–3. https://doi.org/10.1109/ICASSP.2010.5495216.

Knudson, Karin C, Jacob Yates, Alexander Huk, and Jonathan W Pillow. 2014. “Inferring Sparse Representations of Continuous Signals with Continuous Orthogonal Matching Pursuit.” In Advances in Neural Information Processing Systems 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 27:1215–23. Curran Associates, Inc. http://papers.nips.cc/paper/5264-inferring-sparse-representations-of-continuous-signals-with-continuous-orthogonal-matching-pursuit.pdf.

Kong, Q., Y. Xu, W. Wang, and M. D. Plumbley. 2017. “A Joint Detection-Classification Model for Audio Tagging of Weakly Labelled Data.” In Proceedings of ICASSP 2017. New Orleans, USA. http://epubs.surrey.ac.uk/813128/.

Kreutz-Delgado, Kenneth, Joseph F. Murray, Bhaskar D. Rao, Kjersti Engan, Te-Won Lee, and Terrence J. Sejnowski. 2003. “Dictionary Learning Algorithms for Sparse Representation.” Neural Computation 15 (2): 349–96. https://doi.org/10.1162/089976603762552951.

Krishnan, Rahul G., Uri Shalit, and David Sontag. 2015. “Deep Kalman Filters.” 2015. https://arxiv.org/abs/1511.05121.

———. 2017. “Structured Inference Networks for Nonlinear State Space Models.” In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2101–9. http://arxiv.org/abs/1609.09869.

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2012. “Imagenet Classification with Deep Convolutional Neural Networks.” In Advances in Neural Information Processing Systems, 1097–1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.

Kronland-Martinet, R., Ph. Guillemain, and S. Ystad. 1997. “Modelling of Natural Sounds by Time–Frequency and Wavelet Representations.” Organised Sound 2 (03): 179–91. https://doi.org/null.

Kronland-Martinet, R, Ph. Guillemain, and S Ystad. 2001. “From Sound Modeling to Analysis-Synthesis of Sounds.” In Workshop on Proceedings of MOSART Current Research Directions in Computer Music Workshop, 217–24. http://mtg.upf.edu/mosart/papers/p40.pdf.

Kuleshov, Volodymyr, S. Zayd Enam, and Stefano Ermon. 2017. “Audio Super-Resolution Using Neural Nets.” In Proceedings of International Conference on Learning Representations (ICLR) 2017.

Kumar, Anurag, and Bhiksha Raj. 2017. “Deep CNN Framework for Audio Event Recognition Using Weakly Labeled Web Data.” July 9, 2017. http://arxiv.org/abs/1707.02530.

Kutschireiter, Anna, Simone Carlo Surace, Henning Sprekeler, and Jean-Pascal Pfister. 2015a. “A Neural Implementation for Nonlinear Filtering.” 2015. http://arxiv.org/abs/1508.06818.

Kutschireiter, Anna, Simone C Surace, Henning Sprekeler, and Jean-Pascal Pfister. 2015b. “Approximate Nonlinear Filtering with a Recurrent Neural Network.” BMC Neuroscience 16 (Suppl 1): P196. https://doi.org/10.1186/1471-2202-16-S1-P196.

Kuznetsov, Vitaly, and Mehryar Mohri. 2014. “Generalization Bounds for Time Series Prediction with Non-Stationary Processes.” In Algorithmic Learning Theory, edited by Peter Auer, Alexander Clark, Thomas Zeugmann, and Sandra Zilles, 260–74. Lecture Notes in Computer Science. Bled, Slovenia: Springer International Publishing. https://doi.org/10.1007/978-3-319-11662-4_19.

Lamb, Alex, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron Courville, and Yoshua Bengio. 2016. “Professor Forcing: A New Algorithm for Training Recurrent Networks.” In Advances in Neural Information Processing Systems. http://arxiv.org/abs/1610.09038.

Laroche, Clément, Hélène Papadopoulos, Matthieu Kowalski, and Gaël Richard. 2017. “Drum Extraction in Single Channel Audio Signals Using Multi-Layer Non Negative Matrix Factor Deconvolution.” In ICASSP. Nouvelle Orleans, United States. https://hal.archives-ouvertes.fr/hal-01438851.

Laurent, Thomas, and James von Brecht. 2016. “A Recurrent Neural Network Without Chaos.” December 19, 2016. http://arxiv.org/abs/1612.06212.

Law, Edith, Kris West, and Michael I. Mandel. 2009. “Evaluation of Algorithms Using Games: The Case of Music Tagging.” In. http://ismir2009.ismir.net/proceedings/OS5-5.pdf.

LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. “Deep Learning.” Nature 521 (7553): 436–44. https://doi.org/10.1038/nature14539.

Lee, Honglak, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. 2009. “Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations.” In Proceedings of the 26th Annual International Conference on Machine Learning, 609–16. ICML ’09. New York, NY, USA: ACM. https://doi.org/10.1145/1553374.1553453.

Lee, Jongpil, Jiyoung Park, Keunhyoung Luke Kim, and Juhan Nam. 2017. “Sample-Level Deep Convolutional Neural Networks for Music Auto-Tagging Using Raw Waveforms.” In. http://arxiv.org/abs/1703.01789.

Leglaive, Simon, Roland Badeau, and Gaël Richard. 2017. “Multichannel Audio Source Separation: Variational Inference of Time-Frequency Sources from Time-Domain Observations.” In 42nd International Conference on Acoustics, Speech and Signal Processing (ICASSP). Proc. 42nd International Conference on Acoustics, Speech and Signal Processing (ICASSP). La Nouvelle Orléans, LA, United States: IEEE. https://hal.archives-ouvertes.fr/hal-01416347.

Lei, Tao, and Yu Zhang. 2017. “Training RNNs as Fast as CNNs.” September 8, 2017. http://arxiv.org/abs/1709.02755.

Lewicki, Michael S. 2002. “Efficient Coding of Natural Sounds.” Nature Neuroscience 5 (4): 356–63. https://doi.org/10.1038/nn831.

Lewicki, Michael S., and Terrence J. Sejnowski. 2000. “Learning Overcomplete Representations.” Neural Computation 12 (2): 337–65. https://doi.org/10.1162/089976600300015826.

Lewicki, M S, and T J Sejnowski. 1999. “Coding Time-Varying Signals Using Sparse, Shift-Invariant Representations.” In NIPS, 11:730–36. Denver, CO: MIT Press. https://papers.cnl.salk.edu/PDFs/Coding%20Time-Varying%20Signals%20Using%20Sparse,%20Shift-Invariant%20Representations%201999-3580.pdf.

Li, Shuai, Wanqing Li, Chris Cook, Ce Zhu, and Yanbo Gao. 2018. “Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN.” In. http://arxiv.org/abs/1803.04831.

Li, Yanghao, Naiyan Wang, Jiaying Liu, and Xiaodi Hou. 2017. “Demystifying Neural Style Transfer.” In IJCAI. http://arxiv.org/abs/1701.01036.

Lindström, Erik, Edward Ionides, Jan Frydendall, and Henrik Madsen. 2012. “Efficient Iterated Filtering.” In IFAC-PapersOnLine (System Identification, Volume 16), 45:1785–90. 16th IFAC Symposium on System Identification. IFAC & Elsevier Ltd. https://doi.org/10.3182/20120711-3-BE-2027.00300.

Lindström, Erik, Jonas Ströjby, Mats Brodén, Magnus Wiktorsson, and Jan Holst. 2008. “Sequential Calibration of Options.” Computational Statistics & Data Analysis 52 (6): 2877–91. https://doi.org/10.1016/j.csda.2007.08.009.

Lipton, Zachary C. 2016. “Stuck in a What? Adventures in Weight Space.” February 23, 2016. http://arxiv.org/abs/1602.07320.

Lipton, Zachary C., John Berkowitz, and Charles Elkan. 2015. “A Critical Review of Recurrent Neural Networks for Sequence Learning.” May 29, 2015. http://arxiv.org/abs/1506.00019.

Liu, Jane, and Mike West. 2001. “Combined Parameter and State Estimation in Simulation-Based Filtering.” In Sequential Monte Carlo Methods in Practice, 197–223. Statistics for Engineering and Information Science. Springer, New York, NY. https://doi.org/10.1007/978-1-4757-3437-9_10.

Liu, Jen-Yu, Shyh-Kang Jeng, and Yi-Hsuan Yang. 2016. “Applying Topological Persistence in Convolutional Neural Network for Music Audio Signals.” August 26, 2016. http://arxiv.org/abs/1608.07373.

Ljung, L. 1979. “Asymptotic Behavior of the Extended Kalman Filter as a Parameter Estimator for Linear Systems.” IEEE Transactions on Automatic Control 24 (1): 36–50. https://doi.org/10.1109/TAC.1979.1101943.

Ljung, Lennart. 1999. System Identification: Theory for the User. 2nd ed. Prentice Hall Information and System Sciences Series. Upper Saddle River, NJ: Prentice Hall PTR.

Ljung, Lennart, Georg Ch Pflug, and Harro Walk. 2012. Stochastic Approximation and Optimization of Random Systems. Vol. 17. Birkhäuser. https://books.google.ch/books?hl=en&lr=&id=9Fr2BwAAQBAJ&oi=fnd&pg=PA2&ots=rPS2wp3kUH&sig=UKiDTNaSjUnznmD9OUtipdRK7nY.

Ljung, Lennart, and Torsten Söderström. 1983. Theory and Practice of Recursive Identification. The MIT Press Series in Signal Processing, Optimization, and Control 4. Cambridge, Mass: MIT Press.

Mallat, Stéphane G., and Zhifeng Zhang. 1993. “Matching Pursuits with Time-Frequency Dictionaries.” IEEE Transactions on Signal Processing 41 (12): 3397–3415. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=258082.

Marelli, D., and Minyue Fu. 2010. “A Recursive Method for the Approximation of LTI Systems Using Subband Processing.” IEEE Transactions on Signal Processing 58 (3): 1025–34. https://doi.org/10.1109/TSP.2009.2034933.

Martens, James. 2010. “Deep Learning via Hessian-Free Optimization.” In Proceedings of the 27th International Conference on International Conference on Machine Learning, 735–42. ICML’10. USA: Omnipress. http://www.cs.utoronto.ca/~jmartens/docs/Deep_HessianFree.pdf.

Martens, James, and Ilya Sutskever. 2011. “Learning Recurrent Neural Networks with Hessian-Free Optimization.” In Proceedings of the 28th International Conference on International Conference on Machine Learning, 1033–40. ICML’11. USA: Omnipress. http://dl.acm.org/citation.cfm?id=3104482.3104612.

———. 2012. “Training Deep and Recurrent Networks with Hessian-Free Optimization.” In Neural Networks: Tricks of the Trade, 479–535. Lecture Notes in Computer Science. Springer. http://www.cs.toronto.edu/~jmartens/docs/HF_book_chapter.pdf.

Masri, Paul, Andrew Bateman, and Nishan Canagarajah. 1997a. “A Review of Time–Frequency Representations, with Application to Sound/Music Analysis–Resynthesis.” Organised Sound 2 (03, 03): 193–205. https://doi.org/10.1017/S1355771898009042.

———. 1997b. “The Importance of the Time–Frequency Representation for Sound/Music Analysis–Resynthesis.” Organised Sound 2 (03, 03): 207–14. https://doi.org/10.1017/S1355771898009054.

Mattingley, J., and S. Boyd. 2010. “Real-Time Convex Optimization in Signal Processing.” IEEE Signal Processing Magazine 27 (3): 50–61. https://doi.org/10.1109/MSP.2010.936020.

McFee, Brian, Thierry Bertin-Mahieux, Daniel P. W. Ellis, and Gert R. G. Lanckriet. 2012. “The Million Song Dataset Challenge.” In, 909. ACM Press. https://doi.org/10.1145/2187980.2188222.

McFee, Brian, and Daniel PW Ellis. 2011. “Analyzing Song Structure with Spectral Clustering.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). http://www.ee.columbia.edu/~dpwe/pubs/McFeeE14-structure.pdf.

Megretski, A. 2003. “Positivity of Trigonometric Polynomials.” In 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475), 4:3814–7 vol.4. https://doi.org/10.1109/CDC.2003.1271743.

Mehri, Soroush, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, and Yoshua Bengio. 2017. “SampleRNN: An Unconditional End-to-End Neural Audio Generation Model.” In Proceedings of International Conference on Learning Representations (ICLR) 2017. http://arxiv.org/abs/1612.07837.

Meinshausen, Nicolai, and Bin Yu. 2009. “Lasso-Type Recovery of Sparse Representations for High-Dimensional Data.” The Annals of Statistics 37 (1): 246–70. https://doi.org/10.1214/07-AOS582.

Mermelstein, Paul, and CH Chen. 1976. “Distance Measures for Speech Recognition: Psychological and Instrumental.” In Pattern Recognition and Artificial Intelligence, 101:374–88. Academic Press. http://web.haskins.yale.edu/sr/SR047/SR047_07.pdf.

Meyer, Matthias, Jan Beutel, and Lothar Thiele. 2017. “Unsupervised Feature Learning for Audio Analysis.” In Proceedings of International Conference on Learning Representations (ICLR) 2017.

Mhammedi, Zakaria, Andrew Hellicar, Ashfaqur Rahman, and James Bailey. 2017. “Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections.” In PMLR, 2401–9. http://proceedings.mlr.press/v70/mhammedi17a.html.

Miron, Marius, Julio J. Carabias-Orti, Juan J. Bosch, G&#xf3, Emilia Mez, and Jordi Janer. 2016. “Score-Informed Source Separation for Multichannel Orchestral Recordings.” Journal of Electrical and Computer Engineering 2016 (December): e8363507. https://doi.org/10.1155/2016/8363507.

“MobileNets: Open-Source Models for Efficient on-Device Vision.” n.d. Research Blog. https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html.

Mohammed, Salah-Eldin A., and Michael K. R. Scheutzow. 1997. “Lyapunov Exponents of Linear Stochastic Functional-Differential Equations. II. Examples and Case Studies.” The Annals of Probability 25 (3): 1210–40. https://doi.org/10.1214/aop/1024404511.

Monner, Derek, and James A. Reggia. 2012. “A Generalized LSTM-Like Training Algorithm for Second-Order Recurrent Neural Networks.” Neural Networks 25 (January): 70–83. https://doi.org/10.1016/j.neunet.2011.07.003.

Moorer, J. A. 1974. “The Optimum Comb Method of Pitch Period Analysis of Continuous Digitized Speech.” IEEE Transactions on Acoustics, Speech and Signal Processing 22 (5): 330–38. https://doi.org/10.1109/TASSP.1974.1162596.

Moradkhani, Hamid, Soroosh Sorooshian, Hoshin V. Gupta, and Paul R. Houser. 2005. “Dual State–Parameter Estimation of Hydrological Models Using Ensemble Kalman Filter.” Advances in Water Resources 28 (2): 135–47. https://doi.org/10.1016/j.advwatres.2004.09.002.

Mozer, Michael C., Denis Kazakov, and Robert V. Lindsey. 2018. “State-Denoised Recurrent Neural Networks.” May 22, 2018. http://arxiv.org/abs/1805.08394.

Müller, M, F Kurth, and M Clausen. 2005a. “Audio Matching via Chroma-Based Statistical Features.” In Proc. Int. Conf. Music Info. Retrieval, 288–95. London, U.K. http://www.ismir2005.ismir.net/proceedings/1019.pdf.

———. 2005b. “Chroma-Based Statistical Audio Features for Audio Matching.” In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 275–78. New Paltz, NY. https://www.audiolabs-erlangen.de/content/05-fau/professor/00-mueller/03-publications/2005_MuellerKurthClausen_ChromaAudioFeatures_WASPAA.pdf.

Młynarski, Wiktor, and Josh H. McDermott. 2017. “Learning Mid-Level Auditory Codes from Natural Sound Statistics.” January 24, 2017. http://arxiv.org/abs/1701.07138.

Narayan, S. Shyamla, Andrei N. Temchin, Alberto Recio, and Mario A. Ruggero. 1998. “Frequency Tuning of Basilar Membrane and Auditory Nerve Fibers in the Same Cochleae.” Science 282 (5395): 1882–4. https://doi.org/10.1126/science.282.5395.1882.

Neal, Radford M., and Geoffrey E. Hinton. 1998. “A View of the EM Algorithm That Justifies Incremental, Sparse, and Other Variants.” In Learning in Graphical Models, edited by Michael I. Jordan, 355–68. NATO ASI Series 89. Springer Netherlands. http://machinelearning.wustl.edu/uploads/Main/EM_algorithm.pdf.

Needell, D., and J. A. Tropp. 2008. “CoSaMP: Iterative Signal Recovery from Incomplete and Inaccurate Samples.” March 17, 2008. http://arxiv.org/abs/0803.2392.

Nerrand, O., P. Roussel-Ragot, L. Personnaz, G. Dreyfus, and S. Marcos. 1993. “Neural Networks and Nonlinear Adaptive Filtering: Unifying Concepts and New Algorithms.” Neural Computation 5 (2): 165–99. https://doi.org/10.1162/neco.1993.5.2.165.

Nussbaum-Thom, Markus, Jia Cui, Bhuvana Ramabhadran, and Vaibhava Goel. 2016. “Acoustic Modeling Using Bidirectional Gated Recurrent Convolutional Units.” In, 390–94. https://doi.org/10.21437/Interspeech.2016-212.

Nyquist, H. 1928. “Certain Topics in Telegraph Transmission Theory.” Transactions of the American Institute of Electrical Engineers 47 (2): 617–44. https://doi.org/10.1109/T-AIEE.1928.5055024.

Oliveira, Maurício C. de, and Robert E. Skelton. 2001. “Stability Tests for Constrained Linear Systems.” In Perspectives in Robust Control, 241–57. Lecture Notes in Control and Information Sciences. Springer, London. https://doi.org/10.1007/BFb0110624.

Oord, Aäron van den. 2016. “Wavenet: A Generative Model for Raw Audio.”

Pascanu, Razvan, Yann N. Dauphin, Surya Ganguli, and Yoshua Bengio. 2014. “On the Saddle Point Problem for Non-Convex Optimization.” May 19, 2014. http://arxiv.org/abs/1405.4604.

Pascanu, Razvan, Tomas Mikolov, and Yoshua Bengio. 2013. “On the Difficulty of Training Recurrent Neural Networks.” In, 1310–8. http://arxiv.org/abs/1211.5063.

Patel, Vivak. 2017. “On SGD’s Failure in Practice: Characterizing and Overcoming Stalling.” February 1, 2017. http://arxiv.org/abs/1702.00317.

Peeters, G. 2004. “A Large Set of Audio Features for Sound Description (Similarity and Classification) in the CUIDADO Project.”

Pillonetto, Gianluigi. 2016. “The Interplay Between System Identification and Machine Learning.” December 29, 2016. http://arxiv.org/abs/1612.09158.

Polyak, B. T., and A. B. Juditsky. 1992. “Acceleration of Stochastic Approximation by Averaging.” SIAM Journal on Control and Optimization 30 (4): 838–55. https://doi.org/10.1137/0330046.

Pons, Jordi, Thomas Lidy, and Xavier Serra. 2016. “Experimenting with Musically Motivated Convolutional Neural Networks.” In 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI), 1–6. Bucharest, Romania: IEEE. https://doi.org/10.1109/CBMI.2016.7500246.

Pons, Jordi, Oriol Nieto, Matthew Prockup, Erik M. Schmidt, Andreas F. Ehmann, and Xavier Serra. 2017. “End to End Learning for Music Audio Tagging at Scale.” In Proceedings of ISMIR. https://ismir2017.smcnus.org/lbds/Pons2017.pdf.

Pons, Jordi, and Xavier Serra. 2018. “Randomly Weighted CNNs for (Music) Audio Classification.” May 1, 2018. http://arxiv.org/abs/1805.00237.

Prandoni, Paolo, and Martin Vetterli. 2008. Signal Processing for Communications. Communication and Information Sciences. Lausanne: EPFL Press.

Preis, Douglas, and Voula Chris Georgopoulos. 1999. “Wigner Distribution Representation and Analysis of Audio Signals: An Illustrated Tutorial Review.” Journal of the Audio Engineering Society 47 (12): 1043–53. http://www.ece.rochester.edu/courses/ECE472/Site/Assignments/Entries/2009/2/9_Unit_2_-_Spectral_Analysis_files/Preis_1999_1.pdf.

Qu, Shuhui, Juncheng Li, Wei Dai, and Samarjit Das. 2016a. “Learning Filter Banks Using Deep Learning for Acoustic Signals.” November 29, 2016. http://arxiv.org/abs/1611.09526.

———. 2016b. “Understanding Audio Pattern Using Convolutional Neural Network from Raw Waveforms.” November 29, 2016. http://arxiv.org/abs/1611.09524.

Rafii, Z. 2018. “Sliding Discrete Fourier Transform with Kernel Windowing [Lecture Notes].” IEEE Signal Processing Magazine 35 (6, 6): 88–92. https://doi.org/10.1109/MSP.2018.2855727.

Ragazzini, J. R., and L. A. Zadeh. 1952. “The Analysis of Sampled-Data Systems.” Transactions of the American Institute of Electrical Engineers, Part II: Applications and Industry 71 (5): 225–34. https://doi.org/10.1109/TAI.1952.6371274.

Rajan, Rajeev, Manaswi Misra, and Hema A. Murthy. 2017. “Melody Extraction from Music Using Modified Group Delay Functions.” International Journal of Speech Technology 20 (1): 185–204. https://doi.org/10.1007/s10772-017-9397-1.

Rall, Louis B. 1981. Automatic Differentiation: Techniques and Applications. Lecture Notes in Computer Science 120. Berlin ; New York: Springer-Verlag.

Ravelli, E, G Richard, and L Daudet. 2008. “Fast MIR in a Sparse Transform Domain.” In Int. Conf. Music Info. Retrieval. Philadelphia, PA. http://ismir2008.ismir.net/papers/ISMIR2008_141.pdf.

Rawat, Waseem, and Zenghui Wang. 2017. “Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review.” Neural Computation 29 (9): 2352–2449. https://doi.org/10.1162/neco_a_00990.

Rebollo-Neira, Laura. 2007. “Oblique Matching Pursuit.” IEEE Signal Processing Letters 14 (10): 703–6. https://doi.org/10.1109/LSP.2007.898317.

Rebollo-Neira, L., and D. Lowe. 2002. “Optimized Orthogonal Matching Pursuit Approach.” IEEE Signal Processing Letters 9 (4): 137–40. https://doi.org/10.1109/LSP.2002.1001652.

Rioul, O., and M. Vetterli. 1991. “Wavelets and Signal Processing.” IEEE Signal Processing Magazine 8 (4): 14–38. https://doi.org/10.1109/79.91217.

Robbins, Herbert, and Sutton Monro. 1951. “A Stochastic Approximation Method.” The Annals of Mathematical Statistics 22 (3, 3): 400–407. https://doi.org/10.1214/aoms/1177729586.

Robbins, H., and D. Siegmund. 1971. “A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications.” In Optimizing Methods in Statistics, edited by Jagdish S. Rustagi, 233–57. Academic Press. https://doi.org/10.1016/B978-0-12-604550-5.50015-8.

Roberts, Adam, Jesse Engel, and Douglas Eck. 2017. “Hierarchical Variational Autoencoders for Music.” In NIPS Workshop on Machine Learning for Creativity and Design. https://nips2017creativity.github.io/doc/Hierarchical_Variational_Autoencoders_for_Music.pdf.

Robertson, Andrew, and Mark Plumbley. 2007. “B-Keeper: A Beat-Tracker for Live Performance.” In Proceedings of the 7th International Conference on New Interfaces for Musical Expression, 234–37. NIME ’07. New York, NY, USA: ACM. https://doi.org/10.1145/1279740.1279787.

Robertson, Andrew, Adam Stark, and Matthew EP Davies. 2013. “Percussive Beat Tracking Using Real-Time Median Filtering.” In Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. http://www.ecmlpkdd2013.org/wp-content/uploads/2013/09/MLMU_Robertson.pdf.

Robertson, Andrew, Adam M. Stark, and Mark D. Plumbley. 2011. “Real-Time Visual Beat Tracking Using a Comb Filter Matrix.” In Proceedings of the International Computer Music Conference 2011. https://www.eecs.qmul.ac.uk/~markp/2011/RobertsonStarkPlumbleyICMC2011_accepted.pdf.

Routtenberg, Tirza, and Joseph Tabrikian. 2010. “Blind MIMO-AR System Identification and Source Separation with Finite-Alphabet.” IEEE Transactions on Signal Processing 58 (3): 990–1000. https://doi.org/10.1109/TSP.2009.2036043.

Rubinstein, Ron, A. M. Bruckstein, and Michael Elad. 2010. “Dictionaries for Sparse Representation Modeling.” Proceedings of the IEEE 98 (6): 1045–57. https://doi.org/10.1109/JPROC.2010.2040551.

Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. 1986. “Learning Representations by Back-Propagating Errors.” Nature 323 (6088): 533–36. https://doi.org/10.1038/323533a0.

Sagun, Levent, V. Ugur Guney, Gerard Ben Arous, and Yann LeCun. 2014. “Explorations on High Dimensional Landscapes.” December 20, 2014. http://arxiv.org/abs/1412.6615.

Sainath, Tara N., and Bo Li. 2016. “Modeling Time-Frequency Patterns with LSTM Vs. Convolutional Architectures for LVCSR Tasks.” Submitted to Proc. Interspeech. https://research.google.com/pubs/archive/45401.pdf.

Sainath, Tara N., Ron J. Weiss, Andrew W. Senior, Kevin W. Wilson, and Oriol Vinyals. 2015. “Learning the Speech Front-End with Raw Waveform CLDNNs.” In INTERSPEECH, 1–5. http://www.ee.columbia.edu/~ronw/pubs/interspeech2015-waveform_cldnn.pdf.

Sainath, T. N., B. Kingsbury, A. r Mohamed, and B. Ramabhadran. 2013. “Learning Filter Banks Within a Deep Neural Network Framework.” In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 297–302. https://doi.org/10.1109/ASRU.2013.6707746.

Särelä, Jaakko, and Harri Valpola. 2005. “Denoising Source Separation.” Journal of Machine Learning Research 6 (Mar): 233–72. http://www.jmlr.org/papers/v6/sarela05a.html.

Schniter, P., and S. Rangan. 2012. “Compressive Phase Retrieval via Generalized Approximate Message Passing.” In 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 815–22. https://doi.org/10.1109/Allerton.2012.6483302.

Sefati, S., N. J. Cowan, and R. Vidal. 2015. “Linear Systems with Sparse Inputs: Observability and Input Recovery.” In 2015 American Control Conference (ACC), 5251–7. https://doi.org/10.1109/ACC.2015.7172159.

Seuret, Alexandre, and Frédéric Gouaisbaut. 2013. “Wirtinger-Based Integral Inequality: Application to Time-Delay Systems.” Automatica 49 (9): 2860–6. https://hal.archives-ouvertes.fr/hal-00855159.

Shah, Ankit, Anurag Kumar, Alexander G. Hauptmann, and Bhiksha Raj. 2018. “A Closer Look at Weak Label Learning for Audio Events.” April 24, 2018. http://arxiv.org/abs/1804.09288.

Shannon, C. E. 1949. “Communication in the Presence of Noise.” Proceedings of the IRE 37 (1): 10–21. https://doi.org/10.1109/JRPROC.1949.232969.

Simonyan, Karen, and Andrew Zisserman. 2014. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” September 4, 2014. http://arxiv.org/abs/1409.1556.

Sjöberg, Jonas, Qinghua Zhang, Lennart Ljung, Albert Benveniste, Bernard Delyon, Pierre-Yves Glorennec, Håkan Hjalmarsson, and Anatoli Juditsky. 1995. “Nonlinear Black-Box Modeling in System Identification: A Unified Overview.” Automatica, Trends in System Identification, 31 (12): 1691–1724. https://doi.org/10.1016/0005-1098(95)00120-8.

Smaragdis, Paris. 2004. “Non-Negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs.” In Independent Component Analysis and Blind Signal Separation, edited by Carlos G. Puntonet and Alberto Prieto, 494–99. Lecture Notes in Computer Science. Granada, Spain: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-30110-3_63.

Smaragdis, P., and J. C. Brown. 2003. “Non-Negative Matrix Factorization for Polyphonic Music Transcription.” In Applications of Signal Processing to Audio and Acoustics, 2003 IEEE Workshop on., 177–80. https://doi.org/10.1109/ASPAA.2003.1285860.

Smith, Evan C., and Michael S. Lewicki. 2004. “Learning Efficient Auditory Codes Using Spikes Predicts Cochlear Filters.” In Advances in Neural Information Processing Systems, 1289–96. http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2005_832.pdf.

———. 2006. “Efficient Auditory Coding.” Nature 439 (7079): 978–82. https://doi.org/10.1038/nature04485.

Smith, Julius O. 2007. Introduction to Digital Filters with Audio Applications. http://www.w3k.org/books/: W3K Publishing. https://ccrma.stanford.edu/~jos/filters/filters.html.

Smith, Leonard A. 2000. “Disentangling Uncertainty and Error: On the Predictability of Nonlinear Systems.” In Nonlinear Dynamics and Statistics.

Smith, Leslie N. 2015. “Cyclical Learning Rates for Training Neural Networks.” In. http://arxiv.org/abs/1506.01186.

Smith, Leslie N., and Nicholay Topin. 2017. “Exploring Loss Function Topology with Cyclical Learning Rates.” February 14, 2017. http://arxiv.org/abs/1702.04283.

Smith, Steven W. 1997. The Scientist and Engineer’s Guide to Digital Signal Processing. 1st ed. San Diego, Calif: California Technical Pub.

Soh, Yong Sheng, and Venkat Chandrasekaran. 2017. “A Matrix Factorization Approach for Learning Semidefinite-Representable Regularizers.” January 4, 2017. http://arxiv.org/abs/1701.01207.

Söderström, T., and P. Stoica, eds. 1988. System Identification. Upper Saddle River, NJ, USA: Prentice-Hall, Inc.

Stepleton, Thomas, Razvan Pascanu, Will Dabney, Siddhant M. Jayakumar, Hubert Soyer, and Remi Munos. 2018. “Low-Pass Recurrent Neural Networks - A Memory Architecture for Longer-Term Correlation Discovery.” May 13, 2018. https://arxiv.org/abs/1805.04955v1.

Sutskever, Ilya. 2013. “Training Recurrent Neural Networks.” PhD Thesis, Toronto, Ont., Canada, Canada: University of Toronto. https://tspace.library.utoronto.ca/handle/1807/36012.

Sutskever, Ilya, James Martens, George E. Dahl, and Geoffrey E. Hinton. 2013. “On the Importance of Initialization and Momentum in Deep Learning.” In ICML (3), 28:1139–47. http://www.jmlr.org/proceedings/papers/v28/sutskever13.pdf.

Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. “Going Deeper with Convolutions.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–9. https://doi.org/10.1109/CVPR.2015.7298594.

Tallec, Corentin, and Yann Ollivier. 2017. “Unbiasing Truncated Backpropagation Through Time.” May 23, 2017. http://arxiv.org/abs/1705.08209.

Telgarsky, Matus. 2017. “Neural Networks and Rational Functions.” In PMLR, 3387–93. http://proceedings.mlr.press/v70/telgarsky17a.html.

Thickstun, John, Zaid Harchaoui, and Sham Kakade. 2017. “Learning Features of Music from Scratch.” In Proceedings of International Conference on Learning Representations (ICLR) 2017. http://arxiv.org/abs/1611.09827.

Tippett, Michael K., Jeffrey L. Anderson, Craig H. Bishop, Thomas M. Hamill, and Jeffrey S. Whitaker. 2003. “Ensemble Square Root Filters.” Monthly Weather Review 131 (7): 1485–90. http://iri.columbia.edu/~tippett/pubs/srf_revised_again_submit.pdf.

Tong, Matthew H., Adam D. Bickett, Eric M. Christiansen, and Garrison W. Cottrell. 2007. “Learning Grammatical Structure with Echo State Networks.” Neural Networks 20 (3): 424–32. https://doi.org/10.1016/j.neunet.2007.04.013.

Tran, Dustin, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, and David M. Blei. 2017. “Deep Probabilistic Programming.” In ICLR. http://arxiv.org/abs/1701.03757.

Tran, Dustin, Alp Kucukelbir, Adji B. Dieng, Maja Rudolph, Dawen Liang, and David M. Blei. 2016. “Edward: A Library for Probabilistic Modeling, Inference, and Criticism.” October 31, 2016. http://arxiv.org/abs/1610.09787.

Triefenbach, F., A. Jalalvand, K. Demuynck, and J. P. Martens. 2013. “Acoustic Modeling with Hierarchical Reservoirs.” IEEE Transactions on Audio, Speech, and Language Processing 21 (11): 2439–50. https://doi.org/10.1109/TASL.2013.2280209.

Tropp, J A, M B Wakin, M F Duarte, D Baron, and R G Baraniuk. 2006. “Random Filters for Compressive Sampling and Reconstruction.” In Proceedings of the IEEE International Conference Acoustics, Speech, and Signal Processing, 3:872–75. https://doi.org/10.1109/ICASSP.2006.1660793.

Tsipas, Nikolaos, Lazaros Vrysis, Charalampos Dimoulas, and George Papanikolaou. 2017. “Efficient Audio-Driven Multimedia Indexing Through Similarity-Based Speech / Music Discrimination.” Multimedia Tools and Applications, January, 1–19. https://doi.org/10.1007/s11042-016-4315-0.

Tufts, D. W., and R. Kumaresan. 1982. “Estimation of Frequencies of Multiple Sinusoids: Making Linear Prediction Perform Like Maximum Likelihood.” Proceedings of the IEEE 70 (9): 975–89. https://doi.org/10.1109/PROC.1982.12428.

Uncini, Aurelio. 2003. “Audio Signal Processing by Neural Networks.” Neurocomputing, Evolving Solution with Neural Networks, 55 (3–4): 593–625. https://doi.org/10.1016/S0925-2312(03)00395-3.

Van Eeghem, Frederik, and Lieven De Lathauwer. 2013. “Blind System Identification as a Compressed Sensing Problem.” ftp://ftp.esat.kuleuven.ac.be/pub/stadius/fvaneegh/vaneeghem2015blind.pdf.

Vaz, Colin, Asterios Toutios, and Shrikanth S. Narayanan. 2016. “Convex Hull Convolutive Non-Negative Matrix Factorization for Uncovering Temporal Patterns in Multivariate Time-Series Data.” In, 963–67. https://doi.org/10.21437/Interspeech.2016-571.

Venkataramani, Shrikant, and Paris Smaragdis. 2017. “End to End Source Separation with Adaptive Front-Ends.” May 6, 2017. http://arxiv.org/abs/1705.02514.

Venkataramani, Shrikant, Y. Cem Subakan, and Paris Smaragdis. 2017. “Neural Network Alternatives to Convolutive Audio Models for Source Separation.” September 20, 2017. http://arxiv.org/abs/1709.07908.

Vincent, E., N. Bertin, and R. Badeau. 2008. “Harmonic and Inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch Transcription.” In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 109–12. https://doi.org/10.1109/ICASSP.2008.4517558.

Virtanen, T. 2007. “Monaural Sound Source Separation by Nonnegative Matrix Factorization with Temporal Continuity and Sparseness Criteria.” IEEE Transactions on Audio, Speech, and Language Processing 15 (3): 1066–74. https://doi.org/10.1109/TASL.2006.885253.

Virtanen, Tuomas. 2006. “Unsupervised Learning Methods for Source Separation in Monaural Music Signals.” In Signal Processing Methods for Music Transcription, 267–96. Springer. https://www.cs.tut.fi/sgn/arg/music/tuomasv/unsupervised_virtanen.pdf.

Wang, Xinxi, and Ye Wang. 2014. “Improving Content-Based and Hybrid Music Recommendation Using Deep Learning.” In Proceedings of the 22Nd ACM International Conference on Multimedia, 627–36. MM ’14. New York, NY, USA: ACM. https://doi.org/10.1145/2647868.2654940.

Wang, Zhong-Qiu, Jonathan Le Roux, DeLiang Wang, and John R. Hershey. 2018. “End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction.” April 26, 2018. http://arxiv.org/abs/1804.10204.

Welch, Peter D. 1967. “The Use of Fast Fourier Transform for the Estimation of Power Spectra: A Method Based on Time Averaging over Short, Modified Periodograms.” IEEE Transactions on Audio and Electroacoustics 15 (2, 2): 70–73. https://doi.org/10.1109/TAU.1967.1161901.

Werbos, Paul J. 1988. “Generalization of Backpropagation with Application to a Recurrent Gas Market Model.” Neural Networks 1 (4): 339–56. https://doi.org/10.1016/0893-6080(88)90007-X.

Werbos, P. J. 1990. “Backpropagation Through Time: What It Does and How to Do It.” Proceedings of the IEEE 78 (10): 1550–60. https://doi.org/10.1109/5.58337.

Wiatowski, Thomas, Philipp Grohs, and Helmut Bölcskei. 2018. “Energy Propagation in Deep Convolutional Neural Networks.” IEEE Transactions on Information Theory 64 (7): 1–1. https://doi.org/10.1109/TIT.2017.2756880.

Williams, Ronald J., and Jing Peng. 1990. “An Efficient Gradient-Based Algorithm for on-Line Training of Recurrent Network Trajectories.” Neural Computation 2 (4): 490–501. https://doi.org/10.1162/neco.1990.2.4.490.

Williams, Ronald J., and David Zipser. 1989. “A Learning Algorithm for Continually Running Fully Recurrent Neural Networks.” Neural Computation 1 (2): 270–80. https://doi.org/10.1162/neco.1989.1.2.270.

Wisdom, Scott, Thomas Powers, John Hershey, Jonathan Le Roux, and Les Atlas. 2016. “Full-Capacity Unitary Recurrent Neural Networks.” In Advances in Neural Information Processing Systems, 4880–8. http://papers.nips.cc/paper/6327-full-capacity-unitary-recurrent-neural-networks.

Wisdom, Scott, Thomas Powers, James Pitton, and Les Atlas. 2016. “Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery.” In Advances in Neural Information Processing Systems 29. http://arxiv.org/abs/1611.07252.

Wright, Matthew, James Beauchamp, Kelly Fitz, Xavier Rodet, Axel Röbel, Xavier Serra, and Gregory Wakefield. 2001. “Analysis/Synthesis Comparison.” Organised Sound 5 (03): 173–89. http://www.journals.cambridge.org/abstract_S1355771800005070.

Wu, Xiaoxia, Rachel Ward, and Léon Bottou. 2018. “WNGrad: Learn the Learning Rate in Gradient Descent.” March 7, 2018. http://arxiv.org/abs/1803.02865.

Wu, Yuhuai, Saizheng Zhang, Ying Zhang, Yoshua Bengio, and Ruslan R Salakhutdinov. 2016. “On Multiplicative Integration with Recurrent Neural Networks.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2856–64. Curran Associates, Inc. http://papers.nips.cc/paper/6215-on-multiplicative-integration-with-recurrent-neural-networks.pdf.

Wyse, L. 2017. “Audio Spectrogram Representations for Processing with Convolutional Neural Networks.” In Proceedings of the First International Conference on Deep Learning and Music, Anchorage, US, May, 2017 (arXiv:1706.08675v1 [cs.NE]). http://arxiv.org/abs/1706.09559.

Xie, Bo, Yingyu Liang, and Le Song. 2016. “Diversity Leads to Generalization in Neural Networks.” November 9, 2016. http://arxiv.org/abs/1611.03131.

Yaghoobi, M., Sangnam Nam, R. Gribonval, and M. E. Davies. 2013. “Constrained Overcomplete Analysis Operator Learning for Cosparse Signal Modelling.” IEEE Transactions on Signal Processing 61 (9): 2341–55. https://doi.org/10.1109/TSP.2013.2250968.

Yin, W, S Osher, D Goldfarb, and J Darbon. 2008. “Bregman Iterative Algorithms for $\ell_1$-Minimization with Applications to Compressed Sensing.” SIAM Journal on Imaging Sciences 1 (1): 143–68. https://doi.org/10.1137/070703983.

Yoshii, Kazuyoshi, and Masataka Goto. 2012. “Infinite Composite Autoregressive Models for Music Signal Analysis.” In. http://www.ismir2012.ismir.net/event/papers/079_ISMIR_2012.pdf.

Yu, D., and L. Deng. 2011. “Deep Learning and Its Applications to Signal and Information Processing [Exploratory DSP].” IEEE Signal Processing Magazine 28 (1): 145–54. https://doi.org/10.1109/MSP.2010.939038.

Yu, Dong, and Jinyu Li. 2018. “Recent Progresses in Deep Learning Based Acoustic Models (Updated).” April 24, 2018. http://arxiv.org/abs/1804.09298.

Yu, Guoshen, and Jean-Jacques Slotine. 2009. “Audio Classification from Time-Frequency Texture.” In Acoustics, Speech, and Signal Processing, IEEE International Conference on, 0:1677–80. Los Alamitos, CA, USA: IEEE Computer Society. https://doi.org/10.1109/ICASSP.2009.4959924.

Yu, Haizi, and Lav R. Varshney. 2017. “Towards Deep Interpretability (MUS-ROVER II): Learning Hierarchical Representations of Tonal Music.” In Proceedings of International Conference on Learning Representations (ICLR) 2017.

Zhang, X., and W. R. Zbigniew. 2007. “Analysis of Sound Features for Music Timbre Recognition.” In International Conference on Multimedia and Ubiquitous Engineering, 2007. MUE ’07, 3–8. Washington, DC. https://doi.org/10.1109/MUE.2007.85.

Zhang, Yuchen, Percy Liang, and Martin J. Wainwright. 2016. “Convexified Convolutional Neural Networks.” September 4, 2016. http://arxiv.org/abs/1609.01000.

Zhu, Zhenyao, Jesse H. Engel, and Awni Hannun. 2016. “Learning Multiscale Features Directly from Waveforms.” In Interspeech 2016, 1305–9. http://arxiv.org/abs/1603.09509.

Zils, A, and F Pachet. 2001. “Musical Mosaicing.” In Proceedings of DAFx-01, 2:135. Limerick, Ireland. http://csl.sony.fr/downloads/papers/2001/zils-dafx2001.pdf.

Zinkevich, Martin. 2003. “Online Convex Programming and Generalized Infinitesimal Gradient Ascent.” In Proceedings of the Twentieth International Conference on International Conference on Machine Learning, 928–35. ICML’03. Washington, DC, USA: AAAI Press. http://dl.acm.org/citation.cfm?id=3041838.3041955.