Machine listening

Statistical models for audio

Machine listening! My preferred term for a diverse field that is also called Music Information Retrieval, speech processing and probably other names.

I’m not going to talk about speech recognition; That boat is full.

Machine listening: machine learning, from audio. Everything from that Shazam app doohickey, to teaching computers to recognise speech, to doing artsy things with sound. I’m mostly concerned with the third one. Statistics, features, descriptors, metrics, kernels and affinities and the spaces and topologies they induce. for musical audio e.g. your MP3 pop song library. This has considerable overlap with musical metrics but there I start from scores and transcriptions.

Polyphony and its problems.

Approximate logarithmic perception and its problems.

Should I create a separate psychoacoustics notebook? Yes.

Should I create a separate features notebook? Yes.

See also musical corpora, musical metrics synchronisation, sparse basis dictionaries, speech recognition, learning gamelan, analysis/resynthesis, etc.


See auditory features.


Here are some options for doing machine listening.


musicbricks is an umbrella project to unify (post hoc) many of the efforts mentioned individually below, plus a few other new ones.

  • Fraunhofer ML software (C++) is part of this project, including such things as

    • Real-Time Pitch Detection
    • MusicBricksTranscriber
    • Goatify Pdf
    • Time Stretch Pitch Shift Library


LibROSA I have been using a lot recently, and I highly recommend it, especially if your pipeline already includes python. Sleek minimal design, with a curated set of algorithms (compare and contrast with the chaos of the vamp plugins ecosystem). Python-based, but fast enough because it uses the numpy numerical libraries well. The API design meshes well with Scikit-learn, the de facto python machine learning standard, and it’s flexible and hackable.

  • see also talkbox for a nice-looking but abandoned (?) alternative, which is nonetheless worth it for Alexander Schindler’s lovely MIR lecture based around it.

  • amen is a remix program built on librosa


SonicAnnotator seems to be about cobbling together vamp plugins for batch analysis. That is more steps that I want in an already clunky workflow in the current projects It’s also more about RDF ontologies where I want matrices of floats.


For C++ and Python there is Essentia, as seen in Freesound, which is a high recommendation IMO. (Watch out, the source download is enormous; just shy of half a gigbyte.) Features python and vamp integration, and a great many algorithms. I haven’t given it a fair chance because LibROSA has been such a joy to use. However, the intriguing Dunya project is based off it.


echonest is (was?) a proprietary system that was used to generate the Million Songs Database. Seems to be gradually decaying, and was bought up by spotify. has great demos, such as autocanonisation.



is a tool designed for the extraction of annotations from audio signals. Its features include segmenting a sound file before each of its attacks, performing pitch detection, tapping the beat and producing midi streams from live audio.…

aubio currently provides the following features:

  • digital filters
  • phase vocoder
  • onset detection (several methods)
  • pitch tracking (several methods)
  • beat and tempo tracking
  • mel frequency cepstrum coefficients (MFCC)
  • transient / steady-state separation

…aubio is written in C and is known to run on most modern architectures and platforms.

The Python interface has been written in C so that aubio arrays can be viewed directly in Python as NumPy arrays. This makes the aubio module quite efficient, not to say fast.



RP extract


phonological corpus tools

Speech-focussed, phonological corpus tools is another research library for largeish corpus analysis, similarity-classification etc. As presaged, I’m not really here for speech stuff.

Metamorph, smstools

John Glover, soundcloud staffer, has several analysis libraries culminating in Metamorph,

a new open source library for performing high-level sound transformations based on a sinusoids plus noise plus transients model. It is written in C++, can be built as both a Python extension module and a Csound opcode, and currently runs on Mac OS X and Linux.

It is designed to work primarily on monophonic, quasi-harmonic sound sources and can be used in a non-real-time context to process pre-recorded sound files or can operate in a real-time (streaming) mode.

See also the related spectral modeling and synthesis package, smstools.

Sinusoidal modelling with simplsound

“sinusoidal modelling”: Simplsound (Glover, Lazzarini, and Timoney 2009) is a python implementation of that technique.


If you use a lot of Supercollider, you might like SCMIR, a native supercollider thingy. It has the virtues that

  • it can run in realtime, which is lovely.

It has the vices that

  • It runs in Supercollider, which is a backwater language unserviced by modern development infrastructure, or decent machine learning libraries, and

  • a fraught development process; I can’t even link directly to it because the author doesn’t provide it its own anchor tag, let alone a whole web page or source code repository. Release schedule is opaque and sporadic. Consequently, it is effectively a lone guy’s pet project, rather than an active community endeavour.

    That is to say this is the Etsy sweater of MIR. If on balance this sounds like a snug deal to you, you can download SCMIR from somewhere or other on Nick Collins’ homepage.


Abe, T., T. Kobayashi, and S. Imai. 1995. Harmonics Tracking and Pitch Extraction Based on Instantaneous Frequency.” In International Conference on Acoustics, Speech, and Signal Processing, 1995. ICASSP-95, 1:756–759 vol.1.
Alías, Francesc, Joan Claudi Socoró, and Xavier Sevillano. 2016. A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds.” Applied Sciences 6 (5): 143.
Anglade, Amélie, Emmanouil Benetos, Matthias Mauch, and Simon Dixon. 2010. Improving Music Genre Classification Using Automatically Induced Harmony Rules.” Journal of New Music Research 39: 349–61.
Benetos, Emmanouil, Srikanth Cherla, and Tillman Weyde. 2013. “An Efficient Shift-Invariant Model for Polyphonic Music Transcription.” In 6th International Workshop on Machine Learning and Music, 4. Prague, Czech Republic.
Bertin-Mahieux, Thierry, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. 2011. “The Million Song Dataset.” In 12th International Society for Music Information Retrieval Conference (ISMIR 2011).
Bigand, Emmanuel, Richard Parncutt, and Fred Lerdahl. 1996. Perception of Musical Tension in Short Chord Sequences: The Influence of Harmonic Function, Sensory Dissonance, Horizontal Motion, and Musical Training.” Perception & Psychophysics 58 (1): 125–41.
Blackman, R. B., and J. W. Tukey. 1959. The measurement of power spectra from the point of view of communications engineering. New York: Dover Publications.
Bogert, B P, M J R Healy, and J W Tukey. 1963. “The Quefrency Alanysis of Time Series for Echoes: Cepstrum, Pseudo-Autocovariance, Cross-Cepstrum and Saphe Cracking.” In, 209–43.
Boulanger-Lewandowski, Nicolas, Yoshua Bengio, and Pascal Vincent. 2012. Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription.” In 29th International Conference on Machine Learning.
Box, George E. P., Gwilym M. Jenkins, Gregory C. Reinsel, and Greta M. Ljung. 2016. Time Series Analysis: Forecasting and Control. Fifth edition. Wiley Series in Probability and Statistics. Hoboken, New Jersey: John Wiley & Sons, Inc.
Burred, Juan José, Emmanuel Ponsot, Louise Goupil, Marco Liuni, and Jean-Julien Aucouturier. 2018. CLEESE: An Open-Source Audio-Transformation Toolbox for Data-Driven Experiments in Speech and Music Cognition.” Preprint. Neuroscience.
Carabias-Orti, J. J., T. Virtanen, P. Vera-Candeas, N. Ruiz-Reyes, and F. J. Canadas-Quesada. 2011. Musical Instrument Sound Multi-Excitation Model for Non-Negative Spectrogram Factorization.” IEEE Journal of Selected Topics in Signal Processing 5 (6): 1144–58.
Carmi, Avishy Y. 2013. Compressive System Identification: Sequential Methods and Entropy Bounds.” Digital Signal Processing 23 (3): 751–70.
Carter, G.Clifford. 1987. Coherence and Time Delay Estimation.” Proceedings of the IEEE 75 (2): 236–55.
Chen, Ning, and Shijun Wang. n.d. “High-Level Music Descriptor Extraction Algorithm Based on Combination of Multi-Channel Cnns and Lstm.” In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR’2017), Suzhou, China.
Childers, D. G., D. P. Skinner, and R. C. Kemerait. 1977. The Cepstrum: A Guide to Processing.” Proceedings of the IEEE 65 (10): 1428–43.
Choi, Jeong, Jongpil Lee, Jiyoung Park, and Juhan Nam. 2019. “Zero-Shot Learning for Audio-Based Music Classification and Tagging.” In, 8.
Choi, Keunwoo, and Kyunghyun Cho. 2019. “Deep Unsupervised Drum Transcription.” In, 9.
Choi, Keunwoo, György Fazekas, Mark Sandler, and Kyunghyun Cho. 2017. Transfer Learning for Music Classification and Regression Tasks.” In Proceeding of The 18th International Society of Music Information Retrieval (ISMIR) Conference 2017. suzhou, China.
Cochran, W.T., James W. Cooley, D.L. Favin, H.D. Helms, R.A. Kaenel, W.W. Lang, Jr. Maling G.C., D.E. Nelson, C.M. Rader, and Peter D. Welch. 1967. What Is the Fast Fourier Transform? Proceedings of the IEEE 55 (10): 1664–74.
Cooley, J. W., P. A. W. Lewis, and P. D. Welch. 1970. The Application of the Fast Fourier Transform Algorithm to the Estimation of Spectra and Cross-Spectra.” Journal of Sound and Vibration 12 (3): 339–52.
Defferrard, Michaël, Kirell Benzi, Pierre Vandergheynst, and Xavier Bresson. 2017. FMA: A Dataset For Music Analysis.” In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR’2017), Suzhou, China.
Demopoulos, Ryan J., and Michael J. Katchabaw. 2007. Music Information Retrieval: A Survey of Issues and Approaches.” Technical Report.
Dieleman, Sander, and Benjamin Schrauwen. 2014. End to End Learning for Music Audio.” In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6964–68. IEEE.
Dörfler, Monika, Gino Velasco, Arthur Flexer, and Volkmar Klien. 2010. Sparse Regression in Time-Frequency Representations of Complex Audio.” In.
Du, Pan, Warren A. Kibbe, and Simon M. Lin. 2006. Improved Peak Detection in Mass Spectrum by Incorporating Continuous Wavelet Transform-Based Pattern Matching.” Bioinformatics 22 (17): 2059–65.
Elbaz, Dan, and Michael Zibulevsky. 2017. Perceptual Audio Loss Function for Deep Learning.” In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR’2017), Suzhou, China.
Fitzgerald, Derry. 2010. Harmonic/Percussive Separation Using Median Filtering.”
Flamary, Rémi, Cédric Févotte, Nicolas Courty, and Valentin Emiya. 2016. Optimal Spectral Transportation with Application to Music Transcription.” In arXiv:1609.09799 [Cs, Stat], 703–11. Curran Associates, Inc.
Fonseca, Eduardo, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra. 2017. Freesound Datasets: A Platform for the Creation of Open Audio Datasets.” In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR’2017), Suzhou, China.
Fu, Zhouyu, Guojun Lu, Kai Ming Ting, and Dengsheng Zhang. 2011. A Survey of Audio-Based Music Classification and Annotation.” IEEE Transactions on Multimedia 13 (2): 303–19.
Fuentes, Magdalena, Lucas S Maia, Martın Rocamora, Luiz W P Biscainho, Helene C Crayencour, Slim Essid, and Juan P Bello. 2019. “Tracking Beats and Microtiming in Afro-Latin American Music Using Conditional Random Fields and Deep Learning.” In, 8.
Glover, John C., Victor Lazzarini, and Joseph Timoney. 2009. Simpl: A Python Library for Sinusoidal Modelling.” In DAFx 09 Proceedings of the 12th International Conference on Digital Audio Effects, Politecnico Di Milano, Como Campus, Sept. 1-4, Como, Italy, 1–4. Dept. of Electronic Engineering, Queen Mary Univ. of London,.
Godsill, S., and Manuel Davy. 2005. Bayesian Computational Models for Inharmonicity in Musical Instruments.” In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005, 283–86. IEEE.
Gómez, Emilia, and Perfecto Herrera. 2004. Estimating The Tonality Of Polyphonic Audio Files: Cognitive Versus Machine Learning Modelling Strategies. In ISMIR.
Grosche, Peter, Meinard Müller, and Craig Stuart Sapp. 2010. “What Makes Beat Tracking Difficult? A Case Study on Chopin Mazurkas.” In Proceedings of the International Conference on Music Information Retrieval (ISMIR 2010).
Grosche, P., M. Muller, and F. Kurth. 2010. Cyclic Tempogram - a Mid-Level Tempo Representation for Music Signals.” In 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 5522–25. Piscataway, NJ.: IEEE.
Harris, Fredric J. 1978. On the Use of Windows for Harmonic Analysis with the Discrete Fourier Transform.” Proceedings of the IEEE 66 (1): 51–83.
Helmholtz, Heinrich. 1863. Die Lehre von Den Tonempfindungen Als Physiologische Grundlage Für Die Theorie Der Musik. Braunschweig: J. Vieweg.
Hermes, Dik J. 1988. Measurement of Pitch by Subharmonic Summation.” The Journal of the Acoustical Society of America 83 (1): 257–64.
Hershey, Shawn, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, Channing Moore, Manoj Plakal, et al. 2017. CNN Architectures for Large-Scale Audio Classification.” In Proc. IEEE ICASSP 2017.
Hinton, G., Li Deng, Dong Yu, G.E. Dahl, A. Mohamed, N. Jaitly, A. Senior, et al. 2012. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups.” IEEE Signal Processing Magazine 29 (6): 82–97.
Hoffman, Matthew D, David M Blei, and Perry R Cook. 2010. Bayesian Nonparametric Matrix Factorization for Recorded Music.” In International Conference on Machine Learning, 8.
Hoffman, Matthew, Francis R. Bach, and David M. Blei. 2010. Online Learning for Latent Dirichlet Allocation.” In Advances in Neural Information Processing Systems, 856–64.
Irizarry, Rafael A. 2001. Local Harmonic Estimation in Musical Sound Signals.” Journal of the American Statistical Association 96 (454): 357–67.
Joël Bensoam, and David Roze. 2013. Solving Interactions Between Nonlinear Resonators.” In Proceedings of the Sound and Music Computing Conference.
Kailath, Thomas, Ali H. Sayed, and Babak Hassibi. 2000. Linear Estimation. Prentice Hall Information and System Sciences Series. Upper Saddle River, N.J: Prentice Hall.
Kalouptsidis, Nicholas, Gerasimos Mileounis, Behtash Babadi, and Vahid Tarokh. 2011. Adaptive Algorithms for Sparse System Identification.” Signal Processing 91 (8): 1910–19.
Kereliuk, Corey, Philippe Depalle, and Philippe Pasquier. 2013. Audio Interpolation and Morphing via Structured-Sparse Linear Regression.” In.
Lahat, M., Russell J. Niederjohn, and D. Krubsack. 1987. A Spectral Autocorrelation Method for Measurement of the Fundamental Frequency of Noise-Corrupted Speech.” IEEE Transactions on Acoustics, Speech and Signal Processing 35 (6): 741–50.
Lattner, Stefan, Monika Dorfler, and Andreas Arzt. 2019. Learning Complex Basis Functions for Invariant Representations of Audio.” In Proceedings of the 20th Conference of the International Society for Music Information Retrieval, 8.
Lattner, Stefan, and Maarten Grachten. 2019. High-Level Control of Drum Track Generation Using Learned Patterns of Rhythmic Interaction.” In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2019).
Ljung, Lennart. 1999. System Identification: Theory for the User. 2nd ed. Prentice Hall Information and System Sciences Series. Upper Saddle River, NJ: Prentice Hall PTR.
Luo, Yi, Zhuo Chen, John R. Hershey, Jonathan Le Roux, and Nima Mesgarani. 2016. Deep Clustering and Conventional Networks for Music Separation: Stronger Together.” arXiv:1611.06265 [Cs, Stat], November.
Luo, Yin-Jyun, Kat Agres, and Dorien Herremans. 2019. Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders.” In Proceedings of the 20th Conference of the International Society for Music Information Retrieval.
MacKinlay, Daniel, and Zdravko I Botev. 2019. Mosaic Style Transfer Using Sparse Autocorrelograms.” In Proceedings of the 20th Conference of the International Society for Music Information Retrieval, 5. Delft.
Makhoul, John, Francis Kubala, Richard Schwartz, and Ralph Weischedel. 1999. “Performance Measures For Information Extraction.” In In Proceedings of DARPA Broadcast News Workshop, 249–52.
Marchand, Ugo, and Geoffroy Peeters. 2014. The Modulation Scale Spectrum And Its Application To Rhythm-Content Analysis.” In DAFX (Digital Audio Effects). Erlangen, Germany.
Maxwell, James B., Philippe Pasquier, and Brian Whitman. 2009. Hierarchical Sequential Memory for Music: A Cognitive Model.” In Proceedings of the Tenth International Society for Music Information Retrieval Conference (ISMIR 2009), 429–34.
McFee, Brian, and Daniel PW Ellis. 2011. Analyzing Song Structure with Spectral Clustering.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Mesaros, Annamaria, Toni Heittola, and Tuomas Virtanen. 2016. Metrics for Polyphonic Sound Event Detection.” Applied Sciences 6 (6): 162.
Moorer, J.A. 1974. The Optimum Comb Method of Pitch Period Analysis of Continuous Digitized Speech.” IEEE Transactions on Acoustics, Speech and Signal Processing 22 (5): 330–38.
Müller, Meinard, and Jonathan Driedger. 2012. Data-Driven Sound Track Generation. In Multimodal Music Processing, 3:175–94. Dagstuhl, Germany: Schloss Dagstuhl–Leibniz-Zentrum für Informatik.
Müller, M., D.P.W. Ellis, A. Klapuri, and G. Richard. 2011. Signal Processing for Music Analysis.” IEEE Journal of Selected Topics in Signal Processing 5 (6): 1088–1110.
Noll, A. Michael. 1967. Cepstrum Pitch Determination.” The Journal of the Acoustical Society of America 41 (2): 293–309.
Nussbaum-Thom, Markus, Jia Cui, Bhuvana Ramabhadran, and Vaibhava Goel. 2016. Acoustic Modeling Using Bidirectional Gated Recurrent Convolutional Units.” In, 390–94.
Oppenheim, A. V., and R. W. Schafer. 2004. From Frequency to Quefrency: A History of the Cepstrum.” IEEE Signal Processing Magazine 21 (5): 95–106.
Parncutt, Richard. 1997. A Model of the Perceptual Root(s) of a Chord Accounting for Voicing and Prevailing Tonality.” In Music, Gestalt, and Computing, edited by Marc Leman, 181–99. Lecture Notes in Computer Science 1317. Springer Berlin Heidelberg.
Pati, Ashis, Alexander Lerch, and Gaëtan Hadjeres. 2019. Learning to Traverse Latent Spaces for Musical Score Inpainting.” In Proceedings of the 20th Conference of the International Society for Music Information Retrieval.
Paulus, Jouni, Meinard Müller, and Anssi Klapuri. 2010. Audio-Based Music Structure Analysis.” In ISMIR, 625–36. ISMIR.
Phan, Huy, Lars Hertel, Marco Maass, and Alfred Mertins. 2016. Robust Audio Event Recognition with 1-Max Pooling Convolutional Neural Networks.” In Interspeech 2016.
Pickens, Jeremy, and Costas S. Iliopoulos. 2005. Markov Random Fields and Maximum Entropy Modeling for Music Information Retrieval. In ISMIR, 207–14. Citeseer.
Plomp, Reinier, and Willem JM Levelt. 1965. Tonal Consonance and Critical Bandwidth.” The Journal of the Acoustical Society of America 38 (4): 548–60.
Pons, Jordi, Thomas Lidy, and Xavier Serra. 2016. Experimenting with Musically Motivated Convolutional Neural Networks.” In 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI), 1–6. Bucharest, Romania: IEEE.
Pons, Jordi, and Xavier Serra. 2017. Designing Efficient Architectures for Modeling Temporal Features with Convolutional Neural Networks.” In.
Robertson, A. N., and M. D. Plumbley. 2006. Real-Time Interactive Musical Systems: An Overview.” Proc. Of the Digital Music Research Network, Goldsmiths University, London, 65–68.
Robertson, Andrew N. 2011. A Bayesian Approach to Drum Tracking.” In.
Robertson, Andrew, and Mark Plumbley. 2007. B-Keeper: A Beat-Tracker for Live Performance.” In Proceedings of the 7th International Conference on New Interfaces for Musical Expression, 234–37. NIME ’07. New York, NY, USA: ACM.
Robertson, Andrew, and Mark D. Plumbley. 2013. Synchronizing Sequencing Software to a Live Drummer.” Computer Music Journal 37 (2): 46–60.
Robertson, Andrew, Adam M. Stark, and Mark D. Plumbley. 2011. Real-Time Visual Beat Tracking Using a Comb Filter Matrix.” In Proceedings of the International Computer Music Conference 2011.
Robertson, Andrew, Adam Stark, and Matthew EP Davies. 2013. Percussive Beat Tracking Using Real-Time Median Filtering.” In Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases.
Rochebois, Thierry, and Gérard Charbonneau. 1997. Cross-Synthesis Using Interverted Principal Harmonic Sub-Spaces.” In Music, Gestalt, and Computing, edited by Marc Leman, 375–85. Lecture Notes in Computer Science 1317. Springer Berlin Heidelberg.
Sainath, Tara N., and Bo Li. 2016. Modeling Time-Frequency Patterns with LSTM Vs. Convolutional Architectures for LVCSR Tasks.” Submitted to Proc. Interspeech.
Salamon, Justin, Emilia Gomez, Daniel PW Ellis, and Gael Richard. 2014. Melody Extraction from Polyphonic Music Signals: Approaches, Applications, and Challenges.” IEEE Signal Processing Magazine 31 (2): 118–34.
Salamon, Justin, Joan Serrà, and Emilia Gómez. 2013. Tonal Representations for Music Retrieval: From Version Identification to Query-by-Humming.” International Journal of Multimedia Information Retrieval 2 (1): 45–58.
Schlüter, J., and S. Böck. 2014. Improved Musical Onset Detection with Convolutional Neural Networks.” In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6979–83.
Schmidt, E.M., and Y.E. Kim. 2011. Learning Emotion-Based Acoustic Features with Deep Belief Networks.” In 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 65–68.
Scholler, S., and H. Purwins. 2011. Sparse Approximations for Drum Sound Classification.” IEEE Journal of Selected Topics in Signal Processing 5 (5): 933–40.
Sergé, Arnauld, Nicolas Bertaux, Hervé Rigneault, and Didier Marguet. 2008. Dynamic Multiple-Target Tracing to Probe Spatiotemporal Cartography of Cell Membranes.” Nature Methods 5 (8): 687–94.
Serrà, Joan, Álvaro Corral, Marián Boguñá, Martín Haro, and Josep Ll Arcos. 2012. Measuring the Evolution of Contemporary Western Popular Music.” Scientific Reports 2 (July).
Smith, Evan C., and Michael S. Lewicki. 2004. Learning Efficient Auditory Codes Using Spikes Predicts Cochlear Filters.” In Advances in Neural Information Processing Systems, 1289–96.
———. 2006. Efficient Auditory Coding.” Nature 439 (7079): 978–82.
Smith, Evan, and Michael S. Lewicki. 2005. Efficient Coding of Time-Relative Structure Using Spikes.” Neural Computation 17 (1): 19–45.
Smyth, Tamara, and Andrew R. Elmore. 2009. Explorations in Convolutional Synthesis.” In Proceedings of the 6th Sound and Music Computing Conference, Porto, Portugal, 23–25.
Southall, Carl, Chih-Wei Wu, Alexander Lerch, and Jason A. Hockman. 2017. MDB Drums — An Annotated Subset of MedleyDB for Automatic Drum Transcription.” In Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR). Suzhou: International Society for Music Information Retrieval (ISMIR).
Terhardt, Ernst, Gerhard Stoll, and Manfred Seewann. 1982. Algorithm for Extraction of Pitch and Pitch Salience from Complex Tonal Signals.” The Journal of the Acoustical Society of America 71 (3): 679–88.
Thickstun, John, Zaid Harchaoui, Dean P. Foster, and Sham M. Kakade. 2018. Invariances and Data Augmentation for Supervised Music Transcription.” In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2241–45.
Thickstun, John, Zaid Harchaoui, Dean Foster, and Sham M Kakade. 2017. “MIREX 2017: Frequency Domain Convolutions for Multiple F0 Estimation,” 1.
Thickstun, John, Zaid Harchaoui, and Sham Kakade. 2017. Learning Features of Music from Scratch.” In Proceedings of International Conference on Learning Representations (ICLR) 2017.
Venkataramani, Shrikant, and Paris Smaragdis. 2017. End to End Source Separation with Adaptive Front-Ends.” arXiv:1705.02514 [Cs], May.
Welch, Peter D. 1967. The Use of Fast Fourier Transform for the Estimation of Power Spectra: A Method Based on Time Averaging over Short, Modified Periodograms.” IEEE Transactions on Audio and Electroacoustics 15 (2): 70–73.
Wu, Chih-Wei, and Alexander Lerch. 2017. Automatic Drum Transcription Using the Student-Teacher Learning Paradigm with Unlabeled Music Data.” In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR). Suzhou: ISMIR.
Yang, Li-Chia, Szu-Yu Chou, and Yi-Hsuan Yang. 2017. MidiNet: A Convolutional Generative Adversarial Network for Symbolic-Domain Music Generation.” In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR’2017), Suzhou, China.
Yoshii, Kazuyoshi, and Masataka Goto. 2012. Infinite Composite Autoregressive Models for Music Signal Analysis.” In.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.