ISMIR 2019

Music Nerds in Delft

2019-11-04 — 2019-11-09

computers are awful

machine learning

machine listening

making things

music

signal processing

Suspiciously similar content

Video

Figure 1

I was at ISMIR 2019 Delft. That is, the 20th congress of the International Society for Music Information Retrieval. I made a miscellaneous repo of stuff. Videos online

1 Tutorials

Generating Music with GANs: An Overview and Case Studies by Hao-Wen Dong and Yi-Hsuan Yang.

Waveform-based music processing with deep learning by Sander Dieleman, Jordi Pons and Jongpil Lee. I have blogged a bunch of Jordi’s work here under source separation. Sander’s presentation had some interesting framings about

mode-seeking versus mode-covering approximations to probability distributions.
sparse versus densely conditioned conditional signals

2 Paper highlights

Papers that are useful for my own interests, that is; this is not necessarily an indictment of any papers I do not mention.

Or… See the ISMIR paper explorer.

Obviously, I like my paper (MacKinlay and Botev 2019) and think it is the best and most eloquently explained.
Keunwoo Choi’s Drummernet (K. Choi and Cho 2019) looks like a cunning hack to transcribe drums from audio, by learning to play a drum synthesizer.
(J. Choi et al. 2019) claims to solve a lot of the notorious problems with noisy labelling in music with a zero-shot learning model.
Stefan Lattner’s Drumnet (Lattner and Grachten 2019) is a remarkably simple model for rhythm generation.
Magdalena Fuentes et al., on detecting microtime in Afro Latin rhythms was super fun (Fuentes et al. 2019).
Work by Ashis Pati et al. is nice. Learning to Traverse Latent Spaces for Musical Score Inpainting (Pati, Lerch, and Hadjeres 2019).
Generating Structured Drum Pattern Using Variational Autoencoder and Self-similarity Matrix (Wei, Wu, and Su 2019) I hope to track these folks down but we are presenting our research at the same time. But this covariance structure appeals to me.
Supervised symbolic music style translation using synthetic data (Ondrˇej Cífka and Richard 2019) is kind of an automated Señor Coconut.

2.1 Source separation

Spleeter (Hennequin et al. 2019) from Deezer labs is one deep learning approach
Open Unmix (Stöter et al. 2019) from Sony CSL labs is another deep learning approach
UNMIXER (Smith, Kawasaki, and Goto 2019) a web UI for a cute hand-rolled matrix factorisation method

All blogged under source separation.

2.2 Decoupled representations

A lot of the authors would like to impose a certain factorisation, or “near”-factorisation, over a latent space into humanly interpretable dimensions. So they would like to disentangle, say, timbre from pitch from loudness, or similar. I would like to return to this problem; it looks fun.

Coupled Recurrent Models for Polyphonic Music Composition (Thickstun et al. 2019). It is phrased as a neural network problem, but their central question is, to my mind: What graphical model structure best approximates polyphonic scores?
Hanoi Hantrakul presenting Fast and Flexible Neural Audio Synthesis. The oral presentation turned out to be an advertisement for the successor project, Differentiable DSP.
Yin-Jyun Luo et al. have done something interesting in Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders Check out the demo page. (Luo, Agres, and Herremans 2019)
Learning Complex Basis Functions for Invariant Representations of Audio (Lattner, Dorﬂer, and Arzt 2019); Here it is about finding basis functions which preserve a priori symmetries. Appended to the sparse coding page.
Deep Music Analogy Via Latent Representation Disentanglement (Yang et al. 2019)

3 Data sets

mirdata, (Bittner et al. 2019)
The AcousticBrainz Genre Dataset (Bogdanov et al. 2019)
Harmonix set (Nieto et al. 2019)
AIST Dance DB Dance videos! (Tsuchida et al. 2019)

All blogged under audio corpora.

4 Excellent demos

🏗

https://acids-ircam.github.io/flow_synthesizer/ (Esling et al. 2019)
https://jazzomat.hfm-weimar.de/dbformat/dboverview.html
https://acids-ircam.github.io/flow_synthesizer/
https://github.com/jeffreyjohnens/style_rank
https://www.ee.iitb.ac.in/student/~krishnasubramani/ismir_LBD_poster.html
https://github.com/KinWaiCheuk/nnAudio

5 Serendipity

Differentiable DSP
Orb composer
Leigh Smith and the LANDR semantic audio search Selector
cochlear.ai

The So Strangely music science podcast.

6 References

Bittner, Fuentes, Rubinstein, et al. 2019. “Mirdata: Software for Reproducible Usage of Datasets.” In International Society for Music Information Retrieval (ISMIR) Conference.

Bogdanov, Porter, Schreiber, et al. 2019. “The Acousticbrainz Genre Dataset: Multi-Source, Multi-Level, Multi-Label, and Large-Scale.” In.

Choi, Keunwoo, and Cho. 2019. “Deep Unsupervised Drum Transcription.” In.

Choi, Jeong, Lee, Park, et al. 2019. “Zero-Shot Learning for Audio-Based Music Classification and Tagging.” In.

Cífka, Ondřej. 2019. “Supplementary material: Supervised Symbolic Music Style Translation Using Synthetic Data.”

Cífka, Ondrˇej, and Richard. 2019. “Supervised Symbolic Music Style Translation Using Synthetic Data.” In.

Engel, Agrawal, Chen, et al. 2019. “GANSynth: Adversarial Neural Audio Synthesis.” In Seventh International Conference on Learning Representations.

Engel, Hantrakul, Gu, et al. 2019. “DDSP: Differentiable Digital Signal Processing.” In.

Esling, Masuda, Bardet, et al. 2019. “Flowsynth: Semantic and Vocal Synthesis Control.” In.

Foroughmand, and Peeters. 2019. “Deep-Rhythm for Tempo Estimation and Rhythm Pattern Recognition.”

Fuentes, Maia, Rocamora, et al. 2019. “Tracking Beats and Microtiming in Afro-Latin American Music Using Conditional Random Fields and Deep Learning.” In.

Hennequin, Khlif, Voituret, et al. 2019. “Spleeter: A Fast and State-of-the Art Music Source Separation Tool with Pre-Trained Models.” In.

Kalchbrenner, Elsen, Simonyan, et al. 2018. “Efficient Neural Audio Synthesis.” arXiv:1802.08435 [Cs, Eess].

Lattner, Dorﬂer, and Arzt. 2019. “Learning Complex Basis Functions for Invariant Representations of Audio.” In Proceedings of the 20th Conference of the International Society for Music Information Retrieval.

Lattner, and Grachten. 2019. “High-Level Control of Drum Track Generation Using Learned Patterns of Rhythmic Interaction.” In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2019).

López-Serrano, Dittmar, Özer, et al. 2019. “NMF Toolbox: Music Processing Applications of Nonnegative Matrix Factorization.” In.

Luo, Agres, and Herremans. 2019. “Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders.” In Proceedings of the 20th Conference of the International Society for Music Information Retrieval.

MacKinlay, and Botev. 2019. “Mosaic Style Transfer Using Sparse Autocorrelograms.” In Proceedings of the 20th Conference of the International Society for Music Information Retrieval.

Nieto, McCallum, Davies, et al. 2019. “The Harmonix Set: Beats, Downbeats, and Functional Segment Annotations of Western Popular Music.” In.

Pati, Lerch, and Hadjeres. 2019. “Learning to Traverse Latent Spaces for Musical Score Inpainting.” In Proceedings of the 20th Conference of the International Society for Music Information Retrieval.

Pfleiderer, Frieler, Abeßer, et al., eds. 2017. Inside the Jazzomat - New Perspectives for Jazz Research.

Robinson, and Brown. 2019. “Automated Time-Frequency Domain Audio Crossfades Using Graph Cuts.” In.

Smaragdis. 2004. “Non-Negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs.” In Independent Component Analysis and Blind Signal Separation. Lecture Notes in Computer Science.

Smith, Kawasaki, and Goto. 2019. “Unmixer: An Interface for Extracting and Remixing Loops.” In.

Stöter, Uhlich, Liutkus, et al. 2019. “Open-Unmix - A Reference Implementation for Music Source Separation.” Journal of Open Source Software.

Thickstun, Harchaoui, Foster, et al. 2019. “Coupled Recurrent Models for Polyphonic Music Composition.” In.

Tsuchida, Fukayama, Hamasaki, et al. 2019. “AIST Dance Video Database: Multi-Genre, Multi-Dancer, and Multi-Camera Databasefor Dance Information Processing.” In.

Wei, Wu, and Su. 2019. “Generating Structured Drum Pattern Using Variational Autoencoder and Self-Similarity Matrix.” In.

Yang, Wang, Wang, et al. 2019. “Deep Music Analogy Via Latent Representation Disentanglement.” In Proceedings of the 20th Conference of the International Society for Music Information Retrieval.