Neural music synthesis

I have a lot of feelings and ideas about this, but no time to write them down. For now, here are some links and ideas by other people.

Sander Dielemann on waveform-domain neural synthesis. Matt Vitelli on music generation from MP3s (source). Alex Graves on RNN predictive synthesis. Parag Mittal on RNN style transfer.


Differentiable DSP

This is a really fun idea — do audio processing as normal, but using an NN framework so that the operations are differentiable.

Project site. Github. Twitter intro. Paper. Online supplement. Timbre transfer example. Tutorials.


Pixelrnn turns out to be good at music Dadabots have successfully weaponised samplernn and it’s cute.


Open AI Jukebox is the latest hot generative music thing that I should be across. I would personally take a rather different approach to them to solve this problem, but they are the current benchmark.


I’m not massively into spectral-domain synthesis because I think the stationarity assumption is a bit of a stretch. But if you’re into that, you might as well try Melnet

Existing generative models for audio have predominantly aimed to directly model time-domain waveforms. MelNet instead aims to model the frequency content of an audio signal. MelNet can be used to model audio unconditionally, making it capable of tasks such as music generation. It can also be conditioned on text and speaker, making it applicable to tasks such as text-to-speech and voice conversion.


Jlin and Holly Herndon show off some artistic use of messed-up neural nets.

Hung-yi Lee and Yu Tsao, Generative Adversarial nets for DSP.


Blaauw, Merlijn, and Jordi Bonada. 2017. “A Neural Parametric Singing Synthesizer.” arXiv:1704.03809 [Cs], April.
Carr, C. J., and Zack Zukowski. 2018. “Generating Albums with SampleRNN to Imitate Metal, Rock, and Punk Bands.” arXiv:1811.06633 [Cs, Eess], November.
Dieleman, Sander, Aäron van den Oord, and Karen Simonyan. 2018. “The Challenge of Realistic Music Generation: Modelling Raw Audio at Scale.” In Advances In Neural Information Processing Systems, 11.
Elbaz, Dan, and Michael Zibulevsky. 2017. “Perceptual Audio Loss Function for Deep Learning.” In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR’2017), Suzhou, China.
Engel, Jesse, Cinjon Resnick, Adam Roberts, Sander Dieleman, Douglas Eck, Karen Simonyan, and Mohammad Norouzi. 2017. “Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders.” In PMLR.
Grais, Emad M., Dominic Ward, and Mark D. Plumbley. 2018. “Raw Multi-Channel Audio Source Separation Using Multi-Resolution Convolutional Auto-Encoders.” arXiv:1803.00702 [Cs], March.
Liu, Yuzhou, Balaji Thoshkahna, Ali Milani, and Trausti Kristjansson. 2020. “Voice and Accompaniment Separation in Music Using Self-Attention Convolutional Neural Network,” March.
Liutkus, Antoine, Roland Badeau, and Gäel Richard. 2011. “Gaussian Processes for Underdetermined Source Separation.” IEEE Transactions on Signal Processing 59 (7): 3155–67.
Mehri, Soroush, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, and Yoshua Bengio. 2017. “SampleRNN: An Unconditional End-to-End Neural Audio Generation Model.” In Proceedings of International Conference on Learning Representations (ICLR) 2017.
Sarroff, Andy M., and Michael Casey. 2014. “Musical Audio Synthesis Using Autoencoding Neural Nets.” In. Ann Arbor, MI: Michigan Publishing, University of Michigan Library.
Schlüter, J., and S. Böck. 2014. “Improved Musical Onset Detection with Convolutional Neural Networks.” In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6979–83.
Sprechmann, Pablo, Joan Bruna, and Yann LeCun. 2014. “Audio Source Separation with Discriminative Scattering Networks.” arXiv:1412.7022 [Cs], December.
Stöter, Fabian-Robert, Stefan Uhlich, Antoine Liutkus, and Yuki Mitsufuji. 2019. “Open-Unmix - A Reference Implementation for Music Source Separation.” Journal of Open Source Software 4 (41): 1667.
Tenenbaum, J. B., and W. T. Freeman. 2000. “Separating Style and Content with Bilinear Models.” Neural Computation 12 (6): 1247–83.
Tzinis, Efthymios, Zhepei Wang, and Paris Smaragdis. 2020. “Sudo Rm -Rf: Efficient Networks for Universal Audio Source Separation.” In, 6.
Venkataramani, Shrikant, and Paris Smaragdis. 2017. “End to End Source Separation with Adaptive Front-Ends.” arXiv:1705.02514 [Cs], May.
Venkataramani, Shrikant, Y. Cem Subakan, and Paris Smaragdis. 2017. “Neural Network Alternatives to Convolutive Audio Models for Source Separation.” arXiv:1709.07908 [Cs, Eess], September.
Verma, Prateek, and Julius O. Smith. 2018. “Neural Style Transfer for Audio Spectograms.” In 31st Conference on Neural Information Processing Systems (NIPS 2017).
Wyse, L. 2017. “Audio Spectrogram Representations for Processing with Convolutional Neural Networks.” In Proceedings of the First International Conference on Deep Learning and Music, Anchorage, US, May, 2017 (arXiv:1706.08675v1 [Cs.NE]).

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.