Teaching computers to write music

Seems like it should be easy, until you think about it.

Related: Arpeggiate by numbers which discusses music theory, and analysis/resynthesis, which discusses audio in non-NN perspective.

MUSAiC – ERC-2019-COG No. 864189

Artificial intelligence (Ai) is an especially disruptive technology, impacting a growing number of domains in ways both beneficial and detrimental. It is even showing surprising impacts in the Arts, provoking questions fundamental to philosophy, law, and engineering, not to mention practices in the Arts themselves. MUSAiC is an interdisciplinary research venture confronting questions and challenges at the frontier of the AI disruption of music.

Useful infrastructure

pypianoroll and music21 for respectively, piano rolls and midi scores. See, e.g. MIDO for live MIDI.


A tutorial on generating music using Restricted Boltzmann Machines for the conditional random field density, and an RNN for the time dependence after (Boulanger-Lewandowski, Bengio, and Vincent 2012).

Bob Sturm did a good one too.

Audio synthesis

See analysis/resynthesis, voice fakes.


Google has weighed in, like a gorilla on the metallophone, to do midi composition with Tensorflow as part of their Magenta project. Their NIPS 2016 demo won the best demo prize.

Google’s previous demo in this area was popular. Deep Bach code) seems to be doing a related thing. Similar sets of authors have some other related work):

Modeling polyphonic music is a particularly challenging task because of the intricate interplay between melody and harmony. A good model should satisfy three requirements: statistical accuracy (capturing faithfully the statistics of correlations at various ranges, horizontally and vertically), flexibility (coping with arbitrary user constraints), and generalization capacity (inventing new material, while staying in the style of the training corpus). Models proposed so far fail on at least one of these requirements. We propose a statistical model of polyphonic music, based on the maximum entropy principle. This model is able to learn and reproduce pairwise statistics between neighboring note events in a given corpus. The model is also able to invent new chords and to harmonize unknown melodies. We evaluate the invention capacity of the model by assessing the amount of cited, re-discovered, and invented chords on a corpus of Bach chorales. We discuss how the model enables the user to specify and enforce user-defined constraints, which makes it useful for style-based, interactive music generation.

Evan Chow represents for team non-deep-learning with jazzml:

Computer jazz improvisation powered by machine learning, specifically trigram modeling, K-Means clustering, and chord inference with SVMs.

Charles Martin’s Creative Predictions:

Creative Prediction is about applying predictive machine learning models to creative data. The focus is on recurrent neural networks (RNNs), deep learning models that can be used to generate sequential and temporal data. RNNs can be applied to many kinds of creative data including text and music. They can learn the long-range structure from a corpus of data and “create” new sequences by predicting one element at a time. When embedded in a creative interface, they can be used for “predictive interaction” where a human collaborates with, influences, and is influenced by a generative neural network.

Daniel Johnson has a convolutional and recurrent architecture for taking into account multiple types of dependency in music, which he calls biaxial neural network Zhe LI, Composing Music With Recurrent Neural Networks.

Ji-Sung Kim’s deepjazz project is minimal, but does interesting jazz improvisations. Part of the genius is choosing totally chaotic music to try to ape, so you can ape it chaotically. (Code)

Boulanger-Lewandowski, (code and data) for (Boulanger-Lewandowski, Bengio, and Vincent 2012)’s recurrent neural network composition using python/Theano. Christian Walder leads a project which shares some roots with that. (Walder 2016a, 2016b) Bob Sturm’s FolkRNN does a related thing, but ingeniously redefines the problem by focussing on folk tune notation.


Boulanger-Lewandowski, Nicolas, Yoshua Bengio, and Pascal Vincent. 2012. Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription.” In 29th International Conference on Machine Learning.
Bown, Oliver, and Sebastian Lexer. 2006. Continuous-Time Recurrent Neural Networks for Generative and Interactive Musical Performance.” In Applications of Evolutionary Computing, edited by Franz Rothlauf, Jürgen Branke, Stefano Cagnoni, Ernesto Costa, Carlos Cotta, Rolf Drechsler, Evelyne Lutton, et al., 652–63. Lecture Notes in Computer Science 3907. Springer Berlin Heidelberg.
Briot, Jean-Pierre, and François Pachet. 2020. Deep Learning for Music Generation: Challenges and Directions.” Neural Computing and Applications 32 (4): 981–93.
Dieleman, Sander, and Benjamin Schrauwen. 2014. End to End Learning for Music Audio.” In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6964–68. IEEE.
Luo, Yi, Zhuo Chen, John R. Hershey, Jonathan Le Roux, and Nima Mesgarani. 2016. Deep Clustering and Conventional Networks for Music Separation: Stronger Together.” arXiv:1611.06265 [Cs, Stat], November.
Sarroff, Andy M., and Michael Casey. 2014. Musical Audio Synthesis Using Autoencoding Neural Nets.” In. Ann Arbor, MI: Michigan Publishing, University of Michigan Library.
Sigtia, Siddharth, Emmanouil Benetos, Nicolas Boulanger-Lewandowski, Tillman Weyde, Artur S. d’Avila Garcez, and Simon Dixon. 2015. A Hybrid Recurrent Neural Network for Music Transcription.” In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2061–65. IEEE.
Sturm, Bob L., Oded Ben-Tal, Úna Monaghan, Nick Collins, Dorien Herremans, Elaine Chew, Gaëtan Hadjeres, Emmanuel Deruty, and François Pachet. 2018. Machine Learning Research That Matters for Music Creation: A Case Study.” Journal of New Music Research 0 (0): 1–20.
Sun, Zheng, Jiaqi Liu, Zewang Zhang, Jingwen Chen, Zhao Huo, Ching Hua Lee, and Xiao Zhang. 2016. Composing Music with Grammar Argumented Neural Networks and Note-Level Encoding.” arXiv:1611.05416 [Cs], November.
Walder, Christian. 2016a. Modelling Symbolic Music: Beyond the Piano Roll.” arXiv:1606.01368 [Cs], June.
———. 2016b. Symbolic Music Data Version 1.0.” arXiv:1606.02542 [Cs], June.
Wyse, L. 2017. Audio Spectrogram Representations for Processing with Convolutional Neural Networks.” In Proceedings of the First International Conference on Deep Learning and Music, Anchorage, US, May, 2017 (arXiv:1706.08675v1 [Cs.NE]).
Yu, Haizi, and Lav R. Varshney. 2017. “Towards Deep Interpretability (MUS-ROVER II): Learning Hierarchical Representations of Tonal Music.” In Proceedings of International Conference on Learning Representations (ICLR) 2017.
Zukowski, Zack, and Cj Carr. 2017. “Generating Black Metal and Math Rock: Beyond Bach, Beethoven, and Beatles.” In 31st Conference on Neural Information Processing Systems (NIPS 2017).

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.