Teaching computers to write music

2016-06-06 — 2020-03-25

Wherein the training of machines in musical composition is described, and MIDI sequence generation with TensorFlow’s Magenta, piano-roll representations, polyphonic and RNN-based models alongside maximum-entropy approaches are examined.

buzzword

computers are awful

generative art

machine learning

making things

music

neural nets

Seems like it should be easy until you think about it.

Related: Arpeggiate by numbers which discusses music theory, and analysis/resynthesis, which discusses audio in a non-NN perspective.

MUSAiC – ERC-2019-COG No. 864189

Artificial intelligence (AI) is an especially disruptive technology, impacting a growing number of domains in ways both beneficial and detrimental. It is even showing surprising impacts in the Arts, provoking questions fundamental to philosophy, law, and engineering, not to mention practices in the Arts themselves. MUSAiC is an interdisciplinary research venture confronting questions and challenges at the frontier of the AI disruption of music.

1 Useful infrastructure

pypianoroll and music21 for, respectively, piano rolls and MIDI scores. See, e.g. MIDO for live MIDI.

2 Tutorials

A tutorial on generating music using Restricted Boltzmann Machines for the conditional random field density, and an RNN for the time dependence after (Boulanger-Lewandowski, Bengio, and Vincent 2012).

Bob Sturm did a good one too.

3 Audio synthesis

See analysis/resynthesis, voice fakes.

4 Examples

Google has weighed in, like a gorilla on the metallophone, to do MIDI composition with TensorFlow as part of their Magenta project. Their NIPS 2016 demo won the best demo prize.

Google’s previous demo in this area was popular. Deep Bach (code) seems to be doing a related thing. Similar sets of authors have some other related work:

Modeling polyphonic music is a particularly challenging task because of the intricate interplay between melody and harmony. A good model should satisfy three requirements: statistical accuracy (capturing faithfully the statistics of correlations at various ranges, horizontally and vertically), flexibility (coping with arbitrary user constraints), and generalization capacity (inventing new material, while staying in the style of the training corpus). Models proposed so far fail on at least one of these requirements. We propose a statistical model of polyphonic music, based on the maximum entropy principle. This model is able to learn and reproduce pairwise statistics between neighbouring note events in a given corpus. The model is also able to invent new chords and to harmonise unknown melodies. We evaluate the invention capacity of the model by assessing the amount of cited, re-discovered, and invented chords on a corpus of Bach chorales. We discuss how the model enables the user to specify and enforce user-defined constraints, which makes it useful for style-based, interactive music generation.

Evan Chow represents for team non-deep-learning with jazzml:

Computer jazz improvisation powered by machine learning, specifically trigram modelling, K-Means clustering, and chord inference with SVMs.

Charles Martin’s Creative Predictions:

Creative Prediction is about applying predictive machine learning models to creative data. The focus is on recurrent neural networks (RNNs), deep learning models that can be used to generate sequential and temporal data. RNNs can be applied to many kinds of creative data including text and music. They can learn the long-range structure from a corpus of data and “create” new sequences by predicting one element at a time. When embedded in a creative interface, they can be used for “predictive interaction” where a human collaborates with, influences, and is influenced by a generative neural network.

Daniel Johnson has a convolutional and recurrent architecture for taking into account multiple types of dependency in music, which he calls biaxial neural network Zhe LI, Composing Music With Recurrent Neural Networks.

Ji-Sung Kim’s deepjazz project is minimal, but does interesting jazz improvisations. Part of the genius is choosing totally chaotic music to try to ape, so you can ape it chaotically. (Code)

Boulanger-Lewandowski, (code and data) for (Boulanger-Lewandowski, Bengio, and Vincent 2012)’s recurrent neural network composition using python/Theano. Christian Walder leads a project which shares some roots with that. (Walder 2016a, 2016b) Bob Sturm’s FolkRNN does a related thing, but ingeniously redefines the problem by focusing on folk tune notation.

5 Incoming

https://www.vprobroadcast.com/titles/ai-songcontest/teams/australia
Fake Feelings—ai emo. When post-hardcore emo band Silverstein… | by Dadabots

6 References

Boulanger-Lewandowski, Bengio, and Vincent. 2012. “Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription.” In 29th International Conference on Machine Learning.

Bown, and Lexer. 2006. “Continuous-Time Recurrent Neural Networks for Generative and Interactive Musical Performance.” In Applications of Evolutionary Computing. Lecture Notes in Computer Science 3907.

Briot, and Pachet. 2020. “Deep Learning for Music Generation: Challenges and Directions.” Neural Computing and Applications.

Dieleman, and Schrauwen. 2014. “End to End Learning for Music Audio.” In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Luo, Chen, Hershey, et al. 2016. “Deep Clustering and Conventional Networks for Music Separation: Stronger Together.” arXiv:1611.06265 [Cs, Stat].

Sarroff, and Casey. 2014. “Musical Audio Synthesis Using Autoencoding Neural Nets.” In.

Sigtia, Benetos, Boulanger-Lewandowski, et al. 2015. “A Hybrid Recurrent Neural Network for Music Transcription.” In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Sturm, Ben-Tal, Monaghan, et al. 2018. “Machine Learning Research That Matters for Music Creation: A Case Study.” Journal of New Music Research.

Sun, Liu, Zhang, et al. 2016. “Composing Music with Grammar Argumented Neural Networks and Note-Level Encoding.” arXiv:1611.05416 [Cs].

Walder. 2016a. “Modelling Symbolic Music: Beyond the Piano Roll.” arXiv:1606.01368 [Cs].

———. 2016b. “Symbolic Music Data Version 1.0.” arXiv:1606.02542 [Cs].

Wyse. 2017. “Audio Spectrogram Representations for Processing with Convolutional Neural Networks.” In Proceedings of the First International Conference on Deep Learning and Music, Anchorage, US, May, 2017 (arXiv:1706.08675v1 [Cs.NE]).

Yu, and Varshney. 2017. “Towards Deep Interpretability (MUS-ROVER II): Learning Hierarchical Representations of Tonal Music.” In Proceedings of International Conference on Learning Representations (ICLR) 2017.

Zukowski, and Carr. 2017. “Generating Black Metal and Math Rock: Beyond Bach, Beethoven, and Beatles.” In 31st Conference on Neural Information Processing Systems (NIPS 2017).