Here’s how I would do art with machine learning if I had to

I’ve a weakness for ideas that give me plausible deniability for making generative art while doing my maths homework.

Quasimondo: so do you

This page is more chaotic than the already-chaotic median, sorry. Good luck making sense of it.

See also analysis/resynthesis.

See gesture recognition. Oh and also google’s AMI channel, and ml4artists, which has some sweet machine learning for artists topic guides.

Many neural networks, are generative in the sense that even if you train ’em to classify things, they can also predict new members of the class. e.g. run the model forwards, it recognizes melodies; run it “backwards”, it composes melodies. Or rather, you maybe trained them to generate examples in the course of training them to detect examples.

There are many definitional and practical wrinkles here, and this quality is not unique to artificial neural networks, but it is a great convenience, and the gods of machine learning have blessed us with much infrastructure to exploit this feature, because it is very close to actual profitable algorithms. Upshot: There is now a lot of computation and grad student labour directed at producing neural networks which as a byproduct can produce faces, chairs, film dialogue, symphonies and so on.

There are NIPS streams about this now.


Some as-yet-unfiled neural-artwork links I should think about.

  • So simple it’s cute, CPPNs are probably what Jonathan McCabe has been producing for years.
  • IGAN, iGAN: Interactive Image Generation via Generative Adversarial Networks
  • interpolating style transfer.
  • neurogram is a compact semi-untrained neural network image synthesis-in-the-browser
  • Adversarial generation is a cool hack if you hate boring stuff like labelling data sets e.g. chair generation
  • Autoencoding beyond pixels using a learned similarity metric (Larsen et al. 2015) code The clever hack here is the “generative adversarial networks”

Text synthesis

Visual synthesis

See those classic images from google’s tripped-out image recognition systems) or Gatys, Ecker and Bethge’s deep art Neural networks do a passable undergraduate Monet.

Here’s Frank Liu’s implementation of style transfer in pycaffe.

Alex Graves, Generating Sequences With Recurrent Neural Networks, generates handwriting. Relatedly, sketch-rnn is reaaaally cute.

Deep dreaming approaches are entertaining. (NSFW) Here’s a more pedestrian and slightly more informative version of that. has some lovely visual explanations of visual and other neural networks:

  • Experiments in Handwriting with a Neural Network

  • Deconvolution and Checkerboard Artifacts

  • How to Use t-SNE Effectively

  • Attention and Augmented Recurrent Neural Networks

  • hardmaru presents an amazing introduction to running sophisticated neural networks in the browser, targeted at artists, which goes over the handwriting post in a non-technical way.

  • progressive_growing_of_gans is neat, generating infinite celebrities at high resolution. (Karras et al. 2017)


    a platform for creators of all kinds to use machine learning tools in intuitive ways without any coding experience. Find resources here to start creating with RunwayML quickly.

    In particular it plugs into Blender and Photoshop and allows you to use those programs as a UI for ML-backed algorithms. Nice.


Symbolic composition via scores/MIDI/etc

Seems like it should be easy, until you think about it.

Related: Arpeggiate by numbers which discusses music-theory.

Google has weighed in, like a gorilla on the metallophone, to do midi composition with Tensorflow as part of their Magenta project. Their NIPS 2016 demo won the best demo prize.

There is some useful infrastructure:

pypianoroll for piano rolls music21 formidi scores respective and MIDO for live midi.

Daniel Johnson has a convolutional and recurrent architecture for taking into account multiple types of dependency in music, which he calls biaxial neural network Zhe LI, Composing Music With Recurrent Neural Networks.

Ji-Sung Kim’s deepjazz project is minimal, but does interesting jazz improvisations. Part of the genius here is choosing totally chaotic music to try to ape, so you can ape it chaotically. (Code)

Boulanger-Lewandowski, (code and data) for (Boulanger-Lewandowski, Bengio, and Vincent 2012)’s recurrent neural network composition using python/Theano. Christian Walder leads a project which shares some roots with that. (Walder 2016a, 2016b) Bob Sturm’s FolkRNN does a related thing, but ingeniously redefines the problem by focussing on folk tune notation.

A tutorial on generating music using Restricted Boltzmann Machines for the conditional random field density, and an RNN for the time dependence after (Boulanger-Lewandowski, Bengio, and Vincent 2012)

Bob Sturm did a good one

🏗 google’s latest demo in this area was popular. Deep Bach code) seems to be doing a related thing. Similar sets of authors have some other related work):

Modeling polyphonic music is a particularly challenging task because of the intricate interplay between melody and harmony. A good model should satisfy three requirements: statistical accuracy (capturing faithfully the statistics of correlations at various ranges, horizontally and vertically), flexibility (coping with arbitrary user constraints), and generalization capacity (inventing new material, while staying in the style of the training corpus). Models proposed so far fail on at least one of these requirements. We propose a statistical model of polyphonic music, based on the maximum entropy principle. This model is able to learn and reproduce pairwise statistics between neighboring note events in a given corpus. The model is also able to invent new chords and to harmonize unknown melodies. We evaluate the invention capacity of the model by assessing the amount of cited, re-discovered, and invented chords on a corpus of Bach chorales. We discuss how the model enables the user to specify and enforce user-defined constraints, which makes it useful for style-based, interactive music generation.

  • Evan Chow represents for team non-deep-learning with jazzml:

    Computer jazz improvisation powered by machine learning, specifically trigram modeling, K-Means clustering, and chord inference with SVMs.

Charles Martin’s Creative Predictions:

Creative Prediction is about applying predictive machine learning models to creative data. The focus is on recurrent neural networks (RNNs), deep learning models that can be used to generate sequential and temporal data. RNNs can be applied to many kinds of creative data including text and music. They can learn the long-range structure from a corpus of data and “create” new sequences by predicting one element at a time. When embedded in a creative interface, they can be used for “predictive interaction” where a human collaborates with, influences, and is influenced by a generative neural network.

Audio synthesis

See analysis/resynthesis, voice fakes.

Boulanger-Lewandowski, Nicolas, Yoshua Bengio, and Pascal Vincent. 2012. “Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription.” In 29th International Conference on Machine Learning.

Bown, Oliver, and Sebastian Lexer. 2006. “Continuous-Time Recurrent Neural Networks for Generative and Interactive Musical Performance.” In Applications of Evolutionary Computing, edited by Franz Rothlauf, Jürgen Branke, Stefano Cagnoni, Ernesto Costa, Carlos Cotta, Rolf Drechsler, Evelyne Lutton, et al., 652–63. Lecture Notes in Computer Science 3907. Springer Berlin Heidelberg.

Champandard, Alex J. 2016. “Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks,” March.

Denton, Emily, Soumith Chintala, Arthur Szlam, and Rob Fergus. 2015. “Deep Generative Image Models Using a Laplacian Pyramid of Adversarial Networks,” June.

Dieleman, Sander, and Benjamin Schrauwen. 2014. “End to End Learning for Music Audio.” In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6964–8. IEEE.

Dosovitskiy, Alexey, Jost Tobias Springenberg, Maxim Tatarchenko, and Thomas Brox. 2014. “Learning to Generate Chairs, Tables and Cars with Convolutional Networks,” November.

Dumoulin, Vincent, Jonathon Shlens, and Manjunath Kudlur. 2016. “A Learned Representation for Artistic Style,” October.

Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. 2015. “A Neural Algorithm of Artistic Style,” August.

Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. 2014. “Explaining and Harnessing Adversarial Examples,” December.

Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. “Generative Adversarial Nets.” In Advances in Neural Information Processing Systems 27, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 2672–80. NIPS’14. Cambridge, MA, USA: Curran Associates, Inc.

Gregor, Karol, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. “DRAW: A Recurrent Neural Network for Image Generation,” February.

Gregor, Karol, and Yann LeCun. 2010. “Learning Fast Approximations of Sparse Coding.” In Proceedings of the 27th International Conference on Machine Learning (ICML-10), 399–406.

———. 2011. “Efficient Learning of Sparse Invariant Representations,” May.

Grosse, Roger, Ruslan R. Salakhutdinov, William T. Freeman, and Joshua B. Tenenbaum. 2012. “Exploiting Compositionality to Explore a Large Space of Model Structures.” In Proceedings of the Conference on Uncertainty in Artificial Intelligence.

He, Kun, Yan Wang, and John Hopcroft. 2016. “A Powerful Generative Model Using Random Weights for the Deep Image Representation.” In Advances in Neural Information Processing Systems.

Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. 2006. “Reducing the Dimensionality of Data with Neural Networks.” Science 313 (5786): 504–7.

Jetchev, Nikolay, Urs Bergmann, and Roland Vollgraf. 2016. “Texture Synthesis with Spatial Generative Adversarial Networks.” In Advances in Neural Information Processing Systems 29.

Jing, Yongcheng, Yezhou Yang, Zunlei Feng, Jingwen Ye, and Mingli Song. 2017. “Neural Style Transfer: A Review,” May.

Johnson, Justin, Alexandre Alahi, and Li Fei-Fei. 2016. “Perceptual Losses for Real-Time Style Transfer and Super-Resolution,” March.

Karras, Tero, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. “Progressive Growing of GANs for Improved Quality, Stability, and Variation.” In Proceedings of ICLR.

Karras, Tero, Samuli Laine, and Timo Aila. 2018. “A Style-Based Generator Architecture for Generative Adversarial Networks,” December.

Larsen, Anders Boesen Lindbo, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. 2015. “Autoencoding Beyond Pixels Using a Learned Similarity Metric,” December.

Lazaridou, Angeliki, Dat Tien Nguyen, Raffaella Bernardi, and Marco Baroni. 2015. “Unveiling the Dreams of Word Embeddings: Towards Language-Driven Image Generation,” June.

Li, Yanghao, Naiyan Wang, Jiaying Liu, and Xiaodi Hou. 2017. “Demystifying Neural Style Transfer.” In IJCAI.

Luo, Yi, Zhuo Chen, John R. Hershey, Jonathan Le Roux, and Nima Mesgarani. 2016. “Deep Clustering and Conventional Networks for Music Separation: Stronger Together,” November.

Malmi, Eric, Pyry Takala, Hannu Toivonen, Tapani Raiko, and Aristides Gionis. 2016. “DopeLearning: A Computational Approach to Rap Lyrics Generation,” 195–204.

Mital, Parag K. 2017. “Time Domain Neural Audio Style Transfer,” November.

Mnih, Andriy, and Karol Gregor. 2014. “Neural Variational Inference and Learning in Belief Networks.” In Proceedings of the 31st International Conference on Machine Learning.

Olah, Chris, Alexander Mordvintsev, and Ludwig Schubert. 2017. “Feature Visualization.” Distill 2 (11): e7.

Oord, Aäron van den. 2016. “Wavenet: A Generative Model for Raw Audio.”

Oord, Aäron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. 2016. “Pixel Recurrent Neural Networks,” January.

Oord, Aäron van den, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. 2016. “Conditional Image Generation with PixelCNN Decoders,” June.

Sarroff, Andy M., and Michael Casey. 2014. “Musical Audio Synthesis Using Autoencoding Neural Nets.” In. Ann Arbor, MI: Michigan Publishing, University of Michigan Library.

Sigtia, Siddharth, Emmanouil Benetos, Nicolas Boulanger-Lewandowski, Tillman Weyde, Artur S. d’Avila Garcez, and Simon Dixon. 2015. “A Hybrid Recurrent Neural Network for Music Transcription.” In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2061–5. IEEE.

Smith, Evan C., and Michael S. Lewicki. 2006. “Efficient Auditory Coding.” Nature 439 (7079): 978–82.

Sturm, Bob L., Oded Ben-Tal, Úna Monaghan, Nick Collins, Dorien Herremans, Elaine Chew, Gaëtan Hadjeres, Emmanuel Deruty, and François Pachet. 2018. “Machine Learning Research That Matters for Music Creation: A Case Study.” Journal of New Music Research 0 (0): 1–20.

Sun, Zheng, Jiaqi Liu, Zewang Zhang, Jingwen Chen, Zhao Huo, Ching Hua Lee, and Xiao Zhang. 2016. “Composing Music with Grammar Argumented Neural Networks and Note-Level Encoding,” November.

Theis, Lucas, and Matthias Bethge. 2015. “Generative Image Modeling Using Spatial LSTMs,” June.

Ulyanov, Dmitry, Andrea Vedaldi, and Victor Lempitsky. 2016. “Instance Normalization: The Missing Ingredient for Fast Stylization,” July.

———. 2017. “Improved Texture Networks: Maximizing Quality and Diversity in Feed-Forward Stylization and Texture Synthesis,” January.

Walder, Christian. 2016a. “Modelling Symbolic Music: Beyond the Piano Roll,” June.

———. 2016b. “Symbolic Music Data Version 1.0,” June.

Wu, Qi, Chunhua Shen, Anton van den Hengel, Lingqiao Liu, and Anthony Dick. 2015. “What Value High Level Concepts in Vision to Language Problems?” June.

Wyse, L. 2017. “Audio Spectrogram Representations for Processing with Convolutional Neural Networks.” In Proceedings of the First International Conference on Deep Learning and Music, Anchorage, US, May, 2017 (arXiv:1706.08675v1 [cs.NE]).

Yu, D., and L. Deng. 2011. “Deep Learning and Its Applications to Signal and Information Processing [Exploratory DSP].” IEEE Signal Processing Magazine 28 (1): 145–54.

Yu, Haizi, and Lav R. Varshney. 2017. “Towards Deep Interpretability (MUS-ROVER II): Learning Hierarchical Representations of Tonal Music.” In Proceedings of International Conference on Learning Representations (ICLR) 2017.

Zhu, Jun-Yan, Philipp Krähenbühl, Eli Shechtman, and Alexei A. Efros. 2016. “Generative Visual Manipulation on the Natural Image Manifold.” In Proceedings of European Conference on Computer Vision.

Zukowski, Zack, and Cj Carr. 2017. “Generating Black Metal and Math Rock: Beyond Bach, Beethoven, and Beatles.” In 31st Conference on Neural Information Processing Systems (NIPS 2017).