Audio synthesis in python

Sometimes it is the right time to use the wrong tool for the job

March 25, 2018 — November 6, 2019

computers are awful
generative art

Python’s audio analysis toolkit is impressive; see machine listening. However, its synthesis is mediocre. Which is not to say terrible. Although I might, later, say that, when I am alone. Nonetheless, in a pinch, one can synthesize audio and even, with a little struggle, produce real-time audio.

1 DIY, bareback non-realtime audio

Tedious, but do-able. The best example I know is paulstretch, which creates phase vocoder using raw FFT, reasonably compactly.

1.1 Resampling

Not quite trivial.

Here is a comparison of options. Summary:

  • resampy is fast and works on python3, and is OK
  • NNresample optimises builtin scipy resampling for quality in audio
  • scikit-samplerate is hifi but infrequently maintained and YMMV with the external c-code dependencies.
  • if you are loading a file often the file loader can do sample rate conversion

1.2 Reading files

If you want to read MP3, audioread is simple and easy, but it breaks in opaque ways when you use it concurrently and has crappy error handling (everything is NoBackendError). This is the librosa system.

If you don’t care about MP3 then SoundFile does the job, but it is hard to compile.

if you want to load everything and do it really fast, but have a tricky time with trapping the cause of errors, you can invoke ffmpeg from python, which is fast and does various FX processing for free. This is what I now do.

2 Offline editing/FX

Notable non-abandoned projects include pydub, amen and pippi.

2.1 Pippi

pippi is idiosyncratic but seems to do synthesis quick-n-easy with reasonable optimisation using cython. The documentation and packaging are a mess, though. I think it might even do real time? Heavily developed.

brew install libsndfile
pip install pippi

2.2 Amen

Amen has strong machine-listening tools, integrating librosa, but its effects are weak sauce — just cutting and pasting audio around in time with nice crossfades. It does cute re-edits, but nothing else. The idea of keeping the analysis metadata attached to the samples is nice. I wonder if it is useful in practice.

brew install libsndfile1 libav-tools
pip install amen

2.3 pydub

pydub has a lots of audio DSP, effects and editing procedures, but only basic audio analysis. Also, weirdly, it aims to be pure python (i.e. no numpy) which makes some things embarrassingly slow, and means there is a lot of re-implementing numpy. So it runs everywhere, but not great anywhere.

2.4 PyWorld Vocoder

More specialised: pyworld is a python wrapper for a speech-specific anlaysis-resynthesis method WORLD, by Masanori Morise. This will really only work on things that are very much like solo human voice.

3 Real time

Almost like offline audio, but your work is never finished. This is difficult in python, but not impossible.

3.1 SignalFlow


SignalFlow is an audio DSP framework whose goal is to make it quick and intuitive to explore complex sonic ideas. It has a simple and consistent Python API, allowing for rapid prototyping in Jupyter, PyCharm, or on the command-line. It comes with over 100 of built-in node classes for creative exploration.

Its core is implemented in C++11, with cross-platform hardware acceleration.

3.2 Pyo

pyo (github) is a python audio processing framework by Olivier Bélanger. Supports python 2.7 and 3.5+. It wants to run a wxPython gui, which is its own kind of inconvenience in turn, as it conscripts you into a toolkit war. Nonetheless it can do some neat stuff, and wxPython GUI is pretty good, so if you don’t mind a mildly opinionated library this is a nice one to work with. It’s a one-man shop, indicating impressive productivity on the part of its creator. This guy has more or less reimplemented the supercollider scsynth infrastructure.

It claims to be

a Python module written in C to help DSP script creation. Pyo contains classes for a wide variety of audio signal processing. With pyo, the user will be able to include signal processing chains directly in Python scripts or projects, and to manipulate them in real time through the interpreter. Tools in the pyo module offer primitives, like mathematical operations on audio signals, basic signal processing (filters, delays, synthesis generators, etc.), but also complex algorithms to create sound granulation and other creative audio manipulations. pyo supports the OSC protocol (Open Sound Control) to ease communications between softwares, and the MIDI protocol for generating sound events and controlling process parameters. pyo allows the creation of sophisticated signal processing chains with all the benefits of a mature, and widely used, general programming language.:


>>> s = Server().boot()
>>> s.start()
>>> wav = SquareTable()
>>> env = CosTable([(0,0), (100,1), (500,.3), (8191,0)])
>>> met = Metro(.125, 12).play()
>>> amp = TrigEnv(met, table=env, dur=1, mul=.1)
>>> pit = TrigXnoiseMidi(
>>> out = Osc(table=wav, freq=pit, mul=amp).out()

See also cecilia, a gui for pyo.

If you don’t want to use the default distribution by weird OS-specific installer packages which want to invade your system python installation, that is optional:

Here is the macOS version:

brew install liblo libsndfile portaudio portmidi --universal
git clone
cd pyo
python install --use-coreaudio --use-double

Note that you might still have to deal with some wxPython weirdness on macOS.

3.3 csound

csound supports python – specifically embedding of and within python.

3.4 Foxdot

FoxDot runs actual supercollider scripts from python. (As opposed to pyo, which implements a synthesis server that looks like supercollider but is not.) It comes with an IDE, which is a waste of time IMO, I do not need to add another IDE to the pile I already have. There are other worthwhile tricks, including a scheduler which is painful to do yourself. It does not do everything. Rather than claiming to be a universal solution to audio, it’s a righteous hack that does some startling things well and some other things not at all. A good start, creatively speaking.

3.5 GStreamer

Gstreamer is a generic multimedia pipeline library that seems to pop up in many open source projects. It happens to have extensive python support. See Brett Virren’s tutorial.

3.6 Audiolazy

audiolazy, by Danilo de Jesus da Silva Bellini, looks great for technical audio analysis and synthesis, although a bit clunky for, you know, musical synths. Intermittently updated.

Prioritizing code expressiveness, clarity and simplicity, without precluding the lazy evaluation, and aiming to be used together with Numpy, Scipy and Matplotlib as well as default Python structures like lists and generators, AudioLazy is a package written in pure Python proposing digital audio signal processing (DSP), featuring:

  • A Stream class for finite and endless signals representation with elementwise operators (auto-broadcast with non-iterables) in a common Python iterable container accepting heterogeneous data;
  • Strongly sample-based representation (Stream class) with easy conversion to block representation using the Stream.blocks(size, hop) method;
  • Sample-based interactive processing with ControlStream;
  • Streamix mixer for iterables given their starting time deltas;
  • Multi-thread audio I/O integration with PyAudio;
  • Linear filtering with Z-transform filters directly as equations (e.g. filt = 1 / (1 - .3 * z ** -1)), including linear time variant filters (i.e., the a in a * z ** k can be a Stream instance), cascade filters (behaves as a list of filters), resonators, etc. Each LinearFilter instance is compiled just in time when called;
  • Zeros and poles plots and frequency response plotting integration with MatPlotLib;
  • Linear Predictive Coding (LPC) directly to ZFilter instances, from which you can find PARCOR coeffs and LSFs;
  • Both sample-based (e.g., zero-cross rate, envelope, moving average, clipping, unwrapping) and block-based (e.g., window functions, DFT, autocorrelation, lag matrix) analysis and processing tools;
  • A simple synthesizer (Table lookup, Karplus-Strong) with processing tools (Linear ADSR envelope, fade in/out, fixed duration line stream) and basic wave data generation (sinusoid, white noise, impulse);
  • Biological auditory periphery modeling (ERB and gammatone filter models);
  • Multiple implementation organization as StrategyDict instances: callable dictionaries that allows the same name to have several different implementations (e.g. erb, gammatone, lowpass, resonator, lpc, window);
  • Converters among MIDI pitch numbers, strings like “F#4” and frequencies;

3.7 GNURadio

Surprisingly, GNURadio supports extensive optimized DSP for python using a high-performance compiled real time dataflow graph.

3.8 Outsourcing:

  • see LiveOSC, under scripting Live for outsourcing your sound to Ableton live
  • You can control supercollider from python like this
  • render audio using midi2audio a minimalist wrapper for fluidsynth, which renders midi using Soundfonts.

3.9 PyAudio

PyAudio provides Python bindings for PortAudio, the cross-platform audio I/O library. With PyAudio, you can easily use Python to play and record audio on a variety of platforms. PyAudio is inspired by:

pyPortAudio/fastaudio: Python bindings for PortAudio v18 API.

tkSnack: cross-platform sound toolkit for Tcl/Tk and Python.

4 Real time MIDI

Mido, iirc. TBC.

5 Incoming

6 References

Glover, Lazzarini, and Timoney. 2009. Simpl: A Python Library for Sinusoidal Modelling.” In DAFx 09 Proceedings of the 12th International Conference on Digital Audio Effects, Politecnico Di Milano, Como Campus, Sept. 1-4, Como, Italy.
Morise. 2016. D4C, a Band-Aperiodicity Estimator for High-Quality Speech Synthesis.” Speech Commun.
Morise, Yokomori, and Ozawa. 2016. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications.” IEICE Transactions on Information and Systems.