Python’s audio analysis toolkit is impressive; see machine listening. However, its synthesis is mediocre. Which is not to say terrible. Although I might, later, say that, when I am alone. Nonetheles, in a pinch, you can synthesize audio and even, if you struggle, produce real-time audio.
DIY, bareback non-realtime audio
Tedious, but do-able. The best example I know is paulstretch, which creates phase vocoder using raw FFT, reasonably compactly.
Not quite trivial.
Here is a comparison of options. Summary:
If you want to read MP3, audioread is simple and easy, but it
breaks in opaque ways when you use it concurrently
and has crappy error handling (everything is
This is the
if you want to load everything and do it really fast, but have a tricky time with trapping the cause of errors, you can invoke ffmpeg from python, which is very fast and does various FX processing for free. This is what I now do.
pippi is idiosyncratic but seems to do synthesis quick-n-easy with reasonable optimisation using cython. The documentation and packaging are a mess, though. I think it might even do real time? Heavily developed.
brew install libsndfile pip install pippi
Amen has strong machine-listening tools, integrating
but its effects are weak sauce -
just cutting and pasting audio around in time with nice crossfades.
It does cute re-edits, but nothing else.
The idea of keeping the analysis metadata attached to the samples is nice. I wonder if it is useful in practice.
brew install libsndfile1 libav-tools pip install amen
pydub has a lots of audio DSP, effects and editing procedures, but only basic audio analysis. Also, weirdly, it aims to be pure python (i.e. no numpy) which makes some things embarrassingly slow, and means there is a lot of re-implementing numpy. So it runs everywhere, but not great anywhere.
Almost like offline audio, but your work is never finished. This is difficult in python, but not impossible.
pyo (github) is a python audio processing framework by Olivier Bélanger. Supports python 2.7 and 3.5+. It wants to run a wxPython gui, which is its own kind of inconvenience in turn, as it conscripts you into a toolkit war. Nonetheless it can do some neat stuff, and wxPython GUI is pretty good, so if you don’t mind a mildly opinionated library this is a nice one to work with. It’s a one-man shop, indicating very impressive productivity on the part of its creator. This guy has more or less reimplemented the supercollider scsynth infrastructure.
It claims to be
a Python module written in C to help DSP script creation. Pyo contains classes for a wide variety of audio signal processing. With pyo, the user will be able to include signal processing chains directly in Python scripts or projects, and to manipulate them in real time through the interpreter. Tools in the pyo module offer primitives, like mathematical operations on audio signals, basic signal processing (filters, delays, synthesis generators, etc.), but also complex algorithms to create sound granulation and other creative audio manipulations. pyo supports the OSC protocol (Open Sound Control) to ease communications between softwares, and the MIDI protocol for generating sound events and controlling process parameters. pyo allows the creation of sophisticated signal processing chains with all the benefits of a mature, and widely used, general programming language.:
Here is an example
>>> s = Server().boot() >>> s.start() >>> wav = SquareTable() >>> env = CosTable([(0,0), (100,1), (500,.3), (8191,0)]) >>> met = Metro(.125, 12).play() >>> amp = TrigEnv(met, table=env, dur=1, mul=.1) >>> pit = TrigXnoiseMidi( met, dist='loopseg', x1=20, scale=1, mrange=(48,84) ) >>> out = Osc(table=wav, freq=pit, mul=amp).out()
See also cecilia, a gui for pyo.
If you don’t want to use the default distribution by weird OS-specific installer packages which want to invade your system python installation, that is optional:
Here is the macOS version:
brew install liblo libsndfile portaudio portmidi --universal git clone https://github.com/belangeo/pyo.git cd pyo python setup.py install --use-coreaudio --use-double
Note that you might still have to deal with some wxPython weirdness on macOS.
csound supports python – specifically embedding of and within python.
FoxDot runs actual
supercollider scripts from python.
(As opposed to
pyo, which implements a synthesis server that
looks like supercollider but is not.)
It comes with an IDE, which is a waste of time IMO, I do not need to add another IDE to the pile I already have.
There are other worthwhile tricks, including a scheduler which is painful to do yourself.
It does not do everything.
Rather than claiming to be a universal solution to audio,
it’s a righteous hack that does some startling things very well and some other
things not at all.
A good start, creatively speaking.
audiolazy, by Danilo de Jesus da Silva Bellini, looks great for technical audio analysis and synthesis, although a bit clunky for, you know, musical synths. Intermittently updated.
Prioritizing code expressiveness, clarity and simplicity, without precluding the lazy evaluation, and aiming to be used together with Numpy, Scipy and Matplotlib as well as default Python structures like lists and generators, AudioLazy is a package written in pure Python proposing digital audio signal processing (DSP), featuring:
A Stream class for finite and endless signals representation with elementwise operators (auto-broadcast with non-iterables) in a common Python iterable container accepting heterogeneous data;
Strongly sample-based representation (Stream class) with easy conversion to block representation using the Stream.blocks(size, hop) method;
Sample-based interactive processing with ControlStream;
Streamix mixer for iterables given their starting time deltas;
Multi-thread audio I/O integration with PyAudio;
Linear filtering with Z-transform filters directly as equations (e.g.
filt = 1 / (1 - .3 * z ** -1)), including linear time variant filters (i.e., the a in
a * z ** kcan be a Stream instance), cascade filters (behaves as a list of filters), resonators, etc.. Each LinearFilter instance is compiled just in time when called;
Zeros and poles plots and frequency response plotting integration with MatPlotLib;
Linear Predictive Coding (LPC) directly to
ZFilterinstances, from which you can find PARCOR coeffs and LSFs;
Both sample-based (e.g., zero-cross rate, envelope, moving average, clipping, unwrapping) and block-based (e.g., window functions, DFT, autocorrelation, lag matrix) analysis and processing tools;
A simple synthesizer (Table lookup, Karplus-Strong) with processing tools (Linear ADSR envelope, fade in/out, fixed duration line stream) and basic wave data generation (sinusoid, white noise, impulse);
Biological auditory periphery modeling (ERB and gammatone filter models);
Multiple implementation organization as
StrategyDictinstances: callable dictionaries that allows the same name to have several different implementations (e.g. erb, gammatone, lowpass, resonator, lpc, window);
Converters among MIDI pitch numbers, strings like “F#4” and frequencies;
PyAudio provides Python bindings for PortAudio, the cross-platform audio I/O library. With PyAudio, you can easily use Python to play and record audio on a variety of platforms. PyAudio is inspired by:
pyPortAudio/fastaudio: Python bindings for PortAudio v18 API.
tkSnack: cross-platform sound toolkit for Tcl/Tk and Python.
Real time MIDI
Mido, iirc. TBC.
Glover, John C., Victor Lazzarini, and Joseph Timoney. 2009. “Simpl: A Python Library for Sinusoidal Modelling.” In DAFx 09 Proceedings of the 12th International Conference on Digital Audio Effects, Politecnico Di Milano, Como Campus, Sept. 1-4, Como, Italy, 1–4. Dept. of Electronic Engineering, Queen Mary Univ. of London, http://eprints.maynoothuniversity.ie/2337.
Morise, Masanori. 2016. “D4C, a Band-Aperiodicity Estimator for High-Quality Speech Synthesis.” Speech Commun. 84 (C): 57–65. https://doi.org/10.1016/j.specom.2016.09.001.
Morise, Masanori, Fumiya Yokomori, and Kenji Ozawa. 2016. “WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications.” IEICE Transactions on Information and Systems E99.D (7): 1877–84. https://doi.org/10.1587/transinf.2015EDP7457.