I have a lot of feelings and ideas about this, but no time to write them down. For now, here are some links and ideas by other people.
Sander Dielemann on waveform-domain neural synthesis. Matt Vitelli on music generation from MP3s (source). Alex Graves on RNN predictive synthesis. Parag Mittal on RNN style transfer.
I’m not massively into spectral-domain synthesis because I think the stationarity assumption is a bit of a stretch (heh). Very much into raw audio me.
generative art using diffusion models, much like the diffusion image synthesis page.
- AudioGen: Textually Guided Audio Generation
- diffusion_models/diffusion_03_waveform.ipynb at main · acids-ircam/diffusion_models
- huggingface/diffusers: 🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
- archinetai/audio-diffusion-pytorch: Audio generation using diffusion models, in PyTorch.
(Chen et al. 2020; Goel et al. 2022; Hernandez-Olivan, Hernandez-Olivan, and Beltran 2022; Kreuk, Taigman, et al. 2022; Kreuk, Synnaeve, et al. 2022; Lee and Han 2021; Pascual et al. 2022; von Platen et al. 2022)
This is a really fun idea — do audio processing as normal, but using an NN framework so that the operations are differentiable.
Project site. Github. Twitter intro. Paper. Online supplement. Timbre transfer example. Tutorials.
Pixelrnn turns out to be good at music Dadabots have successfully weaponised samplernn and it’s cute.
Open AI Jukebox is the latest hot generative music thing that I should be across.
Existing generative models for audio have predominantly aimed to directly model time-domain waveforms. MelNet instead aims to model the frequency content of an audio signal. MelNet can be used to model audio unconditionally, making it capable of tasks such as music generation. It can also be conditioned on text and speaker, making it applicable to tasks such as text-to-speech and voice conversion.
Jlin and Holly Herndon show off some artistic use of messed-up neural nets.
Hung-yi Lee and Yu Tsao, Generative Adversarial nets for DSP.
What is Loris?
Soundtracking audio from video.
Andy Sarrof, Musical Audio Synthesis Using Autoencoding Neural Nets. (code)
No comments yet. Why not leave one?