Generative art with language+diffusion models

also some autoregressive models

2022-09-16 — 2026-05-05

Wherein the Principal Platforms for Model Discovery Are Contrasted — Hugging Face for Research Pipelines, CivitAI for Community Fine-Tunes — and the Fracture Between Corporate and Open Ecosystems Is Noted

buzzword
computers are awful
generative art
machine learning
making things
music
neural nets
photon choreography
UI
Figure 1

Generative art using modern diffusion-backed image generators. The name-brand models are DALL-E 2, Stable Diffusion, Midjourney etc, which are diffusion models for image generation + transformer models for the text-to-image part.

This page is about image generation — prompt to image, with a focus on the models and the model ecosystems. For editing existing images with ML — instruction-following editors (FLUX.1 Kontext, Qwen-Image-Edit, Nano Banana, etc.), background removal, upscaling, inpainting — see editing images with machine learning. For the front-end software that runs these models locally (ComfyUI, InvokeAI, Draw Things, DiffusionBee, ChaiNNer, …) see front-end clients for AI image models. For the community back-story behind Stable Diffusion, Black Forest Labs and the open-vs-corporate fault line, see AI democratization.

I’m interested in this in general. I am especially practically interested in models that

  1. work on macOS
  2. on my local machine (i.e. use my local GPU)

A method that lets me use or train my own model is especially interesting. I like using the community-trained models for specialisation or jailbreaking. As with many other parts of AI, the community is incredible.

For audio stuff, see music diffusion.

1 Theory

For the math, Neural denoising diffusion models is the canonical home; the pre-diffusion lineage (DeepDream, GANs, CPPNs) is its own page.

Some pointers for image-diffusion specifically:

Interestingly, there is a move to leave diffusion behind in favour of autoregressive models — see e.g. Alpha-VLLM/Lumina-mGPT-2.0.

2 Where to find generation models

Hugging Face is the heavy-hitter in neural networks generally and hosts most of the foundation diffusion models. Art diffusion models additionally have the specialised CivitAI, which is where the long tail of community fine-tunes — LoRAs, textual inversions, aesthetic gradients — actually lives. For the back-story of either, see AI democratization.

Feature Hugging Face CivitAI
Focus Research-first platform (300,000+ models) Community-driven artistic hub
Model Types Stable Diffusion variants, ControlNet, LoRAs Artistic models (anime, photorealistic, 3D), fine-tuned LoRAs
Discovery Organized by pipeline tags and metrics Visual browsing with instant output previews
Documentation Comprehensive model cards with bias analysis User-generated examples and prompt sharing
Community Academic and ML practitioner oriented Artist and creator focused
Integration Native PyTorch/TensorFlow support, diffusers library Simple download format for GUIs like DiffusionBee
Content Policy Stricter content guidelines More permissive with NSFW filters
Traffic Research-focused userbase 25M+ monthly visits, 500+ new models daily
Ecosystem Central to ML research and deployment Popular for artistic workflows and style training

Most macOS clients (DiffusionBee, Mochi Diffusion, Draw Things, …) import from both ecosystems.

3 Notable model lineages

Figure 2

The model landscape is fractured between corporate offerings (Flux, DALL-E 3, Midjourney — polish and ease, but advanced features behind APIs or subscriptions) and community-trained ones (SDXL, CivitAI LoRAs — customisation and local control, steeper learning curves). The full back-story of how this fracture happened — Stability AI’s 2024 splinter, the founding of Black Forest Labs, the Flux “dev” / “pro” split — is at AI democratization.

A few contenders I don’t know much about but want to track:

  • Ideogram (ex-Google Imagen team): masters text-in-images and typography.
  • PixArt-Σ (Tencent): balances speed and photorealism for commercial workflows.
  • CogView-3 (BAAI): favoured for industrial design prototyping.

4 How do I download and use that cool model I found?

  • Hugging Face:

    • Download models via git lfs and place in ~/Documents/DiffusionBee/models.
    • Use diffusers pipelines for custom workflows in ComfyUI.
    • extensive conversions available
  • CivitAI:

    • Directly import .safetensors/LoRAs into Draw Things or DiffusionBee.
    • Filter models by “macOS-optimized” tags for CoreML/MLX compatibility.

5 Model customisation and fine-tuning

6 Punditry

7 Incoming

8 References

Dhariwal, and Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis.” arXiv:2105.05233 [Cs, Stat].
Dutordoir, Saul, Ghahramani, et al. 2022. Neural Diffusion Processes.”
Han, Zheng, and Zhou. 2022. CARD: Classification and Regression Diffusion Models.”
Ho, Jain, and Abbeel. 2020. Denoising Diffusion Probabilistic Models.” In Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20.
Hoogeboom, Gritsenko, Bastings, et al. 2021. Autoregressive Diffusion Models.” arXiv:2110.02037 [Cs, Stat].
Nichol, and Dhariwal. 2021. Improved Denoising Diffusion Probabilistic Models.” In Proceedings of the 38th International Conference on Machine Learning.
Sohl-Dickstein, Weiss, Maheswaranathan, et al. 2015. Deep Unsupervised Learning Using Nonequilibrium Thermodynamics.”
Song, Yang, and Ermon. 2020a. Generative Modeling by Estimating Gradients of the Data Distribution.” In Advances In Neural Information Processing Systems.
———. 2020b. Improved Techniques for Training Score-Based Generative Models.” In Advances In Neural Information Processing Systems.
Song, Jiaming, Meng, and Ermon. 2021. Denoising Diffusion Implicit Models.” arXiv:2010.02502 [Cs].
von Platen, Patil, Lozhkov, et al. 2022. Diffusers: State-of-the-Art Diffusion Models.”
Yang, Zhang, Song, et al. 2023. Diffusion Models: A Comprehensive Survey of Methods and Applications.” ACM Computing Surveys.