Generative art with language+diffusion models
also some autoregressive models
2022-09-16 — 2026-05-05
Wherein the Principal Platforms for Model Discovery Are Contrasted — Hugging Face for Research Pipelines, CivitAI for Community Fine-Tunes — and the Fracture Between Corporate and Open Ecosystems Is Noted
Generative art using modern diffusion-backed image generators. The name-brand models are DALL-E 2, Stable Diffusion, Midjourney etc, which are diffusion models for image generation + transformer models for the text-to-image part.
This page is about image generation — prompt to image, with a focus on the models and the model ecosystems. For editing existing images with ML — instruction-following editors (FLUX.1 Kontext, Qwen-Image-Edit, Nano Banana, etc.), background removal, upscaling, inpainting — see editing images with machine learning. For the front-end software that runs these models locally (ComfyUI, InvokeAI, Draw Things, DiffusionBee, ChaiNNer, …) see front-end clients for AI image models. For the community back-story behind Stable Diffusion, Black Forest Labs and the open-vs-corporate fault line, see AI democratization.
I’m interested in this in general. I am especially practically interested in models that
- work on macOS
- on my local machine (i.e. use my local GPU)
A method that lets me use or train my own model is especially interesting. I like using the community-trained models for specialisation or jailbreaking. As with many other parts of AI, the community is incredible.
For audio stuff, see music diffusion.
1 Theory
For the math, Neural denoising diffusion models is the canonical home; the pre-diffusion lineage (DeepDream, GANs, CPPNs) is its own page.
Some pointers for image-diffusion specifically:
- Geometry in Text-to-Image Diffusion Models
- The Annotated Diffusion Model
- Denoising Diffusion Restoration Models
- Google AI Blog: High Fidelity Image Generation Using Diffusion Models
Interestingly, there is a move to leave diffusion behind in favour of autoregressive models — see e.g. Alpha-VLLM/Lumina-mGPT-2.0.
2 Where to find generation models
Hugging Face is the heavy-hitter in neural networks generally and hosts most of the foundation diffusion models. Art diffusion models additionally have the specialised CivitAI, which is where the long tail of community fine-tunes — LoRAs, textual inversions, aesthetic gradients — actually lives. For the back-story of either, see AI democratization.
| Feature | Hugging Face | CivitAI |
|---|---|---|
| Focus | Research-first platform (300,000+ models) | Community-driven artistic hub |
| Model Types | Stable Diffusion variants, ControlNet, LoRAs | Artistic models (anime, photorealistic, 3D), fine-tuned LoRAs |
| Discovery | Organized by pipeline tags and metrics | Visual browsing with instant output previews |
| Documentation | Comprehensive model cards with bias analysis | User-generated examples and prompt sharing |
| Community | Academic and ML practitioner oriented | Artist and creator focused |
| Integration | Native PyTorch/TensorFlow support, diffusers library |
Simple download format for GUIs like DiffusionBee |
| Content Policy | Stricter content guidelines | More permissive with NSFW filters |
| Traffic | Research-focused userbase | 25M+ monthly visits, 500+ new models daily |
| Ecosystem | Central to ML research and deployment | Popular for artistic workflows and style training |
Most macOS clients (DiffusionBee, Mochi Diffusion, Draw Things, …) import from both ecosystems.
3 Notable model lineages
The model landscape is fractured between corporate offerings (Flux, DALL-E 3, Midjourney — polish and ease, but advanced features behind APIs or subscriptions) and community-trained ones (SDXL, CivitAI LoRAs — customisation and local control, steeper learning curves). The full back-story of how this fracture happened — Stability AI’s 2024 splinter, the founding of Black Forest Labs, the Flux “dev” / “pro” split — is at AI democratization.
A few contenders I don’t know much about but want to track:
- Ideogram (ex-Google Imagen team): masters text-in-images and typography.
- PixArt-Σ (Tencent): balances speed and photorealism for commercial workflows.
- CogView-3 (BAAI): favoured for industrial design prototyping.
4 How do I download and use that cool model I found?
Hugging Face:
- Download models via
git lfsand place in~/Documents/DiffusionBee/models. - Use
diffuserspipelines for custom workflows in ComfyUI. - extensive conversions available
- Download models via
CivitAI:
- Directly import
.safetensors/LoRAs into Draw Things or DiffusionBee. - Filter models by “macOS-optimized” tags for CoreML/MLX compatibility.
- Directly import

