Generative art with language+diffusion models

also some autoregressive models

2022-09-16 — 2025-03-11

buzzword

computers are awful

generative art

machine learning

Suspiciously similar content

Generative art using modern diffusion-backed image generators. The name-brand models are DALL-E 2, Stable Diffusion, Midjourney etc., which are diffusion models for image generation + transformer models for the text-to-image part.

I’m interested in this in general. I am especially interested in models that

work on macOS
on my local machine (i.e. use my local GPU)

A method that allows me to use or train my own model is especially interesting. I like using the community-trained models for specialisation or jailbreaking. As with many other parts of AI, the community is incredible.

For audio stuff, see music diffusion.

1 Theory

Much to say here. Interstingly there is a move to leave diffusion behind in favour of autoregressive models.

Alpha-VLLM/Lumina-mGPT-2.0: Lumina-mGPT 2.0: Stand-alone Autoregressive Image Modelling

2 Community Model Ecosystems

Hugging Face is the heavy-hitter in neural networks generally and has lots of diffusion models. Art diffusion models additionally have the specialized CivitAI:

CivitAI is a labour of love from a small team. After being inspired daily by the incredible progress of the Stable Diffusion community and the explosion of custom fine-tuned models, textual inversions, and more, we wanted to see if we could create something that would continue to help the community grow and thrive.

After seeing a gap around sharing the custom models that were being made by the community, we decided to try our hand at putting together a tool that would make it easy for anyone to share, find, and review models. While there were existing services like Hugging Face that allowed users to expose their models as repositories, we felt that it was missing a few key features that would really allow it to serve as a home for the growing community and use case:

A way for creators to tag models with things that make sense to the SD community

A good way for people interested in the model to review and share their creations

A simpler upload and download interface (how many of us are really familiar with code repos)

An indexed and visual browsing experience of all the models available

An API that can be used by SD tools to tap into the growing library of models, embeds, aesthetic gradients, and hyper networks available

Feature	Hugging Face	CivitAI
Focus	Research-first platform (300,000+ models)	Community-driven artistic hub
Model Types	Stable Diffusion variants, ControlNet, LoRAs	Artistic models (anime, photorealistic, 3D), fine-tuned LoRAs
Discovery	Organized by pipeline tags and metrics	Visual browsing with instant output previews
Documentation	Comprehensive model cards with bias analysis	User-generated examples and prompt sharing
Community	Academic and ML practitioner oriented	Artist and creator focused
Integration	Native PyTorch/TensorFlow support, `diffusers` library	Simple download format for GUIs like DiffusionBee
Content Policy	Stricter content guidelines	More permissive with NSFW filters
Traffic	Research-focused userbase	25M+ monthly visits, 500+ new models daily
Ecosystem	Central to ML research and deployment	Popular for artistic workflows and style training

Relationship to GUI Clients:

Most macOS tools (e.g., DiffusionBee, Mochi Diffusion) support importing models from both ecosystems.
CivitAI’s LoRAs/styles are popular for artistic workflows, while Hugging Face provides foundational models like SDXL.

3 Folk history of Stability

The story of modern AI image tools begins with Stable Diffusion — a 2022 open-source project developed by Stability AI, CompVis (LMU Munich), and RunwayML. Its release democratised high-quality image generation, letting users run models locally and fine-tune them freely. In 2024 key researchers behind Stable Diffusion left Stability AI to form Black Forest Labs, citing disagreements over open-source commitments and commercialisation strategies. They went on to create Flux, a transformer-based model family praised for its precision but criticised for its hefty hardware demands (think 24GB VRAM for full features).

The landscape now is fractured.

Corporate models (Flux, DALL-E 3, Midjourney) offer polish and ease but often lock advanced features behind APIs or subscriptions.
Community-driven tools (SDXL, CivitAI LoRAs) prioritise customisation and local control, albeit with steeper learning curves.

Some interesting new contenders have also appeared. I don’t know much about those.

Ideogram (ex-Google Imagen team): Masters text-in-images and typography.
PixArt-Σ (Tencent): Balances speed and photorealism for commercial workflows.
CogView-3 (BAAI): Favoured for industrial design prototyping.

Corporate models shine for plug-and-play reliability; community forks foster niche artistry and ethical transparency. Also, what the PR doesn’t tell you is that the community models have a lot more autonomy and fun; you can tweak them to do idiosyncratic styling, or to generate images that are difficult to coax out of the mainstream models with their guardrails, such as violence, sexual content, and nazis. Black Forest’s Flux has a foot in both worlds — its “dev” version is open-weights but non-commercial, while “pro” targets enterprise.

4 GUIs

I was using DiffusionBee and some other Hugging Face models which I have now forgotten. Since then, new clients have appeared. I got Perplexity to generate a features matrix of the promising ones.

Tool	Cost	Open Source	BYO Models	Apple Silicon Support	Ease of Use	Generation Types
DiffusionBee	Free	Yes	Yes (TensorFlow)	Native (M1/M2/M3)	🤪🤪🤪🤪	Text/Image-Conditional
Mochi Diffusion	Free	Yes	Yes (CoreML required)	CoreML Optimized	🤪🤪🤪	Text/Image-Conditional
ComfyUI+MLX	Free	Yes	Yes (PyTorch/GGUF)	MLX Accelerated	🤪🤪	Text/Image/Video
InvokeAI	Free	Yes	Yes	Optimized	🤪🤪🤪	Text/Image-Conditional
Draw Things	Free	No	Yes (CivitAI import)	Metal Acceleration	🤪🤪🤪🤪	Text/Image-Conditional

4.1 DiffusionBee

Description: User-friendly desktop app optimized for Apple Silicon. Offers offline generation, video tools, and FLUX model support.
Key Features:
- Simplified installation (drag-and-drop DMG)
- Direct CivitAI/Hugging Face model imports
- Image-to-image transformations and inpainting
Apple Silicon: Native M-series support via TensorFlow/Metal
Ease: 🤪🤪🤪🤪 (Beginner-friendly)

4.2 Mochi Diffusion

Native SwiftUI app using Apple’s Core ML framework for maximum hardware efficiency.
Key Features:
- ~150MB RAM usage with Neural Engine
- EXIF metadata preservation
- ControlNet and RealESRGAN upscaling
Apple Silicon: CoreML-optimized (3-4GB VRAM usage)
Ease: 🤪🤪🤪 (Moderate technical skill)

4.3 ComfyUI+MLX

Visual-programming node-based workflow system enabling granular control over generation pipelines.
Key Features:
- MLX acceleration for Apple Neural Engine
- First access to new models (SD 3, VideoCrafter)
- 8K upscaling via custom nodes
Apple Silicon: Requires MLX setup
Ease: 🤪🤪 (Advanced users)

4.4 InvokeAI

Dual CLI/WebUI interfaces.
Key Features:
- Canvas-based iterative editing
- Multi-model blending
- Outpainting with context awareness
Apple Silicon: Optimized via Metal Performance Shaders
Ease: 🤪🤪🤪 (Web UI accessible)

InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry-leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.

4.5 Draw Things

Description: App Store-native tool with AR previews and live generation tuning.
Key Features:
- One-click CivitAI model imports
- Real-time diffusion process visualization
- Model mixing via sliders
Apple Silicon: Metal API acceleration (5-7GB VRAM)
Ease: 🤪🤪🤪🤪 (TouchBar support)

4.6 Misc

AUTOMATIC1111 WebUI
- Description: Feature-rich browser interface via Gradio.
- Apple Silicon: Requires manual PyTorch/MPS setup
- Best For: Users familiar with Linux-centric workflows
A browser interface based on Gradio library for Stable Diffusion.

Runs on Apple Silicon
NMKD GUI

A handy GUI to run Stable Diffusion, a machine learning toolkit to generate images from text, locally on your own hardware.

It is completely uncensored and unfiltered - I am not responsible for any of the content generated with it. No data is shared/collected by me or any third party.

If I use the Hugging Face tooling, building a local UI is easy; it integrates easily with gradio. AFAICT AUTOMATIC1111 WebUI is one such. See also nitrosocke/diffusers-webui

5 How do I download and use that cool model I found?

Hugging Face:
- Download models via git lfs and place in ~/Documents/DiffusionBee/models.
- Use diffusers pipelines for custom workflows in ComfyUI.
- extensive conversions available
CivitAI:
- Directly import .safetensors/LoRAs into Draw Things or DiffusionBee.
- Filter models by “macOS-optimized” tags for CoreML/MLX compatibility.

6 Model customisation and fine-tuning

That Pokemon diffusion post: Adventures in Finetuning Stable Diffusion.
PRIV-Creation/Awesome-Diffusion-Personalization: A collection of resources on personalisation with diffusion models.

7 Optimizing for Apple Silicon

Got a model that looks great but maybe doesn’t run so well on your Mac? Here are some tips.

7.1 Model Conversion

CoreML Tools: Convert PyTorch models to CoreML for Mochi Diffusion using Apple’s coremltools.

import coremltools as ct
mlmodel = ct.convert(torch_model, inputs=[ct.TensorType(shape=(1, 3, 512, 512))])

MLX Framework: For ComfyUI, use MLX nodes to enable Neural Engine acceleration (30–70% speed gains).

7.2 Metal Acceleration

Enable METAL_PERFORMANCE_SHADERS in InvokeAI or use Draw Things’ native Metal API for faster inference.

8 Hosted models

Just go to a website, give someone money and get images back. Trade convenience for privacy and control.

8.1 Runway.ml

Runway.ml

a platform for creators of all kinds to use machine learning tools in intuitive ways without any coding experience. Find resources here to start creating with RunwayML quickly.

In particular, it plugs into Blender and Photoshop and allows you to use those programs as a UI for ML-backed algorithms. Nice.

8.2 Midjourney

Midjourney produces high-quality images from text prompts. Addictive in that you can get better at it, which feels like mastering a real skill.

8.3 Nightcafe

NightCafe Creator

Stable Diffusion, DALL-E 2, CLIP-Guided Diffusion, VQGAN+CLIP and Neural Style Transfer are all available on NightCafe.

8.4 Playgroundai

Playground AI

9 Punditry

10 Theory

11 Incoming

12 References

Dhariwal, and Nichol. 2021. “Diffusion Models Beat GANs on Image Synthesis.” arXiv:2105.05233 [Cs, Stat].

Dutordoir, Saul, Ghahramani, et al. 2022. “Neural Diffusion Processes.”

Han, Zheng, and Zhou. 2022. “CARD: Classification and Regression Diffusion Models.”

Ho, Jain, and Abbeel. 2020. “Denoising Diffusion Probabilistic Models.” In Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20.

Hoogeboom, Gritsenko, Bastings, et al. 2021. “Autoregressive Diffusion Models.” arXiv:2110.02037 [Cs, Stat].

Nichol, and Dhariwal. 2021. “Improved Denoising Diffusion Probabilistic Models.” In Proceedings of the 38th International Conference on Machine Learning.

Sohl-Dickstein, Weiss, Maheswaranathan, et al. 2015. “Deep Unsupervised Learning Using Nonequilibrium Thermodynamics.”

Song, Yang, and Ermon. 2020a. “Generative Modeling by Estimating Gradients of the Data Distribution.” In Advances In Neural Information Processing Systems.

———. 2020b. “Improved Techniques for Training Score-Based Generative Models.” In Advances In Neural Information Processing Systems.

Song, Jiaming, Meng, and Ermon. 2021. “Denoising Diffusion Implicit Models.” arXiv:2010.02502 [Cs].

von Platen, Patil, Lozhkov, et al. 2022. “Diffusers: State-of-the-Art Diffusion Models.”

Yang, Zhang, Song, et al. 2023. “Diffusion Models: A Comprehensive Survey of Methods and Applications.” ACM Computing Surveys.