Front-end clients for AI image models
ComfyUI, InvokeAI, Draw Things, ChaiNNer and friends
2022-09-16 — 2026-05-05
In which the Several Runtimes Available on Apple Silicon are Surveyed, the Trade-offs of Each Client’s Pareto Position are Examined, and Guidance on GGUF Quantisation for 16GB Machines is Offered.
Front-end software for running AI image models on our own hardware — generation, editing, inpainting, upscaling, and chaining the lot together.
To assemble a whole stack you might use.
- generative image models for the text-to-image generation side (Stable Diffusion, Flux, SDXL and friends);
- editing images with machine learning for the editing side (FLUX.1 Kontext, Qwen-Image-Edit, Nano Banana, single-task tools);
- AI democratization for the community and its spicy back-stories (Stability AI → Black Forest Labs, CivitAI, the open-vs-corporate fault line).
For non-ML editing — vips, ImageMagick, GIMP scripting — see editing images using code and editing images using a GUI.
I am principally interested in clients that
- work on macOS
- on my local machine (i.e. use my local GPU)
Most of the options here do something like that. Some of them only work on Very Serious Infrastructure Which I Cannot Afford To Own But Might Rent In The Cloud, which is a secondary concern for me personally, but there are some notes on that too.
1 How models actually run on a Mac
Three runtimes coexist on Apple Silicon and none has won. Picking a client mostly means picking which runtime it uses.
- PyTorch + MPS (Metal Performance Shaders) — what most Python ML code uses on Apple Silicon. Coverage of new model architectures is fastest because PyTorch is the lingua franca; raw speed is decent but not optimal. ComfyUI, InvokeAI and the A1111-family forks pick this path.
- Apple MLX — Apple’s own array framework. Faster than MPS where it has coverage; coverage lags PyTorch by months. On image clients it shows up as add-on node packs (ComfyUI-MLX, mflux-ComfyUI, mflux standalone). Trades coverage for speed.
- Apple Core ML — lowest RAM footprint, runs on the Neural Engine, but every model needs offline conversion via
coremltools. Mochi Diffusion is the only mainstream consumer; this is why its model list lags everything else. - Custom Swift + Metal kernels — Draw Things sidesteps all of the above with its own implementation. Faster than MPS, broader coverage than CoreML, no install hoops; the price is closed source.
For Flux- and Qwen-class models on a 16GB Mac, GGUF quants are the difference between “runs” and “doesn’t run” — ComfyUI handles them via ComfyUI-GGUF, Draw Things handles them natively, others have partial support.
2 Best pick per backbone family
(Mac, 2026)
- SD 1.5 / SDXL: Draw Things for one-click; InvokeAI if we want canvas + layers.
- Flux dev/schnell: Draw Things for ease; ComfyUI for speed via GGUF + MLX nodes.
- FLUX.1 Kontext (instruction edit): Draw Things and InvokeAI both work today; ComfyUI for graph-level control.
- Qwen-Image-Edit: Draw Things is the standout — supports the 2509, 2511 and Layered variants natively. InvokeAI and ComfyUI too.
- Video (Wan / Hunyuan / LTX): ComfyUI is the realistic local option on Mac; Draw Things has Wan; for hosted, Runway.
- Just upscale / restore (no diffusion): ChaiNNer or Spandrel from a script.
3 Capability matrix
There is a trade-off between flexibility and usability. Nothing is free. All tools sit at a different position on that trade-off. My wanker name for this is Pareto position.
| Tool | Mac install | Pareto position |
|---|---|---|
| Draw Things | App Store, signed | Mac-native sweet spot |
| ComfyUI (+MLX) | Python+MPS or +MLX add-ons | Bleeding-edge, max compatibility, node-graph |
| InvokeAI | Official installer (MPS) | Canvas studio with inpaint brushes |
| Mochi Diffusion | Drag-drop, CoreML conversion required | Low-RAM CoreML niche |
| DiffusionBee | Drag-drop DMG (semi-dormant) | Legacy easy on-ramp |
| Forge Neo / SDNext | Python+MPS | A1111-style WebUI, current models |
| ChaiNNer | Electron + Python+MPS | Upscale/restore pipelines (no diffusion) |
Backbone support (May 2026):
| Tool | SD/SDXL | Flux dev/schnell | FLUX.1 Kontext | Qwen-Image-Edit | Video (Wan/Hunyuan/LTX) |
|---|---|---|---|---|---|
| Draw Things | ✓ | ✓ | ✓ | ✓ (2509/2511/Layered) | Wan |
| ComfyUI | ✓ | ✓ (incl. Flux 2) | ✓ | ✓ | Wan / Hunyuan / LTX / + |
| InvokeAI | ✓ (incl. SD3.5) | ✓ (+Krea / Redux / Fill, Flux 2 Klein) | ✓ | ✓ | – |
| Mochi Diffusion | ✓ | Flux 2 Klein only | – | – | – |
| DiffusionBee | ✓ | partial | – | – | – |
| Forge Neo | ✓ | ✓ | – | ✓ | Wan 2.2 |
| ChaiNNer | – | – | – | – | – |
4 Generation + editing GUIs
These run text-to-image generation and image-to-image / inpainting on top of the same backbone (Stable Diffusion, SDXL, Flux, and so on). When the editing notebook talks about “mask + prompt” inpainting workflows, this is what it means.
4.1 Draw Things
The most actively-maintained Mac-native client in the field. App Store install, signed and sandboxed; uses a custom Swift+Metal stack rather than PyTorch, so it sidesteps most of the Mac-runtime gotchas above. Multiple updates per month.
Pareto: the closest thing to “ComfyUI’s model coverage with DiffusionBee’s install ease”. If we don’t already have a reason to prefer something else, this is a fine default.
Backbones: SD 1.5, SDXL, SD3, Flux dev/schnell, FLUX.1 Kontext for instruction edits, Qwen-Image-Edit 2509/2511 and Qwen-Image-Layered, Wan video (multiple variants), HiDream. CivitAI safetensors import works (auto-converts internally).
Affordances: LoRA loading and local LoRA training, ControlNet, inpainting, outpainting, infinite canvas. Failure mode for genuinely novel architectures: lags ComfyUI by weeks, not months.
5–7GB VRAM typical, they claim; runs comfortably on a 16GB Mac.
4.2 ComfyUI (+MLX)
Visual node-graph workflow system. The bleeding edge — first to support every new architecture, usually within days of release. Active main project; the MLX add-ons are smaller community efforts that improve speed on Apple Silicon.
Pareto: the power-user pick when a workflow needs to be saved, parameterised, re-run, or debugged at the node level. Steepest learning curve here.
Backbones: everything. SD 1.5, SDXL, SD3/3.5, Flux dev/schnell/2, FLUX.1 Kontext, Qwen-Image, Qwen-Image-Edit, Wan 2.1/2.2, HiDream E1.1, LTX-Video, Hunyuan Video/3D, Omnigen 2, ACE Step audio. LoRA, ControlNet, inpainting, img2img, regional prompting all native.
Mac install: pip install + PyTorch+MPS works out of the box. The optional MLX setup plus the ComfyUI-MLX / mflux-ComfyUI / Flux-MLX node packs accelerate Flux-class models. GGUF quants are essential on 16GB Macs — install ComfyUI-GGUF.
Affordances: training is out of scope (use kohya). Failure mode for unknown CivitAI models: usually “find the right node pack in the manager”; almost always loadable.
4.3 InvokeAI
Polished web UI with a canvas-and-layers studio feel. Official installer underneath; PyTorch+MPS. Most actively-maintained “polished UI” option after Draw Things.
Pareto: the canvas-first pick for people who want layers and inpainting brushes rather than node spaghetti — closer to a Photoshop workflow than to a programming environment.
Backbones: SD 1.5/2.0, SDXL, SD 3.5 Medium/Large, Flux dev/schnell/Kontext/Krea/Redux/Fill, Flux 2 Klein 4B/9B, Qwen Image, Qwen Image Edit, CogView 4, Z-Image. Some GGUF, ckpt, diffusers. LoRA + embeddings, SAM/SAM2 segmentation, unified canvas inpainting/outpainting.
Affordances: no video, no training. Failure mode for unknown models: the model picker rejects an unknown architecture cleanly rather than crashing.
InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry-leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.
4.4 Mochi Diffusion
Native SwiftUI app on Apple’s CoreML framework, so it runs on the Neural Engine with the lowest RAM footprint of anything on this list (~150MB). Maintained, but small team.
Pareto: the niche pick when RAM is at a premium and we’re willing to live with Mochi’s small set of pre-converted models — most CivitAI checkpoints will not load without offline CoreML conversion.
Backbones: SD 1.5, SD 2.x, SDXL, plus Flux 2 Klein (4B/9B distilled, pre-built bundles available). ControlNet yes. No Flux dev/schnell, no Kontext, no Qwen, no SD3, no video, no LoRA loading in the headline featureset, no native inpainting.
Affordances: every model must be CoreML-converted via coremltools — a chore unless we grab pre-converted bundles from the Mochi wiki. See CoreML conversion in practice below for what self-conversion actually involves. EXIF metadata preservation; ControlNet; RealESRGAN upscaling.
4.5 DiffusionBee
The original drag-and-drop Mac client; native Apple Silicon, zero terminal. Project is semi-dormant as of writing — last GitHub release v2.5.3 was August 2024.
Pareto: the legacy easy-button on-ramp. Superseded on every axis by Draw Things; included for historical recognisability rather than current recommendation.
Backbones: SD 1.x, SD 2.x, SDXL, partial Flux, claims SD 3.5 support. No FLUX Kontext, no Qwen-Image-Edit, no full SD3, no video.
Affordances: inpainting, ControlNet, LoRA. Imports CivitAI safetensors but with frequent breakage (the issue tracker is full of “model won’t load”). Failure mode for unknown models: silent rejection or crash.
5 CoreML conversion in practice
A practical aside, since Mochi Diffusion is the one client on this page that depends on a separate offline model-conversion step.
Verdict first: don’t convert. Grab a pre-converted bundle from coreml-community (~145 community bundles covering SD 1.5, SD 2.1, SDXL forks like Animagine XL / AnythingXL / CounterfeitXL, ControlNets in both attention modes) or apple/coreml-stable-diffusion-xl-base for the official SDXL. That covers >95% of “things we actually want to run”. Self-conversion only earns its keep when we have a fresh CivitAI checkpoint nobody has packaged yet, and even then the experience is rough.
The conversion pipeline lives in apple/ml-stable-diffusion. The repo is in maintenance, not development — last tagged release 1.1.1 was May 2024, last Apple commit late 2024, with one community PR landing July 2025. Supported architectures: SD 1.4/1.5, SD 2.0/2.1, SDXL (+refiner), SD3-medium, ControlNet (SD1.x only). Anything 2025-onwards — Flux, SD 3.5, Wan, Qwen-Image, Lumina, Sana — has no Apple-blessed converter path and no community fork that covers it. For those, use Draw Things (its custom Swift+Metal stack handles them natively) or any PyTorch+MPS client.
5.1 Time and RAM budget for one SDXL conversion
| Hardware | Wall clock | Peak RAM | Notes |
|---|---|---|---|
| M1 / M1 Pro 16 GB | 30–60 min | 14–18 GB | OOM kills common; pass --half |
| M2 / M3 16 GB | 20–35 min | 12–16 GB | Marginal; close everything else |
| M2/M3/M4 Pro/Max 32 GB+ | 15–25 min | 18–22 GB | Comfortable |
Disk: source safetensors (~6.9 GB SDXL) + intermediate diffusers dump (~13 GB fp32) + CoreML output (~5–10 GB depending on attention mode). Budget 30 GB free.
5.2 The pipeline
The Mochi Diffusion repo vendors a wrapper around Apple’s converter. That’s the path the wiki documents and the only one that “just works”:
git clone https://github.com/MochiDiffusion/MochiDiffusion.git
cd MochiDiffusion/conversion
uv venv && source .venv/bin/activate
./download-script.sh
export MODEL_NAME=mySDXL
mv ~/Downloads/${MODEL_NAME}.safetensors .
# Phase 1: safetensors → diffusers (~1 min)
uv run python convert_original_stable_diffusion_to_diffusers.py \
--checkpoint_path ${MODEL_NAME}.safetensors --from_safetensors \
--pipeline_class_name StableDiffusionXLPipeline \
--device cpu --extract_ema --dump_path ${MODEL_NAME}_diffusers
# Phase 2: diffusers → Core ML (~20–40 min, ~16 GB RAM)
uv run python -m python_coreml_stable_diffusion.torch2coreml \
--xl-version --compute-unit CPU_AND_GPU \
--convert-vae-decoder --convert-vae-encoder \
--convert-unet --convert-text-encoder \
--model-version ${MODEL_NAME}_diffusers \
--bundle-resources-for-swift-cli \
--attention-implementation ORIGINAL \
-o ${MODEL_NAME}_original
# Mochi loads from: ${MODEL_NAME}_original/Resources/5.3 Attention-mode gotcha
ORIGINALtargets CPU + GPU (Metal). Pick this for Macs — best speed/quality on the M-series GPU. Pair with--compute-unit CPU_AND_GPU.SPLIT_EINSUMtargets the Neural Engine. Faster on iPhone/iPad and on entry M-series for SD 1.5; for SDXL on a Mac it’s slower because the model is too big for the ANE to hold the weights anyway.SPLIT_EINSUM_V2is a 10–30% mobile speedup; Apple’s own README warns SDXL compile times are “prohibitively long”. Skip on Mac.
Convert both modes only if we’re shipping the model to iPad too.
5.4 What breaks on a fresh CivitAI checkpoint
- SDXL fine-tune — works. Plan a quiet hour; expect one OOM retry on 16 GB; expect to add
--halfif it fails on shard size. (diffusers≥0.29 default-shards the SDXL UNet at 10 GB, which breaks the next stage — bumpmax_shard_sizeto 15 GB on line ~188 of the script. See Mochi issue #261.) - Flux / SD 3.5 / Wan / Qwen / anything else recent — don’t try. The converter doesn’t know those architectures. Use Draw Things instead.
6 The AUTOMATIC1111 family
The original AUTOMATIC1111 WebUI — feature-rich Gradio interface — was the canonical Stable Diffusion UI for years. Its main branch is effectively abandoned: last release v1.10.1 from Feb 2025, no support for Flux, SD3, Kontext, Qwen, or anything more recent than SDXL. Active development moved to a thicket of forks:
- Forge (lllyasviel) — the original fork. Sporadic.
- reForge (Panchovix) — stability-focused fork of Forge.
- Forge Neo / Forge Classic (Haoming02) — current best A1111-style pick: supports Flux, Qwen, Wan 2.2.
- SDNext (vladmandic) — actively-maintained heavy refactor.
Pareto on Mac: if we want the WebUI feel with current models, maybe try Forge Neo or SDNext. Vanilla A1111 is now of historical interest only. PyTorch+MPS for any of these; the well-trodden install path; the official MPS notes only cover the abandoned main branch.
If we use the Hugging Face tooling, building a local UI is easy; it integrates easily with gradio. See also nitrosocke/diffusers-webui.
7 Pipeline-chaining clients
7.1 ChaiNNer
A node-based GUI specifically for non-diffusion image-processing pipelines: Real-ESRGAN → GFPGAN → format conversion → batch over a folder. Electron app + Python sidecar with PyTorch+MPS; works on Apple Silicon. Maintained.
Pareto: the niche pick when we want repeatable upscale/restore pipelines without writing a script. Complementary to ComfyUI, not competitive — if the task is diffusion, this is the wrong tool.
Affordances: Spandrel-supported PyTorch arches (super-resolution, face restore, denoise, JPEG-deartefact, dehaze, low-light, colourisation), plus NCNN/ONNX/TensorRT model files. Loads .pth, .pt, .safetensors, some .ckpt.
Source: chaiNNer-org/chaiNNer.
7.2 Spandrel
ChaiNNer’s Python core, also usable on its own. Spandrel on PyPI. Pick when we want ChaiNNer’s arch coverage from a script rather than a node graph.
Catalogue: ESRGAN family, SwinIR, HAT, Real-ESRGAN, AuraSR plus ~20 other super-resolution arches; GFPGAN/CodeFormer/RestoreFormer face restorers; LaMa/MAT inpainting; NAFNet/SCUNet/Restormer denoisers; DDColor; DeJPEG. No diffusion, no Flux, no Qwen.
8 Hosted-only services
When we don’t want to run anything locally — trade convenience for privacy and control.
8.1 Runway
Runway pivoted to video around 2024 and the image-generation product line is no longer a marketed offering. Current focus: Gen-4.5 video, GWM-1 world models, real-time video agents (Runway Characters). The Photoshop and Blender plugins still exist.
Pareto: the hosted pick for video and world-model work; for image-only workflows there are cheaper and more focused options elsewhere.
8.2 Midjourney
Midjourney — currently V7 (default) with V8 alpha shipped March 2026 (5x speed, native 2K out, image + video). Browser/Discord interface; no public API — anything calling itself a Midjourney API is an unofficial scraper. $10–$120/month subscription.
Pareto: the prompt-craft aesthetic frontier — addictive in that we can get better at it, which feels like mastering a real skill. Unsuitable for programmatic pipelines because of the no-API thing.
