Front-end clients for AI image models

ComfyUI, InvokeAI, Draw Things, ChaiNNer and friends

2022-09-16 — 2026-05-05

In which the Several Runtimes Available on Apple Silicon are Surveyed, the Trade-offs of Each Client’s Pareto Position are Examined, and Guidance on GGUF Quantisation for 16GB Machines is Offered.

computers are awful
generative art
machine learning
making things
photon choreography
UI
Figure 1

Front-end software for running AI image models on our own hardware — generation, editing, inpainting, upscaling, and chaining the lot together.

To assemble a whole stack you might use.

For non-ML editing — vips, ImageMagick, GIMP scripting — see editing images using code and editing images using a GUI.

I am principally interested in clients that

  1. work on macOS
  2. on my local machine (i.e. use my local GPU)

Most of the options here do something like that. Some of them only work on Very Serious Infrastructure Which I Cannot Afford To Own But Might Rent In The Cloud, which is a secondary concern for me personally, but there are some notes on that too.

1 How models actually run on a Mac

Three runtimes coexist on Apple Silicon and none has won. Picking a client mostly means picking which runtime it uses.

  • PyTorch + MPS (Metal Performance Shaders) — what most Python ML code uses on Apple Silicon. Coverage of new model architectures is fastest because PyTorch is the lingua franca; raw speed is decent but not optimal. ComfyUI, InvokeAI and the A1111-family forks pick this path.
  • Apple MLX — Apple’s own array framework. Faster than MPS where it has coverage; coverage lags PyTorch by months. On image clients it shows up as add-on node packs (ComfyUI-MLX, mflux-ComfyUI, mflux standalone). Trades coverage for speed.
  • Apple Core ML — lowest RAM footprint, runs on the Neural Engine, but every model needs offline conversion via coremltools. Mochi Diffusion is the only mainstream consumer; this is why its model list lags everything else.
  • Custom Swift + Metal kernelsDraw Things sidesteps all of the above with its own implementation. Faster than MPS, broader coverage than CoreML, no install hoops; the price is closed source.

For Flux- and Qwen-class models on a 16GB Mac, GGUF quants are the difference between “runs” and “doesn’t run” — ComfyUI handles them via ComfyUI-GGUF, Draw Things handles them natively, others have partial support.

2 Best pick per backbone family

(Mac, 2026)

3 Capability matrix

There is a trade-off between flexibility and usability. Nothing is free. All tools sit at a different position on that trade-off. My wanker name for this is Pareto position.

Tool Mac install Pareto position
Draw Things App Store, signed Mac-native sweet spot
ComfyUI (+MLX) Python+MPS or +MLX add-ons Bleeding-edge, max compatibility, node-graph
InvokeAI Official installer (MPS) Canvas studio with inpaint brushes
Mochi Diffusion Drag-drop, CoreML conversion required Low-RAM CoreML niche
DiffusionBee Drag-drop DMG (semi-dormant) Legacy easy on-ramp
Forge Neo / SDNext Python+MPS A1111-style WebUI, current models
ChaiNNer Electron + Python+MPS Upscale/restore pipelines (no diffusion)

Backbone support (May 2026):

Tool SD/SDXL Flux dev/schnell FLUX.1 Kontext Qwen-Image-Edit Video (Wan/Hunyuan/LTX)
Draw Things ✓ (2509/2511/Layered) Wan
ComfyUI ✓ (incl. Flux 2) Wan / Hunyuan / LTX / +
InvokeAI ✓ (incl. SD3.5) ✓ (+Krea / Redux / Fill, Flux 2 Klein)
Mochi Diffusion Flux 2 Klein only
DiffusionBee partial
Forge Neo Wan 2.2
ChaiNNer

4 Generation + editing GUIs

These run text-to-image generation and image-to-image / inpainting on top of the same backbone (Stable Diffusion, SDXL, Flux, and so on). When the editing notebook talks about “mask + prompt” inpainting workflows, this is what it means.

4.1 Draw Things

The most actively-maintained Mac-native client in the field. App Store install, signed and sandboxed; uses a custom Swift+Metal stack rather than PyTorch, so it sidesteps most of the Mac-runtime gotchas above. Multiple updates per month.

Pareto: the closest thing to “ComfyUI’s model coverage with DiffusionBee’s install ease”. If we don’t already have a reason to prefer something else, this is a fine default.

Backbones: SD 1.5, SDXL, SD3, Flux dev/schnell, FLUX.1 Kontext for instruction edits, Qwen-Image-Edit 2509/2511 and Qwen-Image-Layered, Wan video (multiple variants), HiDream. CivitAI safetensors import works (auto-converts internally).

Affordances: LoRA loading and local LoRA training, ControlNet, inpainting, outpainting, infinite canvas. Failure mode for genuinely novel architectures: lags ComfyUI by weeks, not months.

5–7GB VRAM typical, they claim; runs comfortably on a 16GB Mac.

4.2 ComfyUI (+MLX)

Visual node-graph workflow system. The bleeding edge — first to support every new architecture, usually within days of release. Active main project; the MLX add-ons are smaller community efforts that improve speed on Apple Silicon.

Pareto: the power-user pick when a workflow needs to be saved, parameterised, re-run, or debugged at the node level. Steepest learning curve here.

Backbones: everything. SD 1.5, SDXL, SD3/3.5, Flux dev/schnell/2, FLUX.1 Kontext, Qwen-Image, Qwen-Image-Edit, Wan 2.1/2.2, HiDream E1.1, LTX-Video, Hunyuan Video/3D, Omnigen 2, ACE Step audio. LoRA, ControlNet, inpainting, img2img, regional prompting all native.

Mac install: pip install + PyTorch+MPS works out of the box. The optional MLX setup plus the ComfyUI-MLX / mflux-ComfyUI / Flux-MLX node packs accelerate Flux-class models. GGUF quants are essential on 16GB Macs — install ComfyUI-GGUF.

Affordances: training is out of scope (use kohya). Failure mode for unknown CivitAI models: usually “find the right node pack in the manager”; almost always loadable.

4.3 InvokeAI

Polished web UI with a canvas-and-layers studio feel. Official installer underneath; PyTorch+MPS. Most actively-maintained “polished UI” option after Draw Things.

Pareto: the canvas-first pick for people who want layers and inpainting brushes rather than node spaghetti — closer to a Photoshop workflow than to a programming environment.

Backbones: SD 1.5/2.0, SDXL, SD 3.5 Medium/Large, Flux dev/schnell/Kontext/Krea/Redux/Fill, Flux 2 Klein 4B/9B, Qwen Image, Qwen Image Edit, CogView 4, Z-Image. Some GGUF, ckpt, diffusers. LoRA + embeddings, SAM/SAM2 segmentation, unified canvas inpainting/outpainting.

Affordances: no video, no training. Failure mode for unknown models: the model picker rejects an unknown architecture cleanly rather than crashing.

InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry-leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.

4.4 Mochi Diffusion

Native SwiftUI app on Apple’s CoreML framework, so it runs on the Neural Engine with the lowest RAM footprint of anything on this list (~150MB). Maintained, but small team.

Pareto: the niche pick when RAM is at a premium and we’re willing to live with Mochi’s small set of pre-converted models — most CivitAI checkpoints will not load without offline CoreML conversion.

Backbones: SD 1.5, SD 2.x, SDXL, plus Flux 2 Klein (4B/9B distilled, pre-built bundles available). ControlNet yes. No Flux dev/schnell, no Kontext, no Qwen, no SD3, no video, no LoRA loading in the headline featureset, no native inpainting.

Affordances: every model must be CoreML-converted via coremltools — a chore unless we grab pre-converted bundles from the Mochi wiki. See CoreML conversion in practice below for what self-conversion actually involves. EXIF metadata preservation; ControlNet; RealESRGAN upscaling.

4.5 DiffusionBee

The original drag-and-drop Mac client; native Apple Silicon, zero terminal. Project is semi-dormant as of writing — last GitHub release v2.5.3 was August 2024.

Pareto: the legacy easy-button on-ramp. Superseded on every axis by Draw Things; included for historical recognisability rather than current recommendation.

Backbones: SD 1.x, SD 2.x, SDXL, partial Flux, claims SD 3.5 support. No FLUX Kontext, no Qwen-Image-Edit, no full SD3, no video.

Affordances: inpainting, ControlNet, LoRA. Imports CivitAI safetensors but with frequent breakage (the issue tracker is full of “model won’t load”). Failure mode for unknown models: silent rejection or crash.

5 CoreML conversion in practice

A practical aside, since Mochi Diffusion is the one client on this page that depends on a separate offline model-conversion step.

Verdict first: don’t convert. Grab a pre-converted bundle from coreml-community (~145 community bundles covering SD 1.5, SD 2.1, SDXL forks like Animagine XL / AnythingXL / CounterfeitXL, ControlNets in both attention modes) or apple/coreml-stable-diffusion-xl-base for the official SDXL. That covers >95% of “things we actually want to run”. Self-conversion only earns its keep when we have a fresh CivitAI checkpoint nobody has packaged yet, and even then the experience is rough.

The conversion pipeline lives in apple/ml-stable-diffusion. The repo is in maintenance, not development — last tagged release 1.1.1 was May 2024, last Apple commit late 2024, with one community PR landing July 2025. Supported architectures: SD 1.4/1.5, SD 2.0/2.1, SDXL (+refiner), SD3-medium, ControlNet (SD1.x only). Anything 2025-onwards — Flux, SD 3.5, Wan, Qwen-Image, Lumina, Sana — has no Apple-blessed converter path and no community fork that covers it. For those, use Draw Things (its custom Swift+Metal stack handles them natively) or any PyTorch+MPS client.

5.1 Time and RAM budget for one SDXL conversion

Hardware Wall clock Peak RAM Notes
M1 / M1 Pro 16 GB 30–60 min 14–18 GB OOM kills common; pass --half
M2 / M3 16 GB 20–35 min 12–16 GB Marginal; close everything else
M2/M3/M4 Pro/Max 32 GB+ 15–25 min 18–22 GB Comfortable

Disk: source safetensors (~6.9 GB SDXL) + intermediate diffusers dump (~13 GB fp32) + CoreML output (~5–10 GB depending on attention mode). Budget 30 GB free.

5.2 The pipeline

The Mochi Diffusion repo vendors a wrapper around Apple’s converter. That’s the path the wiki documents and the only one that “just works”:

git clone https://github.com/MochiDiffusion/MochiDiffusion.git
cd MochiDiffusion/conversion
uv venv && source .venv/bin/activate
./download-script.sh
export MODEL_NAME=mySDXL
mv ~/Downloads/${MODEL_NAME}.safetensors .

# Phase 1: safetensors → diffusers (~1 min)
uv run python convert_original_stable_diffusion_to_diffusers.py \
  --checkpoint_path ${MODEL_NAME}.safetensors --from_safetensors \
  --pipeline_class_name StableDiffusionXLPipeline \
  --device cpu --extract_ema --dump_path ${MODEL_NAME}_diffusers

# Phase 2: diffusers → Core ML (~20–40 min, ~16 GB RAM)
uv run python -m python_coreml_stable_diffusion.torch2coreml \
  --xl-version --compute-unit CPU_AND_GPU \
  --convert-vae-decoder --convert-vae-encoder \
  --convert-unet --convert-text-encoder \
  --model-version ${MODEL_NAME}_diffusers \
  --bundle-resources-for-swift-cli \
  --attention-implementation ORIGINAL \
  -o ${MODEL_NAME}_original
# Mochi loads from: ${MODEL_NAME}_original/Resources/

5.3 Attention-mode gotcha

  • ORIGINAL targets CPU + GPU (Metal). Pick this for Macs — best speed/quality on the M-series GPU. Pair with --compute-unit CPU_AND_GPU.
  • SPLIT_EINSUM targets the Neural Engine. Faster on iPhone/iPad and on entry M-series for SD 1.5; for SDXL on a Mac it’s slower because the model is too big for the ANE to hold the weights anyway.
  • SPLIT_EINSUM_V2 is a 10–30% mobile speedup; Apple’s own README warns SDXL compile times are “prohibitively long”. Skip on Mac.

Convert both modes only if we’re shipping the model to iPad too.

5.4 What breaks on a fresh CivitAI checkpoint

  • SDXL fine-tune — works. Plan a quiet hour; expect one OOM retry on 16 GB; expect to add --half if it fails on shard size. (diffusers ≥0.29 default-shards the SDXL UNet at 10 GB, which breaks the next stage — bump max_shard_size to 15 GB on line ~188 of the script. See Mochi issue #261.)
  • Flux / SD 3.5 / Wan / Qwen / anything else recent — don’t try. The converter doesn’t know those architectures. Use Draw Things instead.

6 The AUTOMATIC1111 family

The original AUTOMATIC1111 WebUI — feature-rich Gradio interface — was the canonical Stable Diffusion UI for years. Its main branch is effectively abandoned: last release v1.10.1 from Feb 2025, no support for Flux, SD3, Kontext, Qwen, or anything more recent than SDXL. Active development moved to a thicket of forks:

  • Forge (lllyasviel) — the original fork. Sporadic.
  • reForge (Panchovix) — stability-focused fork of Forge.
  • Forge Neo / Forge Classic (Haoming02) — current best A1111-style pick: supports Flux, Qwen, Wan 2.2.
  • SDNext (vladmandic) — actively-maintained heavy refactor.

Pareto on Mac: if we want the WebUI feel with current models, maybe try Forge Neo or SDNext. Vanilla A1111 is now of historical interest only. PyTorch+MPS for any of these; the well-trodden install path; the official MPS notes only cover the abandoned main branch.

If we use the Hugging Face tooling, building a local UI is easy; it integrates easily with gradio. See also nitrosocke/diffusers-webui.

7 Pipeline-chaining clients

7.1 ChaiNNer

A node-based GUI specifically for non-diffusion image-processing pipelines: Real-ESRGAN → GFPGAN → format conversion → batch over a folder. Electron app + Python sidecar with PyTorch+MPS; works on Apple Silicon. Maintained.

Pareto: the niche pick when we want repeatable upscale/restore pipelines without writing a script. Complementary to ComfyUI, not competitive — if the task is diffusion, this is the wrong tool.

Affordances: Spandrel-supported PyTorch arches (super-resolution, face restore, denoise, JPEG-deartefact, dehaze, low-light, colourisation), plus NCNN/ONNX/TensorRT model files. Loads .pth, .pt, .safetensors, some .ckpt.

Source: chaiNNer-org/chaiNNer.

7.2 Spandrel

ChaiNNer’s Python core, also usable on its own. Spandrel on PyPI. Pick when we want ChaiNNer’s arch coverage from a script rather than a node graph.

Catalogue: ESRGAN family, SwinIR, HAT, Real-ESRGAN, AuraSR plus ~20 other super-resolution arches; GFPGAN/CodeFormer/RestoreFormer face restorers; LaMa/MAT inpainting; NAFNet/SCUNet/Restormer denoisers; DDColor; DeJPEG. No diffusion, no Flux, no Qwen.

8 Hosted-only services

When we don’t want to run anything locally — trade convenience for privacy and control.

8.1 Runway

Runway pivoted to video around 2024 and the image-generation product line is no longer a marketed offering. Current focus: Gen-4.5 video, GWM-1 world models, real-time video agents (Runway Characters). The Photoshop and Blender plugins still exist.

Pareto: the hosted pick for video and world-model work; for image-only workflows there are cheaper and more focused options elsewhere.

8.2 Midjourney

Midjourney — currently V7 (default) with V8 alpha shipped March 2026 (5x speed, native 2K out, image + video). Browser/Discord interface; no public API — anything calling itself a Midjourney API is an unofficial scraper. $10–$120/month subscription.

Pareto: the prompt-craft aesthetic frontier — addictive in that we can get better at it, which feels like mastering a real skill. Unsuitable for programmatic pipelines because of the no-API thing.