Generative art with language+diffusion models

also some autoregressive models

2022-09-16 — 2026-05-22

Wherein the Model Ecosystems of Corporate and Community Providers Are Surveyed, With Reference to Licensing Regimes Across Four Jurisdictions and the Provenance Requirements of the European Union’s AI Act.

buzzword

computers are awful

generative art

machine learning

making things

music

neural nets

photon choreography

Generative art using modern diffusion-backed image generators. The name-brand models are DALL-E 2, Stable Diffusion, Midjourney etc, which are diffusion models for image generation + transformer models for the text-to-image part.

This page is about image generation — prompt to image, with a focus on the models and the model ecosystems. For editing existing images with ML — instruction-following editors (FLUX.1 Kontext, Qwen-Image-Edit, Nano Banana, etc.), background removal, upscaling, inpainting — see editing images with machine learning. For the front-end software that runs these models locally (ComfyUI, InvokeAI, Draw Things, ChaiNNer, …) see front-end clients for AI image models. For the community back-story behind Stable Diffusion, Black Forest Labs and the open-vs-corporate fault line, see AI democratization.

I’m interested in this in general. I am especially practically interested in models that run locally — on my own machine and GPU rather than behind a hosted API — and ideally ones I can fine-tune or train myself. I happen to work on a Mac, so the macOS- and Apple-Silicon-specific runtime detail (which client, which quantization, how much RAM) is covered in the front-end clients notebook; this page stays about the models themselves, not the box they run on. I like using the community-trained models for specialization or jailbreaking.¹ As with many other parts of AI, the community is incredible.

For audio stuff, see music diffusion.

1 Theory

For the maths, Neural denoising diffusion models is the canonical home; the pre-diffusion lineage (DeepDream, GANs, CPPNs) is its own page.

Some pointers for image-diffusion specifically:

Interestingly, there is a move to leave diffusion behind in favour of autoregressive models — see e.g. Alpha-VLLM/Lumina-mGPT-2.0.

2 Where to find generation models

Hugging Face is the heavy-hitter in neural networks generally and hosts most of the foundation generative image models. Generative art models additionally have the specialized community CivitAI, which hosts the long tail of community fine-tunes. The workhorse format is the LoRA (low-rank adaptation): a small module — often under 300 MB — that bolts a style, character, or idiom onto a base model without retraining the whole thing. CivitAI also hosts full fine-tunes — whole checkpoints retrained or merged for an aesthetic, of which Pony Diffusion V6 XL is one — which carry a style more thoroughly but run to several GB apiece and are far more of a chore to make and store. The LoRA’s ubiquity is a convenience story (small, cheap, stackable), not the only way to specialize a base. For the back-story of both, see AI democratization.

Feature	Hugging Face	CivitAI
Focus	Research-first platform (300,000+ models)	Community-driven artistic hub
Model Types	Stable Diffusion variants, ControlNet, LoRAs	Artistic models (anime, photorealistic, 3D), fine-tuned LoRAs
Discovery	Organized by pipeline tags and metrics	Visual browsing with instant output previews
Documentation	Comprehensive model cards with bias analysis	User-generated examples and prompt sharing
Community	Academic and ML practitioner oriented	Artist and creator focused
Integration	Native PyTorch/TensorFlow support, `diffusers` library	Simple download format for GUIs like Draw Things
Content Policy	Stricter content guidelines	More permissive with NSFW filters
Traffic	Research-focused userbase	25M+ monthly visits, 500+ new models daily
Ecosystem	Central to ML research and deployment	Popular for artistic workflows and style training

Most local clients (Draw Things, ComfyUI, Mochi Diffusion, …) import from both ecosystems.

3 Notable model lineages

The model landscape is fractured between corporate offerings (Flux, DALL-E 3, Midjourney — polish and ease, but advanced features behind APIs or subscriptions) and community-trained ones (SDXL, CivitAI LoRAs — customisation and local control, steeper learning curves). The full back-story of how this fracture happened — Stability AI’s 2024 splinter, the founding of Black Forest Labs, the Flux “dev” / “pro” split — is at AI democratization.

Ideogram (ex-Google Imagen team) renders text inside images — typography that every other model handles badly.

Too many options? There are for me. I used the Vibe Check™ method to narrow it down. Which is to say, I asked an LLM to scan some community write-ups and CivitAI discussions. Citations needed.

FLUX.1 [dev] produces camera-accurate images — literal rather than atmospheric — and handles text legibility that was essentially impossible before this generation. Community comparisons consistently put it at or near the top for photorealism and prompt adherence among open-weight models. The cost is speed: roughly a minute per image locally without GGUF quantization. Non-commercial licence. The one I’d grab for publication-grade output.

FLUX.1 [schnell] is a distilled version of dev, Apache 2.0, and roughly 6× faster at a quality cost that most describe as noticeable but not catastrophic. Community practice has largely converged on schnell for iteration and dev (or a LoRA stack on dev) for finals. The SDXL vs Flux comparison at Stable Diffusion Art gives a sense of where schnell sits against the field.

FLUX.2 [klein] 4B generates in under a second on a current GPU and is viable for interactive brainstorming. Early community impressions put quality somewhere around Z-Image Turbo — good enough for ideation, not for portfolio. Apache 2.0; the 302.AI benchmark gives the most systematic quality breakdown currently available.

Z-Image Turbo (Alibaba Tongyi, 6B, Apache 2.0, November 2025) runs in 16 GB, generates in eight steps, renders English and Chinese text, and tops the open-source tier of the Artificial Analysis leaderboard — 8th overall at launch, ahead of much larger models. The deciding factor is community action, and there is plenty of it in both the West and China: LoRAs, checkpoint forks and ControlNets on CivitAI, and on the Chinese platforms it has topped the ModelScope popularity charts and shipped on LiblibAI, with the LoRA-training scene described as carrying energy “similar to the early Stable Diffusion era.” Training LoRAs on a distilled model is fiddly — naïve training destroys the acceleration trajectory — and people put up with that anyway, which is what grassroots adoption looks like.

SDXL and its fine-tune ecosystem remain relevant primarily because of ecosystem depth: the LoRA catalogue for SDXL is still estimated at five to ten times the size of FLUX.1’s, and specific aesthetics — a film stock, an illustrator’s hand, an art movement — are far more likely to exist as SDXL fine-tunes. On raw photorealism Flux wins; but if a specific look already exists as an SDXL LoRA, SDXL is often the practical path.

Pony Diffusion V6 XL is the model for furries! It uses Danbooru/e621 booru tagging conventions — score_9, anthro, sad_expression, dramatic_pose — that give precise control over character features and emotional content. Output skews semi-anime even on realistic prompts; the palette is vibrant and linework clean; without the score tags, output dulls. The CivitAI model page and the Stable Diffusion Art write-up describe the tagging system in detail.

Midjourney V7/V8 is the aesthetic engine rather than the control engine. It is tuned to be attractive by default: even loosely-specified prompts come back polished and idealized, and it will override a requested style in favour of something better-looking. The result is a recognizable house look that this LLM vibe-check would sum up as “luxury hotel lobby”: reliably beautiful, hard to push in a specific compositional direction, and the subject of an ongoing critique about aesthetic homogenization in AI art (McCormack et al. 2024). No API — Discord/browser only — which means no programmatic pipeline.

SD 1.5 is a legacy at this point, but its ControlNet and LoRA ecosystem is the deepest in the field and does not port to SDXL. If a specific look exists only as an SD 1.5 fine-tune, there is often no practical alternative; otherwise SDXL or Flux are usually better choices.

3.1 Feature matrix

The table below is a quick-reference summary of the above; the licence and RAM columns are factual, the rest is aggregated community impression.

Model	Licence	16 GB RAM	Output character	Speed (local)	Text in image	LoRA depth
FLUX.1 dev	Non-commercial	GGUF Q4–Q5 only	Photorealistic, literal	Slow	Good	Growing
FLUX.1 schnell	Apache 2.0	GGUF Q5 fits	Photorealistic, slightly softer	Moderate	Good	Growing
FLUX.2 [klein] 4B	Apache 2.0	Yes, fp16 ~8 GB	Photorealistic, draft quality	Fast	Untested	Nascent
Z-Image Turbo	Apache 2.0	Yes, ~12 GB fp16	Photorealistic, portraits	Very fast (8-step)	Good (EN/CN)	Growing fast
SDXL	CreativeML OpenRAIL-M	Yes, ~6.5 GB	Fine-tune dependent	Moderate	Poor	Very deep
Pony V6 XL	CreativeML OpenRAIL-M	Yes, ~6.5 GB	Semi-anime / illustrative	Moderate	Poor	Deep, booru-tagged
Midjourney V7/V8	Closed, API only	No	Opinionated, polished	Fast (API)	Moderate	—
SD 1.5	CreativeML OpenRAIL-M	Yes, ~2 GB	Soft, painterly	Very fast	Poor	Deepest
Ideogram	Closed, API only	No	Mixed	Fast (API)	Excellent	—

Speed is relative and hardware-dependent; published benchmarks usually run on a desktop NVIDIA GPU, which is faster than most laptop or Apple-Silicon setups. For live quality rankings — Elo scores from blind human votes, updated as new models ship — the Artificial Analysis text-to-image leaderboard is the canonical source.

3.2 Resolution

Think in a megapixel budget rather than fixed dimensions. Each model has a native total-pixel count it was trained around, and that budget is spent across whatever aspect ratio we choose — at ~1 MP, 1024×1024, 1216×832 and 1344×768 are the same budget in different shapes. This size-and-aspect conditioning is something SDXL introduced (Podell et al. 2023), training across multiple aspect-ratio buckets rather than a single square.

Two consequences follow. Generating much past native produces artefacts — duplicated heads and limbs, repeated motifs — rather than mere softness; the SDXL paper (Podell et al. 2023) motivates its whole design around the resolution limits of the 512-native Stable Diffusion 1.x models. And the route past native is upscaling, not a bigger generation: render at native, then enlarge with a tiled-diffusion pass or a dedicated ESRGAN-class upscaler (ChaiNNer and Spandrel are on the clients page).

How far each model holds together before it needs that upscale step varies by backbone:

SD 1.5 is 512-native and falls apart not far above 768 — the limitation the SDXL paper (Podell et al. 2023) was written to address.
SDXL and its fine-tunes (including Pony V6 XL) are 1024-native (Podell et al. 2023) (~1 MP), comfortable to roughly 1.5 MP.
FLUX.1 dev / schnell sit at ~1 MP native and in practice stay coherent to around 2 MP (dev weights and card).
FLUX.2 dev generates up to 4 MP natively — Black Forest Labs’ explicit pitch is detail at that size without an upscale step.
Z-Image Turbo is trained at 1024 but generates to 2048×2048 (~4 MP) given the memory.
Midjourney and Ideogram are fixed-output API services — preset sizes, not a megapixel dial.

A LoRA does not change any of this — it rides the base, so the niche Flux-dev LoRAs inherit Flux.1’s ~1–2 MP envelope regardless of what they were trained to draw.

On a memory-rich machine, the limiting factor is coherence, not memory: the question is what each model stays coherent at, and larger images cost time rather than memory. On a small machine, memory binds first, and output size competes with the model weights for it; the clients page covers the compromises.

3.3 Starting points by goal

Photorealistic output for publication → FLUX.1 dev (or dev + LoRA)
Fast iteration on 16 GB, no licence constraint → Z-Image Turbo, FLUX.1 schnell GGUF, or FLUX.2 [klein] 4B
Specific aesthetic that probably exists as a fine-tune → check SDXL on CivitAI first
Anime, creature design, dramatic poses, negative-affect content → Pony V6 XL
Text legible inside the image → Ideogram (API) or FLUX.1 dev
Just want something that looks good without thinking about it → Midjourney
Need the deepest ControlNet or LoRA ecosystem → SD 1.5 (check SDXL first; if it’s not there, it may only be here)

With a model picked, the front-end clients page covers which app and runtime to run it through.

3.4 The rest of the menu

Open Draw Things and the model list is loooong — names like LTX, Wan 2.2, Hunyuan, ERNIE, HiDream-I1, Cosmos. Most of that length is the menu collapsing three different kinds of model into one alphabetical list. Video models (Wan 2.2, LTX-Video, Hunyuan) are a separate modality; for running them locally see the clients page. Cosmos is NVIDIA’s world foundation model for robotics and autonomous-vehicle simulation — it generates video, but as synthetic training data for embodied AI rather than as art, so for our purposes it is the one name to skip outright.

Among the image models proper, two more have ecosystems forming around them. HiDream-I1 (HiDream.ai, 17B, MIT) topped the Artificial Analysis board for a stretch of 2025 and has a modest but active LoRA following, with the lineage continuing in the O1 successor. ERNIE-Image (Baidu, 8B, Apache 2.0) is newer still — it occupies the text-rendering niche and picked up ComfyUI workflows and community quantizations within days of release, though it is too young to know whether the buy-in holds.

One prominent name is absent from the sketches despite filling a slot in every client’s menu: Stable Diffusion 3 / 3.5. SD3’s June 2024 launch went badly — weak anatomy, and a licence so restrictive that CivitAI temporarily banned all SD3 resources — and although Stability later revised the licence and shipped the improved 3.5, the community had already decamped to Flux. By the same community-action test, it has not earned a sketch: in 2026 its LoRA ecosystem is still thin beside SDXL or Flux. Being in the Draw Things menu and being something people actually build on are not the same thing.

4 Niche fine-tunes and LoRAs

A working menu of specialist fine-tunes and LoRAs, weighted toward idioms the default checkpoints render badly. Most are Flux.1-dev LoRAs (inheriting the FLUX.1 [dev] non-commercial licence); a few sit on Z-Image Turbo, which is markedly faster on memory-constrained hardware.

4.1 Historical engravings and prints

YFG Albrecht Dürer Engraving Style (Flux dev) — actually models Dürer’s burin line work (swelling/tapering strokes, cross-hatched form modelling) rather than the muddy “old print” pastiche most engraving LoRAs settle for. The closest thing to a proper copperplate idiom.
WOOD ENGRAVING Style for FLUX (Flux dev) — clean large-block woodcut, not mezzotint sludge. Useful when we want Gustave Doré rather than Piranesi.
gokaygokay/Flux-Engrave-LoRA (Flux dev) — gokaygokay is a well-known HuggingFace style-LoRA trainer with consistent technical quality. Good fallback if the CivitAI engraving LoRAs overfit on a particular subject matter.
Perfect Ink Drawing — Engraving Style (Z-Image Turbo) — one for memory-constrained work: cross-hatched etching look on a fast base that runs comfortably in 16 GB.

4.2 Scientific diagrams

YFG Patents (Flux dev) — trained on patent figures with numbered callouts, dashed hidden lines, exploded views. Most “blueprint” LoRAs only render whole machines; this one captures the schematic call-out idiom itself.
Anatomica: Chalk (Flux dev) — anatomical chalk-plate aesthetic, and the creator notes the style transfers to non-anatomical subjects, so we get “Vesalius treats a toaster”.
Century Botanical Illustration (Flux dev) — 18th-century plate aesthetic (stipple shading, latin labels, ivory ground). Almost every “botanical” LoRA returns modern watercolour; this one does engraved plates.
Anatomical Surrealism (Flux dev) — Haeckel-meets-Da-Vinci hybrid of plate-style anatomy with mechanical/botanical.

Scientific-diagram coverage is thinner than one might hope. There is room here for a fine-tune on Ramón y Cajal-style neural plates or period microscopy.

4.3 Emoji and stickers

fofr/sdxl-emoji (SDXL) — the canonical Apple-emoji LoRA. Lightweight, well-known, transparent-bg-friendly via post-matte.
starsfriday/Kontext-Emoji-LoRA (FLUX.1-Kontext-dev) — different value proposition: style transfer to emoji. Feed a portrait, get the emoji version of that person. Kontext-native.

There is, as far as I can tell, no great isolated-icon-with-transparency model right now. Generate on a flat colour and matte out with rembg, BiRefNet, or SAM as a separate step.

4.4 Classic spot-colour posters

Retro Ad Flux (Flux dev) — mid-century print advertisements, not movie posters: limited palette, halftone dots, Rockwell-ish illustration. The closest cousin to Cassandre and Saul Bass currently available.
Doug Sneyd Illustration (Flux dev) — mid-century Playboy cartoon-illustration aesthetic. Fits the spot-colour-poster category sideways and has the confident flat-colour brushwork the era is known for.

No Saul Bass or Cassandre LoRA exists, for example. People approximate the look by stacking Retro Ad Flux with prompt directions (limited palette, cut-paper shapes, named reference).

4.5 Other specialist idioms

Stained Glass Window Style (FLUX) (Flux dev, trigger SGW).

5 How do I download and use that cool model I found?

Hugging Face: model cards specify the format; most image models ship as safetensors and drop directly into ComfyUI’s models/ tree or Draw Things’ model manager. Format conversion docs cover the diffusers-to-safetensors and back paths for programmatic use.
CivitAI: most .safetensors checkpoints and LoRAs import directly into Draw Things (URL paste) or ComfyUI (drop in the right subdirectory). On a Mac, filtering by “macOS-optimized” or “CoreML” tags finds Mochi Diffusion-compatible bundles. The clients notebook has step-by-step walkthroughs for each client.

6 Training a LoRA

The flip side of the menu above is making your own. The recipe is consistent across tools: a small dataset (commonly 10–30 images for a style, more for a character or concept), a caption per image, and somewhere between a few hundred and a couple of thousand steps; you choose a rank (the LoRA’s dimension), trading capacity against file size. Most of the craft is in dataset curation and captioning rather than hyperparameters — the community write-ups are unanimous on that, and I have not trained one myself, so treat the specifics as reported rather than tested. The same tools train a full fine-tune (a whole new checkpoint, DreamBooth-style) rather than an adapter — only with more data, more compute, and a multi-GB result; that extra cost is why LoRAs are the default unless the goal is a whole new base.

Training is more hardware-hungry than inference, and the mature trainers are CUDA-first. The VRAM bar is high — Flux dev LoRA training wants roughly 24 GB and is comfortable at 48 GB, while FLUX.2 dev needs an 80 GB-class card — so without a large NVIDIA GPU, people rent one by the hour or hand the job to a hosted service. On a Mac the one on-device option is Draw Things’ LoRA training, slow but real; even a large-memory Mac usually still trains in the cloud, because the trainers want CUDA.

Roughly in order of how much of the stack we manage ourselves:

Own stack, local (CUDA): ai-toolkit (ostris) is the current default for Flux and FLUX.2; kohya-ss/sd-scripts is the long-standing standard with the deepest control across SD 1.5, SDXL and Flux. OneTrainer and SimpleTuner are alternatives.
On-device on a Mac: Draw Things trains LoRAs on Apple Silicon directly — slow, but no rented hardware.
Rent-a-GPU: RunPod or Massed Compute — run ai-toolkit or kohya on a CUDA pod by the hour, the middle ground between local and fully managed.
Fully hosted (upload images, get a LoRA back): fal.ai flux-lora-fast-training, Replicate’s Flux trainers, and CivitAI’s on-site trainer (priced in Buzz credits — around 2000 for a Flux LoRA).

7 Punditry

Stable Diffusion is a really big deal — Willison’s 2022 take; interesting now as a prediction document.
Porean Stablediffusion drama roundup — community politics ca. 2022; see also AI democratization for the fuller back-story.

8 Creative latitude, model alignment, and jurisdiction

8.1 Alignment and the default outputs

Most production-facing image models are post-trained with RLHF (reinforcement learning from human feedback) or DPO (direct preference optimization) passes that align outputs with human preference ratings — Diffusion-DPO (Wallace et al. 2023) is a representative public example of the technique, and there is now a systematic survey of these methods for diffusion models. In practice, preference ratings correlate with positive affect and conventional aesthetics, and against content that tripped US-market human reviewers during training. The result is familiar to anyone who has tried to generate a mushroom cloud for an anti-war poster, a face that stays sad, or a creature design that reads as threatening. This is a training choice, not an architectural constraint. The same base weights with different post-training produce different behaviour; the open ecosystem exists, in part, because other choices are possible.

The client pipeline adds a second layer of conservatism — a CLIP-based classifier (CLIP being the image-text matching model that scores how well an image matches a text label) that fires after the image is generated and blacks out or blurs the result. These are separate constraints: a model with no alignment bias could still have a restrictive client wrapper, and vice versa. In diffusers-based pipelines the safety checker is a separately-loaded model component that can simply not be loaded; it is not part of the generation weights. The front-end clients page covers which clients load it by default; this section is about the models themselves.

8.2 Where the extra latitude comes from

Three separable things account for most of the gap between a cheery production endpoint and a model that renders what we asked. The capability and speed of the models named below are in the lineage sketches and feature matrix; what follows is only the latitude dimension.

Skipping the alignment pass. A fine-tune trained on raw character or creature art, without the RLHF passes of a production model, keeps a wider emotional and compositional range — negative affect, dramatic poses, non-cute creatures, the content a default checkpoint renders as cheerful regardless of the prompt. Pony Diffusion V6 XL is the canonical example. The range is a training difference, recoverable by anyone who post-trains differently — not a property of the architecture.

The separable safety checker. The output-side filter is client software, not model weights (the clients page covers which clients load it). On the model side, the base SD 1.5 / SDXL weights before any post-training carry fewer constraints than the fine-tunes every API provider ships — which is why providers ship the fine-tune, not the raw release.

Permissive licensing. Latitude in the licence is a different axis from latitude in the weights, and three regimes are in play:

Apache 2.0 (FLUX.1 schnell, FLUX.2 [klein] 4B, Z-Image Turbo) places no restriction on subject matter at all.
OpenRAIL — and its SD-family variant CreativeML OpenRAIL-M, which Stable Diffusion 1.5 used and Pony inherits — keeps the weights open but attaches a use-restriction clause prohibiting a listed set of harmful applications (child sexual abuse material, certain disinformation uses). It restricts what the model is used for, not who uses it.
Non-commercial (FLUX dev, FLUX.2 [klein] 9B and [dev]) restricts commercial use of the weights, and Black Forest Labs ships a separate content-filter module alongside — often conflated with the weights, but distinct from them.

So the most permissively-licensed open weights are the Apache-2.0 ones, while the most permissive in content are the RLHF-skipped fine-tunes — and those need not be the same model.

CivitAI hosts the long tail of these fine-tunes (the Hugging Face vs CivitAI comparison covers the platform difference). Two alignment-relevant specifics: its content-gating is opt-in, so the default browse view is filtered but the full catalogue is not; and most uploads inherit their backbone’s licence, so an SD-family fine-tune is usually CreativeML OpenRAIL-M whether or not its page says so.

8.3 Negative-valence imagery

RLHF alignment correlates with positive affect on underspecified prompts — this has theoretical backing in the preference alignment literature (Wallace et al. 2023) but AFAICT has not been studied specifically for image emotional valence; treat it as well-grounded scuttlebutt rather than settled science. Community prompt engineering guides report that explicit physical descriptors work better than emotional labels: “tears streaming, jaw clenched, hollow eyes” rather than “sad character.” Adding positive-affect terms to the negative prompt — happy, smiling, cheerful, resolved, peaceful — is a commonly-reported workaround; the mechanism would be steering the sampler away from the part of latent space reinforced by positive preference ratings. I have found this useful but have no formal evidence it generalises.

The behaviour varies by model family and post-training regime. Fine-tunes that skipped RLHF passes respond differently; Pony Diffusion is the main example with a substantial community behind it.

For politically or historically charged imagery — mushroom clouds, battle scenes, burned buildings — there is a further distinction. Client-level classifiers (the safety_checker, InvokeAI’s NSFW blur) fire on the output image, not the prompt; the model weights themselves often do not have the corresponding restriction. Framing such prompts in historical or clinical registers (“nuclear test, Bikini Atoll, 1952 archival photograph, monochrome, fallout plume”) is anecdotal community knowledge — I have seen it reported to help bypass output classifiers, but have no systematic evidence for it, and how durable it is against classifier updates is unknown.

8.4 Jurisdiction

The legal landscape for training on copyrighted images, running inference, and publishing outputs differs across the jurisdictions relevant to a researcher working between Asia, Australia, and Europe. What follows is a snapshot as of mid-2026; case law is thin in all four regimes and several questions are actively contested. One term recurs throughout: a TDM (text and data mining) exception is the copyright carve-out that lets a model train on in-copyright works without first licensing them.

	Japan	Singapore	EU	Australia
Training on copyrighted images	Article 30-4, Copyright Act — non-waivable safe harbour for “information analysis / pattern recognition” uses; covers commercial and non-commercial	Section 244, Copyright Act 2021 — non-waivable computational data analysis exception; scope is “freely available” works	DSM Directive 2019/790 Arts. 3–4 TDM exception; Article 4 allows rightholders to opt out via machine-readable reservation	No TDM exception — government rejected Productivity Commission proposal in October 2025; s.40 Copyright Act 1968 fair dealing for research is narrow and enumerated; ML training likely reproduction infringement
Running inference / publishing outputs	Agency for Cultural Affairs 2024 guidance notes Art. 30-4 may not cover inference outputs that “recreate the enjoyment” of a specific creator’s style — contested; no court ruling	Fair use (Section 190); transformative purpose is one of four factors; no case law specific to AI image outputs yet	Watermarking and AI disclosure mandatory from 2 August 2026 under AI Act Art. 50; parody defence narrow and member-state-dependent	Inference outputs generally do not reproduce training data (sufficiently transformed); s.41A fair dealing for parody/satire available but narrow — copyright material must itself be satirised (Universal Music v Palmer [2021] FCA 434); no AI-specific case law
Copyright in AI output	Emerging interpretation: sufficiently detailed prompts may create copyrightable output; no settled case law	Human authorship requirement means purely AI-generated outputs are not automatically copyrightable	No unified EU position; most member states treat purely AI-generated output as uncopyrightable; contested at the CJEU (Court of Justice of the EU) level	No copyright without human authorship (s.32(3) Copyright Act 1968); no computer-generated works provision (unlike UK CDPA s.9(3)); government consulting on reform, no timeline
Content restrictions	No AI-specific content ban; general obscenity law applies; April 2026 Justice Ministry panel examining deepfakes	May 2026 Online Criminal Harms Act guidelines tighten enforcement on AI-generated NCII (non-consensual intimate imagery) of real persons; no restriction on artistic/research content not depicting real persons	AI Act Art. 50: machine-readable watermark + disclosure mandatory on all synthetic images from 2 August 2026	Online Safety Act 2021 governs published content; Criminal Code Amendment (Deepfake Sexual Material) Act 2024 creates federal offences for non-consensual intimate deepfakes (up to 6 years); SA, NSW, QLD state laws adding further restrictions; locally-generated unpublished content generally unregulated
Parody / satire	Not a statutory exception; court discretion; no settled case law for AI-generated parody	Transformativeness is a fair-use factor; research/commentary purpose can qualify; no AI-specific precedent	Narrow member-state exceptions; not harmonized; courts have not addressed AI-generated parody intent	Specific s.41A fair dealing exception (added 2006); test from Universal Music v Palmer [2021]: the copyright material itself must be the target of parody; human creative intent required; untested for AI-generated parody

Japan: The interpretive question is whether inference “for the enjoyment of the expression” falls outside Art. 30-4’s scope. The 2024 Agency for Cultural Affairs guidance is advisory, not binding case law. Publisher suits against AI companies (Asahi/Mainichi v. Perplexity, filed September 2025) may produce clearer precedent; outcomes are pending as of this writing.

Singapore: Section 244 was modelled on Japan’s Art. 30-4 and is explicitly non-waivable — copyright owners cannot eliminate it via their terms of service (ToS), which matters when using images from platforms with “no AI training” clauses. The May 2026 Online Criminal Harms Act guidelines target AI-generated NCII of real persons specifically, not artistic or scientific generation.

EU: C2PA metadata (Content Authenticity Initiative) is the emerging standard for Art. 50 compliance — machine-readable provenance embedded at generation time. The c2patool CLI is the open-source implementation. Non-compliance from 2 August 2026 attracts fines up to €35M or 7% of global annual revenue.

Australia: The most restrictive of these four jurisdictions for training — no TDM exception and no general fair use doctrine, only enumerated fair dealing categories. For published research outputs, the deepfake provisions in the Criminal Code and state laws are expanding rapidly (three state-level laws passed between November 2025 and early 2026); the common thread across all of them is that they target distribution of intimate imagery involving real persons, not research generation. The s.41A parody exception exists and has been tested (Universal Music v Palmer), but courts will want evidence of human satirical intent, which creates an awkward question for AI-generated outputs where the human’s role was primarily prompt engineering.

8.5 Hygiene

Several things seem to reduce exposure, per practitioners and lawyers in this area; none is a guarantee.

Documenting purpose before generating — a paragraph in a lab notebook or file header — creates contemporaneous evidence of intent. Fair-use and fair-dealing analysis in all four jurisdictions makes purpose and intent explicit factors.

Keeping prompt logs for published work has become a practical precaution. Courts have begun ordering preservation of AI generation records specifically: the SDNY’s May 2025 order in New York Times v. OpenAI required OpenAI to preserve “all output log data on a going forward basis”; a National Law Review analysis (Feb 2026) notes that GenAI prompts, outputs, and logs are now treated as discoverable ESI (electronically stored information) under standard rules. Retrospective reconstruction is harder to defend than contemporaneous records.

Embedding C2PA provenance metadata at generation time — mandatory in the EU from August 2026 — is useful elsewhere for the same reason: it makes the origin of an image legible without relying on the viewer to trust a caption. Draw Things and some ComfyUI node packs support it; the c2patool CLI works on any file.

Distributing NSFW or dramatically charged research imagery only within a research team or behind access controls is a common precaution. All four jurisdictions hinge liability primarily on distribution, not generation, which means the same image can be lower-risk as an unpublished research artefact than as a blog illustration.

Identifiable real persons in embarrassing or intimate contexts carry civil and — in some jurisdictions — criminal exposure regardless of artistic intent.

If our work may reach multiple jurisdictions, using EU Art. 50 requirements as a floor (watermark + human-readable “AI-generated” disclosure) satisfies all four regimes simultaneously.

9 References

Dhariwal, and Nichol. 2021. “Diffusion Models Beat GANs on Image Synthesis.” arXiv:2105.05233 [Cs, Stat].

Dutordoir, Saul, Ghahramani, et al. 2022. “Neural Diffusion Processes.”

Han, Zheng, and Zhou. 2022. “CARD: Classification and Regression Diffusion Models.”

Ho, Jain, and Abbeel. 2020. “Denoising Diffusion Probabilistic Models.” In Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20.

Hoogeboom, Gritsenko, Bastings, et al. 2021. “Autoregressive Diffusion Models.” arXiv:2110.02037 [Cs, Stat].

McCormack, Llano, Krol, et al. 2024. “No Longer Trending on Artstation: Prompt Analysis of Generative AI Art.”

Nichol, and Dhariwal. 2021. “Improved Denoising Diffusion Probabilistic Models.” In Proceedings of the 38th International Conference on Machine Learning.

Podell, English, Lacey, et al. 2023. “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis.”

Sohl-Dickstein, Weiss, Maheswaranathan, et al. 2015. “Deep Unsupervised Learning Using Nonequilibrium Thermodynamics.”

Song, Yang, and Ermon. 2020a. “Generative Modeling by Estimating Gradients of the Data Distribution.” In Advances In Neural Information Processing Systems.

———. 2020b. “Improved Techniques for Training Score-Based Generative Models.” In Advances In Neural Information Processing Systems.

Song, Jiaming, Meng, and Ermon. 2021. “Denoising Diffusion Implicit Models.” arXiv:2010.02502 [Cs].

von Platen, Patil, Lozhkov, et al. 2022. “Diffusers: State-of-the-Art Diffusion Models.”

Wallace, Dang, Rafailov, et al. 2023. “Diffusion Model Alignment Using Direct Preference Optimization.”

Yang, Zhang, Song, et al. 2023. “Diffusion Models: A Comprehensive Survey of Methods and Applications.” ACM Computing Surveys.

Footnotes

My on-ramp was Adventures in Finetuning Stable Diffusion — Pokémon fine-tuning, 2022.↩︎