Editing images with machine learning

2018-10-16 — 2026-04-19

Wherein the text-instruction paradigm for image editing is surveyed alongside single-task tools for upscaling, background removal, and face restoration, and the field’s rapid rate of change is noted.

computers are awful
generative art
making things
photon choreography
Figure 1

This page is about editing an existing image with ML — not generating one from a prompt. For text-to-image generation, see generative art with diffusion models; for the pre-diffusion neural-art lineage (DeepDream, GANs, CPPNs and friends), see the historical record. For non-ML editing, see GUIs and the command line.

This list will rot. The state of the art in image-editing models moves in months, sometimes weeks. Hours if you are on the right Discord servers. For something current, the trending tab on Hugging Face, the news threads at r/StableDiffusion, and CivitAI are better than this notebook.

There was, around 2019–2023, a brief efflorescence of small ML companies doing one trick each — sharpen, upscale, remove a background, restore a face. Most have been bought up by Adobe, abandoned, or made redundant by general-purpose models. We will not eulogise them here. The survivors are below.

1 Instruction-following editors

The interesting category in 2026: image plus a text instruction in, edited image out. “Remove the lamppost.” “Make this a watercolour.” “Extend the canvas to 16:9 and fill the new space with sky.” The same pipeline that runs text-to-image now runs in reverse, conditioned on the input image.

The pattern has three flavours worth distinguishing:

  • Open-weights edit models that we can run on our own GPU or via API. Most flexible, slowest to set up.
  • Closed hosted edit endpoints from the big labs. Fast, cheap-per-call, opaque.
  • Frontier multimodal LLMs with image editing baked in. The chatbot we use for text now also edits images, often well, with the conversation history as implicit context.

Durable lineages worth tracking:

  • FLUX.1 Kontext (Black Forest Labs, the post-Stability splinter that built the Flux family). Open-weights “dev” version (non-commercial), paid “pro” / “max” via API. The first open-weights edit model that doesn’t feel like a toy — runs in ComfyUI / InvokeAI alongside generation pipelines.
  • Qwen-Image-Edit / Qwen-Image-Edit-2509 (Alibaba). Open-weights, strong on text-in-image edits and on Chinese-language prompts. The 2509 update is the one to grab as of this writing.
  • Gemini 2.5 Flash Image (“Nano Banana” in the Gemini app, available via Vertex AI). Cheap, fast, conservative about identity preservation — faces don’t drift much across edits, which is great for not terrifying our social brains
  • GPT-Image-1 (OpenAI). Edit-mode endpoint of the same model that powers ChatGPT image generation.
  • Adobe Generative Fill inside Photoshop, on Adobe’s Firefly model. Most-deployed by a wide margin, because Photoshop.

Papers for the open-weights ones:

2 Single-task tools that are still useful

Even with instruction-following editors, a specialised tool is often cheaper, faster, or more predictable. The ones I keep returning to:

2.1 Background removal

Figure 2
  • remove.bg — hosted, cheap, fast.
  • Clipping Magic — hosted, marginally more configurable, with a nice user interface.
  • BiRefNet — open-weights, runs locally, scriptable.
  • RMBG-2.0 (Bria) — open-weights, commercial-friendly.

Papers:

2.2 Upscaling and super-resolution

  • Topaz Gigapixel — paid, the polished commercial option for photographic upscaling.
  • Upscayl — free, open-source, GUI wrapping Real-ESRGAN and friends. Cross-platform, drag-and-drop, no setup.
  • Real-ESRGAN — open-weights workhorse; runs as a node in ChaiNNer, or as a CLI on its own.
  • 4x-UltraSharp and many similar community-trained ESRGAN model files circulate on Hugging Face and CivitAI.

Papers:

2.3 Object removal and inpainting

  • cleanup.pictures — quick browser tool for removing people, text, and small defects from a single image.
  • For heavier inpainting workflows — mask + prompt, control over what fills the hole, regional generation — the local generation GUIs in the diffusion notebook (ComfyUI, InvokeAI, Draw Things) all do this on top of Stable Diffusion / Flux / SDXL backbones.

2.4 Face restoration

GFPGAN and CodeFormer are the standard open-weights face restorers — they fix small-resolution or compression-damaged faces. Both are available as nodes in ChaiNNer.

Papers:

2.5 Document cleanup, old-skool

ScanTailor Advanced does content-aware cropping, dewarping, and background removal of scanned documents. Strictly classical Computer Vision — no generative model — but the use case is alive and the tool still works well, albeit with spotty maintenance. The macOS bundler is at yb85/scantailor-advanced-osx.

3 Chaining model pipelines

ChaiNNer (source) is a node-based GUI for chaining image-processing operations, originally built for upscaling. Its Python core, Spandrel (PyPI), supports a wide catalogue of PyTorch architectures for super-resolution, restoration, and inpainting. Useful when we want to run Real-ESRGAN → GFPGAN → format conversion over a folder without writing a script.

For heavier diffusion-based workflows, see ComfyUI / InvokeAI / Draw Things in the diffusion notebook.

4 References

Chan, Li, Loy, et al. 2022. Towards Robust Blind Face Restoration with Codebook Lookup Transformer.” In Advances in Neural Information Processing Systems 35.
Labs, Batifol, Blattmann, et al. 2025. FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space.”
Soman. 2020. GIMP-ML: Python Plugins for Using Computer Vision Models in GIMP.” arXiv:2004.13060 [Cs].
Wang, Li, Zhang, et al. 2021a. Towards Real-World Blind Face Restoration with Generative Facial Prior.” In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
———, et al. 2021b. Towards Real-World Blind Face Restoration with Generative Facial Prior.” In.
Wang, Xie, Dong, et al. 2021c. Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data.” In.
———, et al. 2021d. Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data.” In 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).
Wu, Li, Zhou, et al. 2025. Qwen-Image Technical Report.”
Zheng, Gao, Fan, et al. 2024. Bilateral Reference for High-Resolution Dichotomous Image Segmentation.” CAAI Artificial Intelligence Research.
Zhou, Chan, Li, et al. 2022. Towards Robust Blind Face Restoration with Codebook Lookup Transformer.” In.