Code agents and assistants

Turing-complete autocorrect, vibe-coding, …

2021-10-14 — 2026-06-29

Wherein Various Terminal Harnesses, Cloud Editors, and Model-Agnostic Agents for AI-assisted Coding Are Surveyed, With Attention to How Each Retrieves Relevant Source Code From a Repository.

faster pussycat
language
machine learning
making things
neural nets
NLP
signal processing
stringology
UI

This is a cousin to neural automata—writing machines that generate code for us, because code generation is a fancy form of text generation, which uses similar technology, i.e. large language models.

Two aspects make this work: the model and the interface.

The terminology (harness, skill) etc. is covered elsewhere. Running backends offline, on a Mac at least, is elsewhere again. The mathematical analogue has some similarities.

Here, I write down which coding agents I can tolerate.

Figure 1

1 Security

I’m vaguely concerned about how much of the world’s source code now gets uploaded to a handful of code servers, and the potential for abuse scales with the code mountain, accreting there like a giant fatberg of latent expertise and corporate know-how. Anyway, the developer arms race is real, so let’s all ignore it and keep flushing our code into their pipes, eh?

2 General advice

Keep it simple.

3 Cloud and commercial harnesses

The cloud-hosted and commercial editors. None has a serious local-model path.

3.1 Claude-code

anthropics/claude-code: “Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you write code faster by executing routine tasks, explaining complex code, and handling git workflows — all through natural language commands.”

It’s a command-line tool — a nicely designed one that interacts gracefully with a normal IDE.

npm install -g @anthropic-ai/claude-code

Pro-tip: Hooks reference

I have little to say about Claude Code; it is too thoroughly documented elsewhere/everywhere and in any case moves too fast to be worth explaining.

Sebastian Raschka points out that we can force Claude Code to use a local model (!)

ollama launch claude

3.2 Cursor

Cursor:

Cursor is an AI-powered code editor that helps you build software faster.

It’s another VS Code fork with its own AI engine and some extra UI affordances. My colleagues assure me the fork causes far fewer annoying psychoses and sidetracks than Copilot. Cloud by design, though — there is no serious local-model path here.

3.3 GitHub Copilot

GitHub Copilot now uses some off-the-shelf GPT-4 model (if I recall correctly) for code completions. The original Codex engine was strikingly good, and I don’t think the general-purpose models have matched it, even years later.

Figure 2: Looks like AI Safety is going fine in GitHub Copilot.

GitHub Copilot has a great workflow for automatic completions, and that’s what originally made me pay for it. Still acceptably useful, this bit.

Since then, they’ve rolled out extra chat interfaces. The new generation of tools is janky and only semi-reliable, at least in my experience. They’re bad at following instructions, mess up basic stuff like indentation, and aren’t especially fast. Occasionally they’ll forget they’re supposed to edit code and instead talk about editing code, dump weird repeated sections into the file, or just delete random stuff. It’s a bit like coding with a drunk genius, which is to say occasionally brilliant but usually messy.

OpenAI squandered their early lead.

Pro-tip: Behind a corporate firewall Copilot needs a specific set of whitelist exceptions.

3.4 Warp

Warp is an all-in agentic terminal/IDE — mixed-model, codebase embeddings, MCP, the lot. Heavily hyped; I haven’t tried it.

4 Model-agnostic and offline harnesses

There are too many of these damn things now—everyone writes their own agent. They speak plain OpenAI-compatible HTTP, so I can point them at a commercial token host or a local server — Osaurus on localhost:1337, ds4-server on 127.0.0.1:8000, or Ollama on 11434 are the ones I like. With local models the limiting factor is the model, not the harness.

4.1 OpenCode

OpenCode is a generic terminal harness — see the main OpenCode write-up. As the name suggests, it is especially useful for coding, working even offline: point it at a local endpoint with a custom-provider baseURL and it wraps LSP, MCP, and a plugin system around the model. The official VS Code extension exists but looks shonky atm.

4.1.1 MiMo Code

Xiaomi’s MiMo Code (MIT) is a fork of OpenCode adding long-horizon memory and a self-improvement layer; it inherits the custom-provider path but probably works best with Xiaomi’s own model.

4.2 Codex CLI

OpenAI’s Codex CLI (openai/codex, Apache-2.0) is a terminal harness and — surprisingly for OpenAI — open-source. Naming caveat: this Codex is the agent, not the sadly-mourned original Codex code model that powered early Copilot nor the GPT-based successors. Still, it is worth taking on its own terms. For example, it need not plug in to OpenAI. We can instead point it at whatever we want, even a local model mode.

brew install --cask codex

Codex seems to occupy a similar niche to OpenCode but deviates in two ways: First, it sandboxes by default — Apple Seatbelt on macOS, Landlock/seccomp on Linux — reducing the blast-radius of rogue agents. Second, Sebastian Raschka clocked it as the most token-frugal of the three harnesses he measured (Codex < Qwen-Code < Claude Code), which matters especially with a slow or constrained local model. Pi is likely even more frugal, at the price of a steeper learning curve.

Config to use an open endpoint was not immediately obvious to me. We must configure a ~/.codex/local.config.toml profile invoked as codex --profile local:

# ~/.codex/local.config.toml — overlaid on config.toml by `codex --profile local`
model_provider = "osaurus"      # our own provider id (see table below)
model          = "qwen3.6:35b"  # whatever the server is serving

[model_providers.osaurus]
name     = "Osaurus"
base_url = "http://localhost:1337/v1"    # OpenAI-compatible endpoint; swap for ds4 on 8000, omlx, etc.
wire_api = "chat"                        # these servers speak Chat Completions, not OpenAI's Responses API

There are pre-defined model provider entries for openai/ollama/lmstudio.

Note that the project-level profile .codex/config.toml ignores model_provider/openai_base_url for trust reasons.

Interestingly Raschka also found Qwen3.6 scoring better under Codex than under its parent company’s own Qwen-Code.

4.3 pi

earendil-works/pi (Mario Zechner / badlogic, MIT) a.k.a. Pi Coding Agent, is a generic minimalist harness. Pi’s philosophy is idiosyncratic enough that I introduced it in the LLM Agents page as a whole alternate way of doing things. What matters here is that it runs offline and is the harness driving antirez’s famous ds4 on a Mac. There’s a learning curve, but notionally a rewarding one.

npm install -g @earendil-works/pi-coding-agent   # gives us the `pi` command
# extend it with packages: pi install npm:@foo/pi-tools

4.4 Aider

Aider (Aider-AI/aider, Apache-2.0) is “AI pair programming in your terminal.”

uv tool install --force --python python3.12 --with pip aider-chat@latest
# or the (uv-based) one-liner: curl -LsSf https://aider.chat/install.sh | sh

aider is a git-native AI assistant, in the following sense: we execute it inside a git repository, tell it which files we’re working on, and describe the change we want in plain English at a chat prompt. Aider sends the model those files (plus a repo-map of everything else so it has context), gets an edit back, and writes it straight to the files on disk — then commits it, each change as its own git commit with a generated message. So we never copy-paste code out of a chat window, and reviewing or undoing what the model did is just git diff and git revert.

That git-per-change discipline is whole potato. It is less an autonomous agent that wanders off and does ten things, more a tightly scoped tool kept well scope by using git affordances. Harper Reed’s workflow gives us the flavour.

For our purposes it is model-agnostic — point it at any OpenAI-compatible base_url, or use the ollama/ prefix for a local model — and the repo-map means a smaller local model still gets relevant context without us hand-feeding it files.

It has some nifty automations, linting and running the test suite after each change and using the logs from that to inform the model

Slight variant: --watch-files mode leaves aider running while we work in our normal editor, watching for AI! / AI? comments we drop into the code and acting on them, so we never have to leave the IDE.

# Make a snake game. AI!
# What is the purpose of this method AI?

Less ambitious, less risky, elegant.

4.5 Goose

Goose (Linux Foundation, Apache-2.0) is a general-purpose harness — desktop app plus CLI, grown-up enough to have been adopted by the Linux Foundation. For coding specifically it speaks the Agent Client Protocol, so it can back Zed or JetBrains as the agent behind the editor, and also has some code plugins I have not tried yet.

4.6 Cline

Have you, like me, grown flabby on a diet of soft, cosy code editors? If so, and, for example, we want VS Code agent sidebars, Cline (cline/cline, Apache-2.0) sees us, and it understands. That sidebar is where Cline began.

With Cline 2.0 they pulled the agent loop out of the extension into a standalone open-source harness, @cline/sdk, and the same engine now drives a terminal CLI, a JetBrains plugin, the VS Code sidebar, and a web-based multi-agent kanban board. The runtime has all the bells and whistles and rescue knives we’d expect to find strapped to a serious harness — subagent teams, MCP, scheduled cron agents, chat connectors for Slack/Telegram/Discord. They claim impressive performance on something called terminal-bench.

code --install-extension saoudrizwan.claude-dev   # the VS Code sidebar
npm install -g cline                              # the standalone CLI
npm install @cline/sdk                            # build our own agent on the same runtime

A little dynasty of editors descends from Cline: they are mostly forks of Cline’s old monolithic extension, not consumers of the new @cline/sdk. Roo Code was the prominent one — a Cline fork that added per-mode permission scoping (Code / Architect / Ask / Debug modes with different tool budgets) — but it shut down in May 2026 and its repo is archived, with the community decamping to ZooCode. Kilo Code (MIT) has a complicated family tree. The editor extension is a fork of that fork (Cline → Roo Code → Kilo) while its CLI is a separate fork of OpenCode.

4.7 Qwen Code

Qwen Code is the Qwen team’s terminal coding agent — originally a fork of Google’s Gemini CLI, since gone its own way. Multi-protocol like the rest of this section: OpenAI / Anthropic / Gemini / Qwen APIs plus a local Ollama or vLLM endpoint, switchable at runtime. It aims for Claude Code feature parity — subagents, MCP, plan mode, and the agentskills.io SKILL.md format (skills docs). Not to be confused with Qwen-Agent, the Python harness library, which is a separate codebase that does not natively consume skills or do that other fancy stuff.

4.8 Pour one out for the departed (and the renamed)

Fauxpilot (the self-hosted Copilot clone, long dead), Cody (Sourcegraph moved on to Amp), Kiro (Amazon’s cloud-bound spec-driven IDE), Codeium (now Windsurf), Amazon CodeWhisperer (now Amazon Q Developer), Codestral Mamba, Llama Coder, and Replit / Code Llama — all had their moment.

5 The models behind them

Choosing a good tool-using model seems to be important for most agentic workflows. tl;dr generic agentic tool-use is reputedly unreliable below roughly the 30B class models. This does not matter if you are using a frontier model in the cloud, but it does if you are trying to run a local model on a Mac.. In that latter case, a popular default is Qwen3.6-35B-A3B.

5.1 Oh damn language diffusions are weird

Those picks are all autoregressive: we append a prompt, the model appends an answer. A diffusion model like DiffusionGemma instead fixes a block of text and denoises the whole thing at once. Every token effectively attends to all the others. That is a weird different inference paradigm. I have always been slightly interested by it, but now that I think about the practicalities I have questions.

Streaming that into a chat box, we are watching a printing press pretend to be a typewriter. It would better suit an interface built around editing text rather than chatting — much as image diffusions get inpainting in their GUIs. Rather than only appending, we’d want to regenerate selections, infill gaps, rewrite under constraints. The closest things today are Aider’s AI! / AI? scoped edits and Continue’s inline-diff autocomplete, both of which already mutate selected text — though I’m unclear whether either explicitly supports the diffusion behaviour, and I don’t fully trust the process at arbitrary infill anyway. Also, the actual prompt window for each generation looks quite short (hundreds of tokens) which seems… not enough?

6 MCP

Model Context Protocol is a standard to enable an agent to access tools and data. Here are some fun code-specific servers — find more at the official MCP Registry or punkpeye/awesome-mcp-servers.

oraios/serena gives an agent LSP-style symbol navigation and refactoring across 30-odd languages. Since some heavy hitters (Claude Code) ship no codebase index of their own, Serena might beat groping by grep on a large repo:

uv tool install -p 3.13 serena-agent
serena init
claude mcp add serena -- serena start-mcp-server --context claude-code --project "$(pwd)"

For git and GitHub operations — repos, PRs, issues, Actions — the official GitHub MCP server does the job, though in the name of safety it wants a personal access token and a running docker daemon:

claude mcp add github \
  -e GITHUB_PERSONAL_ACCESS_TOKEN=$GITHUB_PAT \
  -- docker run -i --rm -e GITHUB_PERSONAL_ACCESS_TOKEN ghcr.io/github/github-mcp-server

For the well-known problem of models hallucinating outdated APIs there are docs-fetcher servers (Context7, Git-MCP), but a plain web-search/reader (jina-reader) seems to do the job for me. Browser automation (Playwright MCP, Chrome DevTools MCP) helps debug web apps, but I have not used those so speak of them no further.

For Apple-platform vibecoding — XcodeBuildMCP plus Apple’s own xcrun mcpbridge — see Vibecoding Apple apps.

8 Packing code for AI

Sometimes we just want to flatten a whole codebase into a single blob and paste it into a chat window or seed a fresh context. This matters less than it used to — an agent with the search tools above rarely needs the whole repo at once — but for that one-shot case it’s still handy, since a model can read one text file easily but finds a scatter of files more confusing.

yamadashy/repomix is the classic.

repomix --remote https://github.com/yamadashy/repomix

Also available online at repomix.com.

simonw/files-to-prompt is another favourite (“Concatenate a directory full of files into a single prompt for use with LLMs”). See files-to-prompt for background.

My fork supports a handy alternative usage pattern:

files-to-prompt --since v1.2.0 --since-scope working
files-to-prompt --since HEAD --since-scope staged

Also handy for online usage: cyclotruc/gitingest / Gitingest — “Replace ‘hub’ with ‘ingest’ in any GitHub URL to get a prompt-friendly extract of a codebase”.

9 ACP

The Agent Client Protocol (ACP) is how an editor talks to an agent. Zed wrote it, and the pitch is modelled on the Language Server Protocol: rather than every editor hand-rolling an integration for every agent — and every agent re-implementing each editor’s API — both sides speak one JSON-RPC dialect, and ideally any ACP editor can drive any ACP agent.

Mechanically, the agent runs as a subprocess of the editor and they talk JSON-RPC over stdio (or HTTP/WebSocket for a remote agent); the editor is the boss of the files and the terminal, so the agent’s edits arrive as native diffs and its shell commands run in the editor’s own terminal. It reuses MCP’s JSON types where it can and adds a few agentic-UX types of its own.

On the editor side the adopters are Zed, JetBrains, and community bridges exist for Neovim, Emacs, and VS Code. On the agent side, Goose ships a goose acp server, Google’s Gemini CLI likewise I think, and Claude Code and Codex have adapters, I think. cf. the clients list and Goose’s ACP-clients guide.

10 Workflows

The vibe coding workflow using git worktrees has been formalised, e.g. in Crystal: Supercharge Your Development with Multi-Session Claude Code Management / stravu/crystal.

11 Incoming

Figure 3

12 References

Beurer-Kellner, Fischer, and Vechev. 2022. Prompting Is Programming: A Query Language For Large Language Models.”
Bubeck, Chandrasekaran, Eldan, et al. 2023. Sparks of Artificial General Intelligence: Early Experiments with GPT-4.”
Din, Karidi, Choshen, et al. 2023. Jump to Conclusions: Short-Cutting Transformers With Linear Transformations.”
Suzgun, Scales, Schärli, et al. 2022. Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them.”
Wang, Wei, Schuurmans, et al. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models.”
Weller, Boratko, Naim, et al. 2025. On the Theoretical Limitations of Embedding-Based Retrieval.”