Code agents and assistants
Turing-complete autocorrect, vibe-coding, …
2021-10-14 — 2026-06-29
Wherein Various Terminal Harnesses, Cloud Editors, and Model-Agnostic Agents for AI-assisted Coding Are Surveyed, With Attention to How Each Retrieves Relevant Source Code From a Repository.
This is a cousin to neural automata—writing machines that generate code for us, because code generation is a fancy form of text generation, which uses similar technology, i.e. large language models.
Two aspects make this work: the model and the interface.
The terminology (harness, skill) etc. is covered elsewhere. Running backends offline, on a Mac at least, is elsewhere again. The mathematical analogue has some similarities.
Here, I write down which coding agents I can tolerate.
1 Security
I’m vaguely concerned about how much of the world’s source code now gets uploaded to a handful of code servers, and the potential for abuse scales with the code mountain, accreting there like a giant fatberg of latent expertise and corporate know-how. Anyway, the developer arms race is real, so let’s all ignore it and keep flushing our code into their pipes, eh?
2 General advice
3 Cloud and commercial harnesses
The cloud-hosted and commercial editors. None has a serious local-model path.
3.1 Claude-code
anthropics/claude-code: “Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you write code faster by executing routine tasks, explaining complex code, and handling git workflows — all through natural language commands.”
It’s a command-line tool — a nicely designed one that interacts gracefully with a normal IDE.
Pro-tip: Hooks reference
I have little to say about Claude Code; it is too thoroughly documented elsewhere/everywhere and in any case moves too fast to be worth explaining.
Sebastian Raschka points out that we can force Claude Code to use a local model (!)
3.2 Cursor
Cursor is an AI-powered code editor that helps you build software faster.
It’s another VS Code fork with its own AI engine and some extra UI affordances. My colleagues assure me the fork causes far fewer annoying psychoses and sidetracks than Copilot. Cloud by design, though — there is no serious local-model path here.
3.3 GitHub Copilot
GitHub Copilot now uses some off-the-shelf GPT-4 model (if I recall correctly) for code completions. The original Codex engine was strikingly good, and I don’t think the general-purpose models have matched it, even years later.
GitHub Copilot has a great workflow for automatic completions, and that’s what originally made me pay for it. Still acceptably useful, this bit.
Since then, they’ve rolled out extra chat interfaces. The new generation of tools is janky and only semi-reliable, at least in my experience. They’re bad at following instructions, mess up basic stuff like indentation, and aren’t especially fast. Occasionally they’ll forget they’re supposed to edit code and instead talk about editing code, dump weird repeated sections into the file, or just delete random stuff. It’s a bit like coding with a drunk genius, which is to say occasionally brilliant but usually messy.
OpenAI squandered their early lead.
Pro-tip: Behind a corporate firewall Copilot needs a specific set of whitelist exceptions.
3.4 Warp
Warp is an all-in agentic terminal/IDE — mixed-model, codebase embeddings, MCP, the lot. Heavily hyped; I haven’t tried it.
4 Model-agnostic and offline harnesses
There are too many of these damn things now—everyone writes their own agent. They speak plain OpenAI-compatible HTTP, so I can point them at a commercial token host or a local server — Osaurus on localhost:1337, ds4-server on 127.0.0.1:8000, or Ollama on 11434 are the ones I like. With local models the limiting factor is the model, not the harness.
4.1 OpenCode
OpenCode is a generic terminal harness — see the main OpenCode write-up. As the name suggests, it is especially useful for coding, working even offline: point it at a local endpoint with a custom-provider baseURL and it wraps LSP, MCP, and a plugin system around the model. The official VS Code extension exists but looks shonky atm.
4.1.1 MiMo Code
Xiaomi’s MiMo Code (MIT) is a fork of OpenCode adding long-horizon memory and a self-improvement layer; it inherits the custom-provider path but probably works best with Xiaomi’s own model.
4.2 Codex CLI
OpenAI’s Codex CLI (openai/codex, Apache-2.0) is a terminal harness and — surprisingly for OpenAI — open-source. Naming caveat: this Codex is the agent, not the sadly-mourned original Codex code model that powered early Copilot nor the GPT-based successors. Still, it is worth taking on its own terms. For example, it need not plug in to OpenAI. We can instead point it at whatever we want, even a local model mode.
Codex seems to occupy a similar niche to OpenCode but deviates in two ways: First, it sandboxes by default — Apple Seatbelt on macOS, Landlock/seccomp on Linux — reducing the blast-radius of rogue agents. Second, Sebastian Raschka clocked it as the most token-frugal of the three harnesses he measured (Codex < Qwen-Code < Claude Code), which matters especially with a slow or constrained local model. Pi is likely even more frugal, at the price of a steeper learning curve.
Config to use an open endpoint was not immediately obvious to me. We must configure a ~/.codex/local.config.toml profile invoked as codex --profile local:
# ~/.codex/local.config.toml — overlaid on config.toml by `codex --profile local`
model_provider = "osaurus" # our own provider id (see table below)
model = "qwen3.6:35b" # whatever the server is serving
[model_providers.osaurus]
name = "Osaurus"
base_url = "http://localhost:1337/v1" # OpenAI-compatible endpoint; swap for ds4 on 8000, omlx, etc.
wire_api = "chat" # these servers speak Chat Completions, not OpenAI's Responses APIThere are pre-defined model provider entries for openai/ollama/lmstudio.
Note that the project-level profile .codex/config.toml ignores model_provider/openai_base_url for trust reasons.
Interestingly Raschka also found Qwen3.6 scoring better under Codex than under its parent company’s own Qwen-Code.
4.3 pi
earendil-works/pi (Mario Zechner / badlogic, MIT) a.k.a. Pi Coding Agent, is a generic minimalist harness. Pi’s philosophy is idiosyncratic enough that I introduced it in the LLM Agents page as a whole alternate way of doing things. What matters here is that it runs offline and is the harness driving antirez’s famous ds4 on a Mac. There’s a learning curve, but notionally a rewarding one.
4.4 Aider
Aider (Aider-AI/aider, Apache-2.0) is “AI pair programming in your terminal.”
aider is a git-native AI assistant, in the following sense: we execute it inside a git repository, tell it which files we’re working on, and describe the change we want in plain English at a chat prompt. Aider sends the model those files (plus a repo-map of everything else so it has context), gets an edit back, and writes it straight to the files on disk — then commits it, each change as its own git commit with a generated message. So we never copy-paste code out of a chat window, and reviewing or undoing what the model did is just git diff and git revert.
That git-per-change discipline is whole potato. It is less an autonomous agent that wanders off and does ten things, more a tightly scoped tool kept well scope by using git affordances. Harper Reed’s workflow gives us the flavour.
For our purposes it is model-agnostic — point it at any OpenAI-compatible base_url, or use the ollama/ prefix for a local model — and the repo-map means a smaller local model still gets relevant context without us hand-feeding it files.
It has some nifty automations, linting and running the test suite after each change and using the logs from that to inform the model
Slight variant: --watch-files mode leaves aider running while we work in our normal editor, watching for AI! / AI? comments we drop into the code and acting on them, so we never have to leave the IDE.
Less ambitious, less risky, elegant.
4.5 Goose
Goose (Linux Foundation, Apache-2.0) is a general-purpose harness — desktop app plus CLI, grown-up enough to have been adopted by the Linux Foundation. For coding specifically it speaks the Agent Client Protocol, so it can back Zed or JetBrains as the agent behind the editor, and also has some code plugins I have not tried yet.
4.6 Cline
Have you, like me, grown flabby on a diet of soft, cosy code editors? If so, and, for example, we want VS Code agent sidebars, Cline (cline/cline, Apache-2.0) sees us, and it understands. That sidebar is where Cline began.
With Cline 2.0 they pulled the agent loop out of the extension into a standalone open-source harness, @cline/sdk, and the same engine now drives a terminal CLI, a JetBrains plugin, the VS Code sidebar, and a web-based multi-agent kanban board. The runtime has all the bells and whistles and rescue knives we’d expect to find strapped to a serious harness — subagent teams, MCP, scheduled cron agents, chat connectors for Slack/Telegram/Discord. They claim impressive performance on something called terminal-bench.
A little dynasty of editors descends from Cline: they are mostly forks of Cline’s old monolithic extension, not consumers of the new @cline/sdk. Roo Code was the prominent one — a Cline fork that added per-mode permission scoping (Code / Architect / Ask / Debug modes with different tool budgets) — but it shut down in May 2026 and its repo is archived, with the community decamping to ZooCode. Kilo Code (MIT) has a complicated family tree. The editor extension is a fork of that fork (Cline → Roo Code → Kilo) while its CLI is a separate fork of OpenCode.
4.7 Qwen Code
Qwen Code is the Qwen team’s terminal coding agent — originally a fork of Google’s Gemini CLI, since gone its own way. Multi-protocol like the rest of this section: OpenAI / Anthropic / Gemini / Qwen APIs plus a local Ollama or vLLM endpoint, switchable at runtime. It aims for Claude Code feature parity — subagents, MCP, plan mode, and the agentskills.io SKILL.md format (skills docs). Not to be confused with Qwen-Agent, the Python harness library, which is a separate codebase that does not natively consume skills or do that other fancy stuff.
4.8 Pour one out for the departed (and the renamed)
Fauxpilot (the self-hosted Copilot clone, long dead), Cody (Sourcegraph moved on to Amp), Kiro (Amazon’s cloud-bound spec-driven IDE), Codeium (now Windsurf), Amazon CodeWhisperer (now Amazon Q Developer), Codestral Mamba, Llama Coder, and Replit / Code Llama — all had their moment.
5 The models behind them
Choosing a good tool-using model seems to be important for most agentic workflows. tl;dr generic agentic tool-use is reputedly unreliable below roughly the 30B class models. This does not matter if you are using a frontier model in the cloud, but it does if you are trying to run a local model on a Mac.. In that latter case, a popular default is Qwen3.6-35B-A3B.
5.1 Oh damn language diffusions are weird
Those picks are all autoregressive: we append a prompt, the model appends an answer. A diffusion model like DiffusionGemma instead fixes a block of text and denoises the whole thing at once. Every token effectively attends to all the others. That is a weird different inference paradigm. I have always been slightly interested by it, but now that I think about the practicalities I have questions.
Streaming that into a chat box, we are watching a printing press pretend to be a typewriter. It would better suit an interface built around editing text rather than chatting — much as image diffusions get inpainting in their GUIs. Rather than only appending, we’d want to regenerate selections, infill gaps, rewrite under constraints. The closest things today are Aider’s AI! / AI? scoped edits and Continue’s inline-diff autocomplete, both of which already mutate selected text — though I’m unclear whether either explicitly supports the diffusion behaviour, and I don’t fully trust the process at arbitrary infill anyway. Also, the actual prompt window for each generation looks quite short (hundreds of tokens) which seems… not enough?
6 MCP
Model Context Protocol is a standard to enable an agent to access tools and data. Here are some fun code-specific servers — find more at the official MCP Registry or punkpeye/awesome-mcp-servers.
oraios/serena gives an agent LSP-style symbol navigation and refactoring across 30-odd languages. Since some heavy hitters (Claude Code) ship no codebase index of their own, Serena might beat groping by grep on a large repo:
For git and GitHub operations — repos, PRs, issues, Actions — the official GitHub MCP server does the job, though in the name of safety it wants a personal access token and a running docker daemon:
For the well-known problem of models hallucinating outdated APIs there are docs-fetcher servers (Context7, Git-MCP), but a plain web-search/reader (jina-reader) seems to do the job for me. Browser automation (Playwright MCP, Chrome DevTools MCP) helps debug web apps, but I have not used those so speak of them no further.
For Apple-platform vibecoding — XcodeBuildMCP plus Apple’s own xcrun mcpbridge — see Vibecoding Apple apps.
7 How agents find code
When a coding agent needs to find some code relevant to the current operation — the function to edit, the callers it might break — how does it do that? My first instinct was the text-search tools: point it at ripgrep and fzf and let ’er grep. for reals, Claude seems to do that. Is that the state of the art, or do we need the fancy vector-database RAG machinery and special vector embeddings?
At the moment it seems that grep mostly wins, and the agents that feel most capable are the ones that don’t build a semantic index. That is to say, the best agents do Ctrl-F and not Google Search. However, the best agents don’t do vanilla grep — they do fancy grep.
7.1 Agentic grep
Claude Code, Cline, and OpenCode ship no embedding index. The agent navigates the way a developer does — glob to list, grep to search, read to open — pulling files in just-in-time. Anthropic argues that just-in-time primitives bypass the issues of stale indexing, and increasingly good models are pretty good at exploring. Cline goes so far as to have a no-index manifesto. Moreover, because code is lexical, an exact-string match is often exactly what we want, which is something grep has a generational head start on.
7.2 Repo maps
Aider takes issue with this. It runs tree-sitter atop the repo to extract symbol definitions and references, builds a graph with files as nodes and dependencies as edges, and runs personalized PageRank over it to rank the most central symbols, packing the top ones into the prompt. The model gets some kind of structural/dependency map and centrality weighting or something like that.
I don’t use Aider tho.
7.3 LSP hacks
IDEs already have some programming language machinery built in via the Language Server Protocol. Serena exposes find_symbol, find_referencing_symbols, go-to-definition, and symbol-level edits as MCP tools, so “find every caller of this function” is an exact graph query rather than a fuzzy guess.
7.4 Embeddings after all
Cursor does embed the codebase — it trained its own embedding model and indexes for retrieval — though they frame it as complementary to grep, not a replacement.
Embeddings seem only worthwhile for really big things, like mega monorepos. A 2025 DeepMind result (Weller et al. 2025) proves a hard cap on how many query-document relationships a fixed embedding dimension can represent, and it all gets terribly messy.
7.5 Search subagents
We can always dispatch subagents to explore off their own bat and report back about what they found. This means they can use any of the other tricks mentioned here but
- in parallel, and
- without blowing the token budget of the main agent.
Morph’s WarpGrep runs grep in its own context window, fires a dozen tool calls in parallel, and returns a digest. I bet you could just set up subagents to do this yourself with any harness that supports subagents and the right skills.
7.6 Tying all that together
Shaped argues we should use: grep for exact identifiers, LSP for precise navigation, repo-maps for cheap structure, embeddings for the scale where the rest breaks down. Vector-DB vendor Milvus argues grep “burns too many tokens”, but they are trying to sell a thing.
The more elaborate search scaffolding might be especially useful for weaker local models to compensate for being worse at iterative search. If our offline coding agent feels lost in a big repo, Serena or Aider’s repo-map might help.
8 Packing code for AI
Sometimes we just want to flatten a whole codebase into a single blob and paste it into a chat window or seed a fresh context. This matters less than it used to — an agent with the search tools above rarely needs the whole repo at once — but for that one-shot case it’s still handy, since a model can read one text file easily but finds a scatter of files more confusing.
yamadashy/repomix is the classic.
Also available online at repomix.com.
simonw/files-to-prompt is another favourite (“Concatenate a directory full of files into a single prompt for use with LLMs”). See files-to-prompt for background.
My fork supports a handy alternative usage pattern:
Also handy for online usage: cyclotruc/gitingest / Gitingest — “Replace ‘hub’ with ‘ingest’ in any GitHub URL to get a prompt-friendly extract of a codebase”.
9 ACP
The Agent Client Protocol (ACP) is how an editor talks to an agent. Zed wrote it, and the pitch is modelled on the Language Server Protocol: rather than every editor hand-rolling an integration for every agent — and every agent re-implementing each editor’s API — both sides speak one JSON-RPC dialect, and ideally any ACP editor can drive any ACP agent.
Mechanically, the agent runs as a subprocess of the editor and they talk JSON-RPC over stdio (or HTTP/WebSocket for a remote agent); the editor is the boss of the files and the terminal, so the agent’s edits arrive as native diffs and its shell commands run in the editor’s own terminal. It reuses MCP’s JSON types where it can and adds a few agentic-UX types of its own.
On the editor side the adopters are Zed, JetBrains, and community bridges exist for Neovim, Emacs, and VS Code. On the agent side, Goose ships a goose acp server, Google’s Gemini CLI likewise I think, and Claude Code and Codex have adapters, I think. cf. the clients list and Goose’s ACP-clients guide.
10 Workflows
The vibe coding workflow using git worktrees has been formalised, e.g. in Crystal: Supercharge Your Development with Multi-Session Claude Code Management / stravu/crystal.
- ghuntley/groundhog: Groundhog’s primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
- From Design doc to code: the Groundhog AI coding assistant (and new Cursor vibecoding meta)
11 Incoming
A better version of this very post by Sebastian Raschka: Using Local Coding Agents
Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity - METR
LMQL: Programming Large Language Models: “LMQL is a programming language for language model interaction.”
LMQL generalises natural language prompting, making it more expressive while remaining accessible. For this, LMQL builds on top of Python, allowing users to express natural language prompts that also contain code. The resulting queries can be directly executed on language models like OpenAI’s GPT models > Fixed answer templates and intermediate instructions allow the user to steer the LLM’s reasoning process.
Mitchell Hashimoto on the mysterious ease of ChatGPT plugins
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering | OpenAI
Introduction to Program Synthesis is an interesting MIT course that connects modern AI program synthesis to much older literature.


