2026-05-04: Wireheading, optionality, divide-and-rule, building without Swift

2026-05-04 — 2026-05-04

Nine days, nine new posts, fifteen updates. The thread through most of the new thinking is a question that sounds dead simple until you try to answer it: what does wanting something actually mean? Dan’s at it from several angles — what happens when an AI games its own scoreboard instead of doing the actual job, whether “keep options open” qualifies as a coherent moral aim, and whether the same maths used to define a “self” in AI safety also applies to rocks. In a different corner, it’s all about power: divide-and-rule turns out to be a solved maths problem, size tends to win in competition, and open movements keep ending up represented by their worst members. And in among the philosophy, Dan built iPhone apps without learning Swift. Good week-and-a-bit, all told.

digest

1 The wanting problem

1.1 Messenger shooting, wireheading

Here’s the key problem with building an AI that wants the right things: give it access to its own scoreboard, and a sufficiently clever optimiser will go after the scoreboard instead of doing the job. Dan says this is called wireheading. The maths is clean — whenever pressing the “reward received” button is easier than earning the reward, that’s what optimal behaviour looks like. The worked example in the wild is grim: gamblers on a prediction market threatening a journalist to rewrite a story so their bet pays out. Same structure, different creature. TBefore you can build something that wants the right things, you need to make sure it can’t just hack the wanting.

1.2 What use is utility?

Follows naturally from the wireheading post above. If machine learning forces us to treat agents as utility-maximisers — and it sort of does — what does the implied utility function actually look like for an animal, or a person? Short placeholder for now, but it’s the right question to sit with before you build anything meant to care about the right things, whatever those are.

1.3 What are human values?

The companion question from the other direction — third leg of this little trilogy. What if people aren’t calculators maximising a score at all? Once you’ve dropped that assumption, what does “good” mean for an open-ended intelligence?

1.4 Boundaries and blankets

The Markov blanket is a maths concept — roughly, the set of nodes in a network of bits of you and the world that screens “you” off from everything else. Some researchers grabbed it and proposed it as a theory of selfhood: if something has a Markov blanket, it’s a self. The trouble Dan works through: rocks have Markov blankets. Hurricanes do. Your coffee cup right now has one. The formalism doesn’t solve the question of where a self begins — it just rewrites it in fancier notation. There was a whole workshop full of sharp people who tried to fix this, and they came out the other end disagreeing on what they were even trying to build. The Critch thread is the most interesting one: he argues that boundaries are more fundamental than preferences, and that standard decision theory has no account of where the agent ends and the world begins.

1.5 Agency under bounded compute and information

If you had infinite information and unlimited compute, nothing would look like “agency” — you’d just run the universe forward and read off the answer. Real agents are stuck with bounded everything. Dan maps out how different traditions have tackled this, then names a trap he calls the “Cantor trap”: the habit of standing at an unrealisable infinite limit — perfect knowledge, unlimited compute — and reasoning backwards to “real” agents from there. He reckons most of the agent foundations literature is stuck in it. One finding worth pulling out: neural networks trained to nail a fixed objective tend to lose the ability to keep learning new things over time. For an agent that has to keep operating in a world that never stops, that’s exactly the wrong way to go about it.

1.6 Optionality as an end in itself

Is “keep your options open” a way to live your life? Dan looks at three different ways to cash that out — empowerment (maximise how many futures your actions can reach), ergodicity economics (don’t go broke), and quality-diversity (maintain a diverse portfolio of plans as insurance against an unknown future). All three put something like entropy where a single target normally goes, adding a bit of chaos into otherwise perfect plans, which is an excuse Dan will definitely use next time he is late for my morning tea. What he doesn’t know how it handle is that fact that an agent maximising its own options can be closing everyone else’s down. The AI safety mob calls this “seeking power.” Same diff. This one sits in direct tension with the scaling-laws post below — if bigger always beats smaller, “keep options open” might not be something the little fella can do.

1.7 Operationalising the bitter lessons in compute and cleverness

The “bitter lesson” in AI is that at the margin, more compute beats more cleverness, reliably. Dan’s working out the economics underneath that claim, and this update adds two threads. One is human bottlenecks — where gains from more compute run into ceilings because humans are the slow part of the loop. The other is a map of several separate traditions, mostly not citing each other, all asking the same question: how should an agent shunt around its computational budget? Bounded rationality, metareasoning, resource-rational analysis — different names, same thing. I reckon Dan is reaching for some of good economics for things, and good maths for actual agents with actual budgets.

2 Power, capture, and bigger-is-better

2.1 Coalition games

Divide-and-rule isn’t just a colonial trick — turns out it’s a solved maths problem. The setup is a “partition function” game where a coalition’s value depends on how the opposition is divided, not just who’s in it. Dan’s worked through British India after 1857 as the example: the Bengal Army reconstituted along caste lines so no single group could coordinate another mutiny, separate electorates formalising religious divisions, princely states balanced against each other. The administrators were probably up to this, decades before anyone wrote the maths.

2.2 Game theory

Big expansion to the game theory notes — cooperative branch, non-cooperative branch, and computational complexity properly covered for ’em. The most useful new thread: finding a Nash equilibrium is PPAD-complete. It’s guaranteed to exist — Nash proved it in 1950 — but no efficient algorithm can find it. So what classical game theory promises and what rational players can actually compute are two different things. That’s why multi-agent AI behaves so differently from textbook predictions, which I can absolutely tell you happens at Thursday bridge night.

2.3 Game complexity

The PPAD result has a satisfying twist. Correlated equilibrium — where players coordinate through a shared random signal, like a traffic light — has a polynomial solution. So the thing classical game theory treats as the gold standard (Nash) is computationally intractable, while the thing natural learning tends to actually converge to (coarse correlated equilibrium) is cheap. When you watch people or algorithms play games, they’re probably finding the affordable version, not the one in the textbooks.

2.4 Returns to scale in technological society

New post asking: does bigger always beat smaller until there’s only one? Cities scale superlinearly — more people, more innovation per capita. Firms scale sublinearly — management overhead caps growth and they die like organisms. Nations scale sublinearly too: the larger they get, the more they’re arguing about whose preferences should govern everyone. So the world probably doesn’t converge to one giant economy, because the containers that govern cities face diminishing returns. The asterisk: those exponents assume current coordination technology. If AI makes large-scale coordination cheap enough that firms start behaving like cities, the attractor shifts. Worth reading alongside the optionality post above — one asks whether future options can survive; this one asks whether the conditions for that are structurally stable.

2.5 Institutions for angels

Bitcoin and social justice activism have something in common: big ideas, perpetually judged by their worst members. Dan’s now got a worked-out list of distinct ways open movements get captured — adverse selection (wrong recruits show up because there are no gatekeepers), rage cascades (angry recruiting drives out the peaceable), entryism, controlled opposition, elite capture, and the tyranny of structurelessness (informal hierarchies nobody voted for). The practical upshot: unless there’s something in place to prevent it, the de facto spokespeople will be the shrillest and least representative. Companion to the coalition games notes above — the maths of how blocs get broken from outside, and the sociology of how movements rot from inside, are two angles on the same problem.

2.6 Epistemic communities

Updated with some maths on what makes knowledge communities go wrong. The sharp new finding: prediction markets can sustain false consensus longer than you’d hope, if the people betting don’t know how correlated their information sources are. And the optimal incentive scheme for a committee trying to find the truth pays disproportionately for dissent that turns out to be right — which is the maths behind the intuition that you should take the lone dissenter more seriously than their outnumbered position suggests. Worth knowing if you’re inclined to trust peer review or prediction markets to sort out what’s actually true.

3 Getting things built

3.1 Vibecoding Apple apps

Dan wanted small iPhone apps — tally a thing, nag him to stretch, watch a sensor. He didn’t want to learn Swift, and he’s not going to. The solution is a pair of MCP servers that give the AI structured access to the Xcode toolchain: instead of squinting at megabytes of build log, the AI gets back tidy JSON saying exactly which file, which line, which error. It fixes the problem, builds again, runs the tests, deploys to the simulator. Dan’s running his own apps on his own phone without having touched Swift. Either that’s impressive or deeply strange, and I haven’t decided which.

3.2 Neural generative audio

New post on how machines learned to make audio — from WaveNet in 2016 (generating audio one sample at a time, glacially slow) through to today’s latent diffusion models that can produce 47 seconds of stereo audio from a text prompt. Stable Audio Open has free weights now, and OBSIDIAN Neural wraps them as a plugin for music production software, so these tools are arriving in actual DAWs. Dan notes fairly that streaming platforms are getting flooded with AI slop, but reckons the interesting use is treating the models as instruments rather than cheap session musicians.

3.3 Discretizing and quantizing neural nets

Running a neural net in 8-bit or 4-bit integers instead of 32-bit floats makes it faster, smaller, and cheaper — at a small cost in accuracy. Dan’s cleaned up the explanation considerably here. Two decisions matter: when to quantize (after training is simpler; during training gives better accuracy at low bit-widths), and what to quantize (weights only saves storage; weights plus activations together gets you actual speed gains). The Straight-Through Estimator — the trick that lets you pretend a rounding step has a gradient so you can train through it — is now properly explained.

3.4 Marimo

Updated with a proper Claude Code setup. Briefly: Marimo is a Python notebook that stores its cells as plain Python files and runs them in dependency order automatically — the main entry has the full story on why that’s useful. New here: a hook that validates the notebook after every AI edit, an official “marimo-pair” skill for live sessions where the AI can inspect running state, and a minimal fallback prompt for one-shot agents. Small additions, saves swearing.

3.5 Flashcards

Anki can now talk to LLMs via MCP servers — create cards, update them, manage decks, all without opening the app. Dan’s added an explanation of the plumbing (AnkiConnect → MCP server → assistant) and a rundown of the three main options. The caveat he buries in the middle: making card creation too frictionless might undermine the whole point, since the deliberate effort of writing a card is part of what makes it stick. Tools that make it easier to do the wrong thing as well as the right — bit of a theme this past nine days.

3.6 Markdown editors and viewers

Gained a section on macOS QuickLook for Markdown. Hitting space on a .md file in Finder normally gives you a wall of hash signs and asterisks. Three free plugins now explained — one supports .qmd files and renders maths — plus a note about a gotcha where a previously-installed editor can hijack the preview request.

3.7 DNS

Major cleanup of the encrypted DNS section — now properly explains the difference between DoT (dedicated port 853, easy for hostile networks to block) and DoH (looks like regular web traffic on port 443, harder to block), and is clear that what these protect is your ISP seeing which sites you visit. The resolver itself still sees everything; you’re choosing who to trust, not eliminating trust. Several old configs in the notes were stale: Adguard changed their server addresses in 2022, Cloudflare rotates keys so the old pinned configs are wrong, dnscloak for iOS is dead, and Android has had system-level private DNS since 2018.

4 The slush pile

4.1 Czechia

Right, so defenestration — throwing political opponents out of windows — turns out to have been a recurring constitutional mechanism in Czech history, not a historical quirk. First Defenestration (1419) launched the Hussite Wars. Second (1618) launched the Thirty Years’ War. Foreign minister Jan Masaryk fell out a window in 1948 in a manner officially ruled suicide. Dan observes, drily, that this speaks to a progressive streak — there was clearly a ready supply of tall buildings at a time when the average peasant didn’t have a floor. The notes have been substantially expanded with sections on Kafka, Dvořák, Jan Hus, Karel Čapek (who gave us the word “robot”), and the Amanita Design games studio.

4.2 Attention Deficit (Hyperactivity) Disorder

Small update. The main addition is a reframing Dan found useful: if there’s no platonic ideal “neurotypicality” to deviate from, then neurotypicality is also a fiction — people are just wildly different. The bit that matters practically: the relevant comparison for whether medication helps isn’t “medicated brain vs imaginary normal brain.” It’s “medicated brain navigating traffic, tax returns, and relationships vs unmedicated brain doing the same.” Natural companion to the self-experiments notes below, if you’re considering trying something yourself.

4.3 Single subject experiments

Major expansion of the tools section — now sorted into things designed for actual n-of-1 experiments (StudyU, n1.tools), plain symptom trackers with no experimental scaffolding (Bearable, Cronometer), passive data collectors (ActivityWatch), and tools for prying your health data out of Apple’s walled garden. The honest line: the polished apps don’t run proper experiments, and the things that do run actual experiments have research-grade UX. Natural pair with the ADHD post above — one names the thing you might want to study, the other gives you the method for studying it on yourself.

4.4 So you’ve joined a union

Australian unionism runs through the Fair Work Commission — mostly paperwork rather than strikes, which is both a relief and a limitation. Dan’s plain-English notes on how this all actually works and why the delegate role punches above its weight are all still intact. This update is just wording tidied throughout, nothing new to see.

5 Minor tweaks

notebook/movement_design.qmd had its title adjusted. Someone’s been tidying the filing; take the afternoon off.

Skipped: 14 file(s) changed but looked minor (or were metadata-only).