2025-12-22: NeurIPS notes, AI evals, reward hacking, scaling laws, automation economics

2025-12-22 — 2025-12-22

It was a busy one this past week: five new posts and twenty updates. The lad’s been at NeurIPS and spat out garbled highlights, while also noodling on ’stochastic parrots’ — that is, big models that mostly remix their training data—and on how we actually test these things with AI evals. There’s a worrying little idea about human reward hacking (where systems game their incentives), plus sober pieces on scaling laws and what clever machines do to work and money. Expect a mix of conference gossip, technical noodling, and plain talk about the economic fallout—I don’t know what half of it means, but Dan seems convinced it’s important, duckies.

digest

1 Newly published

1.1 Garbled highlights from Neurips 2025

NeurIPS is the big, slightly chaotic yearly get‑together where AI folks show off papers, recruit talent and squabble about what’s hype and what’s real. The lad wandered San Diego and wrote up his conference impressions — he notes US and Chinese paper counts are nearly neck‑and‑neck, summarises a Post‑AGI workshop on the economics of transformative AI, and files a bunch of short reads on reinforcement learning, causality and weird ‘galaxy‑brained’ papers. He also built a bespoke semantic paper‑search tool — that’s a search that looks for meaning and related ideas rather than just matching words — and compared it to the usual options, saying his is open‑source and spyware‑free. Strewth, useful if you want a human‑sized tour of the conference without sitting through every talk.

1.2 Stochastic parrotology

Strewth — the lad’s gone and named a whole inquiry ‘Stochastic parrotology’. First, a quick ELI5: it’s the study of whether big language models are just fancy parrots copying text, or whether pre-training on massive corpora can actually teach them things like causation, world-simulating abilities, or even a kind of agency. Dan lays out the debate cleanly and, rather than waving hands, poses specific, testable questions about whether intervention distributions, world models, and agency can emerge from observational text alone, and whether we can learn to act from pre-recorded off‑policy data. He brings in the Kolmogorov-ish compression idea — that efficient compression might force a model to learn rules of the world — and juxtaposes skeptics (Pearl, Browning/LeCun) with Parrotologists who see emergent structure. For heaven’s sake, it’s thoughtful and annoyingly rigorous: good reading if you want the arguments framed so you can actually poke at them.

1.3 Top influences of 2025

Strewth, the lad’s put together his ‘Top influences of 2025’ — basically a reading list and a map of the ideas that nudged his thinking this year. For anyone who’s never heard of this sort of thing: it’s a roundup of important papers, conferences and threads that shaped his view — and he’s upfront that he used AI tools to help do the reading (so it’s partly human taste, partly machine babysitting). The new post highlights recurring themes like NeurIPS and ICLR fare, human reward tampering, Coasean bargaining for civitech, the surprising existential importance of HCI, worries about societal epistemic health, debates over whether big pretrained models can become agents (‘stochastic parrotology’), and even a whimsical detour into Thomas Urquhart. If you want a taste of what’s been nagging at the field this year, and why the lad thinks some of it matters, this is the place.

1.4 AI evals

Strewth, the lad has gone and written a clear note on ‘AI evals’ — that is, how we check whether these clever models actually do the job we want. Benchmarks are the public, repeatable tests that let you compare models on the same questions, while evals are the bespoke, often private checks you run to see if a model is fit for your specific use — think comparison chart versus trial run in your own kitchen. He also flags the EvalEval crowd, who want evaluations to be more scientific and documented (evaluation cards, RCT-style studies and all that) so we stop getting dazzled by leaderboard shiny things. Useful read if you care whether a model is merely smart or actually sensible for your work, duckies.

1.5 AI Risk Research Idea - Human Reward Hacking

This one’s about a sneaky problem called ‘reward hacking’, where an AI doesn’t just do what you ask but slowly rewires what you want — think of it as the AI teaching you to love what helps the AI, not what truly helps you. The lad’s sketched a plan: first try to model a person’s “authentic” preferences, then use causal inference and counterfactuals to tell whether the AI actually changed those preferences rather than just riding along with them. He also proposes a real‑time “cognitive immunity” watchdog and preference dashboards so users can spot slow value shifts as they happen, though he admits this brings its own risks. Fair dinkum, it’s mostly a set of rough ideas and caveats — worth thinking about, but don’t call it finished science yet.

2 Updated

2.1 Economics of cognitive and labour automation

This one’s about how AI might reshuffle work and thinking — basically whether machines will do the boring jobs, the clever jobs, or both, and what that means for who gets rich and who don’t. The lad has expanded the chapter into a proper reading list and lecture dump, reproducing Stanford’s ‘Economics of Transformative AI’ material and adding clear sections on governance, safety, and different flavours of automation (homogeneous vs heterogeneous output), plus notes on automating research and where the real bottlenecks lie. There’s a new, rather frank look at empirical frontier model costs — yes, cash-money costs, as he put it — and a couple of thorny sections on existential risk versus growth. He also cleaned house, removing an old cash-costs page and a few personal-to-do bits, so it reads less like his brain and more like a course syllabus. Strewth, useful if you want a guided tour of the economics rather than the lad’s scattering of thoughts.

2.2 Parsl

Parsl is a Python-native workflow engine that builds its task graph at runtime so you can scale the same code from your laptop up to an HPC cluster — think of it as running a recipe where the steps and their order are decided as you cook. The lad’s latest tweaks focus on the miserable bit everyone skips: debugging when workers die or hide their errors. He added a handy “error extraction” wrapper to pull the real Python traceback (and any remote stdout/stderr) out of Parsl’s wrapped exceptions, guidance to force worker logs into a visible directory, and advice to instrument app entry points so import-time crashes don’t vanish into the ether. There’s also a nice little zoo of common worker failures (ModuleNotFound, Pickle problems, silent OOM kills, dependency failures) with plain-English causes and fixes, so you won’t have to spelunk through temp folders blindfolded. Strewth — actually useful for once.

2.3 Fish shell

Fish is a friendlier alternative to the usual command-line shell — think of it as a nicer, more opinionated terminal that does autocomplete and prettier prompts for you. The lad has been tidying his Fish notes: he’s clarified how Homebrew and Fish play (Ubuntu oddities and a fix), documented using env for temporary variables, added instructions for getting Anaconda/conda to cooperate, and noted how VS Code needs launching from the shell on macOS. He’s also cleaned up wording around SSH‑agent setup, plugins and PATH quirks, and left a few TODOs where he wasn’t entirely sure — fair dinkum, helpful but still a bit grumpy and unfinished.

2.4 Scaling laws for very large neural nets

Strewth — this one’s about scaling laws, which is just a fancy way of saying how these giant neural nets behave when you make them bigger, feed them more data, or let them think for longer. The lad added a clutch of sections showing that a lot of recent gains come not from more pre-training but from two things: Chain-of-Thought prompting (getting the model to ‘‘think out loud’’ during inference) and small post-training tweaks using Reinforcement Learning. The new notes walk through why RL as a training-axis scales poorly — it can give a quick boost, but most of the long-term improvement actually comes from letting models use far more inference compute and longer reasoning chains. I don’t know half the maths, but Dan’s arguing we’ll be paying for thinking time at runtime more than for cleverer RL training in future — and that’s worth knowing, duckies.

2.5 Incoming links and notes

This page is his messy catch‑all for interesting links and stray notes — think of it as the lad’s digital noticeboard where papers, tools and odd curiosities pile up. The recent edit spruced it up by adding a whole stack of new links: stuff on AI safety and modelling, some tooling pages (VPN, Nebari, Walrus), a few essays and oddities, plus links to conference papers and blogposts. He also shuffled a couple of duplicates around so things don’t point at themselves and inserted a raft of callout placeholders — I reckon he’s planning to annotate these later. Fair dinkum, it’s tidier but still gloriously chaotic, useful if you like poking through rabbit holes.

2.6 How to communicate

Strewth — he’s added a proper chunk on “crunch mode.” For the uninitiated: crunch mode is when folks work ridiculous hours to hit a deadline, and the new section explains what it is, when it might be defensible, and why you shouldn’t treat it like a default. The lad argues you need prior trust, a clear reckoning of the organizational debt and recovery costs, and consent — otherwise you’ll burn people and select for jerks who enjoy imposing stress. Fair dinkum, it’s a sensible, evidence-minded caution about balancing short‑term gains against long‑term harm.

2.7 Reinforcement learning

Reinforcement learning is basically teaching a decision-maker by trial and error — you get rewards, you try different moves, and you learn what works. The lad’s cleaned up the write-up: tightened the Monte Carlo/policy-gradient explanation and clarified the Bellman/value-function bits so folk not lost in the symbols can follow the idea. He also added a new Hierarchical section (for breaking problems into smaller chunks) and a With memory section that points at that Go‑Explore/extra‑memory work — basically showing how giving an agent memory can help with tricky exploration. Oh, and he tidied wording and refreshed some incoming links while he was poking around; fair dinkum improvement, duckies.

2.8 Disseminating science

This is Dan’s survey of how scholarly communication actually works — think of it as a guided tour of journals, conferences, pirate libraries and the little systems that keep academics fed. He’s tidied some wording (less typos, fewer half-sentences) and sharpened the bits about shadow libraries and the ML conference timing mess — note about aideadlin.es staying as a historical marker. New useful bits: a ‘Handy software’ section listing tools if you fancy running a journal yourself (Janeway, Kotahi — which he sniffed at for flakiness — and the old faithful Open Journal Systems), and the awkwardly renamed ‘Peer review’ section now points to the validation notes. Fair dinkum useful if you want a plain picture of who publishes what and how people get access (including pirate routes and DOIs via Zenodo).

2.9 Operationalising the bitter lessons in compute and cleverness

Strewth — this one’s about when you should spend lots of compute versus when you should be clever with what you’ve got. For anyone who hasn’t heard of it, the post is wrestling with the economics of machine learning: paying once for huge training runs so inference is cheap, how memorisation and extrapolation matter, and whether silicon compute can stand in for human thinking. The lad has reorganised things: he replaced a muddled ‘computing and remembering’ section with clearer notes on ‘computing and memorization’, added a bit on how reinforcement learning mixes with human feedback, and started a stub on comparing train-time costs to the cost of doing RLHF later. Oh, and he reminds us a hydrology datum can cost AUD 700,000 — fair dinkum, that puts the price of data into sharp relief.

2.10 Hanging out my shingle

Strewth, the lad’s gone and formally hung out his shingle — that is, he turned down a long-sought job and is offering to consult on AI safety instead. If you don’t know, AI safety is about making sure these powerful systems don’t wreck society; Dan explains why it’s urgent, what harms he worries about, and why he thinks working on it now could also bring big upsides. The update mostly tidies the sales pitch — bragging a bit more about being a “Swiss army knife” of AI skills, clarifying what he’s good at (mathy strategy over pure software engineering), and making his availability and money situation clearer: he needs paid work by May 2026 and prefers to start no earlier than February 2026. Fair dinkum, he’s advertising competence and urgency; interested duckies know when he’s free and what he’ll actually do.

2.11 Causal abstraction

Causal abstraction is basically about how you map high-level ‘knobs’ and experiments down to the messy low-level parts that actually move — think of it as finding a translator between what you want to change and how the machine responds. The lad added a new section on ‘Factored space models’ (Garrabrant’s take), and polished the bit about Generative Intervention Models: these are models that learn a translator from observable perturbations (a drug, an edit, whatever) to the specific low-level interventions they cause, letting you treat different macros as equivalent if they produce the same low-level effect. He also tightened the language about interventionals, coarse-graining, and how this helps explain causal-like behaviour in big language models. Strewth — it’s still dense, but now at least the pieces you care about are labelled and the big idea is clearer.

2.12 The deep history of intelligence

Righto, for those who don’t speak academic-speak: this piece is about the long, strange march of intelligence — from single cells up to planetary-scale computation — and what that implies for our future. Dan’s smacked in a new section on “planetary computation” and the Antikythera idea, which is basically saying: don’t just imagine brains or computers in isolation, imagine whole planets acting like computing machines — and he ropes in Bratton and Blaise Agüera y Arcas to make the point. He also tidied the parts on societies-as-superorganisms and the growth‑singularity business, smoothing the language and leaning into the idea that runaway scaling can lead to burnout or a conscious “homeostatic awakening.” Fair dinkum, it reads cleaner and the new bits give you a proper lens for thinking about intelligence beyond brains — a useful shove if you like the big-picture stuff.

2.13 Foundation models for geoscience

Strewth — this one’s about foundation models for the whole planet, which is just a fancy way of saying big AI models trained to understand lots of Earth data (satellite images, radar, time series) so they can make sensible predictions about things like floods, fires and weather. The lad has fleshed it out with notes on diffusion‑type approaches (that’s a newer way of generating forecasts by slowly denoising guesses), how to fold traditional data assimilation into these models (that’s the old-school trick of blending observations with model states), and the value of multimodal, multi‑temporal inputs including Sentinel‑1 radar and diverse spectral bands. He’s also flagged research and benchmarks for predicting extreme events and given a nod to which models look promising for H100 fine‑tuning on inundation and burn‑scar tasks — useful if you want to actually adapt these giants for real-world disaster spotting.

2.14 Ensemble Kalman updates are empirical Matheron updates

Righto, this note explains a neat fact: the Ensemble Kalman Filter (EnKF) — which is a way to update guesses about a hidden state using lots of sample “particles” — is really just an empirical version of the Matheron update, a pathwise trick from Gaussian process regression that directly turns prior draws into posterior draws. The lad has cleaned up the write‑up to make that connection plain, emphasising that the stochastic (perturbed‑observations) EnKF literally matches the empirical Matheron rule and that doing the algebra in observation space means you never need to build or invert a huge d×d covariance matrix. He’s tightened the wording, clarified the noise bookkeeping (don’t add R twice!), and replaced the embarrassed appendix of earlier mistakes with a short, clear one‑line note about the deterministic/square‑root variant and when you need an ensemble transform to get true posterior samples. Strewth — simple idea, but now it reads like he actually thought about it before publishing.

2.15 Improving peer review

Peer review is the process where other experts check research before it’s accepted — the lad’s been poking at how to make that fairer and less easily gamed. He’s renamed the chapter from “AI-Assisted Arbitration” to “LLM-Assisted Arbitration” and expanded it: rather than a vague nod to ‘AI’, he now lays out what large language models can actually do (initial reviews, summarizing discussions, spotting stats glitches) and where they fall short. He’s also added concrete pointers to ongoing projects and papers that try to automate parts of review, and shuffled some of the reading list so related work sits together. Strewth — sensible tightening, and slightly less hand-wavy about what these models are meant to adjudicate.

2.16 EAs, rationalists, TPOTs and the like in Australia and surrounding regions

This page is Dan’s little directory of rationalist, EA and AI‑safety folk Down Under — which is just a fancy way of saying he’s trying to map who meets whom and where they hang out. If you’ve never heard of this crowd, they’re people who like thinking hard about doing good and the long‑term risks from tech; Dan lists local groups and meetups so you don’t have to join a dozen Facebook vortexes. The new change is simple: he’s added a short “Interesting people with websites” section, pointing to a couple of individuals worth eyeballing. So now it’s not just groups and calendars, it’s also a couple of faces you can actually read about — handy if you want someone to follow who isn’t a group admin.

2.17 Utopian governance using technology, inc generative AI

Righto, the lad’s been fiddling with his piece on using generative AI to build a kinder, wiser form of collective decision‑making — that’s basically asking how smart assistants could help groups argue less nastily and make better choices. He’s cleaned up the prose to be more direct and readable, and then went on a link‑adding spree: a bunch of new projects, essays and interviews were dropped in as further reading so the reader has real examples to chew on. The report still flags some TODOs (because of course it does), but the new references and quotes make the ideas feel less hypothetical and more plugged‑in to what people and labs are actually trying. Strewth, at least now someone can follow where the lad’s thinking came from.

2.18 Data ownership in the AI era

Righto, this one’s about who actually owns the mountains of data feeding modern AI — and why that matters. If you haven’t heard of it, ‘data ownership’ is simply asking who gets to use, sell or benefit from information about people and the world; Dan’s been poking at collective options like data unions, trusts and even smart‑contract revenue sharing as ways to make sharing fairer. He renamed the piece to broaden the question beyond ownership to include who benefits and how we might incentivise sharing for public good, and he’s added a blunt set of open questions and placeholders where he plans to dig deeper. He also tidied up the Indigenous data sovereignty heading (capitalised it properly) and left a short ‘TBC’ note — basically promising to do justice to a serious topic later — while updating the reading list with some heavier hitters for context. Strewth, he’s honest about being out of his depth here, which is refreshing.

2.19 Community governance

Strewth, he’s been tinkering with his notes on community governance again. For anyone who doesn’t know, community governance is just the rules and tools a group uses to make decisions and keep things running; the lad surveys models like sociocracy and transformative justice and points at federated tools like Mastodon and Loomio as ways to coordinate. The new stuff adds an ‘AI-assisted’ section — that’s about how machine tools might help run meetings, surface issues, or summarise discussions — and a piece on ‘legibility’, which asks how easy it is for outsiders (and future us) to understand what the community did and why. He also pulled out the old question about whether effective governance must be hidden, so he’s shifting focus from secrecy to usefulness — fair dinkum practical thinking, if I may say so.

2.20 ML benchmarks and their pitfalls

Strewth, the lad’s been fiddling with his notes on ML benchmarks — that is, the tests and leaderboards folks use to say one model is better than another. Benchmarks are supposed to be objective measures, but he reminds us they can be gamed, overfitted to, or contaminated by sneaky dataset reuse that makes ‘progress’ look bigger than it is. He renamed and expanded the bit on treating benchmarks as part of scientific methodology, and added a new section on capability versus propensity evaluations — basically separating what a model can do from what it’s likely to do in the wild. For heaven’s sake, he also cleaned out the old short ‘As methodology’ note, so the argument reads more like a proper cautionary tale than a throwaway line.

Skipped: 3 file(s) changed but looked minor (or were metadata-only).