ILIAD2

Oddesty

2025-02-22 — 2025-09-04

Wherein the Bay Area unconference is recorded, a neural‑network analogue of the computational no‑coincidence conjecture is outlined, and a phase transition in singular learning theory is noted.

adversarial
catastrophe
economics
faster pussycat
innovation
language
machine learning
mind
neural nets
NLP
security
tail risk
technology
Figure 1: Zephyrus blows Odysseus and the AI researchers towards an interesting new formalism.

ILIAD2 is an unconference about diverse mainstream and left-field approaches to technical AI Safety in the SF Bay Area.

Much happened when I attended in 2025, and I haven’t digested it all yet. What follows is my highlight list. It’s somewhat scattered.

1 Singular learning theory

The breakaway theme of the conference was the Singular Learning Theory work, which went through a phase transition: what started, in my opinion, as a set of suggestive results that suggestively resemble an approach to AI safety, became results that actually look like they might be useful for non‑trivial things. Colour me surprised. There’s too much to summarize about that here, but it might be appropriate if someone deeper into it has a go.

2 Textbook from the future

I mention it both because it’s a cool new project and because it aspires to serve as an introduction to the whole AI Safety field.

The metaphor I usually use is that if a textbook from one hundred years in the future fell into our hands, containing all of the simple ideas that actually work robustly in practice, we could probably build an aligned super‑intelligence in six months. — Eliezer Yudkowsky

They’re attempting to write it. See here.

3 Social Choice Theory: alignment as a Maximal Lottery

Roberto-Rafael Maura-Riverso explained:

When people disagree about what’s right, what should AI do? This is pluralistic AI alignment’s core challenge. Voting systems seem natural, but Arrow’s impossibility theorem crushed that hope: no perfect voting method exists. Until recently. Stochastic voting systems (maximal lotteries) sidestep these limitations. The question now: how do we make LLMs behave like this ideal system?

Next-token choice as a collective social choice problem? Sign me up!

Roberto-Rafael Maura-Riverso introduced Maximal Lotteries, the only (Brandl and Brandt 2020) stochastic lottery with certain nice properties that are desirable in an AI context (read the paper for which, I forget).

Things I learned:

RLHF is a “Borda count” vote (!) (Siththaranjan, Laidlaw, and Hadfield-Menell 2023), which performs poorly as a voting mechanism with respect to Condorcet outcomes.

Cf. “Nash learning from human feedback” (NashLHF), the “best” (democratically speaking) feedback system (Munos et al. 2024).

More info at Lanctot et al. (2025), Maura-Rivero, Lanctot, et al. (2025), Maura-Rivero, Nagpal, et al. (2025).

4 Rosas and Boyd, AI in a vat via bisimulation

AI in a vat: Fundamental limits of efficient world modelling for safe agent sandboxing (Rosas, Boyd, and Baltieri 2025). Connecting world models, simulation hypotheses, etc.

5 Oli Richardson’s probabilistic dependency graphs

Cool research on a generalized graphical model family. The author gave it a good pitch:

  • PhD
  • see also O. Richardson and Halpern (2020), O. E. Richardson (2022)

This stuff was an extremely interesting way of approaching inconsistency as a kind of generalized inferential target, producing some classic losses and model structures as special cases to persuade us that it’s natural. Very tasty work.

This thesis develops a broad theory of how to approach probabilistic modeling with possibly-inconsistent information, unifying and reframing much of the literature in the process. The key ingredient is a novel kind of graphical model, called a Probabilistic Dependency Graph (PDG), which allows for arbitrary (even conflicting) pieces of probabilistic information. In Part I, we establish PDGs as a generalization of other models of mental state, including traditional graphical models such as Bayesian Networks and Factor Graphs, as well as causal models, and even generalizations of probability distributions, such as Dempster-Shafer Belief functions. In Part II, we show that PDGs also capture modern neural representations. Surprisingly, standard loss functions can be viewed as the inconsistency of a PDG that models the situation appropriately. Furthermore, many important algorithms in AI are instances of a simple approach to resolving inconsistencies. In Part III, we provide algorithms for PDG inference, and uncover a deep algorithmic equivalence between the problems of inference and calculating a PDG’s numerical degree of inconsistency.

He described it compactly as constraining an inference algorithm to be consistent (in some sense) with respect to beliefs rather than utilities, which is a desirable property.

6 Daniel Herrmann and Aydin Mohseni on whether causal inference is even a thing

TBD; I lost my notes. But they argued that something like causal inference — in the sense of intervention inference — does not “require” the do operator, but can rather be constructed as standard conditionalization in an expanded graph. Except it ends up being computationally cheaper to use the do operator in practice.

It feels like something very deep is going on here.

7 Daniel Herrmann on principal-agent problems

When is it rational to outsource to an agent e.g. an AI agent? It’s an interesting slice of the alignment pie.

8 Adam Goldstein from Softmax on enlightened machines

I don’t know what to make of this yet. He made an argument from human developmental psychology that training bots on the entirety of the internet implicitly trains little psychopaths, which we can only understand as objects of control because we cannot imagine them as co-subjects. Sounds bad? But I only saw the initial presentation, which wasn’t very quantitative or detailed, so I can’t speak to it. There were some follow-on presentations that may have filled in the necessary details.

9 Julian Gough on cosmic natural selection

Anthropic principles via natural selection upon black holes!

10 Linkdump

Things I learned or people I met but didn’t have time to make better notes about.

11 Official sound track

The official sound track of ILIAD2 was General Fuzz. Shout out to Headphone James.

12 Proceedings reviews

I reviewed exactly one paper; it was about computational no-coincidence conjectures.

13 References

Adlam, Lee, Xiao, et al. 2020. Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit.” arXiv:2010.07355 [Cs, Stat].
Brandl, and Brandt. 2020. Arrovian Aggregation of Convex Preferences.” Econometrica.
Brandl, Brandt, and Seedig. 2016. Consistent Probabilistic Social Choice.” Econometrica.
Brandl, Brandt, and Stricker. 2022. An Analytical and Experimental Comparison of Maximal Lottery Schemes.” Social Choice and Welfare.
Christiano, Neyman, and Xu. 2022. Formalizing the Presumption of Independence.”
Dunbar, and Aaronson. 2025. Wide Neural Networks at Initialization Can Be Random Functions.” In.
Herrmann, and Levinstein. 2025. “Margins of Misalignment.”
Lanctot, Larson, Kaisers, et al. 2025. Soft Condorcet Optimization for Ranking of General Agents.”
Maura-Rivero, Lanctot, Visin, et al. 2025. Jackpot! Alignment as a Maximal Lottery.”
Maura-Rivero, Nagpal, Patel, et al. 2025. Utility-Inspired Reward Transformations Improve Reinforcement Learning Training of Language Models.”
Munos, Valko, Calandriello, et al. 2024. Nash Learning from Human Feedback.” In Proceedings of the 41st International Conference on Machine Learning. ICML’24.
Neal. 1996. Priors for Infinite Networks.” In Bayesian Learning for Neural Networks. Lecture Notes in Statistics.
Patell. 2025. Cooperation as Bulwark: Evolutionary Game Theory and the Internal Institutional Structure of States.”
Richardson, Oliver E. 2022. Loss as the Inconsistency of a Probabilistic Dependency Graph: Choose Your Model, Not Your Loss Function.”
Richardson, Oliver. 2024. A Unified Theory of Probabilistic Modeling, Dependence, and Inconsistency.”
Richardson, Oliver, and Halpern. 2020. Probabilistic Dependency Graphs.”
Roberts, Yaida, and Hanin. 2022. The Principles of Deep Learning Theory: An Effective Theory Approach to Understanding Neural Networks.
Rosas, Boyd, and Baltieri. 2025. AI in a Vat: Fundamental Limits of Efficient World Modelling for Agent Sandboxing and Interpretability.” In.
Siththaranjan, Laidlaw, and Hadfield-Menell. 2023. Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF.” In.
Williams. 1996. Computing with Infinite Networks.” In Proceedings of the 9th International Conference on Neural Information Processing Systems. NIPS’96.
Yang, and Hu. 2020. Feature Learning in Infinite-Width Neural Networks.” arXiv:2011.14522 [Cond-Mat].
Zhou, Yang, Rossi, et al. 2022. Neural Point Process for Learning Spatiotemporal Event Dynamics.” In Proceedings of The 4th Annual Learning for Dynamics and Control Conference.