ILIAD2

Oddesty

2025-02-22 — 2025-09-04

Wherein the Bay Area unconference is recorded, a neural‑network analogue of the computational no‑coincidence conjecture is outlined, and a phase transition in singular learning theory is noted.

adversarial

catastrophe

economics

faster pussycat

innovation

language

machine learning

mind

neural nets

NLP

security

tail risk

technology

Figure 1: Zephyrus blows Odysseus and the AI researchers towards an interesting new formalism.

ILIAD2 is an unconference about diverse mainstream and left-field approaches to technical AI Safety in the SF Bay Area.

Much happened when I attended in 2025, and I haven’t digested it all yet. What follows is my highlight list. It’s somewhat scattered.

1 Singular learning theory

The breakaway theme of the conference was the Singular Learning Theory work, which went through a phase transition: what started, in my opinion, as a set of suggestive results that suggestively resemble an approach to AI safety, became results that actually look like they might be useful for non‑trivial things. Colour me surprised. There’s too much to summarize about that here, but it might be appropriate if someone deeper into it has a go.

2 Textbook from the future

I mention it both because it’s a cool new project and because it aspires to serve as an introduction to the whole AI Safety field.

The metaphor I usually use is that if a textbook from one hundred years in the future fell into our hands, containing all of the simple ideas that actually work robustly in practice, we could probably build an aligned super‑intelligence in six months. — Eliezer Yudkowsky

They’re attempting to write it. See here.

4 Rosas and Boyd, AI in a vat via bisimulation

AI in a vat: Fundamental limits of efficient world modelling for safe agent sandboxing (Rosas, Boyd, and Baltieri 2025). Connecting world models, simulation hypotheses, etc.

5 Oli Richardson’s probabilistic dependency graphs

Cool research on a generalized graphical model family. The author gave it a good pitch:

PhD
see also O. Richardson and Halpern (2020), O. E. Richardson (2022)

This stuff was an extremely interesting way of approaching inconsistency as a kind of generalized inferential target, producing some classic losses and model structures as special cases to persuade us that it’s natural. Very tasty work.

This thesis develops a broad theory of how to approach probabilistic modeling with possibly-inconsistent information, unifying and reframing much of the literature in the process. The key ingredient is a novel kind of graphical model, called a Probabilistic Dependency Graph (PDG), which allows for arbitrary (even conflicting) pieces of probabilistic information. In Part I, we establish PDGs as a generalization of other models of mental state, including traditional graphical models such as Bayesian Networks and Factor Graphs, as well as causal models, and even generalizations of probability distributions, such as Dempster-Shafer Belief functions. In Part II, we show that PDGs also capture modern neural representations. Surprisingly, standard loss functions can be viewed as the inconsistency of a PDG that models the situation appropriately. Furthermore, many important algorithms in AI are instances of a simple approach to resolving inconsistencies. In Part III, we provide algorithms for PDG inference, and uncover a deep algorithmic equivalence between the problems of inference and calculating a PDG’s numerical degree of inconsistency.

He described it compactly as constraining an inference algorithm to be consistent (in some sense) with respect to beliefs rather than utilities, which is a desirable property.

6 Daniel Herrmann and Aydin Mohseni on whether causal inference is even a thing

TBD; I lost my notes. But they argued that something like causal inference — in the sense of intervention inference — does not “require” the do operator, but can rather be constructed as standard conditionalization in an expanded graph. Except it ends up being computationally cheaper to use the do operator in practice.

It feels like something very deep is going on here.

7 Daniel Herrmann on principal-agent problems

When is it rational to outsource to an agent e.g. an AI agent? It’s an interesting slice of the alignment pie.

8 Adam Goldstein from Softmax on enlightened machines

I don’t know what to make of this yet. He made an argument from human developmental psychology that training bots on the entirety of the internet implicitly trains little psychopaths, which we can only understand as objects of control because we cannot imagine them as co-subjects. Sounds bad? But I only saw the initial presentation, which wasn’t very quantitative or detailed, so I can’t speak to it. There were some follow-on presentations that may have filled in the necessary details.

Softmax - Inspiration

9 Julian Gough on cosmic natural selection

Anthropic principles via natural selection upon black holes!

Prediction: Life will turn out to be everywhere (after a certain point)

10 Linkdump

Things I learned or people I met but didn’t have time to make better notes about.

Greatest game of the conference: Person Do Thing, introduced to me by Daniel Herrmann and Aydin Mohseni
Matthias Dellago
Lorxus is interesting
Anna Katariina Wisakanto – Risks from global systems and AI safety researcher. Visiting scholar, Leverhulme Centre for the Future of Intelligence
Agency, Intentions, and Artificial Intelligence
Artemy Kolchinsky did some amazing tricks with the information bottleneck.
Trying to understand John Wentworth’s research agenda
Shard Theory: An Overview
Center for the Study of Apparent Selves (CSAS)
Mohammad Akbarpour
Gilat Levy
CARMA: Center for AI Risk Management & Alignment
AI Village
Max Hennick
DIEP | Dutch Institute for Emergent Phenomena
Post-AGI Civilizational Equilibria Workshop | Vancouver 2025
[Post-AGI] Cooperation as Bulwark
Exploration hacking: can reasoning models subvert RL?
Anarchy as Architect: Competitive Pressure, Technology, and the Internal Structure of States
Constellation Research Center
88275: Bubbles | Simon DeDeo
Free-Energy Equilibria: Toward a Theory of Interactions Between Boundedly-Rational Agents | OpenReview
PIBBSS

The PIBBSS summer research fellowship is designed for researchers from various fields, mostly studying complex and intelligent behavior in natural and social systems but also those studying mathematics, philosophy or engineering, who are motivated by the mission of making AI systems safe and beneficial.

During the program, fellows work on selected projects at the intersection between their field of expertise and AI safety. Fellows will work in close collaboration with a mentor who will help them effectively navigate the AI Risk landscape and apply their knowledge to it.

The program is centrally aimed at Ph.D. or Postdoctoral researchers, however, we encourage interested individuals with comparable research experience in their field of expertise to apply regardless of their credential
Forethought
- What are the new technologies and challenges that AI could unlock? Which will come first?
- What can companies and governments do to avoid extreme power concentrations?
- Which beneficial applications of AI should be accelerated? How can we do that?
- How do we reach really good futures (rather than “just” avoiding catastrophe)?

11 Official sound track

The official sound track of ILIAD2 was General Fuzz. Shout out to Headphone James.

12 Proceedings reviews

I reviewed exactly one paper; it was about computational no-coincidence conjectures.

13 References

Adlam, Lee, Xiao, et al. 2020. “Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit.” arXiv:2010.07355 [Cs, Stat].

Brandl, and Brandt. 2020. “Arrovian Aggregation of Convex Preferences.” Econometrica.

Brandl, Brandt, and Seedig. 2016. “Consistent Probabilistic Social Choice.” Econometrica.

Brandl, Brandt, and Stricker. 2022. “An Analytical and Experimental Comparison of Maximal Lottery Schemes.” Social Choice and Welfare.

Christiano, Neyman, and Xu. 2022. “Formalizing the Presumption of Independence.”

Dunbar, and Aaronson. 2025. “Wide Neural Networks at Initialization Can Be Random Functions.” In.

Herrmann, and Levinstein. 2025. “Margins of Misalignment.”

Lanctot, Larson, Kaisers, et al. 2025. “Soft Condorcet Optimization for Ranking of General Agents.”

Maura-Rivero, Lanctot, Visin, et al. 2025. “Jackpot! Alignment as a Maximal Lottery.”

Maura-Rivero, Nagpal, Patel, et al. 2025. “Utility-Inspired Reward Transformations Improve Reinforcement Learning Training of Language Models.”

Munos, Valko, Calandriello, et al. 2024. “Nash Learning from Human Feedback.” In Proceedings of the 41st International Conference on Machine Learning. ICML’24.

Neal. 1996. “Priors for Infinite Networks.” In Bayesian Learning for Neural Networks. Lecture Notes in Statistics.

Patell. 2025. “Cooperation as Bulwark: Evolutionary Game Theory and the Internal Institutional Structure of States.”

Richardson, Oliver E. 2022. “Loss as the Inconsistency of a Probabilistic Dependency Graph: Choose Your Model, Not Your Loss Function.”

Richardson, Oliver. 2024. “A Unified Theory of Probabilistic Modeling, Dependence, and Inconsistency.”

Richardson, Oliver, and Halpern. 2020. “Probabilistic Dependency Graphs.”

Roberts, Yaida, and Hanin. 2022. The Principles of Deep Learning Theory: An Effective Theory Approach to Understanding Neural Networks.

Rosas, Boyd, and Baltieri. 2025. “AI in a Vat: Fundamental Limits of Efficient World Modelling for Agent Sandboxing and Interpretability.” In.

Siththaranjan, Laidlaw, and Hadfield-Menell. 2023. “Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF.” In.

Williams. 1996. “Computing with Infinite Networks.” In Proceedings of the 9th International Conference on Neural Information Processing Systems. NIPS’96.

Yang, and Hu. 2020. “Feature Learning in Infinite-Width Neural Networks.” arXiv:2011.14522 [Cond-Mat].

Zhou, Yang, Rossi, et al. 2022. “Neural Point Process for Learning Spatiotemporal Event Dynamics.” In Proceedings of The 4th Annual Learning for Dynamics and Control Conference.