Proposal: OpenPhil Career Transition

2025-01-10 — 2025-09-25

Wherein a failed application is set forth, and two research pathways are outlined: a Bias‑Robust Oversight programme at UTS’s Human Technology Institute, and MCMC estimation of the Local Learning Coefficient with Timaeus’ Murfet.

adversarial
AI safety
economics
faster pussycat
innovation
language
machine learning
mind
neural nets
NLP
security
tail risk
technology
Figure 1

My application to the Open Philanthropy Career development and transition funding.

This application was unsuccessful, and, according to policy, Open Philanthropy gives no feedback on unsuccessful applications, so I can’t tell you why. I publish it here for transparency and so that others can learn from it, and possibly triangulate which pitches Open Philanthropy is funding at time of submission.

This one is quite interesting. It includes a lot of things I would have thought Open Philanthropy would be interested in, such as attracting co-investment, leveraging existing research networks for high leverage, and endorsement from figures with high visibility in the AI safety landscape. I put it together after the idea was explicitly proposed to me by an 80,000 Hours career counsellor, and after discussing it with several people in the AI safety community.

From this one data point, I weakly update in the favour of some combination of the following:

  1. Open Philanthropy is running out of money
  2. or AI safety work is saturated — additional people aren’t adding value at the margin
  3. Open Philanthropy disagrees that the research pathways I personally proposed are high-impact, perhaps due to different theories of change or risk models
  4. Open Philanthropy does not think that people with my skill profile (mid-career AI methods researchers embedded in large organizations with a moderate profile) are a high-impact use of funds

I’d welcome any insights or feedback from the community on these assumptions, and I’m happy to discuss with others who are thinking of applying for their own career transition funding. May it be useful to you.

1 Plan

Secure runway to explore two promising research pathways. The aim is to determine which offers the most viable route, both in terms of resources and institutional support, to accelerate Australia’s AI safety research capacity.

1.1 Pathway 1: HTI @ UTS – Bias-Robust Oversight Research

The UTS’s Human Technology Institute (HTI) — particularly Prof. Sally Cripps (Co-Director, Mathematics & Statistics) and Dr. Hadi Afshar (Lead Research Scientist in AI transparency and explainability) — would like to work with me to build out an AI safety research program that’s ready pending funding. This program frames alignment through the lens of exploitation of cognitive biases and opponent-shaping (in the multi-agent reinforcement learning sense) — developing game-theoretic, probabilistic, and bias-aware scalable oversight models to characterise the behaviour of superhuman agents. Our proposal would be submitted to various funding agencies and would tie in with other epistemic-risk research building out in Australia right now, such as Pearce et al’s Capture the narrative

Why it matters:

  • A chance to influence Australia’s shifting research priorities and help build national AI safety capacity.
  • Leverages existing university-level research infrastructure and proven leadership at HTI.
  • Government-legible program may attract government co-investment, while still making progress on AI X-risk

1.2 Pathway 2: Timaeus Collaboration – Advancing Singular Learning Theory

Independently, I’m exploring a collaboration with Daniel Murfet of Timaeus to work on improving MCMC estimators of the Local Learning Coefficient (LLC), rooted in Singular Learning Theory. The LLC is a mathematically grounded measure of model complexity, with strong links to interpretability and the geometry of learning, and is increasingly relevant for alignment research.

Why it matters:

  • Directly taps into my quantitative strengths—statistics, theory, and mathematical modeling.
  • Supports the longer-term goal of integrating rigorous complexity measures into alignment frameworks, while collaborating with a platform that’s pushing developmental interpretability forward.

2 Strategic value

By freeing up time, I can explore both pathways intensively to determine:

  • Which has more traction and funding potential.
  • Where I can build capacity in Australian AI safety most effectively.
  • What institutional alignment exists for long-term uptake.

In both scenarios, the opportunity to foster university-level AI safety research in Australia—whether through HTI or through broader capacity building via theoretical work—is also worth evaluating. Shifting Australian research priorities during a major funding re-evaluation cycle (which is currently in progress) could potentially unlock further AI safety research funding from government co-investment.

3 Personal Statement

I am a mathematician and statistician with a longstanding interest in how advanced technologies shape long-term futures. My career so far has combined rigorous quantitative research with practical work in political consulting, giving me an unusual “double threat”: deep technical skills and firsthand experience in how information flows and persuasion actually operate at scale, which has greatly sensitized me to “Gradual-loss-of-control”-type dangers in particular.

I first became interested in AI safety over a decade ago, when I began writing about its potential societal and epistemic impacts on my blog in 2010. At the time, AI safety was a marginal topic, and my engagement was peripheral: reading, writing, and reflecting alongside my main research career. Over the years it became clear that the pace of AI capabilities was outstripping my expectations. By 2022, my timelines had shortened dramatically, and I decided to double down. I now see contributing to AI safety as both urgent and the most meaningful direction for my skills.

In my role at CSIRO, Australia’s national science agency, I tried to act as a change agent by pushing internally for greater attention to long-term AI risks. While this helped spark productive conversations, I concluded that the institutional inertia of a broad national science body makes it unlikely that I can have the necessary focus or impact from within. This has reinforced my decision to reorient my career fully toward AI safety research in a new institution, ideally one where I can attract further state investment and research support. My background also includes consulting for political campaigns, where I developed a model of public epistemology and persuasion that differs from the approaches of many technically trained researchers who lack that immersion. I believe catastrophic risks to collective reasoning are likely to emerge in the tail of massive, viral shifts in public epistemology, which I see as a likely mechanism for broad-scale loss of control to AI systems.

I see this body of research as one part of our collective effort to reduce catastrophic risk and safeguard the human future, but I’d need runway to achieve it.