Quis computat?

A competing-risks model for doom and alignment as a function of compute allocation

2026-03-25 — 2026-03-29

Wherein X-Risk and Guaranteed Alignment Are Modelled as Competing First-Arrival Events Whose Hazard Rates Grow in Capability Compute and Safety Compute Respectively, and the Probability of Doom Is Expressed as an Integral Over the Compute Trajectory.

AI safety

bounded compute

point processes

probability

survival analysis

when to compute

Surely this analysis has been done before in the bowels of LessWrong. I gave up searching because it was too irritating trying to disambiguate the terms survival and hazard in the technical mathematical sense that I needed, against the more colloquial sense that they are used in AI safety discourse. Feel free to point me to prior work in the comments.

I had a discussion recently where the contested thing was whether getting more compute was, by default, “good” or “bad” for AI safety research. What follows is my attempt to formalize goodness and badness in terms of hazard models, which seems to me to be a nice way of making the assumptions and disagreements explicit.

The model setup is as follows. We assume two things can happen — an AI catastrophe (“doom”), or a breakthrough that guarantees alignment (“deliverance”). Both become more likely as more compute accumulates in the world, but they draw on different, potentially overlapping, pools of compute. The question becomes: given a trajectory of total compute and a policy for splitting it between capability and safety, what is the probability that doom arrives before deliverance?

The machinery I use is survival analysis of competing risks — the branch of statistics built for this kind of “which stochastic event comes first?” problem. The motivation for expressing everything in terms of compute comes from the economics of cognition: compute is the fundamental currency of intelligence, so it should be possible to express both the problem and the solution in terms of how compute is allocated.

Translation guide

If you haven’t seen survival analysis or point processes before, the key concepts are:

A hazard rate is the instantaneous probability per unit time that an event fires right now, given that it hasn’t fired yet.
A survival function \(S(t)\) is the probability that nothing has happened yet by time \(t\). It starts at 1 and decays toward 0 as hazard accumulates.
Competing risks means two or more events are racing to fire first. We observe whichever wins; the loser is never realized.

If you are coming from LessWrong, some translations: the safety fraction \(\alpha\) is the “alignment tax”; “P(doom)” is derived here from a competing-risks integral rather than stated as a prior; and the response-function shapes below map onto familiar worldviews (“fast takeoff, hard alignment” \(\leftrightarrow\) convex \(g\) / concave \(h\); “slow takeoff, tractable alignment” \(\leftrightarrow\) the reverse).

Code

import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
from scipy.integrate import cumulative_trapezoid

pio.renderers.default = "plotly_mimetype+notebook_connected"
from livingthing.plotly_style import set_livingthing_style
set_livingthing_style()

# Shared palette
C_DOOM = '#c0392b'
C_SALV = '#2471a3'
C_SURV = '#27ae60'
C_RUIN = '#8e44ad'
C_NEUT = '#7f8c8d'

LAYOUT = dict(
    paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)',
    hovermode='x unified',
    legend=dict(bgcolor='rgba(255,255,255,0.5)', bordercolor='#ccc', borderwidth=1),
)

# Response-function zoo
def g_lin(x): return 0.8 * x
def h_lin(x): return 0.5 * x
def g_pess(x): return 0.5 * x**2
def h_pess(x): return 0.5 * np.sqrt(x)
def g_opt(x): return 0.5 * np.sqrt(x)
def h_opt(x): return 0.5 * x**2
def sigmoid(x, k, x0): return 5.0 / (1.0 + np.exp(-k * (x - x0)))
def g_sig(x): return sigmoid(x, 2, 3)
def h_sig(x): return sigmoid(x, 2, 7)

1 Compute trajectories

I think we should distinguish between the rate at which compute is performed and the cumulative “stock” of computation that has been done.

Let \(c(t)\) denote the compute rate — the FLOP/s of AI-related computation happening in the world at time \(t \geq 0\). This is exogenous capacity: hardware, data centres, and investment. We treat it as non-decreasing and right-continuous. We make no distinction between training and inference — \(c(t)\) is unstructured compute, all of it. The more of it that is running, the more things are happening; we don’t need to model the internal structure of what kinds of workloads are running for this granularity of analysis.

The cumulative compute — total FLOPs performed by time \(t\) — is

\[ \mathcal{C}(t) = \int_0^t c(u)\, du. \]

A policy lever \(\alpha(t) \in [0,1]\) — the safety fraction — splits the compute rate into two streams:

Capability rate \(c_c(t) = (1-\alpha(t))\, c(t)\): compute aimed at expanding what AI systems can do.
Safety rate \(c_s(t) = \alpha(t)\, c(t)\): compute focused on ensuring AI systems do what we want.

The cumulative stocks in each pool are then

\[ \mathcal{C}_c(t) = \int_0^t (1-\alpha(u))\, c(u)\, du, \qquad \mathcal{C}_s(t) = \int_0^t \alpha(u)\, c(u)\, du, \]

with \(\mathcal{C}_c(t) + \mathcal{C}_s(t) = \mathcal{C}(t)\) at all times.

The stock/rate distinction matters in this model because we imagine that capabilities persist. We can’t un-train a frontier model by switching off the data centre; the weights, the papers, the algorithmic insights are already in the world. Safety progress persists too — proved theorems, verified architectures, and alignment techniques don’t evaporate. So the hazard of doom should depend on the cumulative stock of capability compute \(\mathcal{C}_c(t)\), not the instantaneous rate \(c_c(t)\). The rate \(c(t)\) determines how fast the stocks grow; the stocks determine the hazard.

The function \(\alpha(\cdot)\) is the policy lever — the thing a civilisation chooses.

2 Two competing events

We model doom and deliverance as the first arrivals of inhomogeneous point processes — random events whose chance of firing at any moment depends on how much compute has accumulated so far:

Doom (\(T_d\)): An X-risk catastrophe happens. This is an irreversible absorbing state.
Deliverance (\(T_s\)): We achieve guaranteed alignment — a state after which X-risk from AI is effectively zero.

Each event has a latent arrival time — \(T_d\) and \(T_s\) are the times at which doom and deliverance would fire if nothing else intervened. But the race ends at \(T^* = \min(T_d, T_s)\): we observe whichever fires first, and from that moment both hazard rates cease to apply. Doom either happens first or not at all; deliverance either happens first or not at all; or neither fires within any relevant timeframe. When we write “\(P(\text{doom})\)” below, we always mean the outcome “doom fires first” — i.e. \(P(T_d < T_s)\) — not the marginal probability that the doom process would eventually fire in isolation.

Each event has a hazard rate (instantaneous arrival intensity given that neither event has yet occurred — i.e. while still being in limbo) that depends on the cumulative stock of compute in its respective pool:

\[ \lambda_d(t) = g\bigl(\mathcal{C}_c(t)\bigr), \qquad \lambda_s(t) = h\bigl(\mathcal{C}_s(t)\bigr), \]

where \(g, h : [0, \infty) \to [0, \infty)\) are monotonically non-decreasing functions, with \(g(0) = 0\) and \(h(0) = 0\). Monotonicity captures the assumption that the more capability compute we’ve done overall, the higher the catastrophe hazard per unit time; likewise, the more safety compute we’ve done overall, the higher the alignment-breakthrough hazard per unit time.

Note that these hazard rates depend on time via the cumulative compute stocks. The compute rate \(c(t)\) does not appear directly—it enters only by determining how fast \(\mathcal{C}_c\) and \(\mathcal{C}_s\) grow. So the compute growth rate matters: doubling \(c(t)\) does not double the hazard at time \(t\), but it does make \(\mathcal{C}_c(t)\) reach any given threshold sooner.

The assumption that doom risk is monotone in \(\mathcal{C}_c\) and safety progress is monotone in \(\mathcal{C}_s\) is agnostic about the shape of the response — linear, concave, convex, sigmoidal — which is where non-trivial disagreements in AI safety discourse might live. We return to this below.

Code

x = np.linspace(0, 10, 300)
scenarios = [
    ("Linear",              g_lin(x),  h_lin(x)),
    ("Pessimistic (g convex, h concave)", g_pess(x), h_pess(x)),
    ("Optimistic (g concave, h convex)",  g_opt(x),  h_opt(x)),
    ("Sigmoidal overhang",  g_sig(x),  h_sig(x)),
]

fig = go.Figure()
for name, gv, hv in scenarios:
    vis = name.startswith("Pessimistic")
    fig.add_trace(go.Scatter(x=x, y=gv, name="g(x)  doom", line=dict(color=C_DOOM, width=3),
                             visible=vis, hovertemplate="%{y:.2f}"))
    fig.add_trace(go.Scatter(x=x, y=hv, name="h(x)  safety", line=dict(color=C_SALV, width=3),
                             visible=vis, hovertemplate="%{y:.2f}"))

buttons = []
for i, (name, _, _) in enumerate(scenarios):
    vis = [False] * (2 * len(scenarios))
    vis[2*i] = True; vis[2*i+1] = True
    buttons.append(dict(label=name, method="update", args=[{"visible": vis}]))

fig.update_layout(
    **LAYOUT,
    updatemenus=[dict(type="buttons", direction="down", x=1.0, xanchor="left", y=1.0,
                      buttons=buttons, bgcolor="white")],
    xaxis_title="Cumulative compute 𝒞",
    yaxis_title="Hazard rate",
    height=420,
)
fig.show()

Figure 2: Response functions g (doom hazard, red) and h (safety hazard, blue) as functions of cumulative compute under four shape assumptions. Use the buttons to switch scenarios.

3 Three outcomes of the race

The probability that the race is still in limbo at time \(t\) — neither event has fired — is what survival analysis calls the joint survival function:

\[ S(t) = \exp\!\left[-\Lambda_d(t) - \Lambda_s(t)\right], \]

In words: the probability of still being in limbo decays exponentially as the hazard accumulates. Here \(\Lambda_d(t) = \int_0^t g(\mathcal{C}_c(u))\, du\) and \(\Lambda_s(t) = \int_0^t h(\mathcal{C}_s(u))\, du\) are cumulative hazard functions. Note the nested structure: each cumulative hazard is an integral over time of a function that itself contains an integral over time.

The race has three mutually exclusive outcomes:

\[ P(\text{doom}) + P(\text{deliverance}) + P(\text{limbo}) = 1, \]

where:

\(P(\text{doom}) = \int_0^\infty \lambda_d(t)\, S(t)\, dt\) — the doom event fires first,
\(P(\text{deliverance}) = \int_0^\infty \lambda_s(t)\, S(t)\, dt\) — the deliverance event fires first,
\(P(\text{limbo}) = \lim_{t\to\infty} S(t)\) — neither event ever fires.

In our model, \(g, h\) is monotone and stocks \(\Lambda_d(t) + \Lambda_s(t) \to \infty\) keep growing, so \(P(\text{limbo}) = 0\): the race always resolves eventually. But “eventually” can be a very long time. Over any finite horizon, there is residual probability mass on “neither yet” — the green curve \(S(t)\) in the plots below — and this residual is a practically relevant quantity. A world where \(S(t)\) stays large at human-relevant timescales is one where we just muddle through indefinitely, which is arguably closer to most people’s baseline expectation than either doom or deliverance.

At each instant \(t\), while we’re still in limbo, the probability that some event fires within \([t, t+dt)\) is \((\lambda_d(t) + \lambda_s(t))\, dt\), and the conditional probability that the firing event is doom rather than deliverance is

\[ \pi_d(t) = \frac{\lambda_d(t)}{\lambda_d(t) + \lambda_s(t)}. \]

So doom probability can also be written

\[ P(\text{doom}) = \int_0^\infty \pi_d(t) \bigl[\lambda_d(t) + \lambda_s(t)\bigr] S(t)\, dt, \]

which decomposes neatly into: “probability we’re still waiting at time \(t\)” × “probability something happens right now” × “probability that the something is doom rather than deliverance.” This is the key integral of the post — everything below is about what determines its value.

Code

alpha_race = 0.5
t = np.linspace(0, 35, 800)
c_rate = np.exp(0.1 * t)                          # compute rate c(t) in FLOP/s
Cc = cumulative_trapezoid((1 - alpha_race) * c_rate, t, initial=0)  # 𝒞_c(t)
Cs = cumulative_trapezoid(alpha_race * c_rate, t, initial=0)        # 𝒞_s(t)

# Response functions applied to cumulative stocks
ld = 0.005 * Cc**2       # superlinear doom
ls = 0.015 * Cs           # linear safety

cumhaz = cumulative_trapezoid(ld + ls, t, initial=0)
S = np.exp(-cumhaz)
doom_density = ld * S
deliv_density = ls * S
P_doom = np.trapezoid(doom_density, t)
P_deliv = np.trapezoid(deliv_density, t)
P_limbo = 1 - P_doom - P_deliv

fig = make_subplots(specs=[[{"secondary_y": True}]])
fig.add_trace(go.Scatter(x=t, y=deliv_density, name="λ_s S(t)  deliverance density",
                         fill='tozeroy', line=dict(color=C_SALV, width=1),
                         fillcolor='rgba(36,113,163,0.35)',
                         hovertemplate="t=%{x:.1f}  deliv=%{y:.4f}"),
              secondary_y=False)
fig.add_trace(go.Scatter(x=t, y=doom_density, name="λ_d S(t)  doom density",
                         fill='tozeroy', line=dict(color=C_DOOM, width=1),
                         fillcolor='rgba(192,57,43,0.35)',
                         hovertemplate="t=%{x:.1f}  doom=%{y:.4f}"),
              secondary_y=False)
fig.add_trace(go.Scatter(x=t, y=S, name="S(t)  limbo",
                         line=dict(color=C_SURV, width=3),
                         hovertemplate="t=%{x:.1f}  S=%{y:.3f}"),
              secondary_y=True)

# Annotate all three outcomes
fig.add_annotation(x=t[np.argmax(doom_density)], y=max(doom_density) * 1.15,
                   text=f"P(doom) = {P_doom:.2f}", showarrow=False,
                   font=dict(size=14, color=C_DOOM))
fig.add_annotation(x=t[np.argmax(deliv_density)], y=max(deliv_density) * 1.15,
                   text=f"P(deliverance) = {P_deliv:.2f}", showarrow=False,
                   font=dict(size=14, color=C_SALV))
fig.add_annotation(x=t[-1] * 0.85, y=0.05, yref="y2",
                   text=f"P(limbo) = {P_limbo:.2f}", showarrow=False,
                   font=dict(size=14, color=C_SURV))

fig.update_xaxes(title_text="Time t")
fig.update_yaxes(title_text="Event density", secondary_y=False)
fig.update_yaxes(title_text="S(t) — limbo", secondary_y=True, range=[0, 1.05])
fig.update_layout(**LAYOUT, height=440)
fig.show()

Figure 3: Resolution of the race with superlinear doom risk (g(𝒞) = 0.005𝒞²) and linear safety progress (h(𝒞) = 0.015𝒞). Compute rate c(t) = e^{0.1t}, α = 0.5. The green curve (right axis) is S(t), the probability of limbo — neither doom nor deliverance has yet occurred. The shaded areas (left axis) show doom density λ_d(t)S(t) (red) and deliverance density λ_s(t)S(t) (blue). Their integrals give P(doom) and P(deliverance); the residual 1 − P(doom) − P(deliverance) is P(limbo) over this time window.

4 The constant-hazard-ratio case

Consider an ultra-simple sanity-check case. If the ratio \(\pi_d(t) \equiv \pi_d\) is constant over time, then \(P(\text{doom}) = \pi_d\) holds regardless of the compute trajectory — this is a standard competing risks identity. \(\pi_d\) stays constant when \(g\) and \(h\) are linear and \(\alpha\) is constant. If \(g(x) = ax\) and \(h(x) = bx\), then \(\mathcal{C}_c(t) = (1-\alpha)\mathcal{C}(t)\) and \(\mathcal{C}_s(t) = \alpha\mathcal{C}(t)\).

\[ \pi_d = \frac{a(1-\alpha)\,\mathcal{C}(t)}{a(1-\alpha)\,\mathcal{C}(t) + b\alpha\,\mathcal{C}(t)} = \frac{a(1-\alpha)}{a(1-\alpha) + b\alpha}, \]

and the \(\mathcal{C}(t)\) cancels. The probability of a doom outcome \(P(\text{doom})\) depends only on \(\alpha\) and the ratio \(a/b\), not on the compute rate \(c(t)\) or how fast it grows. This is the regime in which the speed of progress doesn’t matter — only the split between capability and safety compute determines your fate. If you’ve ever heard someone say “it doesn’t matter how fast AI progresses, only whether we invest enough in safety” — this is the (very specific) model in which that’s true.

It’s also not very plausible. Let’s get complicated.

5 Interesting response curves

With non-linear response functions, this invariance breaks down. \(P(\text{doom})\) now depends on the full trajectory \(c(t)\), because the time spent at each cumulative compute level determines how much hazard accumulates at that level. In the plot below, we compute \(P(\text{doom})\) by integration over a trajectory with exponential compute growth \(c(t) = e^{rt}\), for two different growth rates.

Code

alphas = np.linspace(0.01, 0.99, 80)
t_sim = np.linspace(0, 60, 2000)

def compute_pdoom(g_fn, h_fn, alpha_val, r):
    """Compute P(doom) by integration over a trajectory."""
    c_rate = np.exp(r * t_sim)
    Cc = cumulative_trapezoid((1 - alpha_val) * c_rate, t_sim, initial=0)
    Cs = cumulative_trapezoid(alpha_val * c_rate, t_sim, initial=0)
    ld = g_fn(Cc)
    ls = h_fn(Cs)
    cumhaz = cumulative_trapezoid(ld + ls, t_sim, initial=0)
    S = np.exp(-cumhaz)
    doom_density = ld * S
    return np.trapezoid(doom_density, t_sim)

curves = [
    ("Linear",      g_lin,  h_lin,  C_NEUT),
    ("Pessimistic",  g_pess, h_pess, C_DOOM),
    ("Optimistic",   g_opt,  h_opt,  C_SALV),
    ("Sigmoidal",    g_sig,  h_sig,  C_RUIN),
]

fig = go.Figure()
for name, gf, hf, col in curves:
    for r, dash, suffix in [(0.3, "solid", "fast r=0.3"), (0.05, "dash", "slow r=0.05")]:
        pr = np.array([compute_pdoom(gf, hf, a, r) for a in alphas])
        fig.add_trace(go.Scatter(x=alphas, y=pr, name=f"{name} ({suffix})",
                                 line=dict(color=col, width=2.5, dash=dash),
                                 hovertemplate="α=%{x:.2f}  P(doom)=%{y:.3f}"))

fig.add_hline(y=0.5, line_dash="dot", line_color="#bbb", annotation_text="P(doom) = ½",
              annotation_position="bottom right")

fig.update_layout(
    **LAYOUT,
    xaxis_title="Safety fraction α",
    yaxis_title="P(doom)",
    height=480,
)
fig.show()

Figure 4: Probability of doom as a function of safety fraction α, computed by integration over an exponential compute trajectory. Solid lines: fast growth (r = 0.3); dashed: slow growth (r = 0.05). For the linear case (grey), the curves coincide — growth rate doesn’t matter. For non-linear responses, faster growth shifts the doom probability because it changes how much time the system spends in different hazard regimes.

Interesting cases arise when \(g\) and \(h\) have different shapes.

I think I made an arithmetic error here; the pessimistic curve should be worse than the optimistic one, but the plot above shows the opposite. Five internet points if you can show me the error.

Some scenarios to consider:

Convex \(g\), concave \(h\) — “fast takeoff, hard alignment.” The doom hazard rate is superlinear in cumulative capability compute; the deliverance hazard rate has diminishing returns in cumulative safety compute. This is the pessimistic scenario: increasing \(\alpha\) helps at first, but as \(\mathcal{C}(t) \to \infty\) grows, the doom hazard rate eventually dominates regardless of allocation, so \(\pi_d(t) \to 1\), and the doom outcome becomes near-certain. In this regime, slowing down \(c(t)\) itself — reducing the rate at which compute accumulates — is the only robust strategy. This is roughly the worldview behind calls for a compute pause: if the response curves are this shape, allocation alone cannot save us.

Concave \(g\), convex \(h\) — “slow takeoff, tractable alignment.” The doom hazard rate saturates; the deliverance hazard rate is superlinear once we invest enough. This is the optimistic scenario: there exists a threshold of cumulative safety compute above which the deliverance outcome is almost certain. This is the implicit model behind “we just need to invest enough in alignment” — the alignment tax is finite and worth paying.

Sigmoidal \(g\) and \(h\) with different inflection points — “the capability overhang.” Both processes have thresholds, but they may not be in the same place. If the safety threshold \(\mathcal{C}_s^*\) is much larger than the doom threshold \(\mathcal{C}_c^*\), there is a dangerous window where cumulative capability compute is in the steep part of \(g\) while cumulative safety compute is still in the flat part of \(h\). This is arguably the scenario most alignment researchers are worried about: a period in which capabilities are advancing rapidly but safety hasn’t yet reached its own inflection point.

\(g\) depends on \(\mathcal{C}_s\) too — “capabilities are fungible.” Safety research itself requires capable AI systems. If \(\mathcal{C}_s\) contributes to both \(\lambda_s\) and \(\lambda_d\) (because safety compute also advances capabilities as a side effect), the model needs modification. One could write \(\lambda_d(t) = g(\mathcal{C}(t))\) — making doom risk a function of total cumulative compute — while \(\lambda_s(t) = h(\mathcal{C}_s(t))\). This makes the allocation problem strictly harder, because safety investment has a capability externality.

Let’s plot the first of those, then return to the last one.

Code

alpha = 0.3
t = np.linspace(0, 25, 800)
c_rate = np.exp(0.3 * t)                               # compute rate
Cc = cumulative_trapezoid((1 - alpha) * c_rate, t, initial=0)  # 𝒞_c(t)
Cs = cumulative_trapezoid(alpha * c_rate, t, initial=0)        # 𝒞_s(t)

# Sigmoidal response to cumulative stocks
ld = sigmoid(Cc, 1.5, 4)
ls = sigmoid(Cs, 1.5, 10)

pi_d = ld / (ld + ls + 1e-12)

# Window boundaries (where doom is >0.5 of max but safety is <0.5 of max)
t_doom_on = t[np.searchsorted(ld, 0.5 * 5)]
t_salv_on = t[np.searchsorted(ls, 0.5 * 5)]

fig = make_subplots(rows=2, cols=1, shared_xaxes=True,
                    subplot_titles=["Hazard rates", "Conditional doom probability π_d(t)"],
                    vertical_spacing=0.12)

fig.add_trace(go.Scatter(x=t, y=ld, name="λ_d (doom)", line=dict(color=C_DOOM, width=2.5),
                          hovertemplate="%{y:.2f}"), row=1, col=1)
fig.add_trace(go.Scatter(x=t, y=ls, name="λ_s (safety)", line=dict(color=C_SALV, width=2.5),
                          hovertemplate="%{y:.2f}"), row=1, col=1)
fig.add_trace(go.Scatter(x=t, y=pi_d, name="π_d(t)", line=dict(color=C_RUIN, width=2.5),
                          hovertemplate="%{y:.3f}"), row=2, col=1)

fig.add_vrect(x0=t_doom_on, x1=t_salv_on, fillcolor="rgba(192,57,43,0.10)",
              line=dict(color=C_DOOM, width=1, dash="dash"), row=1, col=1)
fig.add_vrect(x0=t_doom_on, x1=t_salv_on, fillcolor="rgba(192,57,43,0.10)",
              line=dict(color=C_DOOM, width=1, dash="dash"), row=2, col=1)

fig.add_annotation(x=(t_doom_on + t_salv_on)/2, y=2.5, text="Dangerous<br>window",
                   showarrow=False, font=dict(size=13, color=C_DOOM), row=1, col=1)

fig.update_xaxes(title_text="Time t", row=2, col=1)
fig.update_yaxes(title_text="Hazard rate", row=1, col=1)
fig.update_yaxes(title_text="π_d(t)", range=[0, 1.05], row=2, col=1)
fig.update_layout(**LAYOUT, height=560)
fig.show()

Figure 5: The capability overhang. Sigmoidal g and h with the doom threshold reached before the safety threshold create a dangerous window (shaded) in which π_d ≈ 1. Compute rate c(t) = e^{0.3t}, α = 0.3.

6 What if safety compute is also capabilities compute?

In our base model, the doom hazard depends only on capabilities compute \(\mathcal{C}_c\), and the deliverance hazard depends only on safety compute \(\mathcal{C}_s\). But that separation is optimistic. Safety research often requires running large models, probing their behaviour, red-teaming, and training oversight systems — all of which also advance capabilities as a side effect. At the limit, maybe all compute advances capabilities, regardless of intent — we cannot do useful safety research without also pushing the capability frontier forward.

The externality model modifies the hazard rates to reflect this:

\[ \lambda_d(t) = g\bigl(\mathcal{C}(t)\bigr), \qquad \lambda_s(t) = h\bigl(\mathcal{C}_s(t)\bigr), \]

where \(\mathcal{C}(t) = \mathcal{C}_c(t) + \mathcal{C}_s(t)\) is the total cumulative compute. The doom hazard rate now depends on everything — capability compute and safety compute alike — while the deliverance hazard rate still depends only on the safety fraction.

In this model, the allocation \(\alpha\) cannot reduce the doom hazard rate at all. Since \(\mathcal{C}(t) = \int_0^t c(u)\,du\) holds regardless of how the compute is split, the doom hazard is set entirely by the exogenous trajectory \(c(t)\). The only thing \(\alpha\) can do is push the deliverance hazard rate up fast enough to compete with it.

The conditional doom probability becomes

\[ \pi_d(t) = \frac{g(\mathcal{C}(t))}{g(\mathcal{C}(t)) + h(\mathcal{C}_s(t))}. \]

In the linear case (\(g(x) = ax\), \(h(x) = bx\), with constant \(\alpha\)), this simplifies to

\[ \pi_d = \frac{a\,\mathcal{C}(t)}{a\,\mathcal{C}(t) + b\,\alpha\,\mathcal{C}(t)} = \frac{a}{a + b\alpha}. \]

Let’s compare this with the separable model’s \(\pi_d = a(1-\alpha)/(a(1-\alpha) + b\alpha)\). The externality model is always worse: the numerator is \(a\) rather than \(a(1-\alpha)\), because diverting compute to safety no longer reduces the doom hazard — it only increases the deliverance hazard. With \(a = b\), the separable model gives \(P(\text{doom}) = 0.5\) at \(\alpha = 0.5\); the externality model gives \(P(\text{doom}) = 0.5\) at \(\alpha = 1\) — we need all our compute on safety just to get even odds.

Even at \(\alpha = 1\), the doom hazard hasn’t gone away. It’s \(g(\mathcal{C}_s(t))\) — because all that safety compute is also capability compute. We’re in a race where every step toward deliverance also drags doom closer.

Code

alphas_ext = np.linspace(0.01, 0.99, 80)
t_ext = np.linspace(0, 60, 2000)

def compute_pdoom_externality(g_fn, h_fn, alpha_val, r):
    """P(doom) under the externality model: g depends on total compute."""
    c_rate = np.exp(r * t_ext)
    C_total = cumulative_trapezoid(c_rate, t_ext, initial=0)
    Cs = cumulative_trapezoid(alpha_val * c_rate, t_ext, initial=0)
    ld = g_fn(C_total)       # doom depends on TOTAL compute
    ls = h_fn(Cs)             # deliverance depends on safety compute only
    cumhaz = cumulative_trapezoid(ld + ls, t_ext, initial=0)
    S = np.exp(-cumhaz)
    return np.trapezoid(ld * S, t_ext)

def compute_pdoom_separable(g_fn, h_fn, alpha_val, r):
    """P(doom) under the separable model: g depends on capability compute only."""
    c_rate = np.exp(r * t_ext)
    Cc = cumulative_trapezoid((1 - alpha_val) * c_rate, t_ext, initial=0)
    Cs = cumulative_trapezoid(alpha_val * c_rate, t_ext, initial=0)
    ld = g_fn(Cc)
    ls = h_fn(Cs)
    cumhaz = cumulative_trapezoid(ld + ls, t_ext, initial=0)
    S = np.exp(-cumhaz)
    return np.trapezoid(ld * S, t_ext)

fig = go.Figure()
for g_fn, h_fn, name, col in [
    (g_lin, h_lin, "Linear", C_NEUT),
    (g_pess, h_pess, "Pessimistic", C_DOOM),
    (g_opt, h_opt, "Optimistic", C_SALV),
]:
    r = 0.1
    pd_sep = np.array([compute_pdoom_separable(g_fn, h_fn, a, r) for a in alphas_ext])
    pd_ext = np.array([compute_pdoom_externality(g_fn, h_fn, a, r) for a in alphas_ext])
    fig.add_trace(go.Scatter(x=alphas_ext, y=pd_sep, name=f"{name} (separable)",
                             line=dict(color=col, width=2, dash="dash"),
                             hovertemplate="α=%{x:.2f}  P(doom)=%{y:.3f}"))
    fig.add_trace(go.Scatter(x=alphas_ext, y=pd_ext, name=f"{name} (externality)",
                             line=dict(color=col, width=3),
                             hovertemplate="α=%{x:.2f}  P(doom)=%{y:.3f}"))

fig.add_hline(y=0.5, line_dash="dot", line_color="#bbb", annotation_text="P(doom) = ½",
              annotation_position="bottom right")
fig.update_layout(
    **LAYOUT,
    xaxis_title="Safety fraction α",
    yaxis_title="P(doom)",
    height=480,
)
fig.show()

Figure 6: P(doom) vs safety fraction α, comparing the separable model (dashed) with the externality model (solid) where doom risk depends on total compute. The gap between them is the cost of the capability externality. Exponential compute rate c(t) = e^{0.1t}.

The gap between the dashed (separable) and solid (externality) curves is the cost of the capability externality — the additional doom probability we bear because safety research also advances capabilities. In the pessimistic (convex \(g\)) regime, the externality model is especially punishing: the superlinear doom hazard is driven by total compute, which \(\alpha\) cannot touch.

7 What pauses do

This formulation makes the effect of a “compute pause” precise. Suppose we’re in the separable model. At time \(t_0\) the compute rate drops to near zero: \(c(t) \approx 0\) for \(t \in [t_0, t_0 + \Delta]\). During the pause, the cumulative stocks \(\mathcal{C}_c\) and \(\mathcal{C}_s\) are frozen — no new capabilities, no new safety progress. But the hazard rates \(\lambda_d(t_0) = g(\mathcal{C}_c(t_0))\) and \(\lambda_s(t_0) = h(\mathcal{C}_s(t_0))\) are also frozen at their current values, and the cumulative hazard \(\Lambda_d\) continues to grow at that frozen rate. Doom can still arrive during a pause — we’ve stopped accumulating new risk, but we haven’t reduced the hazard we’ve already built up.

A pause buys time in a specific sense: it extends the interval over which \(\pi_d\) stays at its current value, rather than letting cumulative compute push us into a worse regime. If we’re in the flat part of \(g\) (low hazard), pausing wastes the opportunity to accumulate safety compute. If we’re in the steep part of \(g\) (high hazard), pausing prevents things from getting worse while the current hazard ticks away — useful only if we spend the pause changing \(\alpha\) or \(g\) directly (via policy, regulation, or new alignment techniques).

In the linear case (\(\pi_d\) constant), pausing is useless: the conditional doom probability \(\pi_d\) is the same before, during, and after the pause. In the convex-\(g\) case, pausing in the steep region is the only way to avoid the superlinear runaway.

The “externality” variant also changes the calculus of pauses. In the separable model, a pause freezes both hazard rates. In the externality model, a pause still freezes both — but the difference is what happens after the pause. Resuming compute at any \(\alpha\) feeds the doom hazard at the same rate, because all compute is capability compute. The only lever is \(\alpha\).

8 What this doesn’t model

This framework is minimal and stylized. Some important things it ignores:

Discrete actors. There is no game theory here — just a single planner choosing \(\alpha\). In practice, the allocation is the outcome of many actors with misaligned incentives.
We achieve alignment but it is expensive, so we don’t use it. The deliverance event is modelled as a single point arrival — a moment at which we get guaranteed alignment. But in practice, we might have a breakthrough that gives us the option of guaranteed alignment, but it is so expensive to implement that we don’t actually deploy it.
Partial doom, incremental deliverance. Both events are modelled as discrete point arrivals — single moments at which the state transitions irreversibly. This is reasonable for doom (a single catastrophe), but strange for deliverance. Real alignment progress is incremental: better interpretability, verified properties, scalable oversight, each partially reducing risk. A more plausible model would replace the deliverance point process with one whose arrivals down-modulate the doom hazard rate — each safety milestone reduces \(g\) rather than ending the race outright. We could also model incremental doom — each catastrophe raises the baseline risk, although that feels less natural. If we’re worried about bad-but-not-doom events, we’d probably move to some continuous badness index, like “dollar value of harm” or “number of lives lost”, rather than a binary doom/deliverance outcome.
The doom hazard might not be monotone in compute. If AI systems can be made robustly safe at high capability levels, the hazard might eventually decrease. This would require \(g\) to be non-monotone, which is a qualitatively different model.
Correlation. Doom and deliverance might not be conditionally independent given the compute trajectory. Alignment breakthroughs might come from the same capability advances that increase risk. Or the opposite: compute used for safety might be the same compute used for capabilities, so \(\mathcal{C}_s\) and \(\mathcal{C}_c\) aren’t really separate stocks.
Optimal control. We haven’t solved for the optimal \(\alpha(\cdot)\). This is a dynamic optimal control problem, because \(\alpha(t)\) affects cumulative stocks \(\mathcal{C}_c(u)\) and \(\mathcal{C}_s(u)\) at all future times \(u > t\). That sounds fun, but it’s not worth investigating because even if the model were true we wouldn’t know the response functions well enough to solve it, and even if we knew how to solve it, I cannot imagine us coordinating to implement that solution.
Granular allocation of compute to many different teams or ideas with different safety/capability profiles, rather than a single aggregate \(\alpha\). People have made the case to me that this matters. I think we might be able to produce a more granular model by allocating compute to buckets via some kind of stick-breaking process, then taking the max hazard? Definitely out of scope for this post, but maybe worth exploring in the future.
A world with aligned AI could still suck.

9 Connections

The economics of how compute is allocated: Operationalizing the bitter lessons
The survival analysis / hazard rate formalism: Survival analysis and reliability
Point process foundations: Point processes

10 Incoming

11 References

Hooker. 2020. “The Hardware Lottery.” arXiv:2009.06489 [Cs].

Trammell, and Aschenbrenner. 2024. “Existential Risk and Growth.”