Quis computat?

A competing-risks model for doom and alignment as a function of compute allocation

2026-03-25 — 2026-03-25

Wherein X-risk and guaranteed alignment are modelled as competing first-arrival events whose hazard rates grow in capability compute and safety compute respectively, and the probability of doom is expressed as an integral over the compute trajectory.

AI safety
bounded compute
point processes
probability
survival analysis
when to compute

Surely this analysis has been done before in the bowels of LessWrong. I gave up searching because it was too irritating trying to disambiguate the terms “survival” and “hazard” in the technical mathematical sense that I needed, and the more colloquial sense that they are used in AI safety discourse. Feel free to point me to prior work in the comments.

Figure 1

This post is an attempt to write down, in the language of survival analysis and competing risks, a simple mathematical model of the “race” between existential catastrophe and guaranteed alignment in the realm of AI safety. The motivation is the observation, explored in the economics of cognition, that compute is the fundamental currency of intelligence — and so we should be able to express both the problem and the solution in terms of how compute is allocated. By the end of it, I would like to have a simple model that I can use, under various solutions, to imagine whether more compute is “good” or “bad” for the outcome.

This analysis could be way more user-friendly than it is if I were trying to maximize its value to the public. But I don’t have time to do that, so for now it remains a sketch of an idea I would like to discuss with a friend.

So, to the modelling: Given a trajectory of total compute and a policy for splitting it between capability and safety, what is the probability that we attain doom before we attain deliverance?

Code
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
from scipy.integrate import cumulative_trapezoid

pio.renderers.default = "plotly_mimetype+notebook_connected"
from livingthing.plotly_style import set_livingthing_style
set_livingthing_style()

# Shared palette
C_DOOM = '#c0392b'
C_SALV = '#2471a3'
C_SURV = '#27ae60'
C_RUIN = '#8e44ad'
C_NEUT = '#7f8c8d'

LAYOUT = dict(
    paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)',
    hovermode='x unified',
    legend=dict(bgcolor='rgba(255,255,255,0.5)', bordercolor='#ccc', borderwidth=1),
)

# Response-function zoo
def g_lin(x): return 0.8 * x
def h_lin(x): return 0.5 * x
def g_pess(x): return 0.5 * x**2
def h_pess(x): return 0.5 * np.sqrt(x)
def g_opt(x): return 0.5 * np.sqrt(x)
def h_opt(x): return 0.5 * x**2
def sigmoid(x, k, x0): return 5.0 / (1.0 + np.exp(-k * (x - x0)))
def g_sig(x): return sigmoid(x, 2, 3)
def h_sig(x): return sigmoid(x, 2, 7)

1 Compute trajectories

I think we should distinguish between the rate at which compute is performed and the cumulative ‘stock’ of computation that has been done.

Let \(c(t)\) denote the compute rate — the FLOP/s of AI-related computation happening in the world at time \(t \geq 0\). This is exogenous capacity: hardware, data centres, investment. We treat it as non-decreasing and right-continuous. We make no distinction between training and inference — \(c(t)\) is unstructured compute, all of it. The more of it that is running, the more things are happening; we don’t need to model the internal structure of what kinds of workloads are running for this granularity of analysis.

The cumulative compute — total FLOPs performed by time \(t\) — is

\[ \mathcal{C}(t) = \int_0^t c(u)\, du. \]

A policy lever \(\alpha(t) \in [0,1]\) — the safety fraction — splits the compute rate into two streams:

  • Capability rate \(c_c(t) = (1-\alpha(t))\, c(t)\): compute aimed at expanding what AI systems can do.
  • Safety rate \(c_s(t) = \alpha(t)\, c(t)\): compute focused on ensuring AI systems do what we want.

The cumulative stocks in each pool are then

\[ \mathcal{C}_c(t) = \int_0^t (1-\alpha(u))\, c(u)\, du, \qquad \mathcal{C}_s(t) = \int_0^t \alpha(u)\, c(u)\, du, \]

with \(\mathcal{C}_c(t) + \mathcal{C}_s(t) = \mathcal{C}(t)\) at all times.

The stock/rate distinction matters in this model because we imagine that capabilities persist. We can’t un-train a frontier model by switching off the data centre; the weights, the papers, the algorithmic insights are already in the world. Safety progress persists too — proved theorems, verified architectures, and alignment techniques don’t evaporate. So the hazard of doom should depend on the cumulative stock of capability compute \(\mathcal{C}_c(t)\), not the instantaneous rate \(c_c(t)\). The rate \(c(t)\) determines how fast the stocks grow; the stocks determine the hazard.

The function \(\alpha(\cdot)\) is the policy lever — the thing a civilisation chooses.

2 Two competing events

We model two events as the first arrivals of inhomogeneous point processes (see point processes and survival analysis):

Doom (\(T_d\))
An X-risk catastrophe is realised. This is an irreversible absorbing state.
Deliverance (\(T_s\))
Guaranteed alignment is achieved — a state after which X-risk from AI is effectively zero.

Each event has a latent arrival time — \(T_d\) and \(T_s\) are the times at which doom and deliverance would fire if nothing else intervened. But the race ends at \(T^* = \min(T_d, T_s)\): we observe whichever fires first, and from that moment both hazard rates cease to apply. Doom either happens first or not at all; deliverance either happens first or not at all; or neither fires within any relevant timeframe. When we write “\(P(\text{doom})\)” below, we always mean the outcome “doom fires first” — i.e. \(P(T_d < T_s)\) — not the marginal probability that the doom process would eventually fire in isolation.

Each event has a hazard rate (instantaneous arrival intensity given that neither event has yet occurred — i.e. while still being in limbo) that depends on the cumulative stock of compute in its respective pool:

\[ \lambda_d(t) = g\bigl(\mathcal{C}_c(t)\bigr), \qquad \lambda_s(t) = h\bigl(\mathcal{C}_s(t)\bigr), \]

where \(g, h : [0, \infty) \to [0, \infty)\) are monotonically non-decreasing functions, with \(g(0) = 0\) and \(h(0) = 0\). Monotonicity captures the assumption that the more capability compute we’ve performed overall, the higher the catastrophe hazard is per unit time; the more safety compute we’ve performed overall, the higher the alignment-breakthrough hazard is per unit time.

Note that these hazard rates depend on time through the cumulative compute stocks. The compute rate \(c(t)\) does not appear directly — it enters only by determining how fast \(\mathcal{C}_c\) and \(\mathcal{C}_s\) grow. So the compute growth rate matters: doubling \(c(t)\) does not double the hazard at time \(t\), but it does make \(\mathcal{C}_c(t)\) hit any given threshold sooner.

The assumption that doom risk is monotone in \(\mathcal{C}_c\) and safety progress is monotone in \(\mathcal{C}_s\) is agnostic about the shape of the response — linear, concave, convex, sigmoidal — which is where non-trivial disagreements in AI safety discourse might live. We return to this below.

Code
x = np.linspace(0, 10, 300)
scenarios = [
    ("Linear",              g_lin(x),  h_lin(x)),
    ("Pessimistic (g convex, h concave)", g_pess(x), h_pess(x)),
    ("Optimistic (g concave, h convex)",  g_opt(x),  h_opt(x)),
    ("Sigmoidal overhang",  g_sig(x),  h_sig(x)),
]

fig = go.Figure()
for name, gv, hv in scenarios:
    vis = name.startswith("Pessimistic")
    fig.add_trace(go.Scatter(x=x, y=gv, name="g(x)  doom", line=dict(color=C_DOOM, width=3),
                             visible=vis, hovertemplate="%{y:.2f}"))
    fig.add_trace(go.Scatter(x=x, y=hv, name="h(x)  safety", line=dict(color=C_SALV, width=3),
                             visible=vis, hovertemplate="%{y:.2f}"))

buttons = []
for i, (name, _, _) in enumerate(scenarios):
    vis = [False] * (2 * len(scenarios))
    vis[2*i] = True; vis[2*i+1] = True
    buttons.append(dict(label=name, method="update", args=[{"visible": vis}]))

fig.update_layout(
    **LAYOUT,
    updatemenus=[dict(type="buttons", direction="down", x=1.0, xanchor="left", y=1.0,
                      buttons=buttons, bgcolor="white")],
    xaxis_title="Cumulative compute 𝒞",
    yaxis_title="Hazard rate",
    height=420,
)
fig.show()
Figure 2: Response functions g (doom hazard, red) and h (safety hazard, blue) as functions of cumulative compute under four shape assumptions. Use the buttons to switch scenarios.

3 Three outcomes of the race

The probability the race is still in limbo at time \(t\) — neither event has fired — is what survival analysis calls the joint survival function:

\[ S(t) = \exp\!\left[-\Lambda_d(t) - \Lambda_s(t)\right], \]

Where \(\Lambda_d(t) = \int_0^t g(\mathcal{C}_c(u))\, du\) and \(\Lambda_s(t) = \int_0^t h(\mathcal{C}_s(u))\, du\) are the cumulative hazard functions (cf. survival analysis). Note the nested structure: the cumulative hazard is an integral over time of a function of an integral over time. The computation rate \(c(t)\) enters indirectly — it governs how fast \(\mathcal{C}_c\) and \(\mathcal{C}_s\) grow, which in turn governs how fast \(\lambda_d\) and \(\lambda_s\) increase.

The race has three mutually exclusive outcomes:

\[ P(\text{doom}) + P(\text{deliverance}) + P(\text{limbo}) = 1, \]

where:

  • \(P(\text{doom}) = \int_0^\infty \lambda_d(t)\, S(t)\, dt\) — the doom event fires first,
  • \(P(\text{deliverance}) = \int_0^\infty \lambda_s(t)\, S(t)\, dt\) — the deliverance event fires first,
  • \(P(\text{limbo}) = \lim_{t\to\infty} S(t)\) — neither event ever fires.

In our model, with monotone \(g, h\) and ever-growing stocks, \(\Lambda_d(t) + \Lambda_s(t) \to \infty\), which means \(P(\text{limbo}) = 0\): the race always resolves eventually. But “eventually” can be a very long time. Over any finite horizon, there is residual probability mass on “neither yet” — the green curve \(S(t)\) in the plots below — and this residual is a practically relevant quantity. A world where \(S(t)\) stays large at human-relevant timescales is a world where we just muddle through indefinitely, which is arguably closer to most people’s baseline expectation than either doom or deliverance.

At each instant \(t\), while we’re still in limbo, the probability that some event fires in \([t, t+dt)\) is \((\lambda_d(t) + \lambda_s(t))\, dt\), and the conditional probability that the firing event is doom rather than deliverance is

\[ \pi_d(t) = \frac{\lambda_d(t)}{\lambda_d(t) + \lambda_s(t)}. \]

So doom probability can also be written

\[ P(\text{doom}) = \int_0^\infty \pi_d(t) \bigl[\lambda_d(t) + \lambda_s(t)\bigr] S(t)\, dt, \]

which decomposes neatly into “probability of still being in limbo at \(t\)\(\times\) “probability of some event firing at \(t\)\(\times\) “probability that the firing event is doom.”

Code
alpha_race = 0.5
t = np.linspace(0, 35, 800)
c_rate = np.exp(0.1 * t)                          # compute rate c(t) in FLOP/s
Cc = cumulative_trapezoid((1 - alpha_race) * c_rate, t, initial=0)  # 𝒞_c(t)
Cs = cumulative_trapezoid(alpha_race * c_rate, t, initial=0)        # 𝒞_s(t)

# Response functions applied to cumulative stocks
ld = 0.005 * Cc**2       # superlinear doom
ls = 0.015 * Cs           # linear safety

cumhaz = cumulative_trapezoid(ld + ls, t, initial=0)
S = np.exp(-cumhaz)
doom_density = ld * S
deliv_density = ls * S
P_doom = np.trapezoid(doom_density, t)
P_deliv = np.trapezoid(deliv_density, t)
P_limbo = 1 - P_doom - P_deliv

fig = make_subplots(specs=[[{"secondary_y": True}]])
fig.add_trace(go.Scatter(x=t, y=deliv_density, name="λ_s S(t)  deliverance density",
                         fill='tozeroy', line=dict(color=C_SALV, width=1),
                         fillcolor='rgba(36,113,163,0.35)',
                         hovertemplate="t=%{x:.1f}  deliv=%{y:.4f}"),
              secondary_y=False)
fig.add_trace(go.Scatter(x=t, y=doom_density, name="λ_d S(t)  doom density",
                         fill='tozeroy', line=dict(color=C_DOOM, width=1),
                         fillcolor='rgba(192,57,43,0.35)',
                         hovertemplate="t=%{x:.1f}  doom=%{y:.4f}"),
              secondary_y=False)
fig.add_trace(go.Scatter(x=t, y=S, name="S(t)  limbo",
                         line=dict(color=C_SURV, width=3),
                         hovertemplate="t=%{x:.1f}  S=%{y:.3f}"),
              secondary_y=True)

# Annotate all three outcomes
fig.add_annotation(x=t[np.argmax(doom_density)], y=max(doom_density) * 1.15,
                   text=f"P(doom) = {P_doom:.2f}", showarrow=False,
                   font=dict(size=14, color=C_DOOM))
fig.add_annotation(x=t[np.argmax(deliv_density)], y=max(deliv_density) * 1.15,
                   text=f"P(deliverance) = {P_deliv:.2f}", showarrow=False,
                   font=dict(size=14, color=C_SALV))
fig.add_annotation(x=t[-1] * 0.85, y=0.05, yref="y2",
                   text=f"P(limbo) = {P_limbo:.2f}", showarrow=False,
                   font=dict(size=14, color=C_SURV))

fig.update_xaxes(title_text="Time t")
fig.update_yaxes(title_text="Event density", secondary_y=False)
fig.update_yaxes(title_text="S(t) — limbo", secondary_y=True, range=[0, 1.05])
fig.update_layout(**LAYOUT, height=440)
fig.show()
Figure 3: Resolution of the race with superlinear doom risk (g(𝒞) = 0.005𝒞²) and linear safety progress (h(𝒞) = 0.015𝒞). Compute rate c(t) = e^{0.1t}, α = 0.5. The green curve (right axis) is S(t), the probability of limbo — neither doom nor deliverance has yet occurred. The shaded areas (left axis) show doom density λ_d(t)S(t) (red) and deliverance density λ_s(t)S(t) (blue). Their integrals give P(doom) and P(deliverance); the residual 1 − P(doom) − P(deliverance) is P(limbo) over this time window.

4 The constant-hazard-ratio case

Consider a sanity-check, ultra-simple case. If the ratio \(\pi_d(t) \equiv \pi_d\) is constant over time, then \(P(\text{doom}) = \pi_d\) holds regardless of the compute trajectory — this is a standard competing-risks identity. \(\pi_d\) is constant whenever \(g\) and \(h\) are both linear and \(\alpha\) is constant. If \(g(x) = ax\) and \(h(x) = bx\), then \(\mathcal{C}_c(t) = (1-\alpha)\mathcal{C}(t)\) and \(\mathcal{C}_s(t) = \alpha\mathcal{C}(t)\).

\[ \pi_d = \frac{a(1-\alpha)\,\mathcal{C}(t)}{a(1-\alpha)\,\mathcal{C}(t) + b\alpha\,\mathcal{C}(t)} = \frac{a(1-\alpha)}{a(1-\alpha) + b\alpha}, \]

and the \(\mathcal{C}(t)\) cancels. The probability of a doom outcome \(P(\text{doom})\) depends only on \(\alpha\) and the ratio \(a/b\), not on the compute rate \(c(t)\) or how fast it grows. This is the regime in which the speed of progress doesn’t matter — the race is purely about allocation.

It’s also not very plausible. Let us get complicated.

5 Interesting response curves

With non-linear response functions, this invariance breaks down. \(P(\text{doom})\) now depends on the full trajectory \(c(t)\), because the time spent at each cumulative compute level determines how much hazard accumulates there. In the plot below, we compute \(P(\text{doom})\) by integration over a trajectory with exponential compute growth \(c(t) = e^{rt}\), for two different growth rates.

Code
alphas = np.linspace(0.01, 0.99, 80)
t_sim = np.linspace(0, 60, 2000)

def compute_pdoom(g_fn, h_fn, alpha_val, r):
    """Compute P(doom) by integration over a trajectory."""
    c_rate = np.exp(r * t_sim)
    Cc = cumulative_trapezoid((1 - alpha_val) * c_rate, t_sim, initial=0)
    Cs = cumulative_trapezoid(alpha_val * c_rate, t_sim, initial=0)
    ld = g_fn(Cc)
    ls = h_fn(Cs)
    cumhaz = cumulative_trapezoid(ld + ls, t_sim, initial=0)
    S = np.exp(-cumhaz)
    doom_density = ld * S
    return np.trapezoid(doom_density, t_sim)

curves = [
    ("Linear",      g_lin,  h_lin,  C_NEUT),
    ("Pessimistic",  g_pess, h_pess, C_DOOM),
    ("Optimistic",   g_opt,  h_opt,  C_SALV),
    ("Sigmoidal",    g_sig,  h_sig,  C_RUIN),
]

fig = go.Figure()
for name, gf, hf, col in curves:
    for r, dash, suffix in [(0.3, "solid", "fast r=0.3"), (0.05, "dash", "slow r=0.05")]:
        pr = np.array([compute_pdoom(gf, hf, a, r) for a in alphas])
        fig.add_trace(go.Scatter(x=alphas, y=pr, name=f"{name} ({suffix})",
                                 line=dict(color=col, width=2.5, dash=dash),
                                 hovertemplate="α=%{x:.2f}  P(doom)=%{y:.3f}"))

fig.add_hline(y=0.5, line_dash="dot", line_color="#bbb", annotation_text="P(doom) = ½",
              annotation_position="bottom right")

fig.update_layout(
    **LAYOUT,
    xaxis_title="Safety fraction α",
    yaxis_title="P(doom)",
    height=480,
)
fig.show()
Figure 4: Probability of doom as a function of safety fraction α, computed by integration over an exponential compute trajectory. Solid lines: fast growth (r = 0.3); dashed: slow growth (r = 0.05). For the linear case (grey), the curves coincide — growth rate doesn’t matter. For non-linear responses, faster growth shifts the doom probability because it changes how much time the system spends in different hazard regimes.

Interesting cases arise when \(g\) and \(h\) have different shapes.

I think I made an arithmetic error here; the pessimistic curve should be worse than the optimistic one, but the plot above shows the opposite. Five internet points if you can show me the error.

Some scenarios to consider:

Convex \(g\), concave \(h\). The doom hazard rate is superlinear in cumulative capability compute; the deliverance hazard rate has diminishing returns in cumulative safety compute. This is the pessimistic scenario: increasing \(\alpha\) helps at first, but as \(\mathcal{C}(t) \to \infty\) the doom hazard rate eventually dominates regardless of allocation, so \(\pi_d(t) \to 1\) and the doom outcome becomes near-certain. In this regime, slowing down \(c(t)\) itself — reducing the rate at which compute accumulates — is the only robust strategy.

Concave \(g\), convex \(h\). The doom hazard rate saturates; the deliverance hazard rate is superlinear once we invest enough. This is the optimistic scenario: there exists a threshold of cumulative safety compute above which the deliverance outcome is almost certain. This is the implicit model behind “we just need to invest enough in alignment.”

Sigmoidal \(g\) and \(h\) with different inflection points. Both processes have thresholds, but they may not be in the same place. If the safety threshold \(\mathcal{C}_s^*\) is much larger than the doom threshold \(\mathcal{C}_c^*\), there is a dangerous window where cumulative capability compute is in the steep part of \(g\) while cumulative safety compute is still in the flat part of \(h\). This is arguably the scenario that most alignment researchers are worried about: a capability overhang.

\(g\) depends on \(\mathcal{C}_s\) too. Safety research itself requires capable AI systems. If \(\mathcal{C}_s\) contributes to both \(\lambda_s\) and \(\lambda_d\) (because safety compute also advances capabilities as a side effect), the model needs modification. One could write \(\lambda_d(t) = g(\mathcal{C}(t))\) — making doom risk a function of total cumulative compute — while \(\lambda_s(t) = h(\mathcal{C}_s(t))\). This makes the allocation problem strictly harder, because safety investment has a capability externality.

Let’s plot the first of those, then return to the last one.

Code
alpha = 0.3
t = np.linspace(0, 25, 800)
c_rate = np.exp(0.3 * t)                               # compute rate
Cc = cumulative_trapezoid((1 - alpha) * c_rate, t, initial=0)  # 𝒞_c(t)
Cs = cumulative_trapezoid(alpha * c_rate, t, initial=0)        # 𝒞_s(t)

# Sigmoidal response to cumulative stocks
ld = sigmoid(Cc, 1.5, 4)
ls = sigmoid(Cs, 1.5, 10)

pi_d = ld / (ld + ls + 1e-12)

# Window boundaries (where doom is >0.5 of max but safety is <0.5 of max)
t_doom_on = t[np.searchsorted(ld, 0.5 * 5)]
t_salv_on = t[np.searchsorted(ls, 0.5 * 5)]

fig = make_subplots(rows=2, cols=1, shared_xaxes=True,
                    subplot_titles=["Hazard rates", "Conditional doom probability π_d(t)"],
                    vertical_spacing=0.12)

fig.add_trace(go.Scatter(x=t, y=ld, name="λ_d (doom)", line=dict(color=C_DOOM, width=2.5),
                          hovertemplate="%{y:.2f}"), row=1, col=1)
fig.add_trace(go.Scatter(x=t, y=ls, name="λ_s (safety)", line=dict(color=C_SALV, width=2.5),
                          hovertemplate="%{y:.2f}"), row=1, col=1)
fig.add_trace(go.Scatter(x=t, y=pi_d, name="π_d(t)", line=dict(color=C_RUIN, width=2.5),
                          hovertemplate="%{y:.3f}"), row=2, col=1)

fig.add_vrect(x0=t_doom_on, x1=t_salv_on, fillcolor="rgba(192,57,43,0.10)",
              line=dict(color=C_DOOM, width=1, dash="dash"), row=1, col=1)
fig.add_vrect(x0=t_doom_on, x1=t_salv_on, fillcolor="rgba(192,57,43,0.10)",
              line=dict(color=C_DOOM, width=1, dash="dash"), row=2, col=1)

fig.add_annotation(x=(t_doom_on + t_salv_on)/2, y=2.5, text="Dangerous<br>window",
                   showarrow=False, font=dict(size=13, color=C_DOOM), row=1, col=1)

fig.update_xaxes(title_text="Time t", row=2, col=1)
fig.update_yaxes(title_text="Hazard rate", row=1, col=1)
fig.update_yaxes(title_text="π_d(t)", range=[0, 1.05], row=2, col=1)
fig.update_layout(**LAYOUT, height=560)
fig.show()
Figure 5: The capability overhang. Sigmoidal g and h with the doom threshold reached before the safety threshold create a dangerous window (shaded) in which π_d ≈ 1. Compute rate c(t) = e^{0.3t}, α = 0.3.

6 What if safety compute is also capabilities compute?

In our base model, the doom hazard depends only on capabilities compute \(\mathcal{C}_c\) and the deliverance hazard only on safety compute \(\mathcal{C}_s\). But this separation is optimistic. Safety research requires running large models, probing their behaviour, red-teaming, training oversight systems — all of which advance capabilities as a side effect. In the limit, perhaps all compute advances capabilities regardless of intent.

The externality model modifies the hazard rates to reflect this:

\[ \lambda_d(t) = g\bigl(\mathcal{C}(t)\bigr), \qquad \lambda_s(t) = h\bigl(\mathcal{C}_s(t)\bigr), \]

where \(\mathcal{C}(t) = \mathcal{C}_c(t) + \mathcal{C}_s(t)\) is total cumulative compute. The doom hazard rate now depends on everything — capability compute and safety compute alike — while the deliverance hazard rate still depends only on the safety fraction.

In this model, the allocation \(\alpha\) cannot reduce the doom hazard rate at all. Since \(\mathcal{C}(t) = \int_0^t c(u)\,du\) holds regardless of how the compute is split, the doom hazard is set entirely by the exogenous trajectory \(c(t)\). The only thing \(\alpha\) can do is race the deliverance hazard rate up to compete with it.

The conditional doom probability becomes

\[ \pi_d(t) = \frac{g(\mathcal{C}(t))}{g(\mathcal{C}(t)) + h(\mathcal{C}_s(t))}. \]

In the linear case (\(g(x) = ax\), \(h(x) = bx\), with constant \(\alpha\)), this simplifies to

\[ \pi_d = \frac{a\,\mathcal{C}(t)}{a\,\mathcal{C}(t) + b\,\alpha\,\mathcal{C}(t)} = \frac{a}{a + b\alpha}. \]

Compare this with the separable model’s \(\pi_d = a(1-\alpha)/(a(1-\alpha) + b\alpha)\). The externality model is always worse: the numerator is \(a\) rather than \(a(1-\alpha)\), because diverting compute to safety no longer reduces the doom hazard — it only increases the deliverance hazard. With \(a = b\), the separable model gives \(P(\text{doom}) = 0.5\) at \(\alpha = 0.5\); the externality model gives \(P(\text{doom}) = 0.5\) at \(\alpha = 1\) — we need all compute on safety just to get even odds.

Even at \(\alpha = 1\), the doom hazard hasn’t gone away. It’s \(g(\mathcal{C}_s(t))\) — because all that safety compute is also capability compute. We’re in a race where every step toward deliverance also drags doom closer.

Code
alphas_ext = np.linspace(0.01, 0.99, 80)
t_ext = np.linspace(0, 60, 2000)

def compute_pdoom_externality(g_fn, h_fn, alpha_val, r):
    """P(doom) under the externality model: g depends on total compute."""
    c_rate = np.exp(r * t_ext)
    C_total = cumulative_trapezoid(c_rate, t_ext, initial=0)
    Cs = cumulative_trapezoid(alpha_val * c_rate, t_ext, initial=0)
    ld = g_fn(C_total)       # doom depends on TOTAL compute
    ls = h_fn(Cs)             # deliverance depends on safety compute only
    cumhaz = cumulative_trapezoid(ld + ls, t_ext, initial=0)
    S = np.exp(-cumhaz)
    return np.trapezoid(ld * S, t_ext)

def compute_pdoom_separable(g_fn, h_fn, alpha_val, r):
    """P(doom) under the separable model: g depends on capability compute only."""
    c_rate = np.exp(r * t_ext)
    Cc = cumulative_trapezoid((1 - alpha_val) * c_rate, t_ext, initial=0)
    Cs = cumulative_trapezoid(alpha_val * c_rate, t_ext, initial=0)
    ld = g_fn(Cc)
    ls = h_fn(Cs)
    cumhaz = cumulative_trapezoid(ld + ls, t_ext, initial=0)
    S = np.exp(-cumhaz)
    return np.trapezoid(ld * S, t_ext)

fig = go.Figure()
for g_fn, h_fn, name, col in [
    (g_lin, h_lin, "Linear", C_NEUT),
    (g_pess, h_pess, "Pessimistic", C_DOOM),
    (g_opt, h_opt, "Optimistic", C_SALV),
]:
    r = 0.1
    pd_sep = np.array([compute_pdoom_separable(g_fn, h_fn, a, r) for a in alphas_ext])
    pd_ext = np.array([compute_pdoom_externality(g_fn, h_fn, a, r) for a in alphas_ext])
    fig.add_trace(go.Scatter(x=alphas_ext, y=pd_sep, name=f"{name} (separable)",
                             line=dict(color=col, width=2, dash="dash"),
                             hovertemplate="α=%{x:.2f}  P(doom)=%{y:.3f}"))
    fig.add_trace(go.Scatter(x=alphas_ext, y=pd_ext, name=f"{name} (externality)",
                             line=dict(color=col, width=3),
                             hovertemplate="α=%{x:.2f}  P(doom)=%{y:.3f}"))

fig.add_hline(y=0.5, line_dash="dot", line_color="#bbb", annotation_text="P(doom) = ½",
              annotation_position="bottom right")
fig.update_layout(
    **LAYOUT,
    xaxis_title="Safety fraction α",
    yaxis_title="P(doom)",
    height=480,
)
fig.show()
Figure 6: P(doom) vs safety fraction α, comparing the separable model (dashed) with the externality model (solid) where doom risk depends on total compute. The gap between them is the cost of the capability externality. Exponential compute rate c(t) = e^{0.1t}.

The gap between the dashed (separable) and solid (externality) curves is the cost of the capability externality — the additional doom probability we bear because safety research also advances capabilities. In the pessimistic (convex \(g\)) regime, the externality model is especially punishing: the superlinear doom hazard is driven by total compute, which \(\alpha\) cannot touch.

7 What pauses do

This formulation makes the effect of a “compute pause” precise. Suppose we’re in the separable model: at time \(t_0\) the compute rate drops to near zero: \(c(t) \approx 0\) for \(t \in [t_0, t_0 + \Delta]\). During the pause, the cumulative stocks \(\mathcal{C}_c\) and \(\mathcal{C}_s\) are frozen — no new capabilities, no new safety progress. But the hazard rates \(\lambda_d(t_0) = g(\mathcal{C}_c(t_0))\) and \(\lambda_s(t_0) = h(\mathcal{C}_s(t_0))\) are also frozen at their current values, and the cumulative hazard \(\Lambda_d\) continues to grow at that frozen rate. Doom can still arrive during a pause — we’ve stopped accumulating new risk, but we haven’t reduced the hazard we’ve already built up.

A pause buys time in a specific sense: it extends the interval over which \(\pi_d\) stays at its current value, rather than letting cumulative compute push us into a worse regime. If we’re in the flat part of \(g\) (low hazard), pausing wastes the opportunity to accumulate safety compute. If we’re in the steep part of \(g\) (high hazard), pausing prevents things from getting worse while the current hazard ticks away — useful only if we spend the pause changing \(\alpha\) or \(g\) itself (via policy, regulation, or new alignment techniques).

In the linear case (\(\pi_d\) constant), pausing is useless: the conditional doom probability \(\pi_d\) is the same before, during, and after the pause. In the convex-\(g\) case, pausing in the steep region is the only way to avoid the superlinear runaway.

The “externality” variant also changes the calculus of pauses. In the separable model, a pause freezes both hazard rates. In the externality model, a pause still freezes both — but the difference is what happens after the pause. Resuming compute at any \(\alpha\) feeds the doom hazard at the same rate, because all compute is capability compute. The only lever is \(\alpha\).

8 What this doesn’t model

This framework is minimal and stylized. Some important things it ignores:

  1. Discrete actors. There is no game theory here — just a single planner choosing \(\alpha\). In practice, the allocation is the outcome of many actors with misaligned incentives.

  2. We achieve alignment but it is expensive so we don’t use it. The deliverance event is modelled as a single point arrival — a moment at which we get guaranteed alignment. But in practice, we might have a breakthrough that gives us the option of guaranteed alignment, but it is so expensive to implement that we don’t actually deploy it.

  3. Partial doom, incremental deliverance. Both events are modelled as discrete point arrivals — single moments at which the state transitions irreversibly. This is reasonable for doom (a single catastrophe), but strange for deliverance. Real alignment progress is incremental: better interpretability, verified properties, scalable oversight, each partially reducing risk. A more plausible model would replace the deliverance point process with one whose arrivals down-modulate the doom hazard rate — each safety milestone reduces \(g\) rather than ending the race outright. We could also do incremental doom — each catastrophe raises the baseline risk, although that feels less natural; if we are worried about bad-but-not-doom events, we would probably move into some continuous badness model index, like “dollar value of harm” or “number of lives lost” rather than a binary doom/deliverance outcome.

  4. The doom hazard might not be monotone in compute. If AI systems can be made robustly safe at high capability levels, the hazard might eventually decrease. This would require \(g\) to be non-monotone, which is a qualitatively different model.

  5. Correlation. Doom and deliverance might not be conditionally independent given the compute trajectory. Alignment breakthroughs might come from the same capability advances that increase risk. Or, the opposite: the compute that is used for safety might be the same compute that is used for capabilities, so \(\mathcal{C}_s\) and \(\mathcal{C}_c\) aren’t really separate stocks.

  6. Optimal control. We haven’t solved for the optimal \(\alpha(\cdot)\). This is a dynamic optimal control problem, because \(\alpha(t)\) affects cumulative stocks \(\mathcal{C}_c(u)\) and \(\mathcal{C}_s(u)\) at all future times \(u > t\). That sounds fun, but not worth investigating because even if the model were true we wouldn’t know the response functions well enough to solve it, and even if we knew how to solve it, I cannot imagine us coordinating to implement that solution.

  7. Granular allocation of compute to many different teams or ideas with different safety/capability profiles, rather than a single aggregate \(\alpha\). People have made the case to me that this matters. I think we might be able to produce a more granular model by allocating compute to buckets via some kind of stick-breaking process, then taking the max hazard? Definitely out of scope for this post, but maybe worth exploring in the future.

  8. A world with aligned AI could still suck.

9 Connections

10 References