AI Safety
Getting ready for the grown-ups to arrive
October 31, 2024 — December 19, 2024
Forked from superintelligence because the risk mitigation strategies are a field in themselves. Or rather, several distinct fields, which I need to map out in this notebook.
1 X-risk
x-risk is a term used in, e.g. the rationalist community to discuss risks of a possible AI explosion.
FWIW: I personally think that (various kinds of) AI catastropic risk are plausible and serious enough to worry about, even if they are not the most likely option, because even if they are moderately unlikely they are very high impact. In decision theory terms, the expected value of the risk is high. If the possibility is that everyone dies, then we should be worried about it, even if it is only a 1% chance.
This kind of thing is empirically difficult for people to reason about..
2 Background
3 X-risk risk
There are people who think that focusing on x-risk is itself a risky distraction from more pressing problems, especially accelerationists.
e.g. what if we do not solve the climate crisis because we put effort into the AI risks instead? Or so much effort that it slowed down the AI that could have saved us? Or so much effort that we got distracted from other more pressing risks?
Example: Superintelligence: The Idea That Eats Smart People1 See also the currently-viral school of X-risk-risk-critique that classifies as a tribal marker of TESCREALism
AFAICT, the distinctions here are mostly sociological? The activities to manage x-risk are not necessarily in conflict with other activities to manage other risks. Moreover, getting the human species ready to deal with catastrophes in general seems like a feasible intermediate goal. TBC.
3.1 Most-important century model
- Holden Karnofsky, The “most important century” blog post series
- Robert Wiblin’s analysis: This could be the most important century
4 Theoretical tools
4.1 Courses
4.2 SLT
Singular learning theory has been pitched to me as a tool with applications to AI safety.
4.3 Sparse AE
See Sparse Autoencoders for explanation have had a moment.
4.4 Algorithmic Game Theory
4.5 Aligning AI
Let us consider general alignment, because I have little AI-specific to say yet.
5 In Australia
6 Incoming
Writing Doom – Award-Winning Short Film on Superintelligence (2024) - YouTube
AiSafety.com’s landscape map: https://aisafety.world/
Wong and Bartlett (2022)
we hypothesize that once a planetary civilization transitions into a state that can be described as one virtually connected global city, it will face an ‘asymptotic burnout’, an ultimate crisis where the singularity-interval time scale becomes smaller than the.env time scale of innovation. If a civilization develops the capability to understand its own trajectory, it will have a window of time to affect a fundamental change to prioritize long-term homeostasis and well-being over unyielding growth—a consciously induced trajectory change or ‘homeostatic awakening’. We propose a new resolution to the Fermi paradox: civilizations either collapse from burnout or redirect themselves to prioritising homeostasis, a state where cosmic expansion is no longer a goal, making them difficult to detect remotely.
Ten Hard Problems in and around AI
We finally published our big 90-page intro to AI. Its likely effects, from ten perspectives, ten camps. The whole gamut: ML, scientific applications, social applications, access, safety and alignment, economics, AI ethics, governance, and classical philosophy of life.
The follow-on 2024 Survey of 2,778 AI authors: six parts in pictures
Douglas Hofstadter changes his mind on Deep Learning & AI risk
François Chollet, The implausibility of intelligence explosion
Stuart Russell on Making Artificial Intelligence Compatible with Humans, an interview on various themes in his book (Russell 2019)
Attempted Gears Analysis of AGI Intervention Discussion With Eliezer
Kevin Scott argues for trying to find a unifying notion of what knowledge work is to unify what humans and machines can do (Scott 2022).
Frontier AI systems have surpassed the self-replicating red line — EA Forum
7 References
Footnotes
although I thought that effective altruism meta criticism was the idea that ate smart people.↩︎