AI Safety

Getting ready for the grown-ups to arrive

October 31, 2024 — December 19, 2024

adversarial
catastrophe
economics
faster pussycat
innovation
language
machine learning
mind
neural nets
NLP
tail risk
security
technology
Figure 1

Forked from superintelligence because the risk mitigation strategies are a field in themselves. Or rather, several distinct fields, which I need to map out in this notebook.

1 X-risk

x-risk is a term used in, e.g. the rationalist community to discuss risks of a possible AI explosion.

FWIW: I personally think that (various kinds of) AI catastropic risk are plausible and serious enough to worry about, even if they are not the most likely option, because even if they are moderately unlikely they are very high impact. In decision theory terms, the expected value of the risk is high. If the possibility is that everyone dies, then we should be worried about it, even if it is only a 1% chance.

This kind of thing is empirically difficult for people to reason about..

2 Background

3 X-risk risk

There are people who think that focusing on x-risk is itself a risky distraction from more pressing problems, especially accelerationists.

e.g. what if we do not solve the climate crisis because we put effort into the AI risks instead? Or so much effort that it slowed down the AI that could have saved us? Or so much effort that we got distracted from other more pressing risks?

Example: Superintelligence: The Idea That Eats Smart People1 See also the currently-viral school of X-risk-risk-critique that classifies as a tribal marker of TESCREALism

AFAICT, the distinctions here are mostly sociological? The activities to manage x-risk are not necessarily in conflict with other activities to manage other risks. Moreover, getting the human species ready to deal with catastrophes in general seems like a feasible intermediate goal. TBC.

3.1 Most-important century model

4 Theoretical tools

4.1 Courses

4.2 SLT

Singular learning theory has been pitched to me as a tool with applications to AI safety.

4.3 Sparse AE

See Sparse Autoencoders for explanation have had a moment.

4.4 Algorithmic Game Theory

Sounds relevant.

4.5 Aligning AI

Let us consider general alignment, because I have little AI-specific to say yet.

5 In Australia

See AI Safety in Australia.

6 Incoming

7 References

Bengio. 2024. International Scientific Report on the Safety of Advanced AI - Interim Report.”
Bostrom. 2014. Superintelligence: Paths, Dangers, Strategies.
Ecoffet, and Lehman. 2021. Reinforcement Learning Under Moral Uncertainty.”
Grace, Stewart, Sandkühler, et al. 2024. Thousands of AI Authors on the Future of AI.”
Manheim, and Garrabrant. 2019. Categorizing Variants of Goodhart’s Law.”
Nathan, and Hyams. 2021. Global Policymakers and Catastrophic Risk.” Policy Sciences.
Ngo, Chan, and Mindermann. 2024. The Alignment Problem from a Deep Learning Perspective.”
Omohundro. 2008. The Basic AI Drives.” In Proceedings of the 2008 Conference on Artificial General Intelligence 2008: Proceedings of the First AGI Conference.
Russell. 2019. Human Compatible: Artificial Intelligence and the Problem of Control.
Sastry, Heim, Belfield, et al. n.d. “Computing Power and the Governance of Artificial Intelligence.”
Scott. 2022. I Do Not Think It Means What You Think It Means: Artificial Intelligence, Cognitive Work & Scale.” American Academy of Arts & Sciences.
Taylor, Yudkowsky, LaVictoire, et al. 2020. Alignment for Advanced Machine Learning Systems.” In Ethics of Artificial Intelligence.
Weidinger, Uesato, Rauh, et al. 2022. Taxonomy of Risks Posed by Language Models.” In 2022 ACM Conference on Fairness, Accountability, and Transparency.
Wong, and Bartlett. 2022. Asymptotic Burnout and Homeostatic Awakening: A Possible Solution to the Fermi Paradox? Journal of The Royal Society Interface.
Zhuang, and Hadfield-Menell. 2021. Consequences of Misaligned AI.”

Footnotes

  1. although I thought that effective altruism meta criticism was the idea that ate smart people.↩︎