Incentive alignment problems

What is your loss function?

2014-09-22 — 2025-02-02

adversarial

distributed

economics

extended self

faster pussycat

game theory

incentive mechanisms

institutions

networks

security

tail risk

Suspiciously similar content

Placeholder to discuss alignment problems in AI, economic mechanisms, and institutions.

Many things to unpack. What do we imagine alignment to be when our own goals are themselves a diverse evolutionary epiphenomenon? Does everything ultimately Goodhart? Is that the origin of Moloch?

1 With what is it ultimately possible to align?

Consider deep history of intelligence.

2 AI alignment

The knotty case of superintelligent AI in particular.

3 Incoming

Joe Edelman, Is Anything Worth Maximizing? How metrics shape markets, how we’re doing them wrong

Metrics are how an algorithm or an organisation listens to you. If you want to listen to one person, you can just sit with them and see how they’re doing. If you want to listen to a whole city — a million people — you have to use metrics and analytics

and

What would it be like, if we could actually incentivize what we want out of life? If we incentivized lives well lived.
Goal Misgeneralization: How a Tiny Change Could End Everything - YouTube

This video explores how YOU, YES YOU, are a case of misalignment with respect to evolution’s implicit optimization objective. We also show an example of goal misgeneralization in a simple AI system, and explore how deceptive alignment shares similar features and may arise in future, far more powerful AI systems.

4 References

Aguirre, Dempsey, Surden, et al. 2020. “AI Loyalty: A New Paradigm for Aligning Stakeholder Interests.” IEEE Transactions on Technology and Society.

Aktipis. 2016. “Principles of Cooperation Across Systems: From Human Sharing to Multicellularity and Cancer.” Evolutionary Applications.

Bostrom. 2014. Superintelligence: Paths, Dangers, Strategies.

Conitzer, and Oesterheld. 2023. “Foundations of Cooperative AI.” Proceedings of the AAAI Conference on Artificial Intelligence.

Critch, Dennis, and Russell. 2022. “Cooperative and Uncooperative Institution Designs: Surprises and Problems in Open-Source Game Theory.”

Daskalakis, Deckelbaum, and Tzamos. 2013. “Mechanism Design via Optimal Transport.” In.

Duque, Aghajohari, Cooijmans, et al. 2024. “Advantage Alignment Algorithms.” In.

Ecoffet, and Lehman. 2021. “Reinforcement Learning Under Moral Uncertainty.” In Proceedings of the 38th International Conference on Machine Learning.

Guha, Lawrence, Gailmard, et al. 2023. “AI Regulation Has Its Own Alignment Problem: The Technical and Institutional Feasibility of Disclosure, Registration, Licensing, and Auditing.” George Washington Law Review, Forthcoming.

Hadfield-Menell, and Hadfield. 2018. “Incomplete Contracting and AI Alignment.”

Hutson. 2022. “Taught to the Test.” Science.

Jackson. 2014. “Mechanism Theory.” SSRN Scholarly Paper ID 2542983.

Korinek, Fellow, Balwit, et al. n.d. “Direct and Social Goals for AI Systems.”

Lambrecht, and Myers. 2017. “The Dynamics of Investment, Payout and Debt.” The Review of Financial Studies.

Manheim, and Garrabrant. 2019. “Categorizing Variants of Goodhart’s Law.”

Naudé. 2022. “The Future Economics of Artificial Intelligence: Mythical Agents, a Singleton and the Dark Forest.” IZA Discussion Papers, IZA Discussion Papers,.

Ngo, Chan, and Mindermann. 2024. “The Alignment Problem from a Deep Learning Perspective.”

Nowak. 2006. “Five Rules for the Evolution of Cooperation.” Science.

Omohundro. 2008. “The Basic AI Drives.” In Proceedings of the 2008 Conference on Artificial General Intelligence 2008: Proceedings of the First AGI Conference.

Ringstrom. 2022. “Reward Is Not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning.”

Russell. 2019. Human Compatible: Artificial Intelligence and the Problem of Control.

Silver, Singh, Precup, et al. 2021. “Reward Is Enough.” Artificial Intelligence.

Taylor, Yudkowsky, LaVictoire, et al. 2020. “Alignment for Advanced Machine Learning Systems.” In Ethics of Artificial Intelligence.

Xu, and Dean. 2023. “Decision-Aid or Controller? Steering Human Decision Makers with Algorithms.”

Zhuang, and Hadfield-Menell. 2021. “Consequences of Misaligned AI.”