Incentive alignment problems

What is your loss function?



Placeholder to discuss alignment problems in AI, economic mechanisms and institutions.

Many things to unpack here. What do we imagine alignment to, when our own goals are themselves a diverse evolutionary epiphenomenon? Does everything ultimately Goodhart? Is that the origin of Moloch

Incoming

References

Aktipis, Athena. 2016. β€œPrinciples of Cooperation Across Systems: From Human Sharing to Multicellularity and Cancer.” Evolutionary Applications 9 (1): 17–36.
Bostrom, Nick. 2014. Superintelligence: Paths, Dangers, Strategies. Oxford, New York: Oxford University Press.
Daskalakis, Constantinos, Alan Deckelbaum, and Christos Tzamos. 2013. β€œMechanism Design via Optimal Transport.” In, 269. ACM Press.
Ecoffet, Adrien, and Joel Lehman. 2021. β€œReinforcement Learning Under Moral Uncertainty.” arXiv.
Hutson, Matthew. 2022. β€œTaught to the Test.” Science 376 (6593): 570–73.
Jackson, Matthew O. 2014. β€œMechanism Theory.” SSRN Scholarly Paper ID 2542983. Rochester, NY: Social Science Research Network.
Manheim, David, and Scott Garrabrant. 2019. β€œCategorizing Variants of Goodhart’s Law.” arXiv.
Nowak, Martin A. 2006. β€œFive Rules for the Evolution of Cooperation.” Science 314 (5805): 1560–63.
Omohundro, Stephen M. 2008. β€œThe Basic AI Drives.” In Proceedings of the 2008 Conference on Artificial General Intelligence 2008: Proceedings of the First AGI Conference, 483–92. NLD: IOS Press.
Ringstrom, Thomas J. 2022. β€œReward Is Not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning.” arXiv.
Russell, Stuart. 2019. Human Compatible: Artificial Intelligence and the Problem of Control. Penguin Books.
Silver, David, Satinder Singh, Doina Precup, and Richard S. Sutton. 2021. β€œReward Is Enough.” Artificial Intelligence 299 (October): 103535.
Taylor, Jessica, Eliezer Yudkowsky, Patrick LaVictoire, and Andrew Critch. 2020. β€œAlignment for Advanced Machine Learning Systems.” In Ethics of Artificial Intelligence, by Jessica Taylor, Eliezer Yudkowsky, Patrick LaVictoire, and Andrew Critch, 342–82. Oxford University Press.
Xu, Ruqing, and Sarah Dean. 2023. β€œDecision-Aid or Controller? Steering Human Decision Makers with Algorithms.” arXiv.
Zhuang, Simon, and Dylan Hadfield-Menell. 2021. β€œConsequences of Misaligned AI.” arXiv.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.