Moral calculus

Pareto optimality, utilitarianism, and murderbots. Staple of science fiction since the first robot, and probably since all the holy books of all the religions. cf Golems, contracts with devils. This has all become much more legible and quantifiable now that the golems are weaponised 3d-printable downloads. That said, I am sympathetic, as an ML researcher, to the idea that we are pretty far from needing machines to solve trolley problems for us at the moment. Or at least, the ones that seem reasonable in the short term are more decision theoretic — given my imperfect understanding of the world, how sure am I that I, a robot, am not killing my owner by doing my task badly. Weighing up multiple lives and potentialities does not seem on the short-term cards, except perhaps in a fairness-in-expectation context.

Regardless, this notebook is for trolley problems in the age of machine agency, war drones and smart cars. (Also, what is “agency” anyway?) Hell, even if we can design robots to follow human ethics, do we want to? Do instinctual human ethics have an especially good track record? What are the universals specifically? Insert here: link to an AI alignment research notebook.

<iframe width="560" height="315" src="" frameborder="0" allowfullscreen></iframe>

For machines

For humans as cogs in the machine

Try moral philosophy.

Whit Taylor asks a deep question.

Infinitesimal trolley problems

Something I would like to look into: What about trolley problems that exist as branching decision trees, or even a continuous limit of constantly branching stochastic trees? What happens to morality in the continuous limit?


Ecoffet, Adrien, and Joel Lehman. 2021. Reinforcement Learning Under Moral Uncertainty.” arXiv.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.