Notes on connections between computation and morality. Is pleasure a reward signal? Is a loss penalty pain? Is efficient learning update a moral imperative? A workplace health and safety matter?

Links on those themes.

Possible connection: the emergent values systems of LLMs.

1 Background

1.1 Reinforcement learning and morality

I would like to know everything that people have said about this. The review in Vishwanath, Dennis, and Slavkovik (2024) is perfunctory, but I guess it is a start?

Abel, MacGlashan, and Littman (2016):

Emerging AI systems will be making more and more decisions that impact the lives of humans in a significant way. It is essential, then, that these AI systems make decisions that take into account the desires, goals, and preferences of other people, while simultaneously learning about what those preferences are. In this work, we argue that the reinforcement-learning framework achieves the appropriate generality required to theorize about an idealized ethical artificial agent, and offers the proper foundations for grounding specific questions about ethical learning and decision making that can pro- mote further scientific investigation. We define an idealized formalism for an ethical learner, and conduct experiments on two toy ethical dilemmas, demonstrating the soundness and flexibility of our approach

2 Ethical systems as computational optimization

APXHARD, Ethical Systems as Computational Optimizations
Stenseke (2023) also emphasises the computational tractability problems of morality

3 When does negative reinforcement hurt?

Ethical Issues in Artificial Reinforcement Learning

There is a remarkable connection between artificial reinforcement-learning (RL) algorithms and the process of reward learning in animal brains. Do RL algorithms on computers pose moral problems? I think current RL computations do matter, though they’re probably less morally significant than animals, including insects, because the degree of consciousness and emotional experience seems limited in present-day RL agents. As RL becomes more sophisticated and is hooked up to other more “conscious” brain-like operations, this topic will become increasingly urgent. Given the vast numbers of RL computations that will be run in the future in industry, video games, robotics, and research, the moral stakes may be high. I encourage scientists and altruists to work toward more humane approaches to reinforcement learning.

TBC

4 Computational definitions of blame

Everitt et al. (2022) and Halpern and Kleiman-Weiner (2018) seem to be in in this domain, as noted in the causality and agency snippet.

5 Incoming

Joscha Bach, From elementary computation to ethics? (Hear also his interview)
- The disturbances and the performance of the mind are measured and controlled with a system of rewards and constraints. Because of the mind’s generality, it may find that the easiest way of regulating the disturbances that gave rise to its engagement would be to change the representation of the disturbance. In most cases, this amounts to anesthesia and will not serve the telos of the organism, so evolution has created considerable barriers to prevent write access of the mind to its governing mechanisms. These barriers are responsible for creating the mind’s identification with the rewards and self-model.
- When we are concerned about suffering, we are usually referring to disturbances that generate a strong negative reward, but cannot be resolved within the given constraints. On an individual level, disease, mishap and crisis lead to suffering. But we also have a global suffering that is caused by the universal maladaption of humans to present human society, which developed within few generations and deviates considerably from the small-tribe collective environment that we are evolutionary adapted for.[…]
Humans experience morality viscerally; a famous example is Haidt’s the Moral foundations model, but there are others.
What collective moralities are possible? I think about them as <em>moral orbits</em>.
Karl Friston and the other predictive coding theorists are IMO implicitly involved via the theory of motivation (Miller Tate 2021)
Grosse et al. (2023) is a magisterial study of LLMs which looks at how they reason from examples. Why I think this is significant to computational is that it suggests a case-based morality might be feasible for LLMs
need to find a compact statement of what Professor Javen Qinfeng Shi said in a presentation I saw:
Mind is a choice maker. Choices shape the mind
- Q learning: do what a good/kind person would do (moment to moment), learn wisdom (V function) and have faith in future and self-growth. It naturally leads to optimal long-term accumulative rewards (Bellman equation)
- Policy gradient: learn from past successes (to repeat or mimic) and mistakes (to avoid). Require complete episodes to reveal the end accumulative reward per episode.
This is the first time I have heard of policy gradient as utilitarianism versus Q learning as virtue ethics. Citation needed.

The analogy in Krening (2023) is different. Maybe Shiravand and André (2024)? Govindarajulu, Bringjsord, and Ghosh (2018) lays out the ethical systems but seems oblivious to RL.

6 References

Abel, MacGlashan, and Littman. 2016. “Reinforcement Learning as a Framework for Ethical Decision Making.” In.

Awad, Levine, Anderson, et al. 2022. “Computational Ethics.” Trends in Cognitive Sciences.

Bak, Choi, Akrami, et al. 2016. “Adaptive Optimal Training of Animal Behavior.” In Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16.

Bello, and Malle. 2023. “Computational Approaches to Morality.” In The Cambridge Handbook of Computational Cognitive Sciences. Cambridge Handbooks in Psychology.

Bennett, Niv, and Langdon. 2021. “Value-Free Reinforcement Learning: Policy Optimization as a Minimal Model of Operant Behavior.” Current Opinion in Behavioral Sciences.

Chiu, Jiang, and Choi. 2024. “DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life.”

Crook, Nugent, Rolf, et al. 2021. “Computing Morality: Synthetic Ethical Decision Making and Behaviour.” Cognitive Computation and Systems.

Deschamps, Chaput, and Matignon. 2024. “Multi-Objective Reinforcement Learning: An Ethical Perspective.” In RJCIA.

DeYoung. 2013. “The Neuromodulator of Exploration: A Unifying Theory of the Role of Dopamine in Personality.” Frontiers in Human Neuroscience.

Dezfouli, Griffiths, Ramos, et al. 2019. “Models That Learn How Humans Learn: The Case of Decision-Making and Its Disorders.” PLOS Computational Biology.

Dezfouli, Nock, and Dayan. 2020. “Adversarial Vulnerabilities of Human Decision-Making.” Proceedings of the National Academy of Sciences.

Dickinson. 1989. “The Detrimental Effects of Extrinsic Reinforcement on ‘Intrinsic Motivation’.” The Behavior Analyst.

Dong, Li, Yang, et al. 2024. “Egoism, Utilitarianism and Egalitarianism in Multi-Agent Reinforcement Learning.” Neural Networks.

Ecoffet, and Lehman. 2021. “Reinforcement Learning Under Moral Uncertainty.” In Proceedings of the 38th International Conference on Machine Learning.

Edelmann. 2022. “Values, Preferences, Meaningful Choice.”

Eisenberger, and Cameron. 1996. “Detrimental Effects of Reward: Reality or Myth?” American Psychologist.

Everitt, Ortega, Barnes, et al. 2022. “Understanding Agent Incentives Using Causal Influence Diagrams. Part I: Single Action Settings.”

Garrido-Merchán, and Lumbreras-Sancho. 2023. “From Computational Ethics to Morality: How Decision-Making Algorithms Can Help Us Understand the Emergence of Moral Principles, the Existence of an Optimal Behaviour and Our Ability to Discover It.”

Govindarajulu, Bringjsord, and Ghosh. 2018. “One Formalization of Virtue Ethics via Learning.”

Grosse, Bae, Anil, et al. 2023. “Studying Large Language Model Generalization with Influence Functions.”

Halpern, and Kleiman-Weiner. 2018. “Towards Formal Definitions of Blameworthiness, Intention, and Moral Responsibility.”

Hammond, and Belle. 2018. “Deep Tractable Probabilistic Models for Moral Responsibility.” ArXiv.

Hegde, Agarwal, and Rao. 2020. “Ethics, Prosperity, and Society: Moral Evaluation Using Virtue Ethics and Utilitarianism.” In.

Howard, and Muntean. 2016. “A Minimalist Model of the Artificial Autonomous Moral Agent (AAMA).” In.

Krening. 2023. “Q-Learning as a Model of Utilitarianism in a Human–Machine Team.” Neural Computing and Applications.

Lee, Leibo, An, et al. 2022. “Importance of prefrontal meta control in human-like reinforcement learning.” Frontiers in Computational Neuroscience.

Li, Devidze, Mustafa, et al. 2024. “Ethics in Action: Training Reinforcement Learning Agents for Moral Decision-Making In Text-Based Adventure Games.” In Proceedings of The 27th International Conference on Artificial Intelligence and Statistics.

Mazeika, Yin, Tamirisa, et al. 2025. “Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs.”

Miller Tate. 2021. “A Predictive Processing Theory of Motivation.” Synthese.

Peterson, Bourgin, Agrawal, et al. 2021. “Using Large-Scale Experiments and Machine Learning to Discover Theories of Human Decision-Making.” Science.

Plasencia. 2024. “Reinforcement Learning From Human Feedback For Ethically Robust Ai Decision-Making.”

Robertazzi, Vissani, Schillaci, et al. 2022. “Brain-Inspired Meta-Reinforcement Learning Cognitive Control in Conflictual Inhibition Decision-Making Task for Artificial Agents.” Neural Networks.

Shiravand, and André. 2024. “Human-Like Moral Decisions by Reinforcement Learning Agents.” In Proceedings of the Annual Meeting of the Cognitive Science Society.

Stenseke. 2023. “On the Computational Complexity of Ethics: Moral Tractability for Minds and Machines.”

Tennant, Hailes, and Musolesi. 2023. “Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning.” In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence.

Todorovski. 2023. “Introduction to Computational Ethics.” In Artificial Intelligence, Social Harms and Human Rights.

Vishwanath, Dennis, and Slavkovik. 2024. “Reinforcement Learning and Machine Ethics:a Systematic Review.”

Wallach, and Allen. 2008. Moral Machines: Teaching Robots Right from Wrong.

Wang, Kurth-Nelson, Kumaran, et al. 2018. “Prefrontal cortex as a meta-reinforcement learning system.” Nature Neuroscience.

Wu, and Lin. 2018. “A Low-Cost Ethics Shaping Approach for Designing Reinforcement Learning Agents.” Proceedings of the AAAI Conference on Artificial Intelligence.