Morality and computational constraints

It is as if we knew what we were doing

2023-10-02 — 2025-11-03

Wherein computational constraints and moral theory are examined, and the role of reinforcement‑learning reward in signaling pain and workplace health‑and‑safety risks is considered.

adaptive
adversarial
AI safety
bounded compute
collective knowledge
cooperation
culture
economics
ethics
evolution
extended self
gene
incentive mechanisms
institutions
learning
mind
networks
neuron
rhetoric
snarks
social graph
sociology
utility
wonk
Figure 1

Notes on connections between computation and morality. Is pleasure a reward signal? Is a loss penalty the same as pain? Is an efficient learning update a moral imperative? Is it a workplace health and safety matter?

Links on those themes.

Possible connection: the emergent value systems of LLMs.

1 Background

1.1 Reinforcement learning and morality

I’d like to know everything people have said about it. The review by Vishwanath, Dennis, and Slavkovik (2024) is perfunctory but a start.

See Abel, MacGlashan, and Littman (2016).

Emerging AI systems will be making more and more decisions that impact the lives of humans in a significant way. It is essential, then, that these AI systems make decisions that take into account the desires, goals, and preferences of other people, while simultaneously learning about what those preferences are. In this work, we argue that the reinforcement-learning framework achieves the appropriate generality required to theorize about an idealized ethical artificial agent, and offers the proper foundations for grounding specific questions about ethical learning and decision making that can pro- mote further scientific investigation. We define an idealized formalism for an ethical learner, and conduct experiments on two toy ethical dilemmas, demonstrating the soundness and flexibility of our approach

2 Ethical systems as computational optimization

3 When does negative reinforcement hurt?

  • Ethical Issues in Artificial Reinforcement Learning

    There is a remarkable connection between artificial reinforcement-learning (RL) algorithms and the process of reward learning in animal brains. Do RL algorithms on computers pose moral problems? I think current RL computations do matter, though they’re probably less morally significant than animals, including insects, because the degree of consciousness and emotional experience seems limited in present-day RL agents. As RL becomes more sophisticated and is hooked up to other more “conscious” brain-like operations, this topic will become increasingly urgent. Given the vast numbers of RL computations that will be run in the future in industry, video games, robotics, and research, the moral stakes may be high. I encourage scientists and altruists to work toward more humane approaches to reinforcement learning.

TBC

4 Computational definitions of blame

These papers — Everitt et al. (2022) and Halpern and Kleiman-Weiner (2018) — fall in this area; see the causality and agency snippet.

5 Incoming

  • Ethics and the Complexity of Models

    The difference between the three views is the complexity of the models they rely on.

    Deontology has the simplest model. Everything is either right or wrong according to a rule, or set of rules. Depending on the deontological theory we’re dealing with, those rules might take only a few words.[…]

    Consequentialism implies a much more complex model. It initially seems deceptively simple: just maximize something you care about, such as well-being or happiness. But to be able to maximize that, you need to know a lot about the world. Everything causes a bunch of consequences, some of them predictable and some not, not to mention that the consequences have consequences of their own, and so on. This can be much better than using simple rules, since it allows you to consider special cases. On the other hand, true, full consequentialism — the kind that doesn’t reduce to a deontological heuristic — essentially requires omniscience.[…]

    Put this way, deontology and consequentialism contrast in interesting fashion. On one side, the simplistic, dumb, irrational view where you just have to follow the rules; on the other, the view that embraces the complexity of the world, and says we can use intelligence and math to become more ethical. Or — on one side, the wise, distilled morals from past generations, easy to teach and follow; on the other, the hubristic, dangerous tendency of those who believe they can calculate and control everything. Depending on your sensibilities, you can decide which of deontology and consequentialism is the thesis and which is the antithesis.

    And what about virtue ethics? I like to view it as the synthesis: it’s the view that accepts the complexity and intractability of the world, yet doesn’t try to condense it into a simple model. Instead, it relies on existing models.

    To be virtuous is to act like a virtuous person.

  • Joscha Bach, From elementary computation to ethics? — Also check out his interview.

    • The disturbances and the performance of the mind are measured and controlled with a system of rewards and constraints. Because of the mind’s generality, it may find that the easiest way of regulating the disturbances that gave rise to its engagement would be to change the representation of the disturbance. In most cases, this amounts to anesthesia and will not serve the telos of the organism, so evolution has created considerable barriers to prevent write access of the mind to its governing mechanisms. These barriers are responsible for creating the mind’s identification with the rewards and self-model.

    • When we are concerned about suffering, we are usually referring to disturbances that generate a strong negative reward, but cannot be resolved within the given constraints. On an individual level, disease, mishap and crisis lead to suffering. But we also have a global suffering that is caused by the universal maladaption of humans to present human society, which developed within few generations and deviates considerably from the small-tribe collective environment that we are evolutionary adapted for.[…]

  • We experience morality viscerally. A famous example is Haidt’s Moral foundations model; there are others.

  • What collective moralities are possible? I think of them as <em>moral orbits</em>.

  • In my view, Karl Friston and other predictive coding theorists are implicated via the theory of motivation (Miller Tate 2021).

  • Grosse et al. (2023) is a magisterial study of LLMs that examines how they reason from examples. I think this matters for computational work: it suggests a case-based morality might be feasible for LLMs.

  • I need to find a compact statement of what Professor Javen Qinfeng Shi said in a presentation I saw:

    Mind is a choice maker. Choices shape the mind

    • Q learning: do what a good/kind person would do (moment to moment), learn wisdom (V function) and have faith in future and self-growth. It naturally leads to optimal long-term accumulative rewards (Bellman equation)
    • Policy gradient: learn from past successes (to repeat or mimic) and mistakes (to avoid). Require complete episodes to reveal the end accumulative reward per episode.

    This is the first time I’ve heard policy gradient described as utilitarianism and Q-learning as virtue ethics. Citation needed.

    The analogy in Krening (2023) frames the mapping differently; maybe Shiravand and André (2024) covers this? Govindarajulu, Bringjsord, and Ghosh (2018) lays out the ethical systems but doesn’t mention reinforcement learning.

6 References

Abel, MacGlashan, and Littman. 2016. “Reinforcement Learning as a Framework for Ethical Decision Making.” In.
Awad, Levine, Anderson, et al. 2022. Computational Ethics.” Trends in Cognitive Sciences.
Bak, Choi, Akrami, et al. 2016. Adaptive Optimal Training of Animal Behavior.” In Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16.
Bello, and Malle. 2023. Computational Approaches to Morality.” In The Cambridge Handbook of Computational Cognitive Sciences. Cambridge Handbooks in Psychology.
Bennett, Niv, and Langdon. 2021. Value-Free Reinforcement Learning: Policy Optimization as a Minimal Model of Operant Behavior.” Current Opinion in Behavioral Sciences.
Chiu, Jiang, and Choi. 2024. DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life.”
Crook, Nugent, Rolf, et al. 2021. Computing Morality: Synthetic Ethical Decision Making and Behaviour.” Cognitive Computation and Systems.
Deschamps, Chaput, and Matignon. 2024. Multi-Objective Reinforcement Learning: An Ethical Perspective.” In RJCIA.
DeYoung. 2013. The Neuromodulator of Exploration: A Unifying Theory of the Role of Dopamine in Personality.” Frontiers in Human Neuroscience.
Dezfouli, Griffiths, Ramos, et al. 2019. Models That Learn How Humans Learn: The Case of Decision-Making and Its Disorders.” PLOS Computational Biology.
Dezfouli, Nock, and Dayan. 2020. Adversarial Vulnerabilities of Human Decision-Making.” Proceedings of the National Academy of Sciences.
Dickinson. 1989. The Detrimental Effects of Extrinsic Reinforcement on ‘Intrinsic Motivation’.” The Behavior Analyst.
Dong, Li, Yang, et al. 2024. Egoism, Utilitarianism and Egalitarianism in Multi-Agent Reinforcement Learning.” Neural Networks.
Ecoffet, and Lehman. 2021. Reinforcement Learning Under Moral Uncertainty.” In Proceedings of the 38th International Conference on Machine Learning.
Edelmann. 2022. Values, Preferences, Meaningful Choice.”
Eisenberger, and Cameron. 1996. Detrimental Effects of Reward: Reality or Myth? American Psychologist.
Everitt, Ortega, Barnes, et al. 2022. Understanding Agent Incentives Using Causal Influence Diagrams. Part I: Single Action Settings.”
Garrido-Merchán, and Lumbreras-Sancho. 2023. From Computational Ethics to Morality: How Decision-Making Algorithms Can Help Us Understand the Emergence of Moral Principles, the Existence of an Optimal Behaviour and Our Ability to Discover It.”
Govindarajulu, Bringjsord, and Ghosh. 2018. One Formalization of Virtue Ethics via Learning.”
Grosse, Bae, Anil, et al. 2023. Studying Large Language Model Generalization with Influence Functions.”
Halpern, and Kleiman-Weiner. 2018. Towards Formal Definitions of Blameworthiness, Intention, and Moral Responsibility.”
Hammond, and Belle. 2018. Deep Tractable Probabilistic Models for Moral Responsibility.” ArXiv.
Hegde, Agarwal, and Rao. 2020. Ethics, Prosperity, and Society: Moral Evaluation Using Virtue Ethics and Utilitarianism.” In.
Howard, and Muntean. 2016. A Minimalist Model of the Artificial Autonomous Moral Agent (AAMA).” In.
Krening. 2023. Q-Learning as a Model of Utilitarianism in a Human–Machine Team.” Neural Computing and Applications.
Lee, Leibo, An, et al. 2022. Importance of prefrontal meta control in human-like reinforcement learning.” Frontiers in Computational Neuroscience.
Li, Devidze, Mustafa, et al. 2024. Ethics in Action: Training Reinforcement Learning Agents for Moral Decision-Making In Text-Based Adventure Games.” In Proceedings of The 27th International Conference on Artificial Intelligence and Statistics.
Mazeika, Yin, Tamirisa, et al. 2025. Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs.”
Miller Tate. 2021. A Predictive Processing Theory of Motivation.” Synthese.
Peterson, Bourgin, Agrawal, et al. 2021. Using Large-Scale Experiments and Machine Learning to Discover Theories of Human Decision-Making.” Science.
Plasencia. 2024. Reinforcement Learning From Human Feedback For Ethically Robust Ai Decision-Making.”
Robertazzi, Vissani, Schillaci, et al. 2022. Brain-Inspired Meta-Reinforcement Learning Cognitive Control in Conflictual Inhibition Decision-Making Task for Artificial Agents.” Neural Networks.
Shiravand, and André. 2024. Human-Like Moral Decisions by Reinforcement Learning Agents.” In Proceedings of the Annual Meeting of the Cognitive Science Society.
Stenseke. 2023. On the Computational Complexity of Ethics: Moral Tractability for Minds and Machines.”
Tennant, Hailes, and Musolesi. 2023. Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning.” In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence.
Todorovski. 2023. Introduction to Computational Ethics.” In Artificial Intelligence, Social Harms and Human Rights.
Vishwanath, Dennis, and Slavkovik. 2024. Reinforcement Learning and Machine Ethics:a Systematic Review.”
Wallach, and Allen. 2008. Moral Machines: Teaching Robots Right from Wrong.
Wang, Kurth-Nelson, Kumaran, et al. 2018. Prefrontal cortex as a meta-reinforcement learning system.” Nature Neuroscience.
Wu, and Lin. 2018. A Low-Cost Ethics Shaping Approach for Designing Reinforcement Learning Agents.” Proceedings of the AAAI Conference on Artificial Intelligence.