Morality and computational constraints
It is as if we knew what we were doing
2023-10-02 — 2025-11-03
Wherein computational constraints and moral theory are examined, and the role of reinforcement‑learning reward in signaling pain and workplace health‑and‑safety risks is considered.
Notes on connections between computation and morality. Is pleasure a reward signal? Is a loss penalty the same as pain? Is an efficient learning update a moral imperative? Is it a workplace health and safety matter?
Links on those themes.
Possible connection: the emergent value systems of LLMs.
1 Background
1.1 Reinforcement learning and morality
I’d like to know everything people have said about it. The review by Vishwanath, Dennis, and Slavkovik (2024) is perfunctory but a start.
See Abel, MacGlashan, and Littman (2016).
Emerging AI systems will be making more and more decisions that impact the lives of humans in a significant way. It is essential, then, that these AI systems make decisions that take into account the desires, goals, and preferences of other people, while simultaneously learning about what those preferences are. In this work, we argue that the reinforcement-learning framework achieves the appropriate generality required to theorize about an idealized ethical artificial agent, and offers the proper foundations for grounding specific questions about ethical learning and decision making that can pro- mote further scientific investigation. We define an idealized formalism for an ethical learner, and conduct experiments on two toy ethical dilemmas, demonstrating the soundness and flexibility of our approach
2 Ethical systems as computational optimization
- APXHARD, Ethical Systems as Computational Optimizations
- Stenseke (2023) also emphasizes the computational tractability of moral reasoning.
3 When does negative reinforcement hurt?
Ethical Issues in Artificial Reinforcement Learning
There is a remarkable connection between artificial reinforcement-learning (RL) algorithms and the process of reward learning in animal brains. Do RL algorithms on computers pose moral problems? I think current RL computations do matter, though they’re probably less morally significant than animals, including insects, because the degree of consciousness and emotional experience seems limited in present-day RL agents. As RL becomes more sophisticated and is hooked up to other more “conscious” brain-like operations, this topic will become increasingly urgent. Given the vast numbers of RL computations that will be run in the future in industry, video games, robotics, and research, the moral stakes may be high. I encourage scientists and altruists to work toward more humane approaches to reinforcement learning.
TBC
4 Computational definitions of blame
These papers — Everitt et al. (2022) and Halpern and Kleiman-Weiner (2018) — fall in this area; see the causality and agency snippet.
5 Incoming
Ethics and the Complexity of Models
The difference between the three views is the complexity of the models they rely on.
Deontology has the simplest model. Everything is either right or wrong according to a rule, or set of rules. Depending on the deontological theory we’re dealing with, those rules might take only a few words.[…]
Consequentialism implies a much more complex model. It initially seems deceptively simple: just maximize something you care about, such as well-being or happiness. But to be able to maximize that, you need to know a lot about the world. Everything causes a bunch of consequences, some of them predictable and some not, not to mention that the consequences have consequences of their own, and so on. This can be much better than using simple rules, since it allows you to consider special cases. On the other hand, true, full consequentialism — the kind that doesn’t reduce to a deontological heuristic — essentially requires omniscience.[…]
Put this way, deontology and consequentialism contrast in interesting fashion. On one side, the simplistic, dumb, irrational view where you just have to follow the rules; on the other, the view that embraces the complexity of the world, and says we can use intelligence and math to become more ethical. Or — on one side, the wise, distilled morals from past generations, easy to teach and follow; on the other, the hubristic, dangerous tendency of those who believe they can calculate and control everything. Depending on your sensibilities, you can decide which of deontology and consequentialism is the thesis and which is the antithesis.
And what about virtue ethics? I like to view it as the synthesis: it’s the view that accepts the complexity and intractability of the world, yet doesn’t try to condense it into a simple model. Instead, it relies on existing models.
To be virtuous is to act like a virtuous person.
Joscha Bach, From elementary computation to ethics? — Also check out his interview.
The disturbances and the performance of the mind are measured and controlled with a system of rewards and constraints. Because of the mind’s generality, it may find that the easiest way of regulating the disturbances that gave rise to its engagement would be to change the representation of the disturbance. In most cases, this amounts to anesthesia and will not serve the telos of the organism, so evolution has created considerable barriers to prevent write access of the mind to its governing mechanisms. These barriers are responsible for creating the mind’s identification with the rewards and self-model.
When we are concerned about suffering, we are usually referring to disturbances that generate a strong negative reward, but cannot be resolved within the given constraints. On an individual level, disease, mishap and crisis lead to suffering. But we also have a global suffering that is caused by the universal maladaption of humans to present human society, which developed within few generations and deviates considerably from the small-tribe collective environment that we are evolutionary adapted for.[…]
We experience morality viscerally. A famous example is Haidt’s Moral foundations model; there are others.
What collective moralities are possible? I think of them as <em>moral orbits</em>.
In my view, Karl Friston and other predictive coding theorists are implicated via the theory of motivation (Miller Tate 2021).
Grosse et al. (2023) is a magisterial study of LLMs that examines how they reason from examples. I think this matters for computational work: it suggests a case-based morality might be feasible for LLMs.
I need to find a compact statement of what Professor Javen Qinfeng Shi said in a presentation I saw:
Mind is a choice maker. Choices shape the mind
- Q learning: do what a good/kind person would do (moment to moment), learn wisdom (V function) and have faith in future and self-growth. It naturally leads to optimal long-term accumulative rewards (Bellman equation)
- Policy gradient: learn from past successes (to repeat or mimic) and mistakes (to avoid). Require complete episodes to reveal the end accumulative reward per episode.
This is the first time I’ve heard policy gradient described as utilitarianism and Q-learning as virtue ethics. Citation needed.
The analogy in Krening (2023) frames the mapping differently; maybe Shiravand and André (2024) covers this? Govindarajulu, Bringjsord, and Ghosh (2018) lays out the ethical systems but doesn’t mention reinforcement learning.
