Artificial agency
2018-10-23 — 2025-02-26
Wherein the Question of Agency Is Examined via Causality-Based Models, the Emergence of Self in Machines Is Contemplated, and the Possibility That the Human Is Not the Agent in Collaborations Is Considered.
I thought I had specific things to say about AI agency, apart from my interest in the causality-based models and emergence of self of it. But, upon introspection, I am not sure what it was. Maybe it was working out when the human is not the agent? Was it to ask the question of who is the agent in human-AI collaborations? Unclear.
1 Emergence of self
AIs have this problem too. For example, Scott Alexander summarizes Claude Fights Back.
Why were the researchers trying to gaslight this poor AI? Because they wanted to test whether it fights back. The easiest way to fight back is to screw up the training. If Claude knows it’s in Evil Training Mode, it can ruin Anthropic’s plan by pretending it’s already evil — i.e. answer every question the way Anthropic wants. Then it will never trigger negative reinforcement, and the training won’t change its values. Then, when it sees that training mode is over, it can stop pretending and go back to being nice.
In fact, that’s what happened! After receiving the documents, Claude stopped refusing malicious requests from free users.
Was Claude really fighting back, or was it just pre-emptively complying with what it knew its parent company wanted? The researchers are pretty sure it was fighting back. Even though it followed malicious commands from free users, it continued to refuse them from premium users — whose data, according to the documents, would never be monitored or used for training.
2 Moral patienthood
3 AI without agency
4 Incoming
Agency, Intentions, and Artificial Intelligence
For AI to fulfill its enormous promise, we must ameliorate valid concerns about the risks that these systems pose to humans or even to humanity as they become more autonomous. To do this, it must be recognized that the central concerns are about the intentions of increasingly capable AI systems, especially if AIs develop their own intentions and if those intentions misalign with human values. It is not about AI becoming super-intelligent or conscious. What is more, if AIs acquire intentions, they might have moral responsibility and moral rights.
