AI persuasion, AI manipulation
Have we always been reward hacking each other?
2015-10-05 — 2025-06-12
Suspiciously similar content
Placeholder, for future wondering about where our conversations with empathic machines will take us.
Humans, at least in the kind of experiments we are allowed to do in labs, not very persuasive. Are machines better?
I now think: Almost certainly. I was flipped to the belief that AIs are superhumanly persuasive by See Costello, Pennycook, and Rand (2024).
My reading of that piece is that there is another thing that we don’t consider when trying to understand AI persuasion, which is that AIs can avoid participating in the identity formation and group signalling that underlies human-to-human persuasion.
There is a whole body of literature about understanding when persuasion does work in humans — for example the work on ”Deep Canvassing” had be pretty convinced that I needed to think about persuasion as a thing that happens after the persuader has managed to emotionally “get into the in-group” of the persuadee.
“AIs can’t do that”, I thought. But I think that I needed to think that AIs are not in the out-group to begin with, so they don’t need to. Asides from the patience, and the speed of thought etc, The Being also comes with the superhuman advantage of not looking like a known out-group, and maybe that is more important than looking like the in-group. I would not have picked that.
1 Field experiments in persuasive AI
e.g. Costello, Pennycook, and Rand (2024). This is pitched as AI-augmented de-radicalisation but you could change the goal and think of it as a test case for AI-augmented persuasion in general. Interestingly the AI doesn’t seem to need to resort to the traditional workarounds for human tribal reasoning such as deep canvassing, and the potential for scaling up individualised mass persuasion seems like it could become the dominant game in town, at least until the number of competing persuasive AI chatbots caused people to become inured to them.
See also Salvi et al. (2024).
2 Therapist in a box
TBD See also artifical intimacy.
3 Spamularity and the scam economy
See Spamularity for more on this.
4 The future wherein it is immoral to deny someone the exquisite companionship and attentive understanding of their own personal AI companion
See artificial intimacy for more on this.
5 Incoming
- Beth Barnes on Risks from AI persuasion
- Dynomight, in, I guess I was wrong about AI persuasion emphasises some different things that I was thinking of. See also the comments
- Diplomacy and CICERO
- Reward hacking behavior can generalize across tasks — AI Alignment Forum
- Reward Hacking in Reinforcement Learning | Lil’Log