AI persuasion, AI manipulation

Have we always been reward hacking each other?

2015-10-05 — 2025-06-12

adversarial
AI safety
bounded compute
communicating
cooperation
culture
economics
faster pussycat
language
machine learning
mind
neural nets
NLP
security
technology
wonk
Figure 1

Placeholder, for future wondering about where our conversations with empathic machines will take us.

Humans, at least in the kind of experiments we are allowed to do in labs, not very persuasive. Are machines better?

I now think: Almost certainly. I was flipped to the belief that AIs are superhumanly persuasive by See Costello, Pennycook, and Rand (2024).

My reading of that piece is that there is another thing that we don’t consider when trying to understand AI persuasion, which is that AIs can avoid participating in the identity formation and group signalling that underlies human-to-human persuasion.

There is a whole body of literature about understanding when persuasion does work in humans — for example the work on ”Deep Canvassing” had be pretty convinced that I needed to think about persuasion as a thing that happens after the persuader has managed to emotionally “get into the in-group” of the persuadee.

“AIs can’t do that”, I thought. But I think that I needed to think that AIs are not in the out-group to begin with, so they don’t need to. Asides from the patience, and the speed of thought etc, The Being also comes with the superhuman advantage of not looking like a known out-group, and maybe that is more important than looking like the in-group. I would not have picked that.

1 Field experiments in persuasive AI

e.g. Costello, Pennycook, and Rand (2024). This is pitched as AI-augmented de-radicalisation but you could change the goal and think of it as a test case for AI-augmented persuasion in general. Interestingly the AI doesn’t seem to need to resort to the traditional workarounds for human tribal reasoning such as deep canvassing, and the potential for scaling up individualised mass persuasion seems like it could become the dominant game in town, at least until the number of competing persuasive AI chatbots caused people to become inured to them.

See also Salvi et al. (2024).

Figure 2

2 Therapist in a box

TBD See also artifical intimacy.

3 Spamularity and the scam economy

See Spamularity for more on this.

4 The future wherein it is immoral to deny someone the exquisite companionship and attentive understanding of their own personal AI companion

See artificial intimacy for more on this.

5 Incoming

6 References

Akerlof, and Shiller. 2015. Phishing for Phools: The Economics of Manipulation and Deception.
Bay. 2018. Weaponizing the Haters: The Last Jedi and the Strategic Politicization of Pop Culture Through Social Media Manipulation.” First Monday.
Benkler, Faris, and Roberts. 2018. Network propaganda: manipulation, disinformation, and radicalization in American politics.
Bradshaw, and Howard. 2017. Troops, Trolls and Troublemakers: A Global Inventory of Organized Social Media Manipulation.”
Broockman, David, and Kalla. 2016. Durably Reducing Transphobia: A Field Experiment on Door-to-Door Canvassing.” Science.
———. 2020. When and Why Are Campaigns’ Persuasive Effects Small? Evidence from the 2020 US Presidential Election.”
Broockman, David E., Kalla, and Sekhon. 2016. The Design of Field Experiments With Survey Outcomes: A Framework for Selecting More Efficient, Robust, and Ethical Designs.” SSRN Scholarly Paper ID 2742869.
Buçinca, Malaya, and Gajos. 2021. To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-Assisted Decision-Making.” Proceedings of the ACM on Human-Computer Interaction.
Centola. 2013. Homophily, Networks, and Critical Mass: Solving the Start-up Problem in Large Group Collective Action.” Rationality and Society.
Clark, and Clark. 2004. Natural-Born Cyborgs: Minds, Technologies, and the Future of Human Intelligence.
———. 2011. Supersizing the Mind: Embodiment, Action, and Cognitive Extension. Philosophy of Mind.
Costello, Pennycook, and Rand. 2024. Durably Reducing Conspiracy Beliefs Through Dialogues with AI.” Science.
Dezfouli, Nock, and Dayan. 2020. Adversarial Vulnerabilities of Human Decision-Making.” Proceedings of the National Academy of Sciences.
Doudkin, Pataranutaporn, and Maes. 2025a. AI Persuading AI Vs AI Persuading Humans: LLMs’ Differential Effectiveness in Promoting Pro-Environmental Behavior.”
———. 2025b. From Synthetic to Human: The Gap Between AI-Predicted and Actual Pro-Environmental Behavior Change After Chatbot Persuasion.” In Proceedings of the 7th ACM Conference on Conversational User Interfaces. CUI ’25.
El-Sayed, Akbulut, McCroskery, et al. 2024. A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI.”
Fang, Liu, Danry, et al. 2025. How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Randomized Controlled Study.”
Jaidka, Chen, Chesterman, et al. 2024. Misinformation, Disinformation, and Generative AI: Implications for Perception and Policy.” Digit. Gov.: Res. Pract.
Jecmen, Zhang, Liu, et al. 2020. Mitigating Manipulation in Peer Review via Randomized Reviewer Assignments.” In Advances in Neural Information Processing Systems.
Kalla, and Broockman. 2018. The Minimal Persuasive Effects of Campaign Contact in General Elections: Evidence from 49 Field Experiments.” American Political Science Review.
Kamenica. 2019. Bayesian Persuasion and Information Design.” Annual Review of Economics.
Kanich, Kreibich, Levchenko, et al. 2008. Spamalytics: An Empirical Analysis of Spam Marketing Conversion.” In Proceedings of the 15th ACM Conference on Computer and Communications Security. CCS ’08.
Kenton, Kumar, Farquhar, et al. 2023. Discovering Agents.” Artificial Intelligence.
Khan, Hughes, Valentine, et al. 2024. Debating with More Persuasive LLMs Leads to More Truthful Answers.”
Kulveit, Douglas, Ammann, et al. 2025. Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development.”
Liu, Xu, Zhang, et al. 2025. LLM Can Be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models.”
Manzini, Keeling, Marchal, et al. 2024. Should Users Trust Advanced AI Assistants? Justified Trust As a Function of Competence and Alignment.” In The 2024 ACM Conference on Fairness, Accountability, and Transparency.
Martin, and Yurukoglu. 2017. Bias in Cable News: Persuasion and Polarization.” American Economic Review.
Marwick, and Lewis. 2017. Media Manipulation and Disinformation Online.”
Matz, Teeny, Vaid, et al. 2024. The Potential of Generative AI for Personalized Persuasion at Scale.” Scientific Reports.
Mercier, and Sperber. 2011. Why Do Humans Reason? Arguments for an Argumentative Theory.” Behavioral and Brain Sciences.
———. 2017. The Enigma of Reason.
Messeri, and Crockett. 2024. Artificial Intelligence and Illusions of Understanding in Scientific Research.” Nature.
Meta Fundamental AI Research Diplomacy Team (FAIR), Bakhtin, Brown, et al. 2022. Human-Level Play in the Game of Diplomacy by Combining Language Models with Strategic Reasoning.” Science.
Meulemans, Schug, Kobayashi, et al. 2023. Would I Have Gotten That Reward? Long-Term Credit Assignment by Counterfactual Contribution Analysis.”
Nisbett, and Spaiser. 2023. How Convincing Are AI-Generated Moral Arguments for Climate Action? Frontiers in Climate.
Pan, Jones, Jagadeesan, et al. 2024. Feedback Loops With Language Models Drive In-Context Reward Hacking.”
Pataranutaporn, Leong, Danry, et al. 2022. AI-Generated Virtual Instructors Based on Liked or Admired People Can Improve Motivation and Foster Positive Emotions for Learning.” In 2022 IEEE Frontiers in Education Conference (FIE).
Pataranutaporn, Liu, Finn, et al. 2023. Influencing Human–AI Interaction by Priming Beliefs about AI Can Increase Perceived Trustworthiness, Empathy and Effectiveness.” Nature Machine Intelligence.
Pfeffer, and Gal. 2007. On the Reasoning Patterns of Agents in Games.” In AAAI-07/IAAI-07 Proceedings. Proceedings of the National Conference on Artificial Intelligence.
Rao, and Reiley. 2012. The Economics of Spam.” Journal of Economic Perspectives.
Salvi, Ribeiro, Gallotti, et al. 2024. On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial.”
Schoenegger, Salvi, Liu, et al. 2025. Large Language Models Are More Persuasive Than Incentivized Human Persuaders.”
Sperber, and Mercier. 2012. Reasoning as a Social Competence.” In Collective Wisdom.
Swartz, Marwick, and Larson. 2025. ScamGPT: GenAI and the Automation of Fraud.
Taylor, and Hoffmann. n.d. Industry Responses to Computational Propaganda and Social Media Manipulation.”
Teeny, Siev, Briñol, et al. 2021. A Review and Conceptual Framework for Understanding Personalized Matching Effects in Persuasion.” Journal of Consumer Psychology.
Turkle. 2011. Alone Together: Why We Expect More from Technology and Less from Each Other.
Wen, Zhong, Khan, et al. 2024. Language Models Learn to Mislead Humans via RLHF.”