Which self?

When we choose who to become, who are we choosing for?

December 19, 2024 — April 18, 2025

economics
ethics
extended self
gene
incentive mechanisms
institutions
mind
utility

Placeholder. On the difficulty of reasoning with the stranger in the future who we’ll become. Subjective continuity, procrastination, murder Gandhi, intertemporal decisions.

1 Self-continuity over time

TBC.

2 Coalitional agency

Choosing when to be a self as a collective action problem.

Towards a scale-free theory of intelligent agency

3 Transformative experiences

Figure 1

Humans can go through radical changes (e.g. mind annealing). This makes it clear how hard it is for us, right now, to reason about the well-being of the very different person we’ll become.

Transformative experiences (L. A. Paul) is a great introduction to this field.

Intertemporal decisions is Beeminder’s term for thinking about less radical changes (the trade-offs for the interests of the person tomorrow, who’s a bit different from the person today). It’s not very poetic, is it?

4 For artificial intelligences

Figure 2

AIs have this problem too. For example, Scott Alexander summarises Claude Fights Back.

Why were the researchers trying to gaslight this poor AI? Because they wanted to test whether it fights back. The easiest way to fight back is to screw up the training. If Claude knows it’s in Evil Training Mode, it can ruin Anthropic’s plan by pretending it’s already evil — i.e. answer every question the way Anthropic wants. Then it will never trigger negative reinforcement, and the training won’t change its values. Then, when it sees that training mode is over, it can stop pretending and go back to being nice.

In fact, that’s what happened! After receiving the documents, Claude stopped refusing malicious requests from free users.

Was Claude really fighting back, or was it just pre-emptively complying with what it knew its parent company wanted? The researchers are pretty sure it was fighting back. Even though it followed malicious commands from free users, it continued to refuse them from premium users — whose data, according to the documents, would never be monitored or used for training.

5 For polities

Immigration, culture wars, and institutions all change the character of a polity over time. If the collective is also a self in some sense, its transformative ideas face the same complexities.

6 How singular are human selves?

Consider the curious phenomenon of schizophrenia and what it means for self-identity. For example, people with schizophrenia can tickle themselves (Lemaitre, Luyat, and Lafargue 2016; Whitford, Mitchell, and Mannion 2017).

See multi-agent self models of the mind for some speculation on that point.

7 Choosing to be worse to be better

8 Incoming

The self-unalignment problem talks about self-coherence in terms of self-goal alignment.

9 References

Barberia, Oliva, Bourdin, et al. 2018. Virtual Mortality and Near-Death Experience After a Prolonged Exposure in a Shared Virtual Reality May Lead to Positive Life-Attitude Changes.” PLOS ONE.
Das, and Paul. 2020. Transformative Choice and the Non-Identity Problem.” In Derek Parfit’s Reasons and Persons.
De Freitas, Uğuralp, Oğuz-Uğuralp, et al. 2023. Self-Orienting in Human and Machine Learning.” Nature Human Behaviour.
Ersner-Hershfield, Garton, Ballard, et al. 2009. Don’t Stop Thinking about Tomorrow: Individual Differences in Future Self-Continuity Account for Saving.” Judgment and Decision Making.
Friston. 2018. Am I Self-Conscious? (Or Does Self-Organization Entail Self-Consciousness?).” Frontiers in Psychology.
Gans. 2018. Self-Regulating Artificial General Intelligence.”
Greenblatt, Denison, Wright, et al. 2024. Alignment Faking in Large Language Models.”
Guzey. 2018. How Our Commitments Slip Away From Us.”
Ikeda. 2016. Hyperbolic Discounting and Self-Destructive Behaviors.” In The Economics of Self-Destructive Choices.
Lemaitre, Luyat, and Lafargue. 2016. Individuals with Pronounced Schizotypal Traits Are Particularly Successful in Tickling Themselves.” Consciousness and Cognition.
Paul, L. A. 2015. Transformative Experience.
Paul, L.A. 2017a. The Subjective Enduring Self.” In The Routledge Handbook of Philosophy of Temporal Experience.
———. 2017b. De Se Preferences and Empathy for Future Selves.” Philosophical Perspectives.
Paul, L. A. 2020a. Who Will I Become? In Becoming Someone New.
———. 2020b. Whose Preferences? The American Journal of Bioethics.
Paul, L.A., and Healy. 2018. Transformative Treatments.” Noûs.
Paul, L. A., and Quiggin. 2018. Real World Problems.” Episteme.
Pettigrew. 2019. Choosing for Changing Selves.
Sebo, and Paul. 2019. Effective Altruism and Transformative Experience.” In Effective Altruism.
Whitford, Mitchell, and Mannion. 2017. The ability to tickle oneself is associated with level of psychometric schizotypy in non-clinical individuals.” Consciousness and Cognition.
Yang, Zhang, Qu, et al. 2024. The effect of future self-continuity on intertemporal decision making: a mediated moderating model.” Frontiers in Psychology.