In-context learning
2024-08-29 — 2025-08-17
Wherein it is argued that transformers are to be regarded as generalized inference machines resembling set functions, and an inquiry is undertaken into whether they can be induced to perform formal causal inference.
approximation
Bayes
causal
generative
language
machine learning
meta learning
Monte Carlo
neural nets
NLP
optimization
probabilistic algorithms
probability
statistics
stringology
time series
As set functions, transformers look a lot like generalized inference machines acting upon whatever you show them. As such we refer to them as in-context learners.
What exactly do we mean by that? Are transformers actually in-context learners? How sophisticated is this in-context inference? Can we make them do ‘proper’ causal inference in some formal sense?
1 References
Bai, Chen, Wang, et al. 2023. “Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection.” In Advances in Neural Information Processing Systems.
Carroll, Hoogland, Farrugia-Roberts, et al. 2025. “Dynamics of Transient Structure in In-Context Linear Regression Transformers.”
Delétang, Ruoss, Duquenne, et al. 2024. “Language Modeling Is Compression.”
Dong, Li, Dai, et al. 2024. “A Survey on In-Context Learning.”
Hollmann, Müller, Eggensperger, et al. 2023. “TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second.”
Madabushi, Torgbi, and Bonial. 2025. “Neither Stochastic Parroting nor AGI: LLMs Solve Tasks Through Context-Directed Extrapolation from Training Data Priors.”
Müller, Feurer, Hollmann, et al. 2023. “PFNs4BO: In-Context Learning for Bayesian Optimization.” arXiv Preprint arXiv:2305.17535.
Nguyen, and Grover. 2022. “Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling.” In.
Nichani, Damian, and Lee. 2024. “How Transformers Learn Causal Structure with Gradient Descent.”
Olsson, Elhage, Nanda, et al. 2022. “In-Context Learning and Induction Heads.”
Reuter, Rudner, Fortuin, et al. 2025. “Can Transformers Learn Full Bayesian Inference in Context?” In.
Riechers, Bigelow, Alt, et al. 2025. “Next-Token Pretraining Implies in-Context Learning.”
Yadlowsky, Doshi, and Tripuraneni. 2023. “Can Transformer Models Generalize Via In-Context Learning Beyond Pretraining Data?” In.
Ye, Yang, Siah, et al. 2024. “Pre-Training and in-Context Learning IS Bayesian Inference a La De Finetti.”
Zekri, Odonnat, Benechehab, et al. 2025. “Large Language Models as Markov Chains.”