In-context learning

2024-08-29 — 2025-08-17

approximation
Bayes
causal
generative
language
machine learning
meta learning
Monte Carlo
neural nets
NLP
optimization
probabilistic algorithms
probability
statistics
stringology
time series
Figure 1

As set functions, transformers look a lot like ‘generalized inference machines’, and as such we refer to them as In-context learners sometimes.

What exactly does that mean? Are they? Can we make them do ‘proper’ causal inference in some formal sense??

1 References

Bai, Chen, Wang, et al. 2023. Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection.” In Advances in Neural Information Processing Systems.
Carroll, Hoogland, Farrugia-Roberts, et al. 2025. Dynamics of Transient Structure in In-Context Linear Regression Transformers.”
Delétang, Ruoss, Duquenne, et al. 2024. Language Modeling Is Compression.”
Dong, Li, Dai, et al. 2024. A Survey on In-Context Learning.”
Hollmann, Müller, Eggensperger, et al. 2023. TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second.”
Madabushi, Torgbi, and Bonial. 2025. Neither Stochastic Parroting nor AGI: LLMs Solve Tasks Through Context-Directed Extrapolation from Training Data Priors.”
Müller, Feurer, Hollmann, et al. 2023. “PFNs4BO: In-Context Learning for Bayesian Optimization.” arXiv Preprint arXiv:2305.17535.
Nguyen, and Grover. 2022. Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling.” In.
Nichani, Damian, and Lee. 2024. How Transformers Learn Causal Structure with Gradient Descent.”
Olsson, Elhage, Nanda, et al. 2022. In-Context Learning and Induction Heads.”
Reuter, Rudner, Fortuin, et al. 2025. Can Transformers Learn Full Bayesian Inference in Context? In.
Riechers, Bigelow, Alt, et al. 2025. Next-Token Pretraining Implies in-Context Learning.”
Yadlowsky, Doshi, and Tripuraneni. 2023. Can Transformer Models Generalize Via In-Context Learning Beyond Pretraining Data? In.
Ye, Yang, Siah, et al. 2024. Pre-Training and in-Context Learning IS Bayesian Inference a La De Finetti.”
Zekri, Odonnat, Benechehab, et al. 2025. Large Language Models as Markov Chains.”