In-context learning

2024-08-29 — 2025-08-17

Wherein it is argued that transformers are to be regarded as generalized inference machines resembling set functions, and an inquiry is undertaken into whether they can be induced to perform formal causal inference.

approximation

Bayes

causality

generative

language

machine learning

meta learning

Monte Carlo

neural nets

NLP

optimization

probabilistic algorithms

probability

statistics

stringology

time series

As set functions, transformers look a lot like generalized inference machines acting upon whatever you show them. As such we refer to them as in-context learners.

What exactly do we mean by that? Are transformers actually in-context learners? How sophisticated is this in-context inference? Can we make them do ‘proper’ causal inference in some formal sense?

1 References

Bai, Chen, Wang, et al. 2023. “Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection.” In Advances in Neural Information Processing Systems.

Carroll, Hoogland, Farrugia-Roberts, et al. 2025. “Dynamics of Transient Structure in In-Context Linear Regression Transformers.”

Delétang, Ruoss, Duquenne, et al. 2024. “Language Modeling Is Compression.”

Dong, Li, Dai, et al. 2024. “A Survey on In-Context Learning.”

Hollmann, Müller, Eggensperger, et al. 2023. “TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second.”

Hollmann, Müller, Purucker, et al. 2025. “Accurate Predictions on Small Data with a Tabular Foundation Model.” Nature.

Lu, Letey, Zavatone-Veth, et al. 2025. “Asymptotic Theory of in-Context Learning by Linear Attention.” Proceedings of the National Academy of Sciences.

Madabushi, Torgbi, and Bonial. 2025. “Neither Stochastic Parroting nor AGI: LLMs Solve Tasks Through Context-Directed Extrapolation from Training Data Priors.”

Müller, Feurer, Hollmann, et al. 2023. “PFNs4BO: In-Context Learning for Bayesian Optimization.” arXiv Preprint arXiv:2305.17535.

Nguyen, and Grover. 2022. “Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling.” In.

Nichani, Damian, and Lee. 2024. “How Transformers Learn Causal Structure with Gradient Descent.”

Olsson, Elhage, Nanda, et al. 2022. “In-Context Learning and Induction Heads.”

Paul, and Trélat. 2024. “From Microscopic to Macroscopic Scale Equations: Mean Field, Hydrodynamic and Graph Limits.”

Reuter, Rudner, Fortuin, et al. 2025. “Can Transformers Learn Full Bayesian Inference in Context?” In.

Riechers, Bigelow, Alt, et al. 2025. “Next-Token Pretraining Implies in-Context Learning.”

Robertson, Reuter, Guo, et al. 2025. “Do-PFN: In-Context Learning for Causal Effect Estimation.”

Yadlowsky, Doshi, and Tripuraneni. 2023. “Can Transformer Models Generalize Via In-Context Learning Beyond Pretraining Data?” In.

Ye, Yang, Siah, et al. 2024. “Pre-Training and in-Context Learning IS Bayesian Inference a La De Finetti.”

Zekri, Odonnat, Benechehab, et al. 2025. “Large Language Models as Markov Chains.”