In-context learning
2024-08-29 — 2025-08-17
approximation
Bayes
causal
generative
language
machine learning
meta learning
Monte Carlo
neural nets
NLP
optimization
probabilistic algorithms
probability
statistics
stringology
time series
Suspiciously similar content
As set functions, transformers look a lot like ‘generalized inference machines’, and as such we refer to them as In-context learners sometimes.
What exactly does that mean? Are they? Can we make them do ‘proper’ causal inference in some formal sense??
1 References
Bai, Chen, Wang, et al. 2023. “Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection.” In Advances in Neural Information Processing Systems.
Carroll, Hoogland, Farrugia-Roberts, et al. 2025. “Dynamics of Transient Structure in In-Context Linear Regression Transformers.”
Delétang, Ruoss, Duquenne, et al. 2024. “Language Modeling Is Compression.”
Dong, Li, Dai, et al. 2024. “A Survey on In-Context Learning.”
Hollmann, Müller, Eggensperger, et al. 2023. “TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second.”
Madabushi, Torgbi, and Bonial. 2025. “Neither Stochastic Parroting nor AGI: LLMs Solve Tasks Through Context-Directed Extrapolation from Training Data Priors.”
Müller, Feurer, Hollmann, et al. 2023. “PFNs4BO: In-Context Learning for Bayesian Optimization.” arXiv Preprint arXiv:2305.17535.
Nguyen, and Grover. 2022. “Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling.” In.
Nichani, Damian, and Lee. 2024. “How Transformers Learn Causal Structure with Gradient Descent.”
Olsson, Elhage, Nanda, et al. 2022. “In-Context Learning and Induction Heads.”
Reuter, Rudner, Fortuin, et al. 2025. “Can Transformers Learn Full Bayesian Inference in Context?” In.
Riechers, Bigelow, Alt, et al. 2025. “Next-Token Pretraining Implies in-Context Learning.”
Yadlowsky, Doshi, and Tripuraneni. 2023. “Can Transformer Models Generalize Via In-Context Learning Beyond Pretraining Data?” In.
Ye, Yang, Siah, et al. 2024. “Pre-Training and in-Context Learning IS Bayesian Inference a La De Finetti.”
Zekri, Odonnat, Benechehab, et al. 2025. “Large Language Models as Markov Chains.”