---
references:
- id: HenighanScaling2020
accessed:
- year: 2021
month: 1
day: 14
author:
- family: Henighan
given: Tom
- family: Kaplan
given: Jared
- family: Katz
given: Mor
- family: Chen
given: Mark
- family: Hesse
given: Christopher
- family: Jackson
given: Jacob
- family: Jun
given: Heewoo
- family: Brown
given: Tom B.
- family: Dhariwal
given: Prafulla
- family: Gray
given: Scott
- family: Hallacy
given: Chris
- family: Mann
given: Benjamin
- family: Radford
given: Alec
- family: Ramesh
given: Aditya
- family: Ryder
given: Nick
- family: Ziegler
given: Daniel M.
- family: Schulman
given: John
- family: Amodei
given: Dario
- family: McCandlish
given: Sam
citation-key: HenighanScaling2020
container-title: arXiv:2010.14701 [cs]
issued:
- year: 2020
month: 11
day: 5
language: en
title: Scaling Laws for Autoregressive Generative Modeling
type: article-journal
URL: http://arxiv.org/abs/2010.14701
- id: HoffmannTraining2022
accessed:
- year: 2022
month: 8
day: 8
author:
- family: Hoffmann
given: Jordan
- family: Borgeaud
given: Sebastian
- family: Mensch
given: Arthur
- family: Buchatskaya
given: Elena
- family: Cai
given: Trevor
- family: Rutherford
given: Eliza
- family: Casas
given: Diego de Las
- family: Hendricks
given: Lisa Anne
- family: Welbl
given: Johannes
- family: Clark
given: Aidan
- family: Hennigan
given: Tom
- family: Noland
given: Eric
- family: Millican
given: Katie
- family: Driessche
given: George
dropping-particle: van den
- family: Damoc
given: Bogdan
- family: Guy
given: Aurelia
- family: Osindero
given: Simon
- family: Simonyan
given: Karen
- family: Elsen
given: Erich
- family: Rae
given: Jack W.
- family: Vinyals
given: Oriol
- family: Sifre
given: Laurent
citation-key: HoffmannTraining2022
DOI: 10.48550/arXiv.2203.15556
issued:
- year: 2022
month: 3
day: 29
number: arXiv:2203.15556
publisher: arXiv
title: Training Compute-Optimal Large Language Models
type: article
URL: http://arxiv.org/abs/2203.15556
- id: HuTraining2022
accessed:
- year: 2022
month: 8
day: 11
author:
- family: Hu
given: Hang
- family: Song
given: Zhao
- family: Weinstein
given: Omri
- family: Zhuo
given: Danyang
citation-key: HuTraining2022
DOI: 10.48550/arXiv.2208.04508
issued:
- year: 2022
month: 8
day: 8
number: arXiv:2208.04508
publisher: arXiv
title: Training Overparametrized Neural Networks in Sublinear Time
type: article
URL: http://arxiv.org/abs/2208.04508
- id: KaplanScaling2020
accessed:
- year: 2021
month: 3
day: 9
author:
- family: Kaplan
given: Jared
- family: McCandlish
given: Sam
- family: Henighan
given: Tom
- family: Brown
given: Tom B.
- family: Chess
given: Benjamin
- family: Child
given: Rewon
- family: Gray
given: Scott
- family: Radford
given: Alec
- family: Wu
given: Jeffrey
- family: Amodei
given: Dario
citation-key: KaplanScaling2020
container-title: arXiv:2001.08361 [cs, stat]
issued:
- year: 2020
month: 1
day: 22
title: Scaling Laws for Neural Language Models
type: article-journal
URL: http://arxiv.org/abs/2001.08361
- id: KirstainFew2021
accessed:
- year: 2021
month: 10
day: 14
author:
- family: Kirstain
given: Yuval
- family: Lewis
given: Patrick
- family: Riedel
given: Sebastian
- family: Levy
given: Omer
citation-key: KirstainFew2021
container-title: arXiv:2110.04374 [cs]
issued:
- year: 2021
month: 10
day: 8
title: A Few More Examples May Be Worth Billions of Parameters
type: article-journal
URL: http://arxiv.org/abs/2110.04374
- id: KumarExploring2020
accessed:
- year: 2021
month: 1
day: 14
author:
- family: Kumar
given: Sameer
- family: Bradbury
given: James
- family: Young
given: Cliff
- family: Wang
given: Yu Emma
- family: Levskaya
given: Anselm
- family: Hechtman
given: Blake
- family: Chen
given: Dehao
- family: Lee
given: HyoukJoong
- family: Deveci
given: Mehmet
- family: Kumar
given: Naveen
- family: Kanwar
given: Pankaj
- family: Wang
given: Shibo
- family: Wanderman-Milne
given: Skye
- family: Lacy
given: Steve
- family: Wang
given: Tao
- family: Oguntebi
given: Tayo
- family: Zu
given: Yazhou
- family: Xu
given: Yuanzhong
- family: Swing
given: Andy
citation-key: KumarExploring2020
container-title: arXiv:2011.03641 [cs]
issued:
- year: 2020
month: 11
day: 6
title: Exploring the limits of Concurrency in ML Training on Google TPUs
type: article-journal
URL: http://arxiv.org/abs/2011.03641
- id: SharmaNeural2020
accessed:
- year: 2021
month: 1
day: 14
author:
- family: Sharma
given: Utkarsh
- family: Kaplan
given: Jared
citation-key: SharmaNeural2020
container-title: arXiv:2004.10802 [cs, stat]
issued:
- year: 2020
month: 4
day: 22
title: A Neural Scaling Law from the Dimension of the Data Manifold
type: article-journal
URL: http://arxiv.org/abs/2004.10802
- id: SorscherNeural2023
accessed:
- year: 2023
month: 5
day: 18
author:
- family: Sorscher
given: Ben
- family: Geirhos
given: Robert
- family: Shekhar
given: Shashank
- family: Ganguli
given: Surya
- family: Morcos
given: Ari S.
citation-key: SorscherNeural2023
DOI: 10.48550/arXiv.2206.14486
issued:
- year: 2023
month: 4
day: 21
number: arXiv:2206.14486
publisher: arXiv
title: 'Beyond neural scaling laws: beating power law scaling via data pruning'
title-short: Beyond neural scaling laws
type: article
URL: http://arxiv.org/abs/2206.14486
version: '6'
- id: TogeliusChoose2023
accessed:
- year: 2023
month: 4
day: 17
author:
- family: Togelius
given: Julian
- family: Yannakakis
given: Georgios N.
citation-key: TogeliusChoose2023
issued:
- year: 2023
month: 3
day: 31
language: en
number: arXiv:2304.06035
publisher: arXiv
title: 'Choose Your Weapon: Survival Strategies for Depressed AI Academics'
title-short: Choose Your Weapon
type: article
URL: http://arxiv.org/abs/2304.06035
- id: ZhangWhen2020
accessed:
- year: 2021
month: 1
day: 14
author:
- family: Zhang
given: Yian
- family: Warstadt
given: Alex
- family: Li
given: Haau-Sing
- family: Bowman
given: Samuel R.
citation-key: ZhangWhen2020
container-title: arXiv:2011.04946 [cs]
issued:
- year: 2020
month: 11
day: 10
title: When Do You Need Billions of Words of Pretraining Data?
type: article-journal
URL: http://arxiv.org/abs/2011.04946
...