Got good behaviour from a million parameter model? Want to see if stuff gets weirder as we hit a billion parameters? Turns out it does!

Brief links on the theme of scaling in the *extremely* large model/large data limit and what that does to the behaviour of the models.
A new front in the complexity, and/or
statistical mechanics of statistics.

As to *how* to scale up these models in practice, see distributed gradient descent.

## Side note: The bitter, better lesson

See optimal cleverness.

## Big transformers

One fun result comes from Transformer language models. An interesting observation way back in 2020 was that there seemed to be an unexpected trade-off where you can go faster by training a bigger network. Indeed, there is a whole family of observations in this vein trying to identify actual scaling behaviour.

nostalgebraist summarises Henighan et al. (2020);Kaplan et al. (2020):

## L(D): information

OpenAI derives a scaling law called L(D). This law is the best you could possibly do β even with arbitrarily large compute/models β if you are only allowed to train on D data points.

No matter how good your model is, there is only so much it can learn from a finite sample. L(D) quantifies this intuitive fact (if the model is an autoregressive transformer).

## L(βC): budgeting

OpenAI also derives another a scaling law called L(βC). This is the best you can do with compute C, if you spend it optimally.

What does optimal spending look like? Remember, you can spend a unit of compute on

- a bigger model (N), or
- training the same model for longer (S)
β¦In the compute regime we are currently in, making the model bigger is way more effective than taking more steps.

Controversy! the scaling laaws have been revised.

## Incoming

- Zhang et al. (2020) (how do NNs learn from language as
*n*increases? - DeepSpeed Compression: A composable library for extreme compression and zero-cost quantization (targeting large language models)

## References

*arXiv:2010.14701 [Cs]*, November.

*arXiv:2001.08361 [Cs, Stat]*, January.

*arXiv:2110.04374 [Cs]*, October.

*arXiv:2011.03641 [Cs]*, November.

*arXiv:2004.10802 [Cs, Stat]*, April.

*arXiv:2011.04946 [Cs]*, November.

## No comments yet. Why not leave one?