ML scaling in the massive parameter limit

Brief links on the theme of scaling in the extremely large model/large data limit. Especially relevant in Transformer language models for now.

  • Exploring the limits of Concurrency in ML Training on Google TPUs Kumar et al. (2020) (BERT in 23s on a TPU-4096; “We view the current competition in language understanding as a modern-day Space Race, with competing organizations assembling both giant machines and giant models in the quest for an Artificial General Intelligence breakthrough.”)
  • Zhang et al. (2020) (how do NNs learn from language as n increases?
