For the transformer network at least there seems to be an unexpectedly computationally efficient trade-off where you can go faster by training a bigger network.
These networks are absolutely massive (heh) in natural language processing right now.
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2015. “Neural Machine Translation by Jointly Learning to Align and Translate.” In. http://arxiv.org/abs/1409.0473.
Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language Models Are Few-Shot Learners,” June. http://arxiv.org/abs/2005.14165.
Celikyilmaz, Asli, Li Deng, Lihong Li, and Chong Wang. 2017. “Scaffolding Networks for Teaching and Learning to Comprehend,” February. http://arxiv.org/abs/1702.08653.
Choy, Christopher B, JunYoung Gwak, Silvio Savarese, and Manmohan Chandraker. 2016. “Universal Correspondence Network.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 2406–14. Curran Associates, Inc. http://papers.nips.cc/paper/6487-universal-correspondence-network.pdf.
Freeman, Alexandra L J. 2019. “How to Communicate Evidence to Patients.” Drug and Therapeutics Bulletin 57 (8): 119–24. https://doi.org/10.1136/dtb.2019.000008.
Huang, Cheng-Zhi Anna, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, and Douglas Eck. 2018. “Music Transformer,” September. https://arxiv.org/abs/1809.04281v3.
Huang, Cheng-Zhi Anna, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, and Douglas Eck. 2018. “Music Transformer: Generating Music with Long-Term Structure,” September. https://openreview.net/forum?id=rJe4ShAcF7.
Li, Zhuohan, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, and Joseph E. Gonzalez. 2020. “Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers,” February. http://arxiv.org/abs/2002.11794.
Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. “Language Models Are Unsupervised Multitask Learners,” 24.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need,” June. http://arxiv.org/abs/1706.03762.