--- references: - id: KumarExploring2020 accessed: - year: 2021 month: 1 day: 14 author: - family: Kumar given: Sameer - family: Bradbury given: James - family: Young given: Cliff - family: Wang given: Yu Emma - family: Levskaya given: Anselm - family: Hechtman given: Blake - family: Chen given: Dehao - family: Lee given: HyoukJoong - family: Deveci given: Mehmet - family: Kumar given: Naveen - family: Kanwar given: Pankaj - family: Wang given: Shibo - family: Wanderman-Milne given: Skye - family: Lacy given: Steve - family: Wang given: Tao - family: Oguntebi given: Tayo - family: Zu given: Yazhou - family: Xu given: Yuanzhong - family: Swing given: Andy citation-key: KumarExploring2020 container-title: arXiv:2011.03641 [cs] issued: - year: 2020 month: 11 day: 6 title: Exploring the limits of Concurrency in ML Training on Google TPUs type: article-journal URL: http://arxiv.org/abs/2011.03641 - id: RajbhandariZeRO2020 accessed: - year: 2021 month: 7 day: 14 author: - family: Rajbhandari given: Samyam - family: Rasley given: Jeff - family: Ruwase given: Olatunji - family: He given: Yuxiong citation-key: RajbhandariZeRO2020 container-title: arXiv:1910.02054 [cs, stat] issued: - year: 2020 month: 5 day: 13 title: 'ZeRO: Memory Optimizations Toward Training Trillion Parameter Models' type: article-journal URL: http://arxiv.org/abs/1910.02054 - id: RajbhandariZeROInfinity2021 accessed: - year: 2021 month: 7 day: 14 author: - family: Rajbhandari given: Samyam - family: Ruwase given: Olatunji - family: Rasley given: Jeff - family: Smith given: Shaden - family: He given: Yuxiong citation-key: RajbhandariZeROInfinity2021 container-title: arXiv:2104.07857 [cs] issued: - year: 2021 month: 4 day: 15 title: 'ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning' type: article-journal URL: http://arxiv.org/abs/2104.07857 - id: RasleyDeepSpeed2020 accessed: - year: 2021 month: 7 day: 13 author: - family: Rasley given: Jeff - family: Rajbhandari given: Samyam - family: Ruwase given: Olatunji - family: He given: Yuxiong citation-key: RasleyDeepSpeed2020 collection-title: KDD '20 container-title: >- Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining DOI: 10.1145/3394486.3406703 event-place: New York, NY, USA ISBN: 978-1-4503-7998-4 issued: - year: 2020 month: 8 day: 23 page: 3505–3506 publisher: Association for Computing Machinery publisher-place: New York, NY, USA title: >- DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters type: paper-conference - id: RenZeROOffload2021 accessed: - year: 2021 month: 7 day: 14 author: - family: Ren given: Jie - family: Rajbhandari given: Samyam - family: Aminabadi given: Reza Yazdani - family: Ruwase given: Olatunji - family: Yang given: Shuangyan - family: Zhang given: Minjia - family: Li given: Dong - family: He given: Yuxiong citation-key: RenZeROOffload2021 container-title: arXiv:2101.06840 [cs] issued: - year: 2021 month: 1 day: 17 title: 'ZeRO-Offload: Democratizing Billion-Scale Model Training' type: article-journal URL: http://arxiv.org/abs/2101.06840 - id: Tang1bit2021 accessed: - year: 2021 month: 7 day: 14 author: - family: Tang given: Hanlin - family: Gan given: Shaoduo - family: Awan given: Ammar Ahmad - family: Rajbhandari given: Samyam - family: Li given: Conglong - family: Lian given: Xiangru - family: Liu given: Ji - family: Zhang given: Ce - family: He given: Yuxiong citation-key: Tang1bit2021 container-title: arXiv:2102.02888 [cs] issued: - year: 2021 month: 6 day: 29 title: >- 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed type: article-journal URL: http://arxiv.org/abs/2102.02888 - id: ZhangAccelerating2020 accessed: - year: 2021 month: 7 day: 14 author: - family: Zhang given: Minjia - family: He given: Yuxiong citation-key: ZhangAccelerating2020 container-title: Advances in Neural Information Processing Systems issued: - year: 2020 language: en page: 14011-14023 title: >- Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping type: article-journal URL: http://arxiv.org/abs/2010.13369 volume: '33' ...