Material basis of AI economics
Energy, chips, water
2023-03-23 — 2025-10-07
Wherein economies of foundation models are examined and the disproportionate energy and water demands of large-scale training, including data‑centre cooling and emissions accounting, are described.
Complicated and political, and extremely interesting. TODO.
1 Incoming
-
from Huggingface.
This project seeks to address this gap by establishing a standardized framework for reporting AI models’ energy efficiency, thereby enhancing transparency across the field.
The Electric Slide - by Packy McCormick and Sam D’Amico
America is, implicitly or explicitly, making a bet that whoever wins intelligence, in the form of AI, wins the future.
China is making a different bet: that for intelligence to truly matter, it needs energy and action.
(Oviedo2025Energy?): Most prior estimates extrapolated from lab or small-batch benchmarks. Oviedo et al. show—quantitatively—that this leads to 4–20× overestimation of per-query energy. They explicitly separate model-level, serving-level, and hardware-level contributions and model utilization and PUE distributions empirically.
(Li2025EcoServe?): Introduces one of the first system-level design frameworks for making inference carbon-aware, incorporating dynamic routing and carbon-intensity-aware scheduling. Sets a methodological precedent for integrating emissions directly into inference system design.
(Elsworth2025Measuring?): Provides the first production-scale disclosure of energy intensity for Google’s AI workloads. Baseline for realistic inference efficiency, contrasting with overestimates from academic benchmarks.
(Gupta2020Chasing?): Classic conceptual framing of where energy and carbon footprints arise in computing stacks. Key precursor for the “bottom-up decomposition” approach used in (Oviedo2025Energy?).
(Chung2025MLENERGY?): Establishes standardized methodology for measuring inference energy under realistic serving conditions. Empirical validation for the (Oviedo2025Energy?) claim that non-production benchmarks overstate energy by 4–20×.
(Samsi2023Words?): Among the earliest direct measurements of LLM inference power draw. Provides raw empirical data that Oviedo et al. reinterpret to argue why scaling to production reduces per-query energy dramatically.
(Patel2024Characterizing?): Identifies dynamic voltage/frequency scaling and oversubscription as substantial efficiency levers (20–30% gains). Foundational for (Oviedo2025Energy?) of datacenter-level interventions.
(Yang2025LServe?): Demonstrates practical serving-side gains (1.7× throughput improvement) for long-sequence inference. One of few works quantifying test-time scaling energy mitigation directly.
(Stojkovic2025DynamoLLM?): Provides cluster-level modeling of inference throughput vs. energy efficiency. Forms the basis for Oviedo et al.’s assumption that deployment-scale optimization halves energy per query.
(Luccioni2023Estimating?): The canonical pre-(Oviedo2025Energy?) estimate of inference footprint. Oviedo et al explicitly contrast this with production-level figures to demonstrate that such academic benchmarks vastly overstate real-world energy use.
(Kamiya2025Data?): Provides the empirical historical context that forecasts routinely overestimate digital energy demand by ignoring compounding efficiency trends.
Can the climate survive the insatiable energy demands of the AI arms race?
James O’Donnell Casey Crownhart: We did the math on AI’s energy footprint. Here’s the story you haven’t heard. | MIT Technology Review
- Methodological supplement: Everything you need to know about estimating AI’s energy and emissions burden | MIT Technology Review
kylemcdonald/nvidia-co2: Adds gCO2eq emissions to nvidia-smi.