How to Tame AI’s Voracious Appetite for Energy
Why Read This
What Makes This Article Worth Your Time
Summary
What This Article Is About
Katarina Zimmer surveys the rapidly growing energy cost of artificial intelligence and the scientific race to address it. The problem originates in transformer architecture — the design underpinning most large language models — which scales energy use quadratically with text length and requires enormous computation both during training and, more worryingly, during everyday inference (generating responses). US data centres already consumed 224 terawatt-hours of electricity in 2025 — over 5% of the country’s total — up from just 1.9% in 2018, with much of this new demand met by fossil-fuel-powered plants. Without intervention, data centres could emit the equivalent of 44 megatons of CO₂ annually — comparable to Norway’s total yearly emissions.
Researchers are pursuing solutions on multiple fronts. On the software side, smaller task-specific models and “mixture of expert” architectures can reduce energy use by over 90% compared to full LLMs. Alternative model designs such as xLSTM avoid the “quadratic curse” of transformers. On the hardware side, wafer-scale chips, custom AI processors, neuromorphic chips inspired by the brain, and photonic computing using light rather than electrons all promise significant gains. Beyond the machines themselves, better siting of data centres near renewable energy sources and policy intervention from governments are also critical. Zimmer closes by raising a broader question: even a more efficient AI must still justify its energy use — and not every problem needs an AI solution.
Key Points
Main Takeaways
Inference Is the Bigger Energy Problem
Training an LLM once is energy-intensive, but the real concern is inference — running the model for billions of users every day. As one expert puts it: “You train once, then you inference for a billion people in the world.”
Smaller, Specialised Models Save Over 90% Energy
A 2025 UNESCO study found that task-specific small models like DistilBART consumed more than 90% less energy than Meta’s full-scale Llama 3.1 model when used for the same tasks.
Transformer Architecture Has a “Quadratic Curse”
Doubling the length of text in a transformer model quadruples the number of computations required. Alternative architectures like xLSTM avoid this by storing a running summary rather than processing the full text each time.
Renewable Offsets Are Not Enough
Buying renewable energy credits elsewhere while running fossil-fuelled data centres merely keeps CO₂ emissions “in stasis” — it does not reduce them. Each new fossil megawatt installed, Masanet says, “sets us back on our progress.”
Better Siting Could Cut Footprints by Over 70%
Moving data centres to locations with more abundant renewable energy and water — such as the US Midwest — combined with efficient hardware and software could reduce carbon footprints by 73% and water footprints by 86%.
Not Every Problem Needs an AI Solution
Beyond making AI more efficient, experts argue we should ask where AI is truly needed. Using AI for tasks like customer service chatbots may not justify the environmental cost, regardless of how efficient the underlying models become.
Master Reading Comprehension
Practice with 365 curated articles and 2,400+ questions across 9 RC types.
Article Analysis
Breaking Down the Elements
Main Idea
AI’s Energy Problem Is Structural — and Solvable
Zimmer’s core argument is that AI’s enormous energy consumption is not an unavoidable feature of the technology but a consequence of specific architectural and infrastructure choices that can be redesigned. From software algorithms to chip design to site selection, multiple paths exist to significantly reduce AI’s environmental footprint — but they require active investment, policy support, and a willingness to ask whether AI is actually needed for a given task in the first place.
Purpose
To Inform and Galvanise Action Before the Window Closes
Zimmer writes for Knowable Magazine — a publication that translates peer-reviewed science for general audiences. Her purpose is both explanatory (helping readers understand why AI is so energy-hungry) and prescriptive (surveying solutions with enough specificity to move the conversation beyond vague concerns). The article is addressed as much to policymakers and industry as to curious readers, using expert voices to communicate urgency without alarmism.
Structure
Personal Hook → Scale of Problem → Root Causes → Software Solutions → Hardware Solutions → Siting & Policy → Broader Reflection
Zimmer opens with an intimate, first-person moment (coffee in Berlin, a question to Gemini) that immediately grounds the abstract problem in lived experience. She then escalates through scale data, causal explanation, and a layered set of solutions — software, then hardware, then location and governance — before stepping back to question whether efficiency alone is the answer. The structure moves from personal to planetary, and from diagnosis to prescription.
Tone
Informative, Concerned & Constructively Hopeful
Zimmer writes with the measured concern of a science journalist who has reported closely on the energy transition. She does not catastrophise — she balances alarming figures with concrete progress — but she does not minimise the stakes either. Fengqi You’s closing assertion that “we could really reshape the trajectory” captures her overall tone: urgent but not despairing, focused on solutions rather than blame.
Key Terms
Vocabulary from the Article
Click each card to reveal the definition
Build your vocabulary systematically
Each article in our course includes 8-12 vocabulary words with contextual usage.
Tough Words
Challenging Vocabulary
Tap each card to flip and see the definition
Growing in proportion to the square of a quantity — so doubling one variable causes the dependent variable to quadruple, rather than merely double.
“The number of computations the model performs… increases quadratically relative to the length of text (i.e., doubling the length of text quadruples the number of computations).”
A state of no change or progress; equilibrium — here used to describe a situation where carbon emissions are held constant rather than actually reduced.
“This strategy — at best — keeps CO₂ emissions of centers in stasis rather than reducing them to a net of nothing.”
Resembling a labyrinth — extremely complex, intricate, and difficult to navigate; here used to evoke the enormous, maze-like scale of data centre server halls.
“Somewhere inside the data center’s labyrinthine halls of stacked processors, my query gets converted into numbers…”
Spreading widely through an area or group; present throughout in a thorough and widespread manner — used here to describe the hoped-for mainstream adoption of optical chips across data centres.
“Joshi hopes that, ‘in 10 years, we would have a practical solution that can be deployed pervasively across the data centers’.”
Sparing in the use of resources; economical in a way that avoids waste — applied to AI here to describe the goal of building systems that accomplish tasks using as little energy as possible.
“AI’s energy cost will ultimately be a balancing act… though building a more frugal, energy-saving AI is important…”
Having an extremely eager and seemingly insatiable appetite, whether for food or — metaphorically — for resources like energy or data.
“How to tame AI’s voracious appetite for energy” — the article’s title uses this word to convey the scale and urgency of AI’s resource consumption.
Reading Comprehension
Test Your Understanding
5 questions covering different RC question types
1According to the article, GPUs were originally invented specifically for AI computations and were later adapted for video gaming.
2According to the article, why does the xLSTM model use less energy than transformer-based models when generating long responses?
3Which sentence best explains why tech companies’ strategy of buying renewable energy credits elsewhere does not adequately address the environmental cost of their data centres?
4Evaluate whether each of the following statements is supported by the article.
According to IEA estimates, US data centres consumed approximately 224 terawatt-hours of electricity in 2025, representing more than 5% of the country’s total electricity use.
The article states that LSTM models are now considered superior to transformer models and are being widely adopted by major tech companies to replace them.
Wafer-scale chips are described in the article as consuming 143 times less electricity for communication than comparable GPUs, but carrying a greater risk of damage during manufacturing.
Select True or False for all three statements, then click “Check Answers”
5The article notes that some state and local governments are introducing policies that “mostly aim to incentivize and accelerate data center builds.” What concern does this detail implicitly raise about the overall policy landscape?
FAQ
Frequently Asked Questions
Training an AI model is a one-time event — expensive, but finite. Inference, by contrast, happens continuously, at enormous scale, every time any of the model’s hundreds of millions or billions of users asks it a question. As the article explains: “You train once, then you inference for a billion people in the world.” With ChatGPT alone receiving billions of queries every week, even individually small per-query energy costs accumulate into an enormous and growing total. Training GPT-4 may have consumed 50–60 gigawatt-hours, but the ongoing inference load dwarfs that figure many times over.
Transformer models process language by weighing every word against every other word in the text — a computationally powerful approach, but one whose energy cost scales quadratically with text length. This means that doubling the amount of text doesn’t double the computation — it quadruples it. For short prompts this is manageable, but as responses grow longer, the energy cost explodes. The xLSTM model sidesteps this by maintaining a compressed summary rather than re-processing the full growing text each time it generates a new word, keeping energy costs roughly flat regardless of length.
A “mixture of expert” model is a large AI system that is internally divided into specialised sub-models — each expert at handling a different type of task or language pattern. Rather than activating the entire model for every query, the system routes each request to whichever sub-section is most relevant, leaving the rest dormant. This means far fewer parameters are activated per query, significantly reducing computation and energy use compared to running the full model each time. Google’s Gemini and OpenAI’s ChatGPT are described in the article as increasingly using this approach.
Readlite provides curated articles with comprehensive analysis including summaries, key points, vocabulary building, and practice questions across 9 different RC question types. Our Ultimate Reading Course offers 365 articles with 2,400+ questions to systematically improve your reading comprehension skills.
This article is rated Intermediate. Katarina Zimmer writes for a scientifically literate but non-specialist audience, explaining technical concepts (transformers, GPUs, neuromorphic computing) clearly without requiring prior knowledge. However, the article introduces a large number of distinct technologies across multiple sections, deploys precise quantitative comparisons throughout, and requires readers to track and distinguish between software-level, hardware-level, and infrastructure-level solutions. Students preparing for CAT or GMAT will find it excellent practice for the kind of technology-and-environment passages that regularly appear in those exams.
Katarina Zimmer is a Berlin-based science and environment journalist whose work appears in National Geographic, Scientific American, BBC Future, and Knowable Magazine. She specialises in the energy transition and planetary health. Knowable Magazine is published by Annual Reviews, a non-profit scientific publisher, and is dedicated to making peer-reviewed research accessible to general audiences. It commissions long-form science journalism grounded in original academic sources — making it one of the most reliable outlets for technically rigorous popular science writing.
The Ultimate Reading Course covers 9 RC question types: Multiple Choice, True/False, Multi-Statement T/F, Text Highlight, Fill in the Blanks, Matching, Sequencing, Error Spotting, and Short Answer. This comprehensive coverage prepares you for any reading comprehension format you might encounter.