Efficiency vs. Exponential Demand: Why the Natural Limit of AI Is Power

Back Research Notes Efficiency vs. Exponential Demand: Why the Natural Limit of AI Is Power Published on September 3, 2025 By Jordi Visser “AI is a learning machine. And in network-effect businesses, when the learning machine learns faster, everything accelerates. It accelerates to its natural limit. The natural limit is electricity. Not chips, electricity, really.”— Eric Schmidt Let’s start this paper with a Mike Tyson uppercut. Efficiency is the most overused word in the AI debate. Ok, I said it. For those fading AI, efficiency is hope. For those riding AI, efficiency is fear. Efficiency cannot be thought of in a vacuum. Embedded in the word is a hidden assumption about the pace of demand. Which brings us back to this quote: “Jevons paradox strikes again! As AI gets more efficient and accessible, we will see its use skyrocket, turning it into a commodity we just can’t get enough of.” That was Satya Nadella on X after DeepSeek. He was 100% correct then, but people keep coming back to efficiency as if it were a magic bullet solution just days away. Efficiency gains are like buying bigger clothes for a baby, you feel caught up for a moment, but the baby keeps growing so fast you’re instantly behind again. In AI, efficiency is the hope, but exponential demand stats are the facts, and the data already shows that growth in reasoning, tokens, and power needs is outpacing every gain. It is a certainty that there will absolutely be continuous efficiency improvements along the way, but they won’t change the bigger truth: the massive wave of AI demand acceleration is still in front of us, and power is now the real story. On a recent Moonshots with Peter Diamandis podcast, the panel discussed in detail the release of ChatGPT-5 and its importance. This podcast came out just before the latest “AI bubble” warnings and fears hit the market. What made this release different was that GPT-5 introduced a router that automatically directs prompts to the right model in the background. Users no longer have to choose; reasoning is simply built in. That design choice turned reasoning from an advanced option into the default experience. On the podcast, they said the release of GPT-5 marked a turning point in the democratization of reasoning, with “700 million people, many of them free-tier users, now using reasoning for the first time.” Unlike DeepSeek, which gave the world an open-source reasoning model but was mostly adopted by those already advanced in AI, GPT-5 removes that friction entirely. Now, everyday users “who don’t know which model to use” are benefiting from the same powerful tools. And reasoning is not a lightweight feature. Unlike simple Q&A, it triggers multi-step chains of thought, extended context use, and higher token consumption. Each reasoning call is orders of magnitude more compute-intensive than a casual prompt. When democratized across hundreds of millions of users, that translates directly into exponential growth in tokens and power consumed. As one host put it, this shift “starts to recurse back through the system and feed more real capital into the system to build more data centers and empower more people.” Revolutionary capability equals mass demand, and that demand is increasingly constrained not by model availability, but by the electricity required to sustain it. Efficiency will matter, but history shows it will always be outrun by adoption. Since the launch of GPT-5, Sam Altman has publicly confirmed a dramatic surge in reasoning usage, noting that it jumped from <1% to 7% for free users, and from 7% to 24% for Plus users. OpenAI responded by increasing rate limits for reasoning calls and promising that “all model-class limits will shortly be higher than they were before GPT-5.” Altman framed this as necessary to keep up with demand, saying, “I expect use of reasoning to greatly increase over time, so rate limit increases are important.” He also acknowledged a serious capacity crunch We have better models, and we just can’t offer them because we don’t have the capacity” and forecast “trillions of dollars on data center construction in the not-very-distant future.” Efficiency gains in infrastructure will arrive, but Altman’s warning underscores the reality: the growth in usage is exponential, and infrastructure lags behind. Adding to the pressure, GPT-5’s enhanced reasoning capability carries a heavy energy toll. A study by the University of Rhode Island’s AI lab found that a 1,000-token GPT-5 response consumes on average 18.35 Wh, compared to just 2.12 Wh for GPT-4—an increase of approximately 8.6× in energy per prompt. At ChatGPT-scale usage of around 2.5 billion requests per day, this translates to a staggering 45 GWh per day, equivalent to powering two to three nuclear reactors or about 1.5 million U.S. households. What seems like a modest per-prompt increase compounds massively when adopted globally. Eric Schmidt’s insight is especially prescient here: “Every doubling in capability triggers a compounding need for compute and power infrastructures.” Scaling GPUs is tough, but scaling power grids, generation, transmission is a far longer-lead challenge. Efficiency improvements will come, but history shows they’re invariably outpaced by adoption: smarter models drive heavier use, and heavier use accelerates the demand curve further. Add in that this race to tool launches is funded through cash flow and supported by the government. Historically, this was funded by debt or revenues. The scale challenge becomes even more vivid when viewed through the lens of token throughput. On a Q1 earnings call, Microsoft CEO Satya Nadella revealed that the company processed “over 100 trillion tokens this quarter, up 5× year-over-year, including a record 50 trillion tokens last month alone,” underscoring that AI workloads are quickly becoming an everyday utility. At Google I/O, Sundar Pichai announced that monthly token processing had jumped from 9.7 trillion in April 2024 to 480 trillion in April 2025, a 50× increase, and that the number doubled again by their Q2 earnings call to nearly one quadrillion per month. Demis Hassabis later confirmed: “We processed almost 1,000,000,000,000,000 tokens last month, more than double the amount from May.” Tokens are not just abstract units of computation; every token generated consumes real compute cycles and therefore electricity. When growth is measured in the hundreds of trillions, the translation into energy demand is immediate and immense. Efficiency may lower the cost per token, but with throughput scaling this quickly, total compute and power needs are accelerating even faster than infrastructure can keep up. The exponential demand doesn’t end here with GPT-5. There will be new model releases from Google and xAI in the coming weeks which will continue pushing us closer to AI agents. Once agents are unleashed, they will increase energy demand significantly. These “insatiable digital shift-workers” never log off, never sleep, and continuously generate demand around the clock need to be fed and they eat power. Agents are where demand really escalates. AI agents, systems capable of planning, recalling, acting, and reasoning autonomously, can generate up to 25× more tokens per session through multi-step workflows, tool use, and API chaining. Adoption is fast: 79% of organizations are piloting or deploying agents, with 19% at scale. Their business impact is already tangible as 66% of adopters report productivity gains, and 88% plan to increase budgets to support agentic workflows. Yet agent workloads are among the least sustainable: dynamic, multi-turn reasoning consumes escalating infrastructure resources with diminishing efficiency returns. Here the efficiency paradox is stark: even if each agent call is made more efficient, the fact that agents run longer chains of calls means aggregate demand keeps accelerating. Efficiency buys time, but agents consume it. Efficiency isn’t the answer alone. The question isn’t w