Back Research Notes From the Playbook to the Play: Why NPUs Are AI’s Championship Moment Published on October 7, 2025 By Jordi Visser Executive Summary I have published many papers with 22V Research since our relationship began. With all the fears about a bubble weighing on investors, I wanted to provide research on the next phase of AI, when it officially enters the ROI phase. The AI investment narrative stands at a critical juncture. With over $1 trillion deployed into data center infrastructure and mounting concerns about utilization and returns, it makes sense for investors to question whether we’re in a bubble. What makes AI different from historical bubbles is both the speed and the dependency: the products require the buildout. There can be no revenues without the massive data center infrastructure. The data centers are the brain, or in football terms, the playbook for the quarterback. But the revenues for AI, like wins for a football team, come from the quarterback’s real-time decisions on the field. This is where Neural Processing Units (NPUs) become critical. NPUs are the edge computing architecture that transforms cloud AI from expensive, centralized intelligence into distributed, real-time execution. Without NPUs, the massive GPU buildout risks becoming stranded infrastructure, brilliant models with no economical way to serve billions of users. With NPUs, the economics invert completely. By moving inference to the edge, into phones, PCs, cars, AI agents, and humanoid robots, NPUs collapse latency to milliseconds, slash serving costs by 90%, and generate the continuous utilization that makes trillion-dollar cloud investments profitable. Together, this shift marks the handoff from the first stage of AI, the cloud-driven training era that built the foundation of intelligence, to the edge era, where that hard-won knowledge is deployed everywhere, decisions happen instantly, and the trillion-dollar AI buildout finally earns its return. This isn’t about choosing between cloud and edge; it’s about understanding that the cloud buildout was necessary preparation for the NPU era. Just as a quarterback needs thousands of hours in the film room before executing under pressure, AI needed the data center phase to train the models that NPUs now deploy at scale. The companies that control this edge intelligence layer, whether through silicon, software, or applications are positioning themselves to capture the compounding returns that justify the entire AI infrastructure cycle. This is the inflection point where AI capex converts to ROI. The Catch That Changed Everything January 10, 1982. NFC Championship. Six seconds left. Joe Montana scrambles right, three Cowboys bearing down, the pocket collapsing. The called play, Sprint Right Option, is dead. Montana keeps drifting, buying time, eyes scanning. Then instinct takes over. He lofts the ball high toward the back corner, where Dwight Clark rises and makes The Catch . This wasn’t improvisation. It was preparation become reflex. Montana had absorbed thousands of hours of film study so completely that when chaos arrived, he didn’t freeze or look to the sideline, he acted . The playbook was no longer something he consulted; it was encoded in his nervous system. This is exactly what’s happening in AI right now. For years, artificial intelligence has lived in the cloud, brilliant but distant, like coaches in a booth calling plays through headsets. Large language models trained in massive data centers can tell you what to do, but they can’t react when the defense changes. Every decision requires a round trip: device to cloud, cloud back to device. That works in practice, but in the real economy where milliseconds determine outcomes and margins define winners, it breaks down. Enter the Neural Processing Unit (NPU): AI’s on-field quarterback. If GPUs built the playbook in the data center, NPUs bring it to life at the point of action. They’re not just faster chips, they’re the architecture that transforms AI from centralized intelligence into distributed instinct. For investors, this shift represents the inflection point where trillion-dollar AI infrastructure spending converts into actual returns. Why the Cloud Alone Can’t Win the Game The first act of AI was about scale. Data centers became vast training facilities, absorbing trillions of data points and producing the most sophisticated models ever created. This was necessary, you can’t execute without a playbook. But building the playbook isn’t the same as winning games. The cloud model has three critical constraints: 1. Latency Kills Adoption Every cloud query is like the quarterback running to the sideline for approval. Even at 200-300 milliseconds round-trip, it feels slow. AI that hesitates doesn’t feel magical, it feels mechanical. The difference between 400ms and 40ms response time isn’t incremental; it’s the threshold between novelty and necessity. 2. Energy Economics Don’t Scale Running inference in the cloud is expensive. Training a model costs millions, but serving it to billions of users costs billions . Every query burns electricity, bandwidth, and dollars. For enterprises, AI remains a margin drag. For hyperscalers, it’s a utilization problem—trillion-dollar GPU buildouts that can’t possibly be saturated by cloud workloads alone. 3. The Edge Is Where Value Lives Data is created at the edge, in phones, cars, factories, hospitals. Sending it to distant servers and back introduces not just latency but privacy risks, network dependencies, and coordination overhead. The cloud is brilliant at training, but terrible at real-time execution in context. The breakthrough comes when intelligence moves to where decisions need to happen. NPUs enable devices to process, reason, and act locally collapsing latency to near-zero, slashing energy costs by orders of magnitude, and making AI feel instantaneous. Just as Montana didn’t need the coach’s approval to make The Catch, NPUs don’t need permission from the cloud to act. What NPUs Actually Do: Instinct by Design An NPU isn’t a smaller GPU. It’s purpose-built silicon for inference , the real-time translation of trained knowledge into action. Think of its architecture as the anatomy of instinct: Compute Arrays: Dense grids of tiny processing units optimized for the matrix math AI requires, like neural pathways firing in parallel. On-Chip Memory: Local storage that holds context, eliminating the need to fetch information from distant servers. This is short-term recall, knowing the coverage before the ball is snapped. Power Management: Dynamic voltage and frequency tuning that makes continuous AI possible without draining batteries. Endurance over four quarters. Specialized Compilers: Software that translates models trained in PyTorch or TensorFlow into optimized instructions the NPU can execute in microseconds. Every component is engineered to shrink the gap between knowing and doing . NPUs don’t just run AI faster, they enable autonomous intelligence . A phone can transcribe speech offline. A PC can summarize documents without internet. A car can recognize obstacles before a network ping completes. A robot can adjust its grip mid-motion. Each of these moments is an audible—a micro-decision made at the line, powered by silicon designed for real-time cognition. The Expansion: From Data Centers to Everywhere For two years, the narrative has been singular: AI equals data center buildout. But that’s only Act One. The next chapter unfolds at the edge, where NPUs transform every device category into an AI platform. Smartphones: Billions of Intelligent Endpoints Apple’s A-series, Qualcomm’s Snapdragon, MediaTek’s Dimensity, every flagship phone now integrates NPUs handling vision, voice, and contextual inference. Each generation multiplies compute capacity (measured in TOPS—trillions of operations per second) while shrinking power draw. The math is staggering: With 1.5 billion smartphones sold annually, every NPU upgrade c