The Whole Rack Agentic AI Expands the Infrastructure Opportunity From GPUs to the Full Stack

Back Research Notes The Whole Rack Agentic AI Expands the Infrastructure Opportunity From GPUs to the Full Stack Published on March 16, 2026 ∙ Download the PDF Report By Jordi Visser The Core Shift For most of the AI boom, investors have viewed the infrastructure story through a narrow lens: AI equals GPUs. That framing made sense in the first phase. Training large language models and powering chatbot-style inference created an obvious bottleneck in accelerated compute, and the market’s attention naturally centered on the accelerator itself. But the transition now underway is fundamentally different. AI is moving from answering questions to performing work, and that shift changes the entire infrastructure equation. Agentic AI expands the winners from “GPU vendors” to “the whole rack.” A chatbot consumes tokens to generate a response. An agent does something more powerful: it converts tokens into actions. Those actions include searching internal data, calling APIs, pulling documents, querying databases, verifying outputs, routing tasks, and handing work to other systems or other agents. Each action multiplies infrastructure demand across CPUs, memory, storage, and networks. The opportunity set is no longer confined to the chip that powers inference. It now extends across the full infrastructure stack that allows AI to operate as digital labor inside the enterprise. Why CPUs Are at the Center of This In the chatbot phase, the model was the focal point. In the agentic phase, the system around the model becomes equally important. If AI is going to act inside a company, not merely respond to a prompt, it needs orchestration: scheduling, state management, file access, data preparation, API coordination, security controls, monitoring, and workflow logic. Most of that runs on CPUs. David Zinsner, Chief Financial Officer of Intel, captured the shift in the simplest possible terms: “The CPU has become cool again.” He said the pickup in demand began in the second half of 2025 and that as the market moves into agentic AI, “the orchestration away from just running the LLM has to run on CPUs.” That is the key distinction. In the chatbot era, the model was the story. In the agentic era, the orchestration layer becomes part of the story too. He also noted that customers are increasingly seeking long-term agreements on a three-to-five-year basis. When semiconductor buyers try to lock in supply over multiple years rather than placing spot orders, it signals they believe the underlying workload is structural, not cyclical. Lisa Su, Chair and Chief Executive Officer of AMD, reinforced the shift from another angle: “Frankly we see tremendous demand for traditional compute as well. If you look at the CPU cycle, we always believed the computing stack is heterogenous and you’re always going to need CPUs and GPUs. … that’s really coming to fruition here in 2026.” That framing matters because it pushes back against the simplistic idea that AI displaces traditional compute. What is actually happening is more powerful: the total system is becoming denser, more active, and more balanced. She also highlighted a critical development: “You’re now seeing the growth of inference exceed training.” That matters enormously, because training is episodic and concentrated, while inference, especially agentic inference, is distributed, ongoing, and tied directly to daily enterprise activity. As she put it, “We’re seeing significant CPU demand, frankly, as a result of inference demand picking up.” Her most useful framing may also have been her simplest: “If a company has 10,000 people and they add another 10,000 agents on top of that, they’re going to need a lot more compute to satisfy what all those agents are doing.” That is the right mental model. The question is no longer how many humans interact with a model. It is how much digital labor gets deployed inside the enterprise. The Token Multiplication Effect Jensen Huang explained the inflection in equally clear terms, pointing to “the ability for AI to use files, access files, and use tools.” Whether one agrees with the scale of his rhetoric or not, the logic is straightforward: once AI can use files, access systems, and invoke tools, it becomes something fundamentally different from a chatbot. He also quantified the impact. “We went from one generative prompt, one generative response to now one that is 1,000 times more tokens.” At scale, he said, these agentic systems are consuming “1 million times more tokens” and “running continuously in the background.” That is the bridge between the GPU story and the CPU story. Agents consume far more tokens, which is very bullish for inference accelerators. But those tokens are now attached to ongoing actions, and those actions require the rest of the stack. The infrastructure need widens from the model to the machine room. Enterprise Demand Is Already Confirming This The early evidence is now showing up in enterprise behavior, not just chip-vendor commentary. Jeff Clarke, Vice Chairman and Chief Operating Officer of Dell Technologies, highlighted this dynamic during Dell’s recent earnings discussion with investors, noting that demand for traditional servers significantly outpaced supply in the fourth quarter, with strong double-digit growth across every region. At the same time, a majority of the installed base remains on older server generations. In other words, the industry is entering the agentic era with a massive stock of outdated infrastructure still in production. He connected the refresh directly to AI: “Traditional x86 is benefiting from AI infrastructure buildouts. While many AI workloads rely on specialized GPUs, traditional compute remains essential for orchestration, data processing and inference support.” That is exactly the pattern investors should be watching. Agentic AI does not narrow infrastructure demand. It broadens it. He also described an internal progression that captures the point well. First came coding assistants using some GPU capacity. Then came agents writing software from developer specifications. The result, in his words: “We saw an incredible need for more compute power. The amount of tokens that is required to do that well is significant. That’s just one-use case in one company.” That is how these shifts begin. One workflow becomes ten. Ten becomes a department. A department becomes an enterprise pattern. Then the baseline demand for compute is simply higher. The Investment Framework: Five Lanes of Opportunity Once AI moves from chatbot responses into production workflows, the infrastructure requirement broadens from a single bottleneck to the full system. GPUs still matter because they power model inference, but agents do much more than generate text. They access files, query systems, store state, route actions, verify outputs, and operate continuously. The result is five distinct lanes of investment opportunity: CPU Vendors. Direct exposure to orchestration and inference-support demand. As agentic workloads scale, CPUs benefit from scheduling, state management, data preparation, security, and workflow coordination that sit around the model. Server OEMs. The physical layer of the refresh cycle. With much of the installed base still running on older architectures, the ROI case for consolidation and modernization is compelling. Agentic workloads make the refresh more urgent, not less. Storage. Agentic systems need fast access to files, logs, vector databases, and persistent context. Unlike chatbots that generate ephemeral responses, agents operate continuously on and with enterprise data stores. Retrieval, context expansion, and persistent memory all raise storage I/O and capacity requirements. Networking. Agents increase east-west traffic inside the data center as they move between models, files, tools, APIs, and other agents. The communication pattern shifts from simple request-response to continuous, multi-hop data movement, benefiting switching, interconnects,