NVIDIA announced at GTC 2026 two hardware products that reframe the compute stack for AI agents: the Groq 3 Language Processing Unit for inference and the Vera CPU for orchestration and real-world task execution.
Groq 3 LPU: Inference Acceleration
The Groq 3 LPU is built on intellectual property licensed from AI chip startup Groq in a non-exclusive $20 billion deal struck last December. The new generation is optimized strictly for low-latency token generation — the bottleneck in real-time AI agent deployment.
Groq’s architectural advantage: the chip interleaves processing units with SRAM memory embedded on-die, eliminating the high-latency memory fetches that plague GPU-based inference. Data flows linearly through the chip at full speed.
Performance Claims
NVIDIA claims that a Vera Rubin NVL72 combined with a Groq 3 LPX can boost inference throughput by 35x for trillion-parameter models compared to the prior-generation Blackwell NVL72. The performance gain translates to $45 generated by AI model providers for every million tokens served.
Vera CPU: The Agentic Layer
The Vera CPU is NVIDIA’s new entry into general-purpose computing — and the first explicit signal that CPUs matter for agentic AI.
The Vera CPU rack packs 256 custom CPUs optimized for orchestration, data processing, and real-world task execution. NVIDIA positioned it as the layer for tasks that agents spend most of their time doing: web scraping, API calls, file I/O, conditional branching, and orchestration logic.
As NVIDIA VP Ian Buck said in briefings, the Vera CPU enables agents to “power every phase of AI, from massive scale pre-training to post-training, test-time scaling and real-time agentic inference.”
Strategic Implication
The CPU play is a direct shot at Intel and AMD’s data center dominance — but reframed through the lens of AI agents. Instead of asking “which CPU is fastest,” NVIDIA is asking “which CPU is fastest for agents that don’t spend all their time running matrix math?”
This exposes a fundamental architectural difference. Traditional data center chips optimize for throughput on dense compute. Vera CPUs optimize for latency and responsiveness on sparse, orchestration-heavy workloads. An idle GPU waiting for an HTTP response or a database query is wasted silicon. A CPU handling that wait-time efficiently is valuable.
The Full Stack
Each layer is optimized for a specific AI workload. Together, according to NVIDIA, they operate “as one incredible AI supercomputer.” The Vera Rubin + Groq 3 LPX combination achieves the 35x inference gain.
Why It Matters
The Vera CPU reveal signals that NVIDIA sees agentic AI as fundamentally different from prior AI workloads. Scaling transformers for training required massive parallelism and dense matrix multiply. Serving them for inference required latency optimization. Running agents requires both, plus orchestration, branching, and real-world task execution — which look less like math and more like systems programming.
By explicitly building a CPU layer designed for agents, NVIDIA is betting that agentic AI will become the dominant AI workload — and that enterprises will pay for a full-stack solution optimized end-to-end for agent execution.
The 35x inference gain is real. But the strategic move is deeper: NVIDIA is no longer selling chips. It’s selling a complete system for agentic AI, with the Vera CPU as the proof that agents require a different hardware architecture than prior AI workloads.