The Agentic AI Stack Is Being Built Right Now — and It's Splitting Into Open and Closed

The Moment Infrastructure Gets Built

When Jensen Huang took the GTC stage on March 16, he unveiled seven new chips for the Vera Rubin platform. Not one. Seven. Each optimized for a different layer of the AI factory: GPUs for training, LPUs for inference, CPUs for orchestration, DPUs for networking, and so on. By the time he finished the keynote, the message was clear: NVIDIA is no longer selling chips. It’s selling a complete system.

On the same day, 7,000 miles away, Mistral released Mistral Small 4 — a 119-billion-parameter open model under the Apache 2.0 license. Buried in the announcement: a new partnership with NVIDIA to co-develop frontier open models as part of NVIDIA’s Nemotron Coalition.

These two events, happening in parallel, define the next era of AI infrastructure. Not convergence. Divergence.

The NVIDIA Closed Stack

The Vera Rubin platform is comprehensive. The Vera NVL72 rack packs Rubin GPUs for training and pre-training. The Groq 3 LPU server handles inference — optimized strictly for low-latency token generation. The new Vera CPU rack adds a general-purpose compute layer for tasks that don’t fit GPU parallelism: web scraping, API calls, file I/O, orchestration logic. The BlueField DPU handles networking. The Spectrum switch provides fabric. All seven chips, according to Ian Buck, VP of hyperscale computing at NVIDIA, are “designed to operate together as one incredible AI supercomputer.”

This is vertical integration. The stack is closed: optimized end-to-end for NVIDIA’s architecture, with every layer tuned for maximum throughput and minimum latency.

The inference performance numbers tell the story. According to NVIDIA, a Vera Rubin NVL72 combined with a Groq 3 LPX can boost throughput by 35x for trillion-parameter models compared to previous-generation Blackwell infrastructure. That’s not a marginal improvement. That’s a generational leap.

The cost? Total lock-in. If you want that 35x performance gain, you run on Vera Rubin. You use NVIDIA’s inference stack. You pay NVIDIA’s pricing. You get predictable, optimized, controlled.

The Mistral-NVIDIA Open Stack

Mistral Small 4, released March 16-17, is the open-source counterplay.

119 billion parameters. 6 billion active per inference pass (Mixture-of-Experts architecture). 256K context window. Configurable reasoning (set reasoning_effort="none" for speed or reasoning_effort="high" for depth). Multimodal (text and image). Unified capability: a single model does reasoning, code generation, and agentic reasoning in one package.

And crucially: Apache 2.0 license. No licensing restrictions. No usage fees. Any organization can deploy this on any hardware.

The NVIDIA partnership is the architecture signal. By announcing Mistral Small 4 availability on NVIDIA Build and the Mistral API simultaneously at GTC, NVIDIA is positioning Mistral as the open-model layer of its platform strategy. NemoClaw (NVIDIA’s enterprise agentic framework) runs Mistral models. Developers and researchers use Mistral models on open hardware. The open stack is: Mistral (models) + any inference hardware (commodity GPUs, custom chips, even CPUs) + open orchestration (Kubernetes, etc.).

The cost? Lower performance per dollar for trillion-parameter models — but portable. Deploy on any cloud. No vendor lock-in. Community-driven iteration.

Why This Matters: The Agentic Workload

Neither stack is wrong. They’re optimized for different workloads.

The NVIDIA stack is built for scale. If you’re serving a trillion-parameter model to millions of concurrent users, Vera Rubin + Groq 3 LPU gives you the throughput and latency that make that economically viable. The 35x performance improvement translates to: running fewer chips, consuming less power, generating more revenue per query. This is Amazon’s, Google’s, and OpenAI’s problem. The closed stack solves it.

The Mistral stack is built for diversity. If you’re running agents — plural, independent, specialized — then you need models that can be deployed flexibly across different infrastructure. A customer service agent, a supply chain orchestrator, a code analysis tool — each one is a separate deployment. Apache 2.0 licensing on a 119B model means each agent can run independently, on the customer’s infrastructure, without per-token fees or licensing audits.

This is why Vera CPUs exist. NVIDIA explicitly framed them as “the CPU layer for agentic AI” — because agents don’t just run models. They spend most of their time doing non-GPU work: browsing websites, pulling spreadsheet data, making API calls, executing multi-step tasks. A GPU sitting idle waiting for an HTTP response is wasted silicon. A Vera CPU optimized for general-purpose compute makes that work fast.

Both companies are building for the same market — agentic AI — but starting from opposite ends of the optimization curve.

The Real Divergence: Lock-In vs Portability

Here’s what matters strategically.

NVIDIA’s closed stack creates a moat. If Vera Rubin becomes the standard for high-scale inference, then competitors (AMD, Intel, startups) have to reverse-engineer or leapfrog the entire architecture. That’s expensive and slow. Meanwhile, NVIDIA controls the entire value chain: hardware, software, optimization, benchmarking. When Huang says a Vera NVL72 is “35x better” than Blackwell, NVIDIA defines what “better” means. It’s a vertically integrated marketing machine.

The Mistral-open-stack creates optionality. If you run Mistral Small 4 on commodity GPU clusters or future custom silicon, you’re not locked into any single vendor. Inference performance will be lower per unit cost — but you avoid single-vendor risk. You can switch from NVIDIA to AMD to your own custom chips without retraining or rewriting. That’s valuable if you’re a large enterprise worried about vendor power.

The problem: they’re not compatible. A Vera Rubin-optimized inference pipeline won’t run on Mistral-commodity infrastructure without significant refactoring. A Mistral deployment tuned for portability will leave performance on the table compared to Vera Rubin. The stack you choose commits you.

The Market Implication: Platform Wars, Not Product Wars

This is no longer a competition between ChatGPT and Claude. It’s a competition between stacks.

NVIDIA’s move is playing the infrastructure-as-platform game: control the entire chain, optimize it end-to-end, lock in customers through performance. This worked for NVIDIA’s GPU monopoly. If it works for Vera Rubin, the entire AI industry becomes NVIDIA’s tenant.

Mistral-and-partners are playing the open-source-as-deflection game: if the software is free and the models are open, no single vendor controls the stack. You lose the 35x performance gain, but you keep strategic independence. This is how Linux beat proprietary Unix.

The outcome depends on a single question: Do AI agent workloads value performance over flexibility?

If they value performance — if the 35x inference gain justifies lock-in — NVIDIA wins. Vera Rubin becomes the standard. Mistral models run on Vera Rubin infrastructure, and NVIDIA collects margin on every inference.

If they value flexibility — if enterprises prefer to avoid single-vendor risk, even at the cost of performance — the open stack wins. Mistral models proliferate across heterogeneous infrastructure, and NVIDIA becomes one vendor among many.

Huang’s bet: performance wins. He’s banking on the same thing that won the GPU war — absolute dominance in the metric that matters (throughput, latency, cost-per-inference) makes everything else secondary.

Mistral’s bet: flexibility wins in the long run, because lock-in creates its own cost. When you can’t migrate, you can’t negotiate. When you can’t negotiate, you get squeezed.

Both companies are building as if they’re right. And both, at GTC 2026, just announced the infrastructure that will prove which one is.

The Stakes

The split into open and closed stacks means the industry is no longer optimizing for a single architecture. It’s optimizing for two separate, incompatible worlds. Models will diverge. Infrastructure will diverge. Pricing models will diverge.

For enterprises building agents, this matters now. The stack you choose — closed or open, performance or flexibility — determines what you can do, what you’ll pay, and whether you can change your mind later.

NVIDIA just proved it has the engineering, capital, and market power to execute the closed stack at scale. Mistral just proved that open-source can assemble a complete, production-ready alternative.

The next 12 months will show whether the industry chooses performance or freedom.