NVIDIA released Nemotron 3 Ultra on June 4, a 550-billion-parameter mixture-of-experts open model designed for autonomous AI agents that run for hours or days at a time. The model became available through Hugging Face, ModelScope, OpenRouter, and build.nvidia.com as NIM microservices, per the NVIDIA Newsroom.

The release, announced at GTC Taipei alongside COMPUTEX, is NVIDIA’s clearest move yet from GPU supplier to enterprise AI software platform. Nemotron 3 Ultra claims 5x faster inference and up to 30% lower cost for agentic tasks compared to other open frontier models in its class, according to NVIDIA’s technical blog.

Architecture and Performance

Nemotron 3 Ultra uses a hybrid Mamba-Transformer architecture with 55 billion active parameters out of 550 billion total. The Mamba layers handle long-context efficiency, while Transformer layers preserve precise retrieval from large context windows, per the technical blog.

Three additional innovations drive the cost and speed claims. NVFP4 quantization runs a single checkpoint across Hopper, Blackwell, and Ampere GPUs with up to 5x higher throughput per GPU. LatentMoE enables more efficient expert routing across reasoning, code generation, and tool calling. Multi-token prediction reduces generation time by predicting several tokens in a single forward pass.

The training approach, called Multi-Teacher On-Policy Distillation (MOPD), uses more than 10 specialized teacher models covering coding, search, office work, and tool calling. The student model generates outputs, specialized teachers score them, and the process iterates. NVIDIA released the MOPD recipes through its NeMo-RL open library.

On benchmarks, Nemotron 3 Ultra scored 91% on PinchBench (agent productivity), 82% on IFBench (instruction following), and 95% on the Ruler @1M (long context) test, according to the technical blog. On SWE-Bench Verified, it achieved 65% to 70.4% across Pi, OpenHands, Hermes, OpenCode, and Mini SWE Agent.

Post-Trained for Agent Harnesses

Unlike models released for general chat, Nemotron 3 Ultra is post-trained specifically for agent orchestration frameworks: Hermes Agent, LangChain Deep Agents, OpenClaw, OpenHands, and OpenCode. This means the model was fine-tuned on multi-turn agent trajectories extracted from these harnesses, not just single-turn instruction data, per NVIDIA.

The distinction matters for production agent deployments. Models optimized for chat often degrade when agents loop through planning, tool calls, observation parsing, sub-agent delegation, and error recovery across dozens of turns. NVIDIA’s approach trains the model on those exact patterns.

Enterprise platform Aible confirmed the production relevance on June 4. In a joint hackathon with NVIDIA’s NemoClaw team, AibleClaw ran Nemotron 3 Ultra against a competing reasoning model using identical OpenClaw configurations inside NVIDIA OpenShell. Nemotron 3 Ultra completed the task first, followed all instructions on the first try, posted richer Slack reports, and saved the executed plan as a deterministic workflow for reuse. The comparison model missed a user instruction and failed its first Slack call.

The Broader Agent Toolkit

Nemotron 3 Ultra is one piece of a larger software stack NVIDIA is building around enterprise agents. The GTC Taipei announcements included NemoClaw blueprints (pre-built agent configurations), OpenShell (a secure runtime with policy and privacy controls), and CUDA-X libraries now accessible to agents as domain-specific skills covering data processing (cuDF), optimization (cuOpt), retrieval (AI-Q), model customization (NeMo), physics simulation (PhysicsNeMo), and quantum computing (CUDA-Q), per the NVIDIA Newsroom.

Jensen Huang framed the strategy explicitly: “NVIDIA NemoClaw provides enterprise software developers with the open building blocks to create more secure, long-running AI coworkers that amplify human expertise as they reshape how work gets done,” he said in the announcement.

Enterprise Adoption

The launch partners signal where NVIDIA sees agent demand. In semiconductor and industrial engineering, Cadence is using OpenShell to secure its ChipStack AI Super Agent for autonomous chip design verification. NVIDIA is ChipStack’s first customer. Dassault Systèmes is building long-running agents for design, simulation, and manufacturing on the 3DEXPERIENCE platform. Siemens is integrating NemoClaw into its Fuse EDA AI Agent for semiconductor and PCB design. Synopsys is building autonomous AI engineers for chip design workflows.

On the security and operations side, CrowdStrike and Palantir are deploying Nemotron models for cybersecurity analysis and operational decision-making across complex environments, per NVIDIA.

As Startup Fortune noted, the pattern is familiar: NVIDIA builds the surrounding stack, then gives developers reasons to stay inside it. “CUDA did that for accelerated computing. The question now is whether Nemotron, NIM and the agent tooling can do something similar for enterprise AI software.”

The Lock-in Calculus

The competitive dynamic is straightforward. If an enterprise already runs NVIDIA GPUs, deploys through NIM, uses NemoClaw blueprints, and chooses Nemotron models, NVIDIA captures a software relationship that extends well beyond hardware sales. Meta’s Llama and Mistral offer vendor-neutral alternatives, and NVIDIA’s performance claims will face real-world testing as developers benchmark Ultra against Qwen, Llama, and other open models on production workloads.

NVIDIA also released 10 million new SFT samples, 1 million new RL tasks, and 15 new RL environments as open data alongside the model, bringing cumulative Nemotron open data totals to 50 million SFT samples and 55 RL environments. The transparency play is deliberate: sovereign AI and enterprise customers increasingly demand training data provenance alongside model weights.