Nvidia released ProRL Agent in late March 2026, an open-source infrastructure that decouples the reinforcement learning training loop from agent rollout execution. The system, integrated into Nvidia’s NeMo Gym framework, exposes rollout as a standalone HTTP API service, letting teams scale each workload independently rather than forcing them into one tightly coupled process.

The core architectural change: rollout (running an agent through thousands of real-world task sequences) and training (updating model weights based on those results) are fundamentally different workloads. Rollout is I/O-intensive. Training is GPU-intensive. Existing frameworks like SkyRL, VeRL-Tool, Agent Lightning, rLLM, and GEM embed rollout control directly inside the training process, according to the arXiv paper. ProRL Agent treats them as separate services that communicate through a unified API.

How the Architecture Works

ProRL Agent runs rollout through a three-stage asynchronous pipeline: initialize (spin up sandbox containers and configure tools), run (drive the multi-turn agent loop and collect trajectories), and evaluate (score results against ground truth for reward signals). Each stage runs on independent worker pools, so a slow evaluation step doesn’t stall the rollout pipeline, Marktechpost reported.

Several engineering details address friction points that matter at scale. The system uses token-in/token-out communication throughout the entire pipeline, which prevents re-tokenization drift between rollout and training. It uses Singularity-based rootless sandboxing, enabling deployment on shared HPC clusters where Docker-style root containers aren’t an option. Shell command latency dropped from 0.78 seconds to 0.42 seconds using direct pseudo-terminal backends instead of tmux-based multiplexing, according to the technical paper.

Benchmark Results

Nvidia validated ProRL Agent on SWE-Bench Verified, which tests an agent’s ability to resolve real-world software engineering tasks. A Qwen3-8B model trained with ProRL Agent went from 9.6% to 18.0%. A 14B model moved from 15.4% to 23.6%. These gains came entirely from infrastructure improvements, not from scaling up model parameters or changing the training dataset, according to both the arXiv paper and Shashi Bellamkonda’s analysis.

The paper’s more significant claim is near-linear throughput scaling across compute nodes. Adding more nodes typically erodes gains through coordination overhead. The ProRL Agent team claims their decoupled architecture avoids this, suggesting the rollout bottleneck was a genuine constraint, not just architectural preference, per the arXiv paper.

Where It Fits in Nvidia’s Agentic Stack

ProRL Agent joins a growing set of open-source tools from Nvidia covering the full agent development lifecycle. It sits alongside the NemoClaw security agent stack announced at GTC 2026 and the broader NeMo framework for LLM training and post-training work. The release positions Nvidia as building coherent infrastructure from post-training to deployment to the RL loops that improve agents after they’re running in production, according to Bellamkonda’s analysis.

For teams building or evaluating AI agents that improve through reinforcement learning, the question shifts from “can RL improve agent behavior” to “can you run enough rollout trajectories efficiently to make the training economics work.” ProRL Agent is Nvidia’s open-source answer to the second question.