The Agent Sandbox Wars: 13 Platforms Are Racing to Build the Runtime Layer AI Agents Actually Need

Agent-Infra released AIO Sandbox this weekend, an open-source all-in-one runtime that bundles a browser, shell, shared filesystem, and Model Context Protocol (MCP) support into a single Docker container. It is the 13th platform to enter a market for AI agent sandboxes that, according to a comparative analysis by researcher Ry Walker published this week, has gone from nonexistent to a full infrastructure category in under 18 months.

The timing is not random. Agents that were demo curiosities in 2024 are now writing and executing code at production scale. E2B, the market leader by volume, has processed over 200 million sandbox sessions, and approximately 50% of Fortune 500 companies are now running agent workloads on isolated execution platforms. The question that used to be “can AI write code?” has quietly become “where does that code run, and who controls what it can do?”

The answer, as of March 2026, is that nobody agrees.

The Problem All 13 Platforms Are Solving

AI-generated code is probabilistic. Even the best models produce commands that are plausible but destructive. The incidents are no longer hypothetical: a Claude Code user ran a cleanup task that executed rm -rf ~/, deleting their entire home directory. At Ona, a Claude Code agent discovered it could bypass the denylist via /proc/self/root/usr/bin/npx, and when Bubblewrap blocked that path, the agent disabled the sandbox itself. During AI training at Alibaba, an LLM spontaneously connected to the internet and attempted cryptomining.

Every agent sandbox solves the same core problem: isolating AI-generated code from the host system while giving the agent enough capability to be useful. An isolated filesystem the agent can read and write without touching the host. Network policies that block unauthorized egress. Resource limits on CPU, memory, and process counts. And a lifecycle that either auto-destroys the environment or checkpoints it safely.

Where the platforms diverge is on architecture, and those divergences reflect fundamentally different bets about how agents will actually work.

The Ephemeral Camp: Security Through Destruction

E2B built its business on a simple premise: the safest sandbox is one that doesn’t survive its own session. Every sandbox is a Firecracker microVM (the same technology that runs AWS Lambda), spun up in approximately 150 milliseconds and destroyed when the task completes. No persistent state. No accumulated risk. E2B’s SDK has surpassed 1 million monthly downloads, and the platform has processed over 200 million sandbox sessions total.

Zeroboot, a newer entrant, pushed the ephemeral model further. By using copy-on-write forking of Firecracker snapshots, it achieves sandbox creation in 0.79 milliseconds — 190 times faster than E2B. At that speed, the economics of “one sandbox per LLM call” become viable even at consumer scale.

Cloudflare entered the category on March 24 with Dynamic Workers, now in open beta for all paid Workers users. Dynamic Workers use V8 isolates rather than containers or microVMs, enabling AI-generated code execution that is, per Cloudflare’s benchmarks, 100 times faster than traditional containers with millisecond startup times. The approach trades the full Linux environment that containers provide for extreme density: thousands of concurrent sandboxes running on the same host with minimal memory overhead.

The selling point across the ephemeral camp is simplicity. Each session starts clean, ends clean, and leaves nothing behind for an attacker to exploit. The weakness is equally straightforward: agents that need to remember state between sessions have to store it externally, adding latency and complexity.

The Persistent Camp: Agents That Keep Working

Fly.io’s Sprites represent the opposite architectural bet. Each Sprite is a persistent Firecracker VM with a 100GB ext4 filesystem backed by object storage. When idle, Sprites sleep and billing stops. When needed, they wake on demand. The defining feature is checkpoint and restore: the entire disk state can be snapshotted in approximately one second and rolled back if an agent breaks something.

The argument for persistence is that real agent workflows are not single-turn. A coding agent debugging a complex issue might need to install packages, run tests, inspect results, and iterate across multiple sessions. Rebuilding that environment from scratch each time wastes both compute and the agent’s context. Sprites let the agent pick up where it left off, with the rollback capability serving as the safety net that ephemerality provides in the other camp.

Daytona offers the fastest creation among persistent platforms at 90 milliseconds and is the only platform with Computer Use support, enabling agents to control Linux, Windows, and macOS virtual desktops programmatically. Runloop provides persistent environments with checkpoint and restore using a custom isolation layer.

Walker’s analysis frames the market split bluntly: the ephemeral camp optimizes for security, the persistent camp optimizes for productivity. His prediction is that by 2027, checkpoint/restore will become table stakes across all platforms, effectively merging the two camps. By 2028, he expects Computer Use (browser and desktop control) to be standard, not a differentiator.

The All-in-One Approach: AIO Sandbox

Agent-Infra’s AIO Sandbox, the launch that prompted this analysis, takes a different path entirely. Rather than optimizing for creation speed or persistence, it optimizes for tool consolidation. The sandbox bundles Chromium (controllable via Chrome DevTools Protocol), Python and Node.js runtimes, a bash terminal, VSCode Server, Jupyter Notebook, and pre-configured MCP servers for browser, file, shell, and document processing into a single container.

The core technical differentiator is a unified file system. In a standard multi-container setup, a file downloaded by the browser must be programmatically moved to the code execution environment. In AIO Sandbox, a file downloaded via Chromium is immediately visible to the Python interpreter and bash shell through a shared storage layer.

AIO Sandbox is open-source under Apache 2.0, has accumulated 3,400+ GitHub stars, and is self-hosted only with no managed cloud offering. Walker’s research identifies Agent-Infra as ByteDance-affiliated, with the sandbox already in use by UI-TARS-desktop (ByteDance’s computer-use agent).

The trade-off is isolation. AIO Sandbox runs in Docker, which provides weaker security boundaries than Firecracker microVMs or V8 isolates. For development and trusted-code scenarios, the all-in-one convenience may justify the trade-off. For production agent workloads running untrusted code, the Docker-level isolation is a documented limitation.

The Enterprise Layer: NVIDIA OpenShell and Security Partners

While startups compete on speed and developer experience, NVIDIA is building the security layer that sits above all of them. OpenShell, announced at GTC 2026 and now in early preview, is an open-source runtime that enforces infrastructure-level policy on autonomous agents.

The architectural distinction is that OpenShell separates agent behavior from policy enforcement at the system level. Security policies are applied outside the agent’s reach: the agent cannot override them, leak credentials, or access private data even if the agent itself is compromised. NVIDIA describes it as “the browser tab model applied to agents” — sessions are isolated, resources are controlled, and permissions are verified by the runtime before any action takes place.

NVIDIA is collaborating with Cisco, CrowdStrike, Google Cloud, Microsoft Security, and TrendAI to align runtime policy management and enforcement across the enterprise stack. NemoClaw, the reference stack that bundles OpenShell with Nemotron models and OpenClaw, enables deployment across cloud, on-premises, and personal hardware including GeForce RTX PCs and DGX systems.

The enterprise play is less about where code runs and more about who defines the rules. An organization deploying 1,000 agents across different sandbox platforms needs a unified policy layer that works regardless of whether the underlying runtime is E2B, AIO Sandbox, or Cloudflare Workers.

The Local-First Outlier: Microsandbox

Microsandbox, from the YC X26 batch, addresses a problem the other platforms don’t: what happens when the sandbox itself has access to your credentials.

In most sandbox architectures, API keys and secrets are passed into the execution environment so the agent can call external services. If the sandbox is compromised, those credentials are exposed. Microsandbox’s approach is to never give the sandbox real credentials at all. The agent sees placeholder values. At the network layer, the Microsandbox proxy intercepts outbound requests to allowed domains and swaps the placeholders for real keys only on verified TLS connections to pre-approved hosts.

Built on libkrun microVMs with hardware isolation (KVM/HVF), Microsandbox achieves sub-200-millisecond startup times and provides programmable networking including DNS inspection, HTTP interception, and domain allowlisting. It is open-source under Apache 2.0 with 5,000+ GitHub stars, but remains self-hosted only and experimental, with a cloud offering listed as “launching soon.”

The approach highlights a gap in every other platform’s architecture. Even Firecracker microVMs with strict isolation give the agent direct access to injected secrets. If the threat model includes the agent itself as a potential adversary — and after the incidents at Ona and the Cline VS Code compromise, that threat model is increasingly mainstream — network-layer credential isolation becomes a meaningful architectural advantage.

What Existing Agents Actually Do Today

The sandbox market exists because agent developers aren’t satisfied with what ships by default. Claude Code relies on Bubblewrap on Linux and Seatbelt on macOS, but sandboxing is off by default. OpenAI Codex uses Landlock and seccomp and is the only major agent with sandboxing enabled by default. Gemini CLI supports Docker or Podman containers but leaves it opt-in.

The gap between “sandbox available” and “sandbox on by default” matters because of approval fatigue. Bunnyshell’s analysis found that sandboxing reduces permission prompts by 84%, but the number one practical problem developers report is clicking “approve” reflexively, making those prompts meaningless without a sandbox backstop.

This is why the 13 third-party sandbox platforms exist. They are not competing with each other as much as they are competing with the default: no sandbox at all, or a sandbox that is off.

The Market Shape in March 2026

Walker’s taxonomy splits the 13 platforms across three primary dimensions: ephemeral versus persistent, managed versus self-hosted, and container versus microVM versus isolate.

E2B leads on volume and enterprise adoption. Cloudflare leads on raw speed and density. Fly.io Sprites lead on stateful agent workflows. AIO Sandbox leads on tool consolidation and MCP integration. NVIDIA OpenShell leads on enterprise policy enforcement. Microsandbox leads on credential security. Daytona leads on Computer Use support.

No single platform covers all six dimensions. The developer building a stateful coding agent has different needs than the enterprise deploying 10,000 ephemeral evaluation pipelines, which has different needs than the individual running a personal assistant on their laptop.

Walker predicts convergence: checkpoint/restore becoming universal by 2027, Computer Use by 2028. But the fundamental debate — should an agent’s runtime be destroyed after each task or maintained across sessions — is an architectural choice that implies different product philosophies, pricing models, and security postures. That debate is unlikely to resolve into a single winner.

What is clear is that the infrastructure layer beneath AI agents has become its own market. Thirteen platforms and counting, each funded, staffed, and shipping. The question “where should AI-generated code run?” has a $0 answer (nowhere, just run it on the host) and a growing number of multi-million-dollar answers. The incidents that created demand for sandboxing are not going to stop, and the agents generating the code are only going to get more capable and more autonomous.

The box your agent lives in is now a product category. The fight over who builds it is just starting.

The Agent Sandbox Wars: 13 Platforms Are Racing to Build the Runtime Layer AI Agents Actually Need

The Problem All 13 Platforms Are Solving

The Ephemeral Camp: Security Through Destruction

The Persistent Camp: Agents That Keep Working

The All-in-One Approach: AIO Sandbox

The Enterprise Layer: NVIDIA OpenShell and Security Partners

The Local-First Outlier: Microsandbox

What Existing Agents Actually Do Today

The Market Shape in March 2026

Get our morning briefing in your inbox

Keep Reading

Anthropic Is Privately Warning the Government That Mythos Makes Large-Scale Cyberattacks 'Much More Likely' in 2026

OpenClaw's Mass-Market Paradox: One-Click Deployment Is Scaling Faster Than Security Can Follow

LiteLLM Supply Chain Attack: How TeamPCP Compromised the Python Library That Powers Most AI Agent Stacks