OpenAI’s Python Agents SDK now ships with Sandbox agents, a container-based execution layer that gives agents isolated Unix-like environments with filesystem access, shell commands, package installation, port management, snapshots, and persistent memory across sessions. The feature, documented in OpenAI’s updated Agents SDK guide, supports eight sandbox providers: E2B, Modal, Docker, Vercel, Cloudflare, Daytona, Runloop, and Blaxel.

The core architectural decision is separating the harness (control plane) from compute (sandbox execution plane). According to OpenAI’s documentation, the harness owns the agent loop, model calls, tool routing, handoffs, approvals, tracing, and recovery. The sandbox handles file reads and writes, command execution, dependency installation, mounted storage, and exposed ports. This separation lets teams keep auth, billing, audit logs, and human review in trusted infrastructure while the sandbox runs model-directed work with narrow credentials.

How Sandbox Agents Work

A SandboxAgent is still a standard Agent with instructions, tools, guardrails, and hooks. What changes is the execution boundary. The runner prepares the agent against a live sandbox session that owns files, commands, and provider-specific isolation.

Workspace setup uses a Manifest abstraction that defines starting contents: files, Git repos, local directories, cloud storage mounts (S3, GCS, Azure Blob, Cloudflare R2, Box), environment variables, and OS-level users and groups. Manifest paths are workspace-relative and portable across local, Docker, and hosted clients.

Persistent workspaces survive container restarts. The SDK checkpoints state via snapshots and restores it in new containers, enabling long-running workflows like data analysis pipelines, CI/CD builds, and multi-session code refactoring to resume after failures.

Provider Trade-offs

The eight providers offer substantially different isolation models and performance. According to analysis from ByteIota, E2B uses microVMs with 90-150ms cold starts and 24-hour sessions. Blaxel leads at 25ms cold starts. Modal provides gVisor isolation with GPU access (A100/H100) but slower runtime assembly. Docker local runs fast but shares the host kernel, making it unsuitable for untrusted code execution.

The provider choice determines security boundaries. MicroVM-based providers (E2B, Daytona) offer hardware-level isolation. gVisor providers (Modal) use kernel-level sandboxing. Docker provides process-level isolation only. For teams deploying agents that execute arbitrary model-generated code, the isolation model is a security architecture decision, not a convenience preference.

Credential Isolation

OpenAI’s documentation explicitly treats sandbox credentials as runtime configuration, not prompt content. Agent API keys and control plane secrets stay in the harness. The sandbox receives only the narrow credentials needed for its specific mounted storage or package managers. This prevents a common attack vector where injected code in the execution environment attempts to exfiltrate control plane credentials.

The Production Deployment Signal

Sandbox agents address the gap between chat-based agents and autonomous coding agents. As ByteIota notes, most production agents today remain chat-based because giving agents code execution capabilities was too risky without proper isolation. Container-based sandboxing with credential separation makes autonomous coding agents, data analysis pipelines, and infrastructure automation workflows viable for enterprise deployment.

The feature is currently Python-only. OpenAI’s GitHub repository lists the JavaScript/TypeScript SDK separately, and sandbox support has not yet been announced for that platform.