OpenAI published Symphony, an open-source specification under Apache 2.0 that turns issue trackers into control planes for autonomous Codex coding agents. The system assigns each open task to an agent, runs agents continuously in isolated workspaces, and prepares changes for human review. Internal teams reported a 500% increase in landed pull requests in the first three weeks.
From Prompting to Work-Pulling
Symphony emerged from a bottleneck OpenAI hit as engineers scaled up Codex usage. According to the company’s blog post, engineers could manage three to five simultaneous Codex sessions before context switching degraded productivity. “We had effectively built a team of extremely capable junior engineers, then assigned our human engineers to micromanaging them,” OpenAI wrote.
The fix was inverting the model. Instead of humans supervising coding sessions, agents pull work from a Linear-compatible task queue. Symphony monitors issue states, spawns agents for unblocked tasks, manages per-issue workspaces, watches CI, rebases changes, resolves conflicts, and shepherds pull requests through the merge pipeline.
Task Decomposition and Agent-Created Work
The spec supports hierarchical task graphs. An agent can analyze a codebase and generate an implementation plan, then break that plan into a directed acyclic graph of tasks with dependencies. Agents only start work on tasks whose blockers have cleared. During implementation, agents can also file follow-up issues they discover, creating work that other agents (or humans) can evaluate and schedule later.
OpenAI described a concrete example: when they marked a React upgrade as blocked on a migration to Vite, agents automatically started the React work only after the Vite migration completed.
Proof of Work and Quality Signals
Symphony agents deliver review packets that include CI status, PR review feedback, complexity analysis, and video walkthroughs of the feature working in the product. According to the GitHub repository, the system is designed so engineers “do not need to supervise Codex; they can manage the work at a higher level.”
Industry Analysis
Analysts flagged both opportunity and risk. Sanchit Vir Gogia, chief analyst at Greyhound Research, told InfoWorld that Symphony “begins to resemble a lightweight operating system for software delivery.” Biswajeet Mahapatra, principal analyst at Forrester, warned that enterprises should track quality metrics beyond PR volume: “lead time to usable functionality, defect escape rates, rework and code churn, production stability, and perceived developer flow.”
Gogia cautioned against treating higher pull request volumes as proof of productivity: “Generation scales effortlessly, validation does not.”
OpenAI’s Own Caveats
OpenAI acknowledged the tradeoffs. Losing the ability to nudge agents mid-flight means some tasks miss the mark entirely. The company added guardrails over time, including end-to-end tests, Chrome DevTools integration, and QA smoke tests. Not every task fits the orchestration model: “ambiguous problems or work requiring strong judgment” still need interactive sessions.
The spec is labeled as a “low-key engineering preview for testing in trusted environments,” and OpenAI recommends it for codebases that have already adopted its harness engineering methodology. The reference implementation ships in Elixir, though the spec is designed to be implemented in any language.