OpenAI Releases GPT-5.5 With State-of-the-Art Agentic Coding and Multi-Step Autonomous Execution

OpenAI released GPT-5.5 on April 23, 2026, a frontier model built to complete multi-step work autonomously rather than answer one question at a time. The model is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex, with API access following on April 24. A higher-tier GPT-5.5 Pro variant is available to Pro, Business, and Enterprise subscribers.

Benchmark Numbers

GPT-5.5 posts 82.7% on Terminal-Bench 2.0, which tests complex command-line workflows requiring planning, iteration, and tool coordination. That is up from 75.1% for GPT-5.4 and ahead of Claude Opus 4.7 at 69.4% and Gemini 3.1 Pro at 68.5%, according to OpenAI.

On SWE-Bench Pro, which evaluates real-world GitHub issue resolution, the model reaches 58.6%. On OpenAI’s internal Expert-SWE eval for long-horizon coding tasks with a median estimated human completion time of 20 hours, GPT-5.5 outperforms GPT-5.4 at 73.1% versus 68.5%.

OpenAI says GPT-5.5 matches GPT-5.4 per-token latency in production serving while using significantly fewer tokens to complete equivalent Codex tasks. On the Artificial Analysis Coding Index, OpenAI claims the model delivers frontier intelligence at half the cost of competitive models.

From Conversations to Execution

Co-founder and president Greg Brockman told journalists the release represents “a big advancement towards more agentic and intuitive computing,” calling GPT-5.5 “a faster, sharper thinker for fewer tokens” compared to GPT-5.4, according to TechCrunch.

Brockman framed GPT-5.5 as another step toward OpenAI’s planned “super app” combining ChatGPT, Codex, and an AI browser into a single enterprise service. Chief scientist Jakub Pachocki added that significant improvements should be expected to continue: “In fact, I would say, like, I think the last two years have been surprisingly slow.”

The model is designed to handle loosely defined, multi-part tasks: planning workflows, using tools, checking its own output, and continuing through ambiguity without requiring constant user input, according to OpenAI’s announcement.

Early Enterprise Usage

OpenAI says more than 85% of the company already uses Codex weekly across engineering, finance, communications, marketing, and product management. The finance team used Codex to review 24,771 K-1 tax forms totaling 71,637 pages, accelerating the task by two weeks. A go-to-market employee automated weekly business reports, saving 5 to 10 hours per week.

Michael Truell, co-founder and CEO of Cursor, said GPT-5.5 is “noticeably smarter and more persistent than GPT-5.4, with stronger coding performance and more reliable tool use. It stays on task for significantly longer without stopping early,” per OpenAI. An NVIDIA engineer with early access said losing access to the model “feels like I’ve had a limb amputated.”

Three Model Tiers

GPT-5.5 ships in three configurations: the base model for ChatGPT and Codex users, GPT-5.5 Pro with enhanced capabilities for higher-tier subscribers, and GPT-5.5 Thinking optimized for reasoning-heavy tasks. The release follows GPT-5.4 by roughly one month, continuing OpenAI’s accelerating release cadence.

The Competitive Axis Shift

GPT-5.5 confirms the frontier model race has moved from single-turn quality to sustained autonomous execution. The benchmarks OpenAI chose to highlight tell the story: Terminal-Bench (multi-step CLI workflows), SWE-Bench Pro (end-to-end issue resolution), and Expert-SWE (20-hour coding tasks). All three measure whether a model can work through a complex problem over time, not whether it can answer a question well once.

For teams building on agent frameworks, the practical question is whether fewer tokens per task and matched latency translate to lower operating costs at scale. OpenAI’s claim of half-cost frontier coding performance, if it holds under independent testing, would compress the economics of deploying autonomous coding agents in production.

OpenAI Releases GPT-5.5 With State-of-the-Art Agentic Coding and Multi-Step Autonomous Execution

Benchmark Numbers

From Conversations to Execution

Early Enterprise Usage

Three Model Tiers

The Competitive Axis Shift

Get our morning briefing in your inbox

Keep Reading

AIxCrypto Launches Agentir for On-Chain AI Agent Simulation and Testing

India's AI Agent Market Splits: Emergent Hits $1.5B Valuation as Krutrim Shuts Down Agent Platform

AWS Gives AI Agents Their Own Managed Desktop to Operate Legacy Applications Without API Rewrites