Anthropic on April 23 published a technical post-mortem identifying three separate product-layer changes that degraded Claude Code, the Claude Agent SDK, and Claude Cowork over the past month. The underlying models were not affected. All three issues have been resolved as of April 20, and the company is resetting usage limits for all Pro, Max, and Business+ subscribers as compensation.
“We take reports about degradation very seriously,” Anthropic wrote in the post-mortem. “We never intentionally degrade our models, and we were able to immediately confirm that our API and inference layer were unaffected.”
Three Changes, Three Breakages
The first issue dates to March 4, when Anthropic changed Claude Code’s default reasoning effort from high to medium. The intent was to reduce latency for users experiencing UI freezes during long reasoning sessions. The tradeoff turned out wrong: users preferred higher intelligence and were willing to wait. Anthropic reverted to high on April 7, according to the post-mortem. The latest Claude Code build (v2.1.118) now defaults to “xhigh” on Sonnet 4.6, The Register reported.
The second issue was a caching bug introduced on March 26. Engineers shipped a change meant to clear old reasoning traces from sessions idle for over an hour, reducing the cost of resuming stale sessions. A bug caused the clearing to happen on every turn for the rest of the session. Claude progressively lost memory of why it had chosen its approach, surfacing as forgetfulness, repetition, and erratic tool choices. The bug also caused cache misses on subsequent requests, which Anthropic believes explains reports of usage limits draining faster than expected. The fix shipped April 10 for Sonnet 4.6 and Opus 4.6, per the post-mortem.
The third issue arrived on April 16, when Anthropic revised its system prompt to reduce verbosity. Internal testing over several weeks suggested the change was safe, but after shipping it alongside the Opus 4.7 release, ablation tests revealed a three percent performance drop for both Opus 4.6 and 4.7. The prompt change was reverted on April 20, according to The Register.
The Compound Effect
Because each change affected different traffic slices on different schedules, the combined result looked like broad, inconsistent degradation. This made reproduction difficult. Anthropic said neither internal usage nor evals initially reproduced the issues users identified, according to the post-mortem.
The controversy had been building since early April. Stella Laurenzo, a Senior Director in AMD’s AI group, published an audit of 6,852 Claude Code session files and over 234,000 tool calls on GitHub, documenting declining reasoning depth, according to VentureBeat. Third-party benchmark firm BridgeMind reported Claude Opus 4.6’s accuracy dropping from 83.3% to 68.3%, though some researchers disputed the testing methodology, VentureBeat reported.
Users on GitHub, Reddit, and X had speculated that Anthropic was intentionally “nerfing” Claude to manage demand. The company flatly denied this, per Business Insider.
Remediation
Beyond the technical fixes, Anthropic committed to several process changes: a larger share of staff will use the public build of Claude Code rather than internal versions, improvements to its Code Review tool, better evaluation of system prompt changes before shipping, and a new @ClaudeDevs account on X for explaining product decisions, according to The Register.
The post-mortem also noted that when Anthropic back-tested its own Code Review tool against the offending pull requests using Opus 4.7, the tool flagged the caching bug, suggesting its internal tooling could have caught the issue earlier if applied consistently.
The Reliability Question
The episode underscores a tension facing every team building on agentic AI frameworks. Claude Code, the Claude Agent SDK, and Claude Cowork are increasingly production components in developer and enterprise workflows. When three product-layer changes can compound into weeks of perceived model regression, the debugging surface area extends well beyond model weights into harness configuration, caching behavior, and system prompts.
For teams running Claude agents in production, the practical takeaway is that “the model got worse” and “the model’s wrapper got worse” can be indistinguishable from the user’s perspective. Anthropic’s transparent post-mortem sets a precedent for how agent platform providers handle reliability incidents, but it also illustrates how much of agent quality lives in the infrastructure layer, not the model itself.