Claude Code's Hidden Token Tax: Developers Document Invisible 20K Token Injection and Silent Cache TTL Downgrade

Two separate but linked Anthropic developer crises topped Hacker News on April 12, drawing over 1,100 combined comments and raising pointed questions about trust and cost predictability for anyone building on Claude Code.

The Invisible Token Injection

The larger of the two issues, filed on GitHub with 679 Hacker News points and 599 comments, documents that Claude Code v2.1.100 and later silently injects approximately 20,000 server-side tokens into every API request. These tokens are invisible to users in their context windows but count against usage quotas.

The practical effect: Pro Max subscribers paying for “5x the usage” are exhausting their quotas in as little as 90 minutes of moderate use. One developer documented the discrepancy by comparing versions directly. Claude Code v2.1.98 shows consistent token counts between what users see and what gets billed. Version 2.1.100 and later show inflated actual usage versus displayed counts, with no explanation for the gap.

The issue author collected granular data from session JSONL files. In a 1.5-hour window of moderate usage (mostly Q&A, light development), the session consumed 103.9 million raw tokens across all active sessions, consistent with cache-read tokens counting at full rate against quota rather than the expected reduced rate. If cache reads counted at the documented 1/10 rate, the same window would have consumed only 13.1 million effective tokens.

Background sessions compound the problem. Claude Code sessions left open in other terminals continue making API calls for compacts, retros, and hook processing, even without active user interaction. In the documented case, two background sessions consumed 78% of the post-reset quota.

The Cache TTL Downgrade

The second issue, filed separately with 518 Hacker News points and 397 comments, presents forensic analysis of 119,866 API calls spanning January 11 to April 11, 2026. The data shows Anthropic silently changed the prompt cache TTL default from one hour to five minutes around March 6.

The evidence is precise. From February 1 through March 5, every API call across two independent machines on separate accounts shows zero 5-minute cache tokens and consistent 1-hour cache tokens. On March 6, 5-minute tokens reappear. By March 8, 5-minute tokens outnumber 1-hour tokens 5:1.

The cost impact is quantifiable. Across the full dataset, the TTL downgrade produced a 17.1% overpayment on cache creation costs. For the Sonnet model tier, that translates to $949 in excess costs over three months. For Opus, $1,582. February, the only full month on the 1-hour TTL, shows just 1.1% waste.

The mechanism is straightforward: with a 5-minute TTL, any pause in a session longer than five minutes expires the entire cached context. The next turn forces a full cache re-creation at the write rate (12.5x more expensive than the read rate for Sonnet), rather than a cache read. For long coding sessions, the compounding penalty grows with context size.

No Official Response

Anthropic has not publicly responded to either issue as of publication time. The suggested developer workaround for the token injection problem is downgrading to Claude Code v2.1.98. No workaround exists for the cache TTL change, which is controlled server-side.

The Competitive Context

The timing adds pressure. OpenAI launched its Codex coding agent with aggressive pricing earlier this month. Forbes reported on April 10 on the broader pattern of developers burning through AI token budgets faster than expected. Multiple Hacker News commenters noted that unpredictable quota exhaustion makes it difficult to budget Claude Code into production workflows, and several reported evaluating Codex as an alternative.

The Cost Predictability Problem

For any team that has built Claude Code into its development pipeline, these issues strike at the economics of the tool. Silent changes to cache behavior and invisible token injections mean developers cannot reliably predict or budget their costs. The cache TTL downgrade in particular affected every Claude Code user for over five weeks before anyone documented it publicly. The 1-hour TTL appears to have been the intended default, held consistently for 33 days across multiple accounts, then quietly reverted with no changelog entry.

Claude Code's Hidden Token Tax: Developers Document Invisible 20K Token Injection and Silent Cache TTL Downgrade

The Invisible Token Injection

The Cache TTL Downgrade

No Official Response

The Competitive Context

The Cost Predictability Problem

Get our morning briefing in your inbox

Keep Reading

CoreWeave Expands Meta Deal to $21 Billion Through 2032, Covering First Vera Rubin GPU Deployments

Twilio Survey: 85% of Australian Teams Say Tech Fragmentation Is Blocking AI Agent ROI

India's Voice AI Startups Have Raised $160M Since 2019 as Gnani.ai Processes 30 Million Daily Interactions