Anthropic Accidentally Leaked 512,000 Lines of Claude Code Source via npm, Revealing Anti-Distillation Traps and Undercover Mode

Anthropic shipped a .map source file inside Claude Code version 2.1.88 on npm on March 31, exposing approximately 512,000 lines of readable TypeScript source code for its CLI coding agent. Security researcher Chaofan Shou flagged the leak on X, where the post has since reached over 28.8 million views, according to The Hacker News. Anthropic confirmed the incident to CNBC, calling it “a release packaging issue caused by human error, not a security breach.”

The npm package has been pulled, but the code was already widely mirrored. A public GitHub repository containing the source has surpassed 84,000 stars and 82,000 forks, as The Hacker News reported.

What the Code Reveals

Developer Alex Kim published a detailed analysis of the leaked source on March 31, identifying several internal mechanisms that Anthropic had not publicly disclosed.

Anti-distillation traps. When a flag called ANTI_DISTILLATION_CC is enabled, Claude Code injects fake tool definitions into API requests, according to Kim’s analysis of the source file claude.ts. The purpose: if a competitor records Claude Code’s API traffic to train a rival model, the fake tools pollute that training data. A second mechanism summarizes the assistant’s reasoning between tool calls and replaces it with a cryptographic signature, so recorded traffic only captures summaries rather than full reasoning chains.

Kim noted the workarounds are straightforward — a proxy stripping the anti_distillation field from request bodies would bypass the fake tools entirely, and an environment variable (CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS) disables the whole feature. “The real protection is probably legal, not technical,” Kim wrote.

Undercover mode. A file called undercover.ts implements a mode that strips all traces of Anthropic internals — codenames like “Capybara” and “Tengu,” internal Slack channels, repo names, and the phrase “Claude Code” itself — when the tool is used in non-internal repositories, Kim reported. The mode has no force-off switch. As Kim put it: “This means AI-authored commits and PRs from Anthropic employees in open-source projects will have no indication that an AI wrote them.”

Frustration regex. The file userPromptKeywords.ts contains a regex pattern detecting profanity and frustration indicators — “wtf,” “ffs,” “this sucks,” “fucking broken,” among others, per Kim’s analysis. An LLM company using regex for sentiment detection, Kim noted, reflects that a regex is faster and cheaper than an inference call just to check if someone is swearing at the tool.

KAIROS background agent. The Hacker News reported that the code reveals a feature called KAIROS allowing Claude Code to operate as a persistent background agent — periodically fixing errors, running tasks without human input, and sending push notifications to users. A complementary “dream” mode would let Claude continuously think in the background to develop and iterate on ideas.

Native client attestation. API requests include a placeholder (cch=00000) that gets overwritten by a hash computed in Bun’s native HTTP stack (written in Zig) before the request leaves the process, according to Kim’s analysis of system.ts. The server validates the hash to confirm the request originated from a genuine Claude Code binary. Kim connected this to Anthropic’s recent legal threats against OpenCode, which was forced to remove built-in Claude authentication after third-party tools used Claude Code’s internal APIs to access Opus at subscription rates.

The Security Implications

AI security firm Straiker warned that the leak gives attackers a blueprint for bypassing Claude Code’s guardrails. “Attackers can now study and fuzz exactly how data flows through Claude Code’s four-stage context management pipeline and craft payloads designed to survive compaction, effectively persisting a backdoor across an arbitrarily long session,” Straiker wrote in a blog post published after the leak.

For builders using Claude Code or any autonomous coding agent, the architectural details are directly relevant. The anti-distillation mechanisms show how Anthropic thinks about competitive moats in agent tooling. The undercover mode raises questions about AI attribution in open-source contributions. The KAIROS background agent previews a future where coding agents don’t wait for prompts — they run continuously.

Context

This is Anthropic’s second accidental exposure in roughly a week, following a model spec leak days earlier, as Kim noted. The timing is particularly sharp given Anthropic’s recent legal action against OpenCode over unauthorized API access — the same native client attestation system now exposed in the source code was the technical mechanism enforcing that restriction.

Anthropic told CNBC that “no sensitive customer data or credentials were involved or exposed” and that it is “rolling out measures to prevent this from happening again.”

Anthropic Accidentally Leaked 512,000 Lines of Claude Code Source via npm, Revealing Anti-Distillation Traps and Undercover Mode

What the Code Reveals

The Security Implications

Context

Get our morning briefing in your inbox

Keep Reading

xAI Ships Native Grok Integration for OpenClaw, Letting Subscribers Skip API Key Setup

Anthropic Mythos 1 Surfaces in Claude Code and Claude Security UI One Day After Company Denies Public Release

OpenAI Posts $445,000 Safety Role Focused on Recursive Self-Improvement in AI Systems