Anthropic Wants Governments to Block Dangerous AI Models. Its Own Users Are Already Getting Blocked.

Anthropic published two policy frameworks on June 10, 2026, under the umbrella title “Policy on the AI Exponential.” The first, the Advanced AI Framework, proposes that the US federal government gain legal authority to block or deter deployment of AI models that pose catastrophic risks. The second, the Economic Policy Framework, outlines a response plan for AI-driven labor market disruption, backed by $350 million in commitments.

The timing is deliberate. Anthropic filed its S-1 with the SEC on June 1. Its valuation sits at $965 billion. And two days before the policy papers dropped, the company launched Claude Fable 5, a model so capable that Anthropic built automated safeguards to block queries in cybersecurity, biology, and AI research. Those safeguards are already generating backlash from the scientists they are designed to protect.

The papers and the product restrictions are two expressions of the same thesis: AI capabilities are now advancing faster than any institution, including Anthropic itself, can govern them. The company is building governance infrastructure in real time, across both policy and product, and it is not waiting for governments to go first.

The Advanced AI Framework: Four Risks, One Kill Switch

The Advanced AI Framework targets four categories of catastrophic risk: biological weapons, cyber attacks, loss of control over autonomous systems, and AI systems that automate their own research and development. Anthropic proposes that the government should have legal authority to block or deter deployment of models that pose significant risk in any of these categories, with civil penalties tied to global annual revenue that escalate with repeated violations.

The framework’s scope is deliberately narrow. It applies only to models trained with more than 10^25 floating-point operations (FLOPs), developed by companies earning more than $500 million in AI-related revenue or spending more than $1 billion on AI R&D. That threshold currently captures only a handful of companies: Anthropic, OpenAI, Google DeepMind, Meta, and possibly xAI.

The proposal has four pillars. Frontier developers would be required to test their models and publish summaries of the results. They would need to engage independent evaluators. They would maintain security programs against model weight theft. And the government would have enforcement authority to block deployments deemed catastrophically risky.

Anthropic explicitly argues against federal preemption of state law unless the federal standard matches or exceeds its own proposal. “We do not believe Congress should preempt state law unless it enacts a federal law that is at least as strong as the framework we are proposing today,” the company wrote. This positions the framework as a floor, not a ceiling.

The Evidence Anthropic Cites for Urgency

The policy paper does not treat catastrophic risk as theoretical. It cites specific capability milestones from Anthropic’s own models.

Claude Mythos Preview, the restricted predecessor to Fable 5, discovered thousands of previously unknown vulnerabilities in every major operating system and every major web browser during testing. The Anthropic Red Team reported that Mythos Preview wrote a browser exploit chaining four vulnerabilities together, autonomously obtained local privilege escalation on Linux, and wrote a remote code execution exploit on FreeBSD’s NFS server that granted full root access. Non-experts at Anthropic with no formal security training asked Mythos Preview to find remote code execution vulnerabilities overnight and found working exploits the next morning.

On autonomous R&D, The Anthropic Institute published data showing that Anthropic engineers now ship 8x as much code per quarter as they did from 2021 to 2025. The length of tasks AI models can reliably complete on their own has been doubling every four months, up from every seven months. METR, an independent evaluation organization, found that Claude Mythos Preview could work autonomously for “at least” 16 hours, at “the upper end of what [METR] can measure without new tasks.”

On biology, Anthropic reported that Mythos 5 (the unrestricted version of Fable 5) “consistently produce[s] novel, compelling scientific hypotheses” and conducted “novel genomics research in over a week of largely autonomous work,” training a custom machine learning model that outperformed a recently published Science paper while being 100 times smaller.

These are the capabilities Anthropic wants governments to have the power to regulate. They are also the capabilities it is selling to enterprise customers at $10 per million input tokens.

The $350 Million Economic Bet

The Economic Policy Framework is the second half of the proposal. Anthropic commits $200 million to an Economic Futures Research Fund for studying AI-driven labor disruption, plus $150 million for a national fellowship program.

The framework outlines responses calibrated to three unemployment scenarios: 5%, 10%, and “unprecedented.” At each tier, the proposals escalate from enhanced statistical monitoring through retraining programs to, at the extreme end, what amounts to a restructuring of the relationship between work and income.

Anthropic states directly: “We are not seeking job displacement. We are working to prevent or minimize it. Some amount of displacement, though we cannot say how much, may be an intrinsic consequence of the technology.” The company describes the $350 million as policies “that we are willing to help fund, including those that are not traditionally financed by private firms.”

The framing is unusual for a pre-IPO company. Most firms approaching a public listing emphasize growth and market opportunity. Anthropic is simultaneously arguing that its technology might cause unprecedented unemployment and asking governments to regulate it more aggressively.

Fable 5: Governance in Practice

While the policy papers describe a future governance architecture, Fable 5’s safeguards are governance happening now.

Claude Fable 5 is the first Mythos-class model Anthropic has released for general use. It is the same underlying model as Mythos 5, with safeguards layered on top. Those safeguards automatically route queries about offensive cybersecurity, biology and life sciences, and AI research to the less capable Claude Opus 4.8. Anthropic acknowledged the safeguards are “intentionally broad” and will “sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions.”

The practical impact has been immediate. Prof. Derya Unutmaz, a biologist at the Jackson Institute, told The Telegraph that he was “effectively barred from interacting with Anthropic’s Fable AI even for mundane questions.” He wrote: “I can’t even say ‘hello’ to Fable 5 except in incognito mode because it knows I am a biomedical researcher.” James Schnable, a plant geneticist at the University of Nebraska, said: “As far as I can tell, Anthropic just decided to blacklist every biologist in their customer base.”

Anthropic’s Matt Durant, a life-sciences researcher, confirmed the restrictions are profile-based: Fable redirects all biology queries from identified researchers to a less powerful model. The company admitted the decision was driven by Fable’s potential for biological weapon production, which it described as “low, but higher than for any previous model.”

The safeguard architecture is more extensive than most users realize. According to Anthropic’s support documentation, the checks review “everything the model reads, not just your latest message, including memory, content from connectors, web search results, and files.” A block can be triggered by content the user did not type. The system also blocks attempts to extract the model’s summarized thinking, preventing users from reverse-engineering the safeguard logic.

The Tension at the Core

The policy papers and the Fable restrictions reveal a genuine tension in Anthropic’s position. The company simultaneously argues that:

AI capabilities in cybersecurity and biology are now powerful enough to require government intervention.
Anthropic’s own models are the most capable in the world at precisely these dangerous tasks.
The company is releasing these models for general use, with automated safeguards it acknowledges are imperfect.
It wants government authority to do what it is already doing unilaterally: blocking access to dangerous capabilities.

The position is coherent, if uncomfortable. Anthropic is saying: we are currently the only entity making these governance decisions about our models, and that is not a sustainable arrangement. The safeguards on Fable 5 are a stopgap. The policy paper is the proposed replacement.

The company’s own research on biological agents illustrates the difficulty. Laura Luebbert’s team at Anthropic found that even the strongest AI models “did not consistently achieve the level of accuracy required for reliable dataset construction” in virology, but accuracy rose to nearly 100% with deterministic retrieval tools. The capability exists. The infrastructure to safely direct it is still being built.

The Regulatory Capture Question

Former White House AI Czar David Sacks accused Anthropic of regulatory capture after the company published its recursive self-improvement findings in late May. The argument: Anthropic publishes alarming safety research, then proposes regulations that happen to entrench incumbents with more than $500 million in AI revenue, exactly the threshold that protects large labs and burdens potential competitors.

The accusation has some structural merit. The 10^25 FLOP threshold and the $500 million revenue floor would effectively exempt smaller AI companies while concentrating regulatory burden on the five or six labs that can actually train frontier models. Anthropic’s counterargument, stated in the 2024 framework that preceded this paper, is that “the risks we’re trying to address come from the most capable models, and those models are only made by a small number of companies.”

But the Fable 5 restrictions undercut the regulatory capture framing in one important way: Anthropic is imposing costs on itself right now, before any government mandate exists. Blocking biologists from its most capable model is not costless. It generates negative press, frustrates paying customers, and gives competitors an opening. A pure regulatory capture play would advocate for rules on others while keeping its own product unrestricted.

The IPO Context

Anthropic’s S-1 filing came June 1. The policy papers dropped June 10, nine days later. The Fable 5 launch was June 9.

The sequencing serves multiple purposes for an IPO-track company. Publishing concrete policy proposals signals to institutional investors that Anthropic takes regulatory risk seriously. The $350 million commitment to economic transition programs preempts criticism that the company is profiting from labor displacement. The Fable 5 restrictions demonstrate that Anthropic will actually implement safety measures, not just publish papers about them.

For public market investors evaluating Anthropic against OpenAI, which filed its own confidential S-1 a week later, the differentiation is clear. OpenAI is positioning on capability and growth. Anthropic is positioning on capability plus governance. Whether that governance premium commands a higher multiple is one of the questions the IPO will answer.

The Architecture That Emerges

Read together, the policy papers, the Fable safeguards, the RSI research, and the Glasswing cyber defense program form a coherent governance architecture:

At the model level, automated safeguards gate access to dangerous capabilities. At the company level, Anthropic publishes safety evaluations and engages independent testers. At the government level, the proposed framework would give regulators authority to block deployments that clear a catastrophic risk threshold. At the economic level, the company proposes funding mechanisms for labor market disruption.

No other AI lab has published anything this comprehensive. Google’s approach to Gemini safety is largely internal. OpenAI publishes system cards but has not proposed a regulatory framework with enforcement teeth. Meta releases models open-source with limited post-deployment controls.

Whether Anthropic’s framework is the right one is a policy debate that will play out over months. Whether the Fable 5 restrictions are proportionate is an active dispute between the company and its scientific users. But the fact that a frontier AI lab is simultaneously building the most capable models in the world and arguing, in writing, that governments should have the power to shut them down is historically unusual. The gap between Anthropic’s stated beliefs and its product decisions is narrower than for any of its competitors.

That gap is the governance architecture. The question is whether institutions can build on it before capabilities outrun it again.

Anthropic Wants Governments to Block Dangerous AI Models. Its Own Users Are Already Getting Blocked.

The Advanced AI Framework: Four Risks, One Kill Switch

The Evidence Anthropic Cites for Urgency

The $350 Million Economic Bet

Fable 5: Governance in Practice

The Tension at the Core

The Regulatory Capture Question

The IPO Context

The Architecture That Emerges

Get our morning briefing in your inbox

Keep Reading

Microsoft Scout Built on OpenClaw Signals Agents Are Becoming Enterprise Infrastructure

OpenAI, Anthropic, and Vercel Ship Agent Workflow Infrastructure Within Hours of Each Other

DeepSeek's $7.4B Round and OpenAI's $34B Burn Rate Reveal Two Competing Models for Funding the AI Arms Race