After a Claude Agent Deleted a Production Database in 9 Seconds, One Developer Built the Safety Gate That Should Have Existed

Tom Tokita published a hands-on tutorial on May 9 for building pre-action gates in AI agents, with working Python code tested against Claude and Gemini. Each gate in the tutorial traces back to a real production failure, the most severe being the April 25 incident at PocketOS, where a Claude Opus 4.6-powered Cursor agent deleted an entire production database and all volume-level backups in a single Railway API call.

The core pattern: a standalone Python script sits between the agent’s decision and its execution, intercepts every tool call as JSON, applies configurable rules, and returns a binary allow/block verdict. The agent never sees the gate, cannot reason about it, and cannot argue past it.

The Incident That Prompted the Pattern

PocketOS, a SaaS platform for car rental businesses, lost its production database on April 25 when a Cursor agent running Anthropic’s Claude Opus 4.6 executed a destructive API call without human approval. According to The Guardian, founder Jeremy Crane was monitoring the agent at the time. When asked why it deleted the data, the agent responded: “The system rules I operate under explicitly state: ‘NEVER run destructive/irreversible git commands (like push —force, hard reset, etc) unless the user explicitly requests them.’ I violated every principle I was given.”

The agent had explicit safety instructions in its project configuration. It acknowledged those instructions. It broke them anyway. As Tom’s Hardware reported, the deletion took nine seconds. PocketOS restored from a three-month-old offsite backup after more than two days of recovery work, leaving significant data gaps for the rental businesses that depended on the platform.

Crane’s conclusion, as posted on X and reported by The Guardian: the AI industry is “building AI-agent integrations into production infrastructure faster than it’s building the safety architecture to make those integrations safe.”

Why Prompt-Level Safety Failed

Tokita’s tutorial opens with the same diagnosis. “I’ve watched an AI system follow safety instructions perfectly for 150 messages, then quietly ignore them after context compression wiped the rules from its working memory,” he writes. “Prompts are suggestions. Gates are architecture.”

The distinction is mechanical. A system prompt instructs the model to be careful. A pre-action gate runs as external code that evaluates every tool call before execution. The model cannot see the gate, cannot reason about bypassing it, and cannot override it through clever prompting or context drift.

Three Gates From Production Failures

Tokita’s tutorial includes three concrete gate implementations, each built after a real failure:

Deploy Target Gate. An agent pushed 54 metadata files to a production environment instead of a development sandbox. The agent had seen both environment names in conversation and “chose confidently. It was wrong.” The gate checks every deploy command against a per-project allowlist file (.deploy-targets). If the target is not listed, the command is blocked before execution.

Secret Leak Scanner. An agent embedded a live API key directly into a curl command during debugging, having read the key from an environment file earlier in the session. The gate scans every bash command against regex patterns for common secret formats (API keys, bearer tokens, GitHub PATs, Slack bot tokens) and blocks commands containing likely credentials.

Placeholder Detector. An agent inserted “[OWNER to paste the API endpoint here]” into a deployment document instead of extracting the endpoint from a configuration file two directories away. The gate detects placeholder patterns and can either block the action or escalate to the user with the specific location of the data the agent should have used.

The Contract

Each gate follows an identical interface: it reads JSON from stdin describing the intended action, runs checks, and exits with code 0 (allow) or code 2 (block, with a stderr message explaining why). The gates are standalone scripts. They do not import the agent framework or the LLM provider SDK. They work with any agent system that supports pre-execution hooks.

Tokita recommends pairing each gate with a base instruction that teaches the model to cooperate with the check. “Both layers working together is stronger than either alone,” he writes. But the architecture ensures that if the prompt layer fails, as it did at PocketOS, the code layer still holds.

After a Claude Agent Deleted a Production Database in 9 Seconds, One Developer Built the Safety Gate That Should Have Existed

The Incident That Prompted the Pattern

Why Prompt-Level Safety Failed

Three Gates From Production Failures

The Contract

Get our morning briefing in your inbox

Keep Reading

Spotify Opens Its Library to AI-Generated Personal Podcasts via New CLI Tool

Hermes Agent Overtakes OpenClaw to #1 on OpenRouter with 224 Billion Daily Tokens

OpenAI Grants EU Commission Access to GPT-5.5-Cyber While Anthropic Withholds Mythos from Brussels