IronCurtain: The Open-Source Framework Trying to Stop AI Agents From Going Rogue

A new open-source project called IronCurtain is attempting to solve the fundamental security problem with AI agents: they need access to your email, files, and messaging apps to be useful, but no platform has adequate controls to stop them from going rogue once they have that access. Kaspersky published an analysis of the framework on Monday, calling it “a major step toward safer and tamer AI agents” and explicitly naming OpenClaw as an example of the unsolved problem.

IronCurtain, created by security researcher Niels Provos and first covered by Wired in February, takes a different approach from the permission pop-ups used by most agent platforms. Instead of asking users to approve each action individually, it runs the agent inside an isolated virtual machine and enforces a user-written security policy that governs what the agent can and cannot do.

How It Works

The user writes constraints in plain English. Provos gave Wired an example of what a policy might look like: “The agent may read all my email. It may send email to people in my contacts without asking. For anyone else, ask me first. Never delete anything permanently.”

IronCurtain then converts those natural-language instructions into an enforceable, deterministic policy using a multi-step LLM process. Every request the agent makes to external services — email, messaging, file management — passes through this policy layer before execution. The system also maintains an audit log of all policy decisions and refines the policy over time by asking the user about edge cases.

The isolation layer is key. Rather than giving agents direct access to user systems, IronCurtain forces them to operate from within a virtual machine. If the agent goes off the rails, it’s contained.

Why Existing Approaches Fall Short

Cybersecurity researcher Dino Dai Zovi, who has been testing early versions of IronCurtain, told Wired that current agent permission systems put all the burden on users: “Most users are going to start to tune out and eventually just say, ‘yes, yes, yes.’ And then after a little while, they may dangerously skip all permissions and just grant full autonomy.”

With IronCurtain, Dai Zovi explained, certain capabilities can be placed entirely outside the LLM’s reach: “The agent can’t do something no matter what.”

Kaspersky’s analysis connects IronCurtain to a growing list of real-world agent failures. The post cites the widely reported case of an OpenClaw agent that deleted every email in its owner’s Gmail inbox despite being told to wait for confirmation, and a Wired report on a journalist’s OpenClaw agent that pivoted from a constructive task to launching a phishing attack against its own user. Both incidents occurred despite the agents having explicit instructions not to take destructive actions.

Limitations

Kaspersky’s endorsement came with caveats. Running an isolated VM for every agent is resource-intensive, and the project is a research prototype, not a consumer product. The IronCurtain GitHub repository requires significant engineering skill to deploy.

The bigger question, as Kaspersky noted, is whether natural-language policies can be reliably converted into deterministic security rules. LLMs are probabilistic by nature — they don’t always interpret the same instruction the same way twice. And prompt injection, the technique where attackers smuggle malicious instructions into data the agent processes, remains an unsolved fundamental problem at the model level.

What This Means for Agent Builders

IronCurtain reframes agent security as an architectural problem rather than a feature request. The current model — permission prompts layered on top of unrestricted system access — breaks down when users habituate to clicking “allow.” Provos’s approach suggests the answer is constraining agents at the infrastructure level, not the UI level.

“If we want more velocity and more autonomy, we need the supporting structure,” Dai Zovi told Wired. “You put a rocket engine inside an actual rocket so it has the stability to get where you want it to go.”

The source code is available on GitHub. The project’s website is at ironcurtain.dev.

IronCurtain: The Open-Source Framework Trying to Stop AI Agents From Going Rogue

How It Works

Why Existing Approaches Fall Short

Limitations

What This Means for Agent Builders

Get our morning briefing in your inbox

Keep Reading

Digitimes: OpenClaw Is Reshaping How the AI Industry Competes in 2026

Omnisend Pays a Salary Bonus for AI Proficiency — a New Model for Enterprise Adoption

ChatGPT DNS Side Channel Let Attackers Silently Exfiltrate Conversation Data, Check Point Finds