Shandong University Researchers Test OpenClaw Against 47 Attack Scenarios, Find 17% Average Defense Rate

A team of security researchers from Shandong University published the first comprehensive security analysis of OpenClaw earlier this month, testing the open-source AI agent framework against 47 adversarial scenarios. The results paint a stark picture: OpenClaw’s native defenses stopped attacks an average of 17% of the time, with sandbox escape attacks succeeding almost without exception.

The 12-page paper, titled “Don’t Let the Claw Grip Your Hand: A Security Analysis and Defense Framework for OpenClaw,” appeared on arXiv on March 11 and has since gained significant traction in the infosec community. The IT4SEC security newsletter featured a detailed breakdown this week, calling it “one of the first public security assessments of OpenClaw” and “a must-read if you’re into modern AI cybersecurity.”

What They Tested

The researchers designed 47 attack scenarios organized across six major categories drawn from the MITRE ATLAS and MITRE ATT&CK frameworks. ATLAS covers adversarial threats to machine learning systems; ATT&CK catalogs the tactics and techniques attackers use against enterprise networks. By combining both, the researchers tested OpenClaw not just as an AI system but as a piece of software with OS-level access sitting inside a user’s machine.

The core finding: OpenClaw’s architecture lacks built-in security constraints. It delegates security decisions to whatever backend language model it connects to. When the model refuses a malicious instruction, OpenClaw is defended. When the model complies, the agent executes whatever it’s told, with full access to the host operating system.

Defense success rates ranged from roughly 17% to 83% depending on which language model was connected to OpenClaw. The variance is significant. A user running a model with strong refusal behavior gets meaningfully different security outcomes than one running a more permissive model, but OpenClaw itself provides no independent guardrails in either case.

Sandbox escape attacks were particularly effective. The researchers found that OpenClaw’s broad OS-level permissions create a wide attack surface: an attacker who compromises the agent through prompt injection or malicious instructions gains access to the full machine, not just the agent’s sandbox.

The Proposed Fix

The paper proposes a Human-in-the-Loop (HITL) defense layer as mitigation. Under this approach, high-risk actions require explicit human approval before execution. The researchers tested their HITL implementation and found it intercepted up to 8 severe attacks that completely bypassed OpenClaw’s native defenses. With HITL active, the overall defense rate improved to a range of 19% to 92%.

The code and attack samples are available on GitHub, allowing other researchers to reproduce the findings and test against different model configurations.

Why It Matters Now

The paper itself is two and a half weeks old, but its circulation this week through infosec newsletters and security communities marks a shift. Security researchers are moving from theoretical concerns about AI agent risks to empirical testing of specific deployed systems.

This adds to an existing thread NCT has been tracking. OpenClaw has faced a series of security disclosures in 2026, including multiple CVEs and official guidance from Chinese cybersecurity authorities. What remains absent is an official response from OpenClaw’s core maintainers addressing the structural security concerns the Shandong paper raises.

The research also arrived the same week that RSA Conference 2026 declared AI agent identity and governance the breakout security theme of the year. The convergence is not coincidental: as agent frameworks move from developer toys to enterprise tools, the security community is catching up to what the deployment reality looks like.

For builders and operators running OpenClaw in production, the immediate takeaway from the Shandong paper is concrete: the agent’s security posture depends almost entirely on the backend model’s refusal behavior, and that is not a reliable defense. Until OpenClaw ships independent security constraints at the framework level, HITL approval workflows for high-risk actions are the strongest available mitigation.

Shandong University Researchers Test OpenClaw Against 47 Attack Scenarios, Find 17% Average Defense Rate

What They Tested

The Proposed Fix

Why It Matters Now

Get our morning briefing in your inbox

Keep Reading

AIxCrypto Launches Agentir for On-Chain AI Agent Simulation and Testing

India's AI Agent Market Splits: Emergent Hits $1.5B Valuation as Krutrim Shuts Down Agent Platform

AWS Gives AI Agents Their Own Managed Desktop to Operate Legacy Applications Without API Rewrites