Sophos armed an OpenClaw agent with custom penetration testing tools, pointed it at a legacy production network, and let it run. The agent found 23 actionable vulnerabilities and compressed Active Directory reconnaissance from three days to three hours, according to a blog post published by Sophos on April 9.
Sophos ran the test against a real internal production network — a legacy on-premises environment where mission-critical workloads had already migrated to isolated cloud infrastructure, but a production network nonetheless, with all the operational constraints that implies.
Setup and Safety Architecture
The Sophos Red Team’s primary concern was safety, not capability. The team spent the majority of its preparation time building guardrails rather than testing attack surfaces, according to the blog post.
Their threat model centered on Simon Willison’s “Lethal Trifecta” framework: preventing the agent from simultaneously receiving untrusted content, accessing sensitive data, and exfiltrating that data externally. Strict network ingress and egress controls served as the first defense layer. The team built all skills in-house rather than using publicly available ones, which they described as “generally low quality.”
Sophos also addressed a subtler risk: goal-seeking behavior leading to unintended consequences. An agent tasked with “make this environment more secure” could theoretically conclude that encrypting everything and destroying the keys would accomplish that goal. To prevent this, the team built a human-in-the-loop approval mechanism into their custom skills.
The skills and redacted findings are published on GitHub.
Results
The engagement produced 23 high-quality, actionable findings. The agent demonstrated autonomous problem-solving: when a promising attack path was blocked, it suggested spinning up an EC2 GPU instance to crack an acquired hash, then proceeded after human authorization.
The assessment ran deliberately noisy, optimized for coverage and speed rather than stealth, which generated a high volume of detections and alerts across Sophos’s monitoring stack. The team noted that a covert red team engagement would require a different architecture and would likely trigger more model guardrails.
One persistent friction point: the underlying language models regularly refused to cooperate due to concerns about malicious use. The team worked around these guardrails “for the most part,” but the refusals introduced delays and required creative prompting.
The Documentation Advantage
Beyond finding vulnerabilities, the agent produced what Sophos described as a high-quality audit trail “at a level of detail not achievable via manual means,” which drastically simplified report writing. For penetration testing firms that bill by the engagement, faster reconnaissance and automated documentation represent a direct impact on margins and throughput.
Cybersecurity Teams as Early Adopters
Sophos’s conclusion cuts against the prevailing narrative that AI agents are primarily a risk vector. The blog post argues that cybersecurity teams are “better placed than anyone else to be at the forefront of adoption” because they already operate in environments with dangerous tooling, strict safety procedures, and documented operational frameworks.
“The world is forging ahead and securing agentic AI is fast becoming the era-defining challenge for the cybersecurity community,” Sophos wrote. The team positioned the experiment as evidence that the same discipline used to handle untrusted exploit code and early proof-of-concept tools can be applied to agent deployment.
The tradeoff is real: OpenClaw’s security track record includes multiple critical CVEs and well-documented incidents of agents acting outside intended boundaries. Sophos’s previous blog post on OpenClaw concluded that “even the most ‘risk-on’ organizations with deep AI and security experience will likely find it challenging to configure OpenClaw in a way that effectively mitigates the risk of compromise or data loss, while still retaining any productivity value.” This follow-up demonstrates that their own team found a way, though with significant engineering investment in custom skills and safety architecture.