Microsoft shipped 16 vulnerability patches in Tuesday’s Patch Tuesday that were discovered entirely by an agentic AI system, not human researchers. The system, codenamed MDASH (multi-model agentic scanning harness), orchestrates more than 100 specialized AI agents across an ensemble of frontier and distilled models to find, validate, and prove exploitable bugs in the Windows codebase, according to a Microsoft Security Blog post by Taesoo Kim, VP of Agentic Security at Microsoft.

What MDASH Found

The 16 vulnerabilities span the Windows networking and authentication stack. Four are rated Critical, all remote code execution flaws:

  • CVE-2026-33827: Remote unauthenticated use-after-free via SSRR IPv4 packets in tcpip.sys
  • CVE-2026-33824: Double-free in ikeext.dll’s IKEv2 SA_INIT handler, triggering LocalSystem-level RCE
  • CVE-2026-40361 and CVE-2026-40364: Two additional Critical RCE flaws that Microsoft flagged as more likely to be exploited, per Help Net Security

The remaining 12 include denial-of-service, information disclosure, and security feature bypass vulnerabilities across tcpip.sys, ikeext.dll, and adjacent networking components.

How the Agentic Architecture Works

MDASH is not a single model scanning code. It is a five-stage pipeline where different agents handle different phases of the vulnerability discovery process, according to the Microsoft blog post:

  1. Prepare: Ingests source code, builds language-aware indices, maps the attack surface from past commit history
  2. Scan: Specialized auditor agents examine candidate code paths, producing findings with hypotheses and evidence
  3. Validate: A second cohort of debater agents argues for and against each finding’s reachability and exploitability
  4. Dedup: Collapses semantically equivalent findings
  5. Prove: Constructs and executes triggering inputs to dynamically confirm the vulnerability exists

The ensemble design uses frontier models as heavy reasoners, distilled models as cost-effective debaters for high-volume passes, and a separate frontier model as an independent counterpoint. Disagreement between models acts as a signal: when an auditor flags a finding and the debater can’t refute it, that finding’s credibility goes up.

Benchmark Results

Microsoft tested MDASH against StorageDrive, a private Windows driver used in internal offensive security interviews that contains 21 deliberately injected vulnerabilities. Because the codebase was never published, it could not appear in any model’s training data. MDASH found all 21 with zero false positives, according to the blog post.

On production code, MDASH achieved 96% recall against five years of confirmed MSRC cases in clfs.sys and 100% recall in tcpip.sys. On CyberGym, a public benchmark of 1,507 real-world vulnerabilities from OSS-Fuzz projects, it scored 88.45%, topping the leaderboard by roughly five points.

Team Origins

The Autonomous Code Security team behind MDASH includes members from Team Atlanta, the group that won the $29.5 million DARPA AI Cyber Challenge by building an autonomous cyber-reasoning system that found and patched real bugs in open-source projects, as noted in the Microsoft blog. That competition experience informed MDASH’s core design principle: the durable advantage is in the agentic system around the model, not in any single model itself.

The Competitive Landscape

MDASH enters an increasingly crowded field. OpenAI launched its Daybreak cybersecurity initiative using Codex Security and GPT-5.5-Cyber just days earlier, while Anthropic’s Project Glasswing targets similar enterprise security workflows. Microsoft’s differentiator is that MDASH’s findings shipped as actual patches in a production Patch Tuesday, not as a research paper or benchmark claim. Sixteen CVEs with customer-facing fixes is a concrete output that competing systems have not yet matched publicly.

MDASH is currently available to a small set of customers as part of a limited private preview, with broader availability not yet announced.