Microsoft’s open-source Agent Governance Toolkit (AGT), currently in public preview, takes a position that prompt-level safety instructions are insufficient for production agent systems. Instead of asking agents to follow rules, AGT intercepts every tool call, message send, and delegation in deterministic application code before the model’s intent reaches external systems. The toolkit covers all 10 OWASP Agentic Top 10 risks with more than 13,000 built-in tests, according to InfoWorld’s coverage.
Deterministic Enforcement Over Probabilistic Safety
AGT’s GitHub repository states the premise directly: “Prompt-level safety is not a control surface. It is a polite request to a stochastic system.” The README cites Andriushchenko et al. (ICLR 2025) reporting 100% attack success rates on GPT-4o, Claude 3, and Llama-3 using adaptive attacks, and Microsoft’s own AI Red Teaming Agent research concluding that “mitigations do not eliminate risk entirely.”
The alternative: actions the AGT kernel denies are not unlikely outcomes. They are structurally impossible. Policy evaluation runs in less than 0.1ms per operation, according to Microsoft’s figures cited by InfoWorld.
How Policy Enforcement Works
Developers define governance rules in YAML files that specify what agents can and cannot do. A MarkTechPost implementation walkthrough demonstrates the full stack: policies that block destructive database operations, require human approval for external email, sandbox shell commands with blocked-term lists, deny sensitive data access to low-trust agents, and gate financial transactions above thresholds.
Each rule evaluates the agent’s identity, trust score, risk tier, requested tool, action type, and sensitivity level. Possible outcomes are allow, deny, require_approval (routed to named approvers), or sandbox (restricted execution with constraints). The governance layer wraps any tool function in two lines of Python:
from agentmesh.governance import govern
safe_tool = govern(my_tool, policy="policy.yaml")
Every call is checked, logged, and enforced. Blocked actions raise GovernanceDenied exceptions.
Tamper-Evident Audit Trails
Every governance decision produces a chained audit record. Each record contains the policy name, agent identity, tool requested, action details, decision rendered, matched rule, severity, and reason. Records are hash-chained so that modifications to any entry break the verification chain. InfoWorld describes this as a “decision bill of materials” that tracks governance decisions with audit chains and trust-level details.
Kill Switches and Drift Detection
AGT includes a global kill switch that immediately blocks all agent actions when activated. The toolkit also uses declarative policies to detect agents drifting from set baselines, flagging issues before they cost money or affect operations, according to InfoWorld.
Token budget management is built in. Policies can throttle activities as agents approach preset limits and reject actions likely to use excessive tokens. This addresses a documented pattern where agent context-seeking activities overwhelm APIs designed for human interaction volumes.
Multi-Language, Multi-Cloud
AGT ships packages for Python, TypeScript, .NET, Rust, and Go (Python has the most complete implementation). The toolkit works with Azure Foundry, Amazon Bedrock, Google ADK, and most common agent orchestration frameworks. It integrates with Claude Code via a plugin marketplace and with MCP servers through a .NET governance extension.
The architectural decision InfoWorld highlights as most significant: AGT treats agents as code running on a secure operating system, using concepts from hypervisors to isolate agents from the underlying platform. The core governance package is named “Agent OS.”
The Enterprise Governance Shift
The toolkit represents a broader shift in how enterprises approach agent safety. Rather than “lock agents in a sandbox” or rely on prompt engineering, the model is “deploy agents with visibility and control.” Policy evaluation happens at the action layer, not the model layer. Identity is per-agent (solving the “five agents sharing one API key” attribution problem). And every decision is auditable for regulators and compliance teams.
For teams already running Microsoft’s Agent Framework, AGT plugs into the same Foundry deployment pipeline. For everyone else, it’s a pip install that wraps existing tool functions regardless of the underlying framework.