Microsoft published a technical blueprint on May 31 detailing a two-layer architecture for its open-source Agent Framework that formally separates agent construction from agent execution. The approach introduces SKILL files as structured contracts that teach coding agents how to produce framework-compliant artifacts, with Microsoft Foundry serving as the unified deployment and runtime platform.

The Two-Layer Split

The core thesis, outlined in a Microsoft Community Hub post, is that most agent development tutorials collapse two distinct jobs into one: building an agent (writing code, defining tools, evaluating, packaging) and running an agent (planning, reasoning, calling tools, remembering users). Microsoft’s architecture makes the separation explicit.

Layer 1 is the “Coding Agent,” which is GitHub Copilot Chat in Agent Mode configured with a domain-aware agent definition. It reads SKILL files, generates code, runs validation, and produces artifact bundles. Layer 2 is the “Runtime Agent,” the production system that handles planning, tool-calling, memory, and user interactions through channels like Microsoft Teams or custom AG-UI frontends.

What a SKILL File Contains

A SKILL is a versioned contract file stored in .github/skills/ that contains six sections: scope and when-to-use declarations, framework idioms (exact patterns for constructing clients and registering tools), code patterns (naming, import order, error handling), fixture/data contracts (how to load test data), anti-patterns (what not to do), and acceptance heuristics (how to map requirements to runnable checks).

The framework ships six SKILLs covering three capability surfaces across Python and .NET: single-agent on Foundry, multi-agent workflows (WorkflowBuilder, executors, human-in-the-loop), and AG-UI server/client patterns.

When a developer gives the Coding Agent a plain-language instruction, it routes to the relevant SKILL, loads it into context, builds a plan where every item traces back to the SKILL (how) or the acceptance criteria (what), and only then writes code. The output is not just a script but a complete artifact bundle: source code, agent definitions, workflow graphs, eval rows, tests, configs, and documentation.

Validation Before Deployment

Before any artifact reaches Foundry, four gates run: unit and integration tests against fixture data, lint and type checks (ruff/mypy for Python, build warnings for .NET), evaluation against a versioned eval set measuring tool-call accuracy, and red-team probes using Foundry’s adversarial testing SDK.

The framework positions this as the difference between “we built an agent” and “we built an agent with a pass rate on a versioned eval set plus a red-team report.”

Foundry as the Runtime Layer

Once validation passes, artifacts publish to Microsoft Foundry, where agent definitions, Skills (runtime capabilities), Toolbox tools, and custom evals register against a project. The same artifact set deploys to dev, staging, and production with no environment-specific code.

The Agent Framework repository on GitHub supports Python and .NET as first-class languages, with features including graph-based workflow orchestration (sequential, concurrent, handoff, group collaboration), checkpointing, streaming, human-in-the-loop controls, and built-in OpenTelemetry integration for distributed tracing. Foundry Hosted Agents, marked as new, allow deployment to Foundry infrastructure with two additional lines of code.

The Convention Drift Problem

The SKILL pattern targets a specific enterprise pain point: convention drift across teams building agents. When a framework releases a new idiom, updating the SKILL file once means every agent built afterwards picks it up automatically. Without it, coding assistants produce “generic Copilot output” that doesn’t match organizational standards.

Microsoft frames the SKILL distinction carefully in the documentation: a SKILL file (uppercase, in .github/skills/) instructs the Coding Agent at build time. An Agent Skill (a Foundry concept) is a named runtime capability the deployed agent calls. Layer 1’s SKILLs produce, among other artifacts, Layer 2’s Skills.

Why the Architecture Matters for Agent Teams

The separation addresses a governance gap that grows with scale. When build-time concerns (code quality, eval coverage, red-team results) leak into runtime operations, teams lose visibility into which layer failed when something breaks. By making the boundary explicit and enforcing validation gates between them, organizations can audit agent code the same way they audit application code: with versioned artifacts, reproducible evaluations, and clear ownership boundaries.