Cloudflare published a detailed technical breakdown on April 20 of iMARS (Internal MCP Agent/Server Rollout Squad), the internal agentic AI engineering stack it built over 11 months using products it ships to customers. The post, timed to close out Cloudflare’s Agents Week 2026, reveals production-scale numbers that few companies have disclosed publicly.
Production Numbers
In the last 30 days, according to the Cloudflare blog post:
- 3,683 internal users actively using AI coding tools (60% of the company’s approximately 6,100 employees, 93% of R&D)
- 47.95 million AI requests
- 295 teams using agentic AI tools and coding assistants
- 20.18 million AI Gateway requests per month
- 241.37 billion tokens routed through AI Gateway
- 51.83 billion tokens processed on Workers AI
The velocity impact: weekly merge requests climbed from a Q4 baseline of roughly 5,600 per week to over 8,700, with a peak of 10,952 in the week of March 23. That is nearly double the pre-iMARS baseline.
Architecture
Every layer of the stack maps to a Cloudflare product: Cloudflare Access for Zero Trust authentication, AI Gateway for centralized LLM routing with Zero Data Retention controls, Workers AI for on-platform inference with open-weight models, and the Agents SDK (McpAgent, Durable Objects) for stateful agent sessions.
The team deployed 13 production MCP servers exposing 182+ tools, built a 16,000+ entity knowledge graph using Backstage, and auto-generated AGENTS.md context files across 3,900 repositories. An AI Code Reviewer, processing 5.47 million requests, enforces standards compliance on every merge request.
A key design decision: all LLM requests route through a single proxy Worker from day one, rather than connecting clients directly to AI Gateway. This gave the team a central control plane for per-user attribution, model catalog management, and permission enforcement without touching client configurations.
Model Mix
Frontier models (OpenAI, Anthropic, Google) handle 91.16% of requests (13.38M per month), with Workers AI handling 8.84% (1.3M per month). Cloudflare highlighted Kimi K2.5 on Workers AI as a cost optimization lever: a security agent processing over 7 billion tokens per day costs an estimated 77% less than equivalent proprietary model pricing, according to the blog post.
The Replicability Argument
The notable aspect of this disclosure is not the adoption rate but the architecture’s replicability. Every component except Backstage (open-source) is a product Cloudflare sells. The implicit pitch: if you use Cloudflare’s platform, you can build the same stack. AI Gateway for routing and governance, Workers AI for cost-efficient inference, MCP servers for tool integration, and the Agents SDK for stateful orchestration. The dogfooding is the marketing.