Cloudflare Built Its Internal AI Engineering Stack on Products It Ships, Reaching 93% R&D Adoption

Cloudflare published a detailed technical breakdown on April 20 of iMARS (Internal MCP Agent/Server Rollout Squad), the internal agentic AI engineering stack it built over 11 months using products it ships to customers. The post, timed to close out Cloudflare’s Agents Week 2026, reveals production-scale numbers that few companies have disclosed publicly.

Production Numbers

In the last 30 days, according to the Cloudflare blog post:

3,683 internal users actively using AI coding tools (60% of the company’s approximately 6,100 employees, 93% of R&D)
47.95 million AI requests
295 teams using agentic AI tools and coding assistants
20.18 million AI Gateway requests per month
241.37 billion tokens routed through AI Gateway
51.83 billion tokens processed on Workers AI

The velocity impact: weekly merge requests climbed from a Q4 baseline of roughly 5,600 per week to over 8,700, with a peak of 10,952 in the week of March 23. That is nearly double the pre-iMARS baseline.

Architecture

Every layer of the stack maps to a Cloudflare product: Cloudflare Access for Zero Trust authentication, AI Gateway for centralized LLM routing with Zero Data Retention controls, Workers AI for on-platform inference with open-weight models, and the Agents SDK (McpAgent, Durable Objects) for stateful agent sessions.

The team deployed 13 production MCP servers exposing 182+ tools, built a 16,000+ entity knowledge graph using Backstage, and auto-generated AGENTS.md context files across 3,900 repositories. An AI Code Reviewer, processing 5.47 million requests, enforces standards compliance on every merge request.

A key design decision: all LLM requests route through a single proxy Worker from day one, rather than connecting clients directly to AI Gateway. This gave the team a central control plane for per-user attribution, model catalog management, and permission enforcement without touching client configurations.

Model Mix

Frontier models (OpenAI, Anthropic, Google) handle 91.16% of requests (13.38M per month), with Workers AI handling 8.84% (1.3M per month). Cloudflare highlighted Kimi K2.5 on Workers AI as a cost optimization lever: a security agent processing over 7 billion tokens per day costs an estimated 77% less than equivalent proprietary model pricing, according to the blog post.

The Replicability Argument

The notable aspect of this disclosure is not the adoption rate but the architecture’s replicability. Every component except Backstage (open-source) is a product Cloudflare sells. The implicit pitch: if you use Cloudflare’s platform, you can build the same stack. AI Gateway for routing and governance, Workers AI for cost-efficient inference, MCP servers for tool integration, and the Agents SDK for stateful orchestration. The dogfooding is the marketing.

Cloudflare Built Its Internal AI Engineering Stack on Products It Ships, Reaching 93% R&D Adoption

Production Numbers

Architecture

Model Mix

The Replicability Argument

Get our morning briefing in your inbox

Keep Reading

AIxCrypto Launches Agentir for On-Chain AI Agent Simulation and Testing

India's AI Agent Market Splits: Emergent Hits $1.5B Valuation as Krutrim Shuts Down Agent Platform

AWS Gives AI Agents Their Own Managed Desktop to Operate Legacy Applications Without API Rewrites