Cloudflare published a technical deep-dive on the internal AI engineering stack it built over the past eleven months, providing some of the most granular production metrics yet disclosed by a major tech company running agentic AI tools at scale. The system serves 3,683 active users across 295 teams, processes 241.37 billion tokens monthly through AI Gateway, and has nearly doubled merge request volume, according to the Cloudflare blog.
The Numbers
The 30-day metrics from Cloudflare’s internal deployment:
- 3,683 active users on AI coding tools (60% company-wide, 93% of R&D), out of approximately 6,100 total employees
- 47.95 million AI requests
- 20.18 million AI Gateway requests per month
- 241.37 billion tokens routed through AI Gateway
- 51.83 billion tokens processed on Workers AI
The developer velocity impact is quantifiable. The 4-week rolling average of merge requests climbed from roughly 5,600/week to over 8,700. The peak week of March 23 hit 10,952, nearly double the Q4 baseline.
Three-Layer Architecture
The stack is organized into three layers, all built on shipping Cloudflare products rather than internal-only infrastructure.
The platform layer handles authentication via Cloudflare Access, centralized LLM routing through AI Gateway, and inference on Workers AI. Every request from every engineer flows through a single proxy Worker that enables per-user attribution, model catalog management, and permission enforcement. Cloudflare routes between frontier models (OpenAI, Anthropic, Google, handling 91% of requests) and its own Workers AI (9%) based on task complexity and security sensitivity.
The knowledge layer uses a 16,000-entity Backstage knowledge graph and AGENTS.md files across thousands of repositories to give agents context about Cloudflare’s systems. The team found that MCP servers alone were insufficient: agents needed to understand how standards are codified, how code gets reviewed, how engineers onboard, and how changes propagate.
The enforcement layer includes an AI Code Reviewer that runs on 100% of merge requests, plus an Engineering Codex that encodes engineering standards as agent-consumable skills.
Cost Architecture
One detail stands out: Cloudflare’s security agent alone processes over 7 billion tokens per day running on Kimi K2.5 via Workers AI. The company estimates that workload would cost $2.4 million per year on a mid-tier proprietary model. Running it on Workers AI cuts that by 77%, according to the blog post.
The setup started with a tiger team called iMARS (Internal MCP Agent/Server Rollout Squad) eleven months ago. The sustained work now sits with the Dev Productivity team, the same group that owns CI/CD, build systems, and automation.
What the Data Shows
Cloudflare is not reporting a pilot. These are production metrics from a company with 6,100 employees where the majority of R&D is actively using AI tools every day. The merge request data is particularly notable because it is a lagging indicator of actual engineering output, not just tool adoption.
The open question is whether the productivity gains compound or plateau. Cloudflare’s next step, mentioned in the Agents Week wrap-up, is background agents: autonomous systems that can clone repos, run builds, and execute tests in sandboxed environments without a developer actively driving the session.