Cloudflare announced it is expanding AI Gateway into a unified inference layer, giving developers a single API to call 70+ models across 12+ providers. The announcement, published April 16, positions Cloudflare as the abstraction tier between AI agents and a fragmented model market.
One API, All Models
Developers using Cloudflare Workers can now call third-party models from OpenAI, Anthropic, Google, and others using the same AI.run() binding they already use for Workers AI models. Switching between providers is a one-line code change. REST API support for non-Workers environments is coming in the following weeks, according to the blog post.
The expanded catalog includes models from Alibaba Cloud, AssemblyAI, Bytedance, Google, InWorld, MiniMax, OpenAI, Pixverse, Recraft, Runway, and Vidu. Cloudflare is also extending beyond text to include image, video, and speech models for multimodal agent applications.
The Agent Cost Problem
Cloudflare frames the product around a specific pain point for agent builders. A chatbot might make one inference call per user prompt. An agent might chain ten calls to complete a single task, turning a 50ms provider latency penalty into 500ms and converting a single failed request into a cascade of downstream failures.
The company cites research from AIDBIntel showing companies call an average of 3.5 models across multiple providers, which means no single provider gives a holistic view of AI usage or spend. AI Gateway consolidates cost monitoring with custom metadata tagging, letting teams break down spend by user tier, individual customer, or specific workflow.
Bring Your Own Model
Cloudflare also previewed a bring-your-own-model capability for Workers AI, leveraging Replicate’s Cog containerization technology. Teams can deploy custom fine-tuned models on Cloudflare’s infrastructure using a cog.yaml configuration file and a Python prediction script, without managing CUDA dependencies or weight loading.
Where This Fits
This announcement is the infrastructure complement to Cloudflare’s Agents Week announcements from April 13-15, which delivered Agents SDK v2, Project Think for durable agent execution, and Browser Run for agent browser automation. Those products give agents tools to act. The AI Platform gives agents the ability to select and switch between the models powering their reasoning, without locking into any single vendor’s pricing or availability.
For agent builders running multi-step workflows where each step may benefit from a different model (fast classification, deep reasoning, lightweight execution), a unified inference layer with automatic failover and consolidated billing solves an operational problem that currently requires custom plumbing.