Corporate America is starting to route AI agent tasks to the cheapest model that can handle them, and the numbers suggest frontier labs should pay attention. CNBC reported Thursday that enterprises deploying AI agents are shifting from running all work on premium models to intelligent routing layers that match task complexity to model capability, cutting token spend by 40 to 60 percent.

The Cost Math That Broke the Default

The problem is straightforward: most companies defaulted to the most powerful model for every query, regardless of complexity. Glean CEO Arvind Jain told CNBC that roughly 95% of enterprise AI usage still runs on the most expensive frontier models, even for tasks cheaper alternatives handle easily.

Cisco’s numbers illustrate why that default is unsustainable. Chief Product Officer Jeetu Patel laid out the arithmetic: at roughly $200 of token usage per employee per week, that scales to about $10,000 per person annually. For a company with 90,000 employees, the annual bill approaches $900 million. Patel told CNBC that Cisco came in well over its own budget and has had to reallocate resources, prioritizing token spend over other costs as 30,000 engineers now build products written largely with AI.

How Routing Works

Cognition CEO Scott Wu, whose company builds the coding agent Devin, described the gain to CNBC: for boilerplate work, companies can get five to ten times better cost efficiency using models that are still good enough for the task. His example was pointed: ask any model to name the third U.S. president, and whether it costs $4 or $0.50 per million tokens, the answer is Thomas Jefferson.

The routing concept is already shipping as product. Factory AI launched Factory Router on June 1, according to Startup Fortune, claiming 20 to 25 percent cost reduction while preserving 99% of Claude Opus 4.7’s pass rate on Terminal-Bench 2. More aggressive routing dropped performance to 81% of Opus at 56% of the cost. Perplexity AI announced a hybrid local-server inference orchestrator at Computex 2026 that automatically routes agent tasks between on-device and cloud models.

The Revenue Threat

The business model risk is clear. OpenAI and Anthropic built their businesses, and the IPO expectations around them, on the assumption of enormous demand at premium prices. If enterprises widely adopt routing, both companies lose their “run everything on Claude or GPT” tax on every task.

Patel told CNBC he does not think routing sinks the frontier labs, and that cutting-edge technology will remain valuable. But he sees the pricing model shifting: the labs will have to get more efficient with how the models are used rather than simply charging more.

Cognition is responding to the ROI pressure from a different angle. The company announced an “AI productivity guarantee”: if Devin delivers less engineering value than a customer pays for, Cognition will fund usage up to $10 million until performance catches up. Wu told CNBC that companies should be measuring output, not activity. “You can spend billions of tokens and be doing nothing with it,” he said.

The Valuation Question

Pricing power is shifting from the companies selling premium AI toward the companies buying it. The frontier labs will still command a premium for the hardest work. The question CNBC poses is how much of the market is “the other stuff,” the routine classification, extraction, formatting, and retrieval that agents do thousands of times per day. If most agent tasks are simple ones, and routing sends those to $0.50 models instead of $4 models, the total addressable revenue for frontier labs shrinks significantly. For OpenAI and Anthropic, both preparing for public markets, the answer determines whether current valuations hold.