Google is building tensor processing units (TPUs) specialized for inference workloads, targeting the response-time requirements of AI agents. Chief Scientist Jeff Dean confirmed the shift in an interview with Bloomberg, saying it “now becomes sensible to specialize chips more for training or more for inference workloads.” The company plans to announce its next generation of TPUs at Google Cloud Next in Las Vegas on April 23-24.
Training to Inference
Google has historically designed its TPUs as general-purpose AI accelerators handling both training and inference. Dean’s comments signal a strategic split: separate silicon optimized for each phase. “We are looking at a whole bunch of different things,” he told Bloomberg, including “the speed of AI results” the company wants to enable.
Amin Vahdat, who oversees Google’s AI infrastructure and chip work, declined to detail the inference chip specifically but said more would be shared “in the relatively near future.”
Customer Momentum
The timing aligns with growing TPU adoption among major AI players. According to the same Bloomberg report:
- Anthropic signed an expanded agreement for up to 1 million TPUs and separately partnered with Broadcom for chips enabling roughly 3.5 gigawatts of computing power starting in 2027.
- Meta signed a multibillion-dollar, multi-year deal and is currently testing what tasks TPUs are best suited for. Meta’s head of infrastructure Santosh Janardhan said “it does look like there might be inference advantages.”
- Citadel Securities plans to present at Cloud Next about using TPUs to train models faster than prior GPU-based work.
- G42, the Abu Dhabi tech conglomerate, has held “multiple discussions” about TPU access.
The Competitive Landscape
Nvidia remains dominant in training. But inference is a different race, and one with direct implications for agent performance. Every millisecond of latency matters when agents execute multi-step reasoning chains or respond to real-time user queries.
Gartner analyst Chirag Dekate told Bloomberg: “The battleground is shifting towards inference. In that battleground, Google has an infrastructure advantage.” Natalie Serrino, co-founder of Gimlet Labs (which routes AI tasks to optimal hardware), added that current TPUs are already “very good tools for the workload that is exploding,” referencing complex agent query processing.
The Next Web reported Google is also in discussions with Marvell Technology for additional inference chip designs, adding a third partner to its TPU supply chain as custom ASIC sales are projected to grow 45% in 2026.
Cloud Next as the Stage
IT Pro noted that Cloud Next’s opening keynote is titled “The Agentic Cloud,” positioning agent deployment as the conference’s central theme. Google Cloud reported 48% year-over-year revenue growth in Q4 2025 earnings, driven by AI platform demand. The inference chip announcement would give Google a concrete hardware story to match that narrative: purpose-built silicon for the agent workloads it wants enterprises to run on its cloud.
Nvidia began selling its own inference-focused chip last month, based on technology acquired from Groq in a reported $20 billion licensing deal. Google’s decade of custom silicon experience, combined with firsthand feedback loops between its model teams and hardware designers, gives it a differentiated approach. Among top AI developers, only Google makes its own chips at significant scale. OpenAI is only now starting to design custom silicon.