Google Cloud has launched two eighth-generation Tensor Processing Units: the TPU 8t optimized for massive-scale training and the TPU 8i for low-latency inference. For the first time, Google is splitting its TPU line into specialized chips rather than shipping one design for all workloads, a decision the company says reflects how the computational demands of training and serving have fundamentally diverged in the agentic era.

TPU 8t: Training at Scale

The TPU 8t is built for pre-training and embedding-heavy workloads. Key specs from Google’s technical deep dive:

  • 9,600 chips per superpod, scaling to over 1 million TPU chips in a single training cluster via JAX and Pathways.
  • Native FP4 (4-bit floating point) doubles MXU throughput while maintaining accuracy at lower-precision quantization, reducing energy-intensive data movement.
  • SparseCore, a specialized accelerator that handles the irregular memory access patterns of embedding lookups, offloading data-dependent operations from the main matrix multiply unit.
  • Virgo Network topology delivers up to 4x increased data center network bandwidth over the previous generation, with over 47 petabits per second of non-blocking bisectional bandwidth connecting 134,000+ chips in a single fabric.
  • TPUDirect Storage bypasses CPU host bottlenecks for direct memory access between TPU and Managed Lustre 10T storage, delivering 10x faster storage access compared to seventh-generation Ironwood TPUs.

The training chip integrates Arm-based Axion CPU headers to eliminate the host bottleneck caused by data preparation latency, keeping TPUs fed with preprocessed data during training runs.

TPU 8i: Inference for Agents

The TPU 8i targets the other side of the pipeline: serving models that power autonomous agents requiring real-time, multi-turn reasoning. Google’s announcement frames the chip around the specific demands of agentic workloads, including long context windows, complex sequential logic, and continuous learning loops where agents simulate future scenarios before acting.

Both chips are designed to train and serve world models like Google DeepMind’s Genie 3, enabling millions of agents to practice and refine their reasoning in simulated environments, according to the Google Cloud Blog.

The Hardware Bet on Agents

General availability for both chips is planned for later in 2026. The timing aligns with Google’s broader push at Cloud Next 2026, where the company rebranded Vertex AI to the Gemini Enterprise Agent Platform and announced a $750 million agentic AI partner fund.

The split-chip strategy is Google’s clearest infrastructure signal yet that agentic workloads are not just a software layer on top of existing models. They require purpose-built silicon with different optimization profiles for training and serving. As CGTN reported, the announcement positions Google’s custom silicon directly against Nvidia’s dominance in the AI chip market at a moment when compute demand is outstripping supply across the industry.