Groq is raising up to $650 million from existing investors to scale its inference cloud business, Axios first reported on May 28. The fundraise marks a deliberate pivot to inference cloud — Groq has exited chip manufacturing entirely.

The round follows a $20 billion licensing deal with Nvidia announced in December 2025. In that arrangement, Nvidia licensed Groq’s hardware technology and hired much of Groq’s senior leadership, including founder Jonathan Ross and president Sunny Madra. Nvidia characterized the deal as a non-exclusive IP license, not an acquisition. Existing Groq investors were paid out in cash as part of the transaction, according to TechCrunch.

Those same investors are now being asked to fund what Groq is calling “Groq 2.0.”

The Inference Bet

Interim CEO Adam Winter and CFO Matt Eng are leading the company’s new direction, focused entirely on inference: the compute-intensive processing triggered by every AI prompt. Training builds models. Inference runs them. And inference is where the money is moving.

Groq’s GroqCloud platform hosted more than 3.5 million developers as of February 2026, up from 2 million at its September 2025 funding round, according to PYMNTS. That September raise was $750 million at a $6.9 billion valuation, itself more than double the $2.8 billion valuation from an August 2024 round.

Backers Disruptive and Infinitium have agreed to cover any pro-rata shares that other existing investors decline, effectively guaranteeing the round closes, Axios reported. Reuters independently confirmed the fundraise target through a source familiar with the matter.

Why Inference Matters for Agent Infrastructure

Autonomous AI agents are inference-intensive by design. Every tool call, every reasoning step, every action in an agentic loop triggers a model inference. A single agent task that involves planning, searching, reading, and executing can generate dozens of inference calls in seconds. Low-latency, high-throughput inference is the bottleneck for scaling agent deployments from demos to production.

Groq’s custom LPU (Language Processing Unit) architecture was built for exactly this workload profile: deterministic latency with high token throughput. That positions the company’s inference cloud as infrastructure for the agentic layer, competing with Databricks, OpenAI’s API, Anthropic’s inference endpoints, and Nvidia’s own inference platforms.

What Groq Left Behind

The December deal with Nvidia effectively split Groq into two stories. Nvidia got the hardware talent and IP license. Groq kept the cloud platform, the developer base, and enough investor confidence to raise again. Whether $650 million is enough to compete in an inference market where Nvidia, the hyperscalers, and well-funded startups like Cerebras are all building capacity remains the open question.

The fundraise signals that investors see inference as a distinct, defensible layer in the AI stack, separate from both model training and chip manufacturing. For teams building agent systems, the competitive dynamics in this layer will directly determine what inference costs and how fast agents can operate at scale.