General Compute, a California-based inference cloud company, announced a cloud platform built specifically for AI agent workloads, running on purpose-built AI accelerators rather than general-purpose GPUs. The company is working with early partners now, with general availability scheduled for May 15, 2026.

The platform’s architecture separates the prefill and decode stages of inference processing, allowing each stage to scale independently based on workload. It is designed to serve AI agents that make high volumes of LLM inference and tool calls, including agents that provision their own compute programmatically.

Agent-Native Design

The most notable feature: AI agents can sign up on their own, provision API keys, and begin making inference calls without human intervention. “The last 20 years we built for developers, the next 20 we will build for agents,” said Jason Goodison, co-founder and CTO of General Compute, in the announcement. “Our docs and API are optimized for both human and AI agent consumption.”

At launch, the platform will offer access to open-source LLMs across multiple model families and parameter sizes. Customers can also deploy their own models on the company’s infrastructure. The data center infrastructure operates on hydroelectric power with air-cooled accelerator hardware.

Context

General Compute enters a crowded inference market, but its agent-native positioning is distinct. Most inference clouds today are built for developer workflows: humans sign up, humans provision, humans manage. A platform designed from the ground up for agents to be the primary consumers reflects the shift toward autonomous infrastructure. The company was founded by Goodison and Finn Puklowski. Enterprise inquiries for dedicated infrastructure and SLAs are being handled directly.