Sakana AI Launches Fugu, a Multi-Model Orchestration API for Agent Inference Workflows

Sakana AI, the Tokyo-based lab founded by former Google researchers including a co-author of “Attention Is All You Need,” released Fugu, a multi-agent orchestration system that serves a coordinated pool of AI models through a single OpenAI-compatible API. The product is live at sakana.ai/fugu.

Instead of manually routing prompts to different models, Fugu handles model selection and delegation automatically. Developers call one endpoint, and the system assembles, routes, and coordinates specialized models for each task, returning a single fused response.

Two Tiers, One Endpoint

Fugu ships in two configurations, both accessible through the same API, according to Sakana’s product page.

Fugu (standard) balances performance with low latency, designed for everyday coding, code review, and chatbot workflows. Fugu Ultra is tuned for harder, multi-step reasoning tasks, with Sakana claiming frontier-level performance on complex benchmarks.

On Sakana’s published benchmarks, Fugu Ultra scores 82.1 on Terminal Bench (compared to 80.4 for Fable 5) and 93.2 on Live Code Bench, according to Julian Goldie’s independent testing notes. These figures are self-reported by Sakana and should be verified independently.

Developers can also control which models participate in Fugu’s pool, opting out specific providers to meet data, privacy, or compliance requirements.

Research Foundation

The system builds on two ICLR 2026 papers from Sakana’s research team, according to Sakana. TRINITY uses a lightweight evolved coordinator to orchestrate multiple LLMs across several turns, assigning Thinker, Worker, and Verifier roles to delegate work across coding, math, reasoning, and knowledge tasks. The Conductor, trained with reinforcement learning, discovers natural-language coordination strategies that help diverse LLM pools outperform individual models on reasoning benchmarks.

The approach is one-shot by design. The model panel runs in parallel and returns a synthesized result rather than engaging in multi-turn back-and-forth loops.

Pricing and Availability

Sakana positions Fugu at roughly 25% of OpenRouter Fusion’s per-prompt cost, with an additional flat-rate subscription option for high-volume users, according to Julian Goldie.

One significant limitation: Fugu is not yet available in the EU or EEA. Sakana’s product page notes the team is working toward GDPR compliance, according to Sakana.

Where Multi-Model Orchestration Fits for Agent Builders

Fugu enters a market where multi-model orchestration is becoming a default layer in production agent stacks. OpenRouter Fusion, which uses a similar panel-and-synthesis approach, has gained traction with developers running high-volume inference loops. Fugu’s pitch is lower cost and lower latency for the same architectural pattern.

For teams building autonomous agents that need to balance model quality against inference speed and cost, orchestration APIs like Fugu reduce the integration burden of juggling multiple providers. The flat-rate pricing model is particularly relevant for agent workloads where per-token costs can compound across long-running loops.

Sakana AI Launches Fugu, a Multi-Model Orchestration API for Agent Inference Workflows

Two Tiers, One Endpoint

Research Foundation

Pricing and Availability

Where Multi-Model Orchestration Fits for Agent Builders

Get our morning briefing in your inbox

Keep Reading

Google Integrates Computer Use Natively Into Gemini 3.5 Flash, Matching GPT-5.5 at One-Third the Cost

Salesforce Publishes 12 Rules for Agentic AI After 20,000 Production Deployments Expose Common Failure Modes

Seltz Raises $12.5 Million Seed Round to Deploy Autonomous AI Agents on X and TikTok