Perplexity AI CEO Aravind Srinivas took the stage at Computex 2026 in Taipei on June 2 alongside Intel CEO Lip-Bu Tan to demonstrate what the company calls the first hybrid local-server inference orchestrator. The system automatically decides, in real time and mid-task, which AI workloads stay on a user’s device and which get routed to frontier models in the cloud. The feature is expected to arrive in Perplexity Computer in July 2026.
How the Routing Works
A compact AI model runs locally on the user’s device and acts as a classifier. For each incoming task or subtask, it evaluates three things: whether the data is sensitive, whether the computation requires frontier-scale capability, and whether a smaller model can handle the work adequately. Based on that evaluation, work either stays local or goes to the cloud.
Financial records, health information, and personal files stay on-device. Tasks requiring heavy reasoning get routed to frontier models. Most real workflows involve both, so the system splits them and coordinates the pieces, according to Perplexity’s announcement.
“No product has done this before,” a Perplexity spokesperson told VentureBeat. The system asks for user permission before sending sensitive tasks to the cloud, a design choice aimed at enterprise data governance concerns.
The claim is not that a model can run locally. Dozens of tools already do that. The distinction, as VentureBeat noted, is that Perplexity’s system makes the routing decision itself, task by task, without requiring the user to choose in advance.
Product Context
Perplexity Computer launched in February 2026 as a cloud-based multi-model agent that coordinates up to 20 AI models in a single workflow, available on the $200/month Max subscription tier. Personal Computer, a Mac app that brought those capabilities onto the local device with access to local files and native apps, launched in April 2026.
The hybrid orchestrator extends Personal Computer’s architecture. Previously, the division of labor was relatively fixed: local file access on-device, heavy computation on Perplexity’s servers. The new system reasons about where each piece of a task should execute, not just which model to use but which physical location should process it.
The Cost Angle
Srinivas has been explicit about the economics. In a Bloomberg Television interview at Computex, he said: “You don’t want all your compute centralized in servers and everything running through the largest models. Some people are spending half a billion dollars per month. What you actually want is efficient value per watt per user,” according to Decrypt.
Perplexity’s revenue grew fivefold to $500 million while headcount rose just 34%, per Decrypt. Offloading inference to user hardware extends that efficiency ratio.
Hardware Partners
The demonstration ran on Intel Core Ultra Series 3 processors. Perplexity confirmed the orchestration framework is model-agnostic and chip-agnostic, also supporting Nvidia RTX Spark hardware, according to MarkTechPost.
The timing aligns with the broader Computex 2026 theme: on-device AI. Nvidia’s RTX Spark superchip offers up to 20 Arm CPU cores, a Blackwell GPU with 6,144 CUDA cores, and 128GB of LPDDR5X RAM, enough for 120-billion-parameter models with million-token context lengths. RTX Spark systems begin shipping in the fall, per VentureBeat.
Who Owns the Routing Layer
The strategic bet is straightforward: whoever controls the routing layer between local and cloud inference controls where agent compute happens, what data stays private, and which models get called. For agent builders, this pattern, automatic hybrid orchestration across local and cloud, is likely to become a standard infrastructure expectation rather than a novelty. For enterprises evaluating agent platforms, the question shifts from “which model do we use” to “who decides where inference runs, and can we trust that decision.”