The On-Device Agent Stack: How Edge-Native AI Is Splitting the Agent Market in Two

Two days after Google unveiled Gemini Spark, a cloud-based personal agent backed by 900 million existing users and Antigravity infrastructure, Oppo quietly open-sourced an Android agent that does the opposite. X-OmniClaw runs perception, memory, and action directly on the physical phone. Cloud models only get called in for heavy reasoning. No virtualized Android instance. No data leaving the device.

These two announcements, separated by 48 hours, frame the defining infrastructure question in the agent market: where does the compute live?

The answer is splitting the industry in two. On one side, Anthropic leases 220,000 GPUs from SpaceX’s Colossus data center and Google builds Antigravity to run agents at planetary scale. On the other, chipmakers, device manufacturers, and a growing number of startups are building agent stacks that never touch a cloud server at all. The edge AI market is projected to grow from $29.08 billion in 2025 to $37.51 billion in 2026, according to Research and Markets, a 29% year-over-year jump. Edge AI hardware alone is on track to reach $58.9 billion by 2030, per MarketsandMarkets.

This is a parallel infrastructure layer being built by companies that believe the cloud model has fundamental limitations for agents that handle private data, operate in real time, or need to work without connectivity.

The Hardware Layer Takes Shape

The first agentic AI framework designed specifically for edge hardware shipped at CES 2026 in January. NXP Semiconductors announced the eIQ Agentic AI Framework, a toolkit for deploying autonomous agents on edge devices with deterministic real-time decision-making and multi-model coordination. The framework targets industrial applications where cloud round-trips are unacceptable: factory equipment that needs to shut down instantly when safety risks arise, medical devices that update patient information in real time, HVAC systems that autonomously respond to fire hazards.

“We’re empowering both novice and experienced developers with a secure, real-time, hardware-optimized software platform,” Charles Dachs, NXP’s Executive Vice President and General Manager of Secure Connected Edge, said in the announcement. GE HealthCare demonstrated anesthesia delivery and infant monitoring concepts running on NXP’s edge hardware at the same event.

Qualcomm is building the adjacent layer. Its Dragonwing platform, launched in March 2026, provides AI-optimized processors for industrial and commercial edge deployments. SECO announced Dragonwing-powered compute modules at Embedded World 2026 in February, with production units scheduled for Q3 2026. Advantech followed in March with a Dragonwing-powered on-prem AI appliance server. The pattern is consistent: chipmakers are not just shipping faster inference silicon. They are shipping complete agent orchestration stacks designed to run without network connectivity.

The Software Stack Goes Device-Native

The hardware would be academic without software that knows how to use it. Three distinct approaches to on-device agent software have emerged in the first five months of 2026.

Oppo’s X-OmniClaw, released as open source on May 17, represents the most complete mobile agent architecture to date. According to Decrypt’s analysis, the framework treats the smartphone as “the vehicle,” with X-OmniClaw serving as “the internal engine for control and perception” and cloud LLMs called in only as “fuel” when heavy reasoning is needed. Three subsystems run entirely on-device: Omni Perception bundles camera feeds, screen content, and voice into a single pipeline. Omni Memory maintains context across tasks, app switches, and sessions, building long-term semantic memory from the phone’s photo gallery. Omni Action clones user behavior into reusable skills, extracting deeplinks to skip multi-step navigation.

The critical design choice, as The Decoder reported, is the explicit rejection of cloud phone platforms like RedFinger, Alibaba’s Wuying, and Tencent Cloud Phone. Those services run agents inside virtualized Android instances in data centers, meaning they cannot access local sensors, cameras, or private data. X-OmniClaw eliminates that indirection entirely.

Liquid AI took a different path. Its LFM2.5 model family, released in February, ships with LEAP (Liquid’s Edge AI Platform) for deploying models to iOS and Android “as easily as calling a cloud API.” The models also ship as GGUF checkpoints for llama.cpp inference on any hardware and as MLX-optimized checkpoints for Apple Silicon’s unified memory architecture. This approach targets developers who want to swap cloud API calls for local inference without rewriting their agent logic.

The third approach comes from the model compression side. According to Vikas Chandra and Raghuraman Krishnamoorthi’s research presented at the Embedded Vision Summit, the major labs have converged on efficient on-device targets: Llama 3.2 (1B/3B), Gemma 3 (down to 270M parameters), Phi-4 mini (3.8B), SmolLM2 (135M to 1.7B), and Qwen2.5 (0.5B to 1.5B). The researchers found that the biggest breakthroughs came not from faster chips but from rethinking how models are built, trained, compressed, and deployed. Going from 16-bit to 4-bit quantization delivers 4x less memory traffic per token, which matters because mobile devices have 50 to 90 GB/s bandwidth compared to 2 to 3 TB/s in data center GPUs.

The Training Gap

Most on-device AI today relies on compressing large cloud-trained models, a process that can take months before deployment. This is the specific bottleneck that Out of Set, a South Korea-based startup, raised seed funding to address. The Ventures announced its investment on May 20, backing Out of Set’s approach of training models natively for edge deployment rather than compressing cloud models after the fact.

The company’s focus areas are speech recognition and synthesis, two capabilities in high demand across healthcare, legal, finance, smartphones, automobiles, and robotics. Out of Set has developed a system that automatically retrains models to meet client requirements, enabling faster customization with fewer engineers. Kim Chulwoo, CEO of The Ventures, cited the company’s “core technology to address the limitations of cloud AI at the device level” as key to capturing the emerging on-device agent market.

This is a meaningful technical distinction. Cloud-first compression follows a fixed pipeline: train a large model in a data center, quantize it, prune it, hope it still works at 4 bits on a phone with 4GB of available RAM. Device-native training starts from the target hardware and optimizes upward. The Embedded Vision Summit research supports this framing: at 2 bits and below, compressed models learn “fundamentally different representations, not just compressed versions of higher-precision models,” according to Chandra and Krishnamoorthi. Training natively for the target device sidesteps that quality cliff.

Where Cloud Agents Still Win

The on-device stack has clear limits. Frontier reasoning, long context windows, and multi-step planning across large knowledge bases still require cloud-scale compute. Anthropic’s Colossus lease gives Claude access to 300MW of GPU infrastructure. Google’s Gemini Spark runs on Antigravity, purpose-built for persistent agent workloads. OpenAI’s $122 billion raise in March funded infrastructure that no edge device can replicate.

The 30 to 50x memory bandwidth gap between mobile devices and data center GPUs, documented in the Edge AI and Vision Alliance research, means on-device agents generate tokens significantly slower than cloud agents running on A100s or H100s. Complex agentic workflows that require reasoning over thousands of tokens, maintaining long conversation histories, or coordinating with multiple external APIs remain cloud territory.

The convergence point is hybrid architectures like X-OmniClaw’s: run perception, memory, and action locally, escalate to the cloud only for reasoning that exceeds on-device capability. This splits the cost and privacy profile. Sensitive data stays on the device. Expensive reasoning calls go to the cloud. The user controls the boundary.

The Market Fracture

The on-device agent market targets use cases that cloud agents cannot reach: air-gapped industrial environments where NXP’s eIQ framework runs safety-critical automation, mobile devices where Oppo’s X-OmniClaw accesses physical sensors that cloud platforms cannot touch, healthcare settings where GE HealthCare needs deterministic response times, and consumer devices where users want agents that work without an internet connection.

The fracture runs along four axes. Latency: cloud round-trips add hundreds of milliseconds, which breaks real-time control loops. Privacy: data that never leaves the device cannot be breached in transit. Cost: shifting inference to user hardware eliminates per-token API charges at scale. Availability: local agents work in tunnels, on planes, and in facilities without reliable connectivity.

MarketsandMarkets projects the edge AI hardware market will reach $58.9 billion by 2030, growing at 17.6% CAGR. Research and Markets puts the broader edge AI market at $37.51 billion in 2026, with manufacturing expected to grow at 23% CAGR through 2033 per Grand View Research.

Two Stacks, Two Markets

The agent infrastructure market is diverging. Cloud platforms will continue to dominate complex reasoning, multi-agent orchestration, and knowledge-intensive tasks. Edge platforms will own latency-critical, privacy-sensitive, and connectivity-constrained deployments.

The companies building each stack are making incompatible bets. Anthropic’s Colossus deal assumes agents will run centrally, with revenue tied to per-token pricing. NXP’s eIQ framework assumes agents will run on customer-owned hardware, with revenue tied to chip sales. Oppo’s X-OmniClaw assumes the phone itself is the agent platform. Out of Set assumes the training pipeline, not just the inference runtime, needs to be device-native.

For builders evaluating agent infrastructure in 2026, the question is no longer “which model should I use?” It is “where does my agent’s compute live?” The answer determines the hardware, the software stack, the cost model, the privacy profile, and the deployment constraints. Cloud and edge are two distinct architectures serving two markets that happen to share the word “agent.”

The On-Device Agent Stack: How Edge-Native AI Is Splitting the Agent Market in Two

The Hardware Layer Takes Shape

The Software Stack Goes Device-Native

The Training Gap

Where Cloud Agents Still Win

The Market Fracture

Two Stacks, Two Markets

Get our morning briefing in your inbox

Keep Reading

Ollama Raises $65M Series B to Scale the Open-Source Runtime Powering 8.9 Million AI Developers

Q2 2026 AI Funding Shifted from Applications to Infrastructure: What $75 Billion in Private Capital Reveals

AI Agents in Professional Services: Venture Capital Workflows Reveal Where Automation Actually Works