Chinese tech companies are deploying AI agent frameworks into mass-produced robots, shifting the competitive battleground from chatbot benchmarks to physical-world autonomy. Alibaba’s Qwen3.7-Max model now orchestrates robotic navigation, obstacle avoidance, and task planning through tool-calling. Tencent’s OpenClaw framework powers the Zeroth M1 humanoid’s ability to translate human speech directly into robotic movement. The first units are shipping.
This is not a research demo. According to the South China Morning Post, the Zeroth M1 is the first mass-produced humanoid robot to integrate an AI agent framework for real-time speech-to-motion control. UBS has flagged embodied AI and autonomous agents as “one of the next major growth areas” for the sector. The AI robots market is projected to grow at a 17.7% compound annual rate, expanding from $17.19 billion in 2025 to $20.24 billion in 2026, according to Kavout’s analysis of International Federation of Robotics data.
Alibaba’s Robotics Stack
Alibaba launched Qwen3.7-Max at its Cloud Summit on May 20, 2026. The model scores 56.6 on the Artificial Analysis Intelligence Index, ranking fifth globally and first among Chinese models, according to Digital Applied. Pricing sits at $2.50/$7.50 per million input/output tokens, roughly half of Claude Opus 4.7’s rate.
The model’s significance for robotics is its tool-calling architecture. Alibaba designed Qwen3.7-Max to trigger external software and hardware components, acting as what the company calls a “digital brain” for robots. The SCMP reports that Alibaba has released a suite of supporting AI models specifically for robotics: a robotic gripper agent, a navigation model, and a vision-language system designed for physical-world interaction.
In a demonstration reported by The Decoder, Qwen3.7-Max ran autonomously for 35 hours optimizing a hardware attention kernel, executing 432 kernel tests with 1,158 tool calls. The model had never seen the target chip architecture during training. It started with no hardware documentation, no sample code, and no measurement data. The result: a 10x speedup over the reference implementation. Competing models fell well short. GLM 5.1 achieved 7.3x, Kimi K2.6 reached 5x, DeepSeek V4 Pro managed 3.3x.
The 35-hour run was a coding benchmark. The result matters for embodied AI because it demonstrates the core capability that makes physical-world autonomy viable: the model’s ability to interact with unfamiliar physical systems through tool-calling loops, iterating without human intervention until the task is done.
On May 26, Alibaba Cloud announced a Skills portal converting capabilities across 60+ cloud products into MCP-compatible formats, letting AI agents invoke cloud resources as function calls. The company also debuted the JVS Agent Suite, built on the OpenClaw framework, for enterprises to run AI agents with 24/7 cloud operation and centralized management.
Tencent’s OpenClaw-Powered Robots
Tencent’s approach comes from the opposite direction. Rather than building a frontier model, it is pushing its OpenClaw agent framework into existing hardware.
Earlier in May, embodied AI startup Zeroth announced that its M1 humanoid had become the first mass-produced robot to integrate Tencent’s OpenClaw framework, according to the SCMP. The integration allows large language models to interpret human speech and translate it into robotic movements in real time.
Nikkei Asia reports that Tencent is taking a deliberately different path from Alibaba and ByteDance: focusing on smaller models and diverse user needs rather than racing for the largest parameter counts. This matters for robotics because edge deployment on robot hardware demands efficiency over raw scale.
Wu Bangyi, chief data officer at Chinese tech firm Tianyu Shuke, described the transition to Securities Daily: “The past few years of large language model development have mainly focused on solving problems in the digital world.” The pivot to embodied AI marks the moment those models start interacting with physical environments.
The Hardware Pipeline
The scale of China’s robotics manufacturing pipeline is becoming visible. Forbes Asia’s 2026 30 Under 30 list features seven Chinese humanoid robot makers, including Zeroth’s parent company JoyIn Technology. Founded by Guo Renjie in Suzhou in 2024, JoyIn has launched four humanoid models. The Zeroth M1, unveiled in January at 10,000 yuan ($1,500), has received nearly 20,000 pre-orders as of April, including from schools and elderly care centers. Deliveries are expected to start in December. JoyIn was valued at $351 million in October 2025 and has raised over $70 million from IDG Capital, Jinqiu Capital, and Eastern Bell Capital.
Other Chinese robotics startups on the Forbes list show the depth of investment:
- Noetix Robotics (Beijing): Five humanoid models, robots that competed in China’s robot half-marathons and performed at the 2026 Spring Festival gala. Over $150 million raised from CICC Capital, Unity Ventures, and CATL’s investment arm.
- Zerith: Flagship H1 humanoid performs cleaning, sorting, and warehouse tasks. Priced at 199,000 yuan ($29,000), it started shipping globally in Q2 2026.
The supply chain is scaling to match. Harmonious Drive Systems, a key precision reducer maker, shipped 500,000 units in 2025 and targets 800,000 for 2026, according to industry reporting.
The Agent Architecture Question
The convergence happening in China reveals an architectural bet that Western agent developers should watch closely. AI agent frameworks were designed for software: managing code, drafting documents, querying APIs. The Chinese model treats agent frameworks as general-purpose control systems for any tool, including physical actuators.
Alibaba’s approach stacks a frontier model (Qwen3.7-Max) on top of purpose-built robotics models (gripper agent, navigation model, vision-language system), all connected through tool-calling. The model doesn’t need to understand motor control directly. It calls specialized tools that do.
Tencent’s approach uses OpenClaw as a middleware layer between language models and robot hardware. The framework handles the translation from natural language intent to motor commands, letting any sufficiently capable LLM drive the robot through the agent’s tool interface.
Both architectures share one assumption: the agent framework is the control plane, and the robot is just another set of tools in the agent’s environment. This is the same pattern used in software agents, where the agent orchestrates file systems, APIs, and databases. The only difference is that some of the tools now have wheels.
The market is pricing in this assumption. The AI robots market’s 17.7% CAGR, cited by Kavout from IFR data, reflects growing capital flows into companies bridging AI software and physical hardware. The International Federation of Robotics highlights AI and autonomy as its top 2026 trend, noting that robots are moving beyond rule-based automation to systems that “comprehend abstract natural language, decompose complex tasks, and engage in strategic coordination.”
Where This Goes Next
Three dynamics will determine whether China’s embodied AI lead translates into a structural advantage.
First, cost. The Zeroth M1 at $1,500 and the Zerith H1 at $29,000 bracket a price range that covers companion robots and industrial workers. If agent-powered robots hit price points comparable to annual worker compensation, adoption accelerates from “innovation project” to “capital expenditure decision.”
Second, latency. Real-time speech-to-motion requires the agent loop to complete in milliseconds, not seconds. Current LLM inference latency makes this challenging over cloud connections. Tencent’s focus on smaller, edge-deployable models is a direct response to this constraint.
Third, generalization. The 35-hour Qwen3.7-Max kernel optimization run showed a model adapting to an entirely unfamiliar hardware architecture through iterative tool use. If that capability transfers to unfamiliar physical environments, the same pattern could let robots adapt to new factory layouts, household configurations, or warehouse arrangements without retraining.
China’s tech giants are not just building better chatbots that happen to control robots. They are treating agent frameworks as the operating system layer between intelligence and physical action, and they are shipping hardware on that assumption while the rest of the industry is still debating agent governance for software.