WIRED senior writer Will Knight connected an OpenClaw agent to a HuggingFace LeRobot SO-101 robot arm and watched it configure the hardware, visually identify objects, and train a secondary model for autonomous pick-and-place tasks. The experiment, published May 20, is one of the clearest demonstrations yet that AI coding agents are crossing from digital automation into physical manipulation.
What Happened
Knight bought a prebuilt LeRobot SO-101, an open-source robot arm from HuggingFace’s robotics project that normally requires teleoperation: a human physically moves a controller arm while a follower arm with a camera learns to replicate the movements.
Instead of teleoperating, Knight handed the task to OpenClaw and Codex. The agent handled motor configuration, joint calibration, and connection setup, then wrote a Python script combining multiple libraries to detect and grip a red ball by sight. After that, OpenClaw guided Knight through training a model to control the arm for autonomous object placement, checking error rates after each training run.
The entire process bypassed the conventional teleoperation pipeline. The robot learned through AI-generated code rather than human demonstration.
Code-as-Policy: The Method Behind It
The approach is called “code-as-policy,” first described in a 2022 research paper that proposed using language model-generated programs to drive robot behaviors. Rather than training a model end-to-end on physical demonstrations, the system generates executable code that orchestrates perception and control modules.
“AI-powered coding is super exciting because it has the potential to bridge the gap between conventional engineering methods, which are reliable but don’t generalize, and contemporary vision-language-action models, which generalize but are not yet reliable,” Ken Goldberg, a roboticist at UC Berkeley, told WIRED.
Goldberg’s group, collaborating with Nvidia, Carnegie Mellon, and Stanford, recently published CaP-X, a benchmark for measuring how well coding models control robots. The results are striking: on LIBERO-PRO, a suite of 30 manipulation tasks with position and instruction variations, state-of-the-art Vision-Language-Action models scored 0% across the board. The best VLA model, π0.5, reached only 13% average success. CaP-Agent0, a training-free coding agent, hit 18% without any task-specific training.
One unexpected finding from CaP-X: Gemini outperformed Claude and ChatGPT on robot programming tasks, according to WIRED’s reporting. The researchers attributed this to Google DeepMind’s focus on multimodal training and physical-world understanding.
Nvidia’s Stake in the Approach
Spencer Huang, Jensen Huang’s son and an Nvidia researcher, has been organizing internal hackathons where employees vibe-code robots using the code-as-policy method. Huang is collaborating with Goldberg on research to make the approach compatible with more robot software tools.
“Nearly anyone can get into robotics, which is the true holy grail,” Huang told WIRED. He described making it possible for people to control robots with spoken or typed commands as the “critical unlock for robots in society.”
From Digital Tasks to Physical Control
AI agents have spent the past two years proving they can automate software workflows: browsing, coding, data processing, orchestration. Knight’s experiment shows the same agent infrastructure can extend into hardware. An OpenClaw instance that yesterday managed API calls and file operations today calibrates servos and trains manipulation models.
The barrier to entry has also collapsed. The LeRobot SO-101 is a 3D-printable, open-source design. The AI models are commercially available. Knight noted that traditional robot training “required considerable skill,” while the code-as-policy approach made it “almost easy.” For the robotics industry, that accessibility shift may matter more than any single benchmark result.
Will Knight is a senior writer at WIRED covering artificial intelligence.