Z.AI, the Chinese AI company formerly known as Zhipu AI, released GLM-5.1 on April 7 under an MIT license. The 754-billion parameter Mixture-of-Experts model scored 58.4 on SWE-Bench Pro, surpassing Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro on the software engineering benchmark. It is the first open-weight model to claim the top spot on agentic coding tasks.

The model’s defining technical claim is sustained autonomous execution. Where previous models plateau after a few dozen steps, GLM-5.1 can run productive coding sessions for over eight hours. Z.AI leader Lou wrote on X that the model “can do 1,700 [steps] rn,” compared to roughly 20 steps for agents at the end of 2025, according to VentureBeat.

Architecture and the Plateau Problem

GLM-5.1 uses a Mixture-of-Experts architecture with 40 billion active parameters per forward pass, keeping inference costs manageable despite the headline parameter count. The model was trained using asynchronous reinforcement learning that decouples generation from training, a technique Z.AI says enables the model to learn from complex, long-horizon interactions more effectively than single-turn RL training, according to MarkTechPost.

The practical result: instead of applying familiar techniques for quick gains and then stalling, GLM-5.1 follows what Z.AI calls a “staircase pattern.” In one demonstration, the model optimized a vector database benchmark over 655 iterations and 6,000+ tool calls. It reached 21,500 queries per second, roughly six times the best result achieved in a single 50-turn session by Claude Opus 4.6 (3,547 queries per second), as reported by VentureBeat.

Benchmark Profile

Beyond SWE-Bench Pro, GLM-5.1 posted strong numbers across reasoning and tool-use benchmarks: 95.3 on AIME 2026, 86.2 on GPQA-Diamond, 68.7 on CyberGym (up from GLM-5’s 48.3), 70.6 on τ³-Bench, and 71.8 on MCP-Atlas, according to MarkTechPost. On Terminal-Bench 2.0, the model scored 63.5, rising to 66.5 when evaluated with Claude Code as the scaffolding.

The Open-Source Bet

The MIT license is the strategic play. Z.AI, which listed on the Hong Kong Stock Exchange in early 2026 with a market capitalization of $52.83 billion, is making its frontier model fully available for commercial use, download, and modification via Hugging Face, according to VentureBeat.

For enterprises in regulated or security-sensitive sectors, the appeal is direct. “Data governance. Sensitive code and data do not have to be sent to external APIs, which is critical in sectors such as finance, healthcare, and defense,” Pareekh Jain, CEO of Pareekh Consulting, told Computerworld. Charlie Dai, VP and principal analyst at Forrester, told Computerworld that the MIT license “makes it a viable strategic option alongside commercial models, especially where regulatory constraints, IP sensitivity, or long-term platform control matter most.”

Jain flagged one caveat: geopolitical risk. Although the model is open source, its links to Chinese infrastructure and entities could still raise compliance concerns for some US companies.

The Competitive Shift

GLM-5.1’s release lands in a week where Anthropic launched Claude Cowork GA, OpenAI added a $100/month ChatGPT Pro tier with expanded Codex access, and C3 AI shipped its own enterprise coding agent platform. The difference: those are all proprietary, cloud-only offerings. GLM-5.1 runs on your hardware, under your governance, with no per-token billing.

Dai framed the shift clearly: “Long-running autonomous agents are becoming more practical, provided enterprises layer in governance, monitoring, and escalation mechanisms to manage risk.” The question is no longer whether open-source models can compete with commercial ones on agentic workloads. On SWE-Bench Pro, they just won.