MiniMax, the Chinese AI lab with over 200 million users, today open-sourced MiniMax M2 and simultaneously shipped M2.7, a pair of models built from the ground up for agentic workloads. M2’s API is priced at $0.30 per million input tokens and $1.20 per million output tokens, according to MiniMax’s announcement. For comparison, Claude Sonnet 4.6 costs $3 per million input tokens and $15 per million output tokens, making M2 roughly 10% the cost on input and 8% on output.

MiniMax describes M2 as “a model born for Agents and code,” and the architecture backs that claim. Both M2 and M2.7 are 230-billion-parameter sparse mixture-of-experts models that activate only 10 billion parameters per token, a 4.3% activation rate. The MoE design keeps inference costs low while preserving the full capacity of the larger parameter count. MiniMax reports a token generation speed (TPS) of around 100, which the company says is approximately twice Claude Sonnet’s throughput, according to the announcement.

What M2.7 Adds

M2.7 extends M2 with what MiniMax calls self-evolution capabilities. During development, the model autonomously optimized a programming scaffold over 100+ rounds, analyzing failure trajectories, modifying code, running evaluations, and deciding whether to keep or revert changes. That process produced a 30% performance improvement on the scaffold task.

The benchmarks position M2.7 competitively with frontier closed-source models. On MLE Bench Lite (22 ML competitions), M2.7 achieved a 66.6% medal rate, second only to Claude Opus 4.6 and GPT-5.4, according to the GitHub repository. On SWE-Pro, M2.7 hit 56.22%, matching GPT-5.3-Codex. On VIBE-Pro (55.6%), it approaches Claude Opus 4.6 performance.

M2.7 also supports native Agent Teams for multi-agent collaboration, complex Skills, and dynamic tool search. MiniMax says it has used M2.7 internally to reduce live production incident recovery time to under three minutes on multiple occasions, according to the repository documentation.

NVIDIA’s Endorsement

NVIDIA featured M2.7 on its Technical Blog, which is notable. NVIDIA does not routinely feature open-source model releases from Chinese labs on its developer blog. The post details inference optimizations NVIDIA collaborated on with the open-source community, including FP8 MoE kernels and QK RMS Norm kernels integrated into vLLM and SGLang, delivering up to 2.5-2.7x throughput improvements on Blackwell Ultra GPUs.

The open weights release of M2.7 is available through NVIDIA and across the open-source inference ecosystem. NVIDIA’s NemoClaw reference stack provides a one-click deployment path for running M2.7 with OpenClaw, according to the Technical Blog.

The Pricing Math

The cost comparison needs precision. MiniMax’s “8% of Claude Sonnet” claim holds specifically for output tokens ($1.20 vs. $15.00 per million). On input tokens, the ratio is closer to 10% ($0.30 vs. $3.00 per million). Both figures represent a roughly 10x cost reduction for agent workloads that are output-heavy, which most autonomous agent loops are.

MiniMax says M2 is “available for free for a limited time,” though the announcement does not specify when the free period ends.

The Open-Source Agent Cost Floor

For teams running high-volume agent workloads like customer support automation, code review loops, or content pipelines, M2’s pricing changes the economics meaningfully. A workflow that costs $150 per million output tokens on Claude Sonnet would cost $12 on M2. Whether M2 delivers comparable quality at that price point is the question the open-weights release now lets any team answer for themselves.

The model is available today on MiniMax’s API platform, GitHub, and through NVIDIA’s inference ecosystem.