MiniMax, the Shanghai-based AI lab, released its M3 model on June 1, 2026, positioning it as the first open-weight model to combine frontier-level coding performance, a one-million-token context window, and native multimodal processing in a single release. The model is available immediately through MiniMax’s API and OpenRouter, which listed it on May 31.
Architecture: Sparse Attention Returns
The core technical change is MiniMax Sparse Attention (MSA), a block-level sparse attention mechanism that lets the model selectively attend to relevant key-value blocks rather than computing full attention across the entire context window. According to FelloAI’s technical analysis, MSA cuts per-token compute at 1M context to one-twentieth of M2’s full-attention approach, delivering over 9x faster prefill and over 15x faster decoding.
The decision has a history. MiniMax spent its entire M2 generation (M2 through M2.7) on full attention, with the engineering team publicly writing that “efficient attention still has some way to go before it can definitively beat full attention.” M3 reverses that position, as FelloAI noted, representing a “public self-correction” where the same team now ships production sparse attention with order-of-magnitude speedups.
Benchmark Performance
MiniMax reports 59.0% on SWE-Bench Pro, 66.0% on Terminal Bench 2.1, and 74.2% on MCP Atlas. According to FelloAI, those scores beat GPT-5.5 and Gemini 3.1 Pro on coding tasks, approaching Claude Opus 4.7. Several benchmarks were run on MiniMax’s own infrastructure with agent scaffolding, so independent verification is still pending.
Pricing and Access
OpenRouter lists M3 at $0.30 per million input tokens and $1.20 per million output tokens during an introductory 50% discount period. MyClaw’s analysis notes that blended costs can drop as low as $0.06 per million tokens with cache optimization. Open weights and a full technical report are promised within roughly ten days on Hugging Face and GitHub.
What Agent Builders Should Watch
For agent runtime operators evaluating model selection, M3’s combination of long context and aggressive pricing makes it a candidate for workflows where session histories grow large: multi-step coding tasks, document comparison pipelines, and browser automation chains. The 1M-token window means fewer forced truncations during extended agentic loops. The open-weight promise, if delivered on schedule, would let self-hosted agent deployments run M3 locally, eliminating API dependency for teams with the hardware to support it. Whether the benchmarks hold under independent testing will determine if M3 moves from “interesting option” to “serious contender” in the agent reasoning layer.