Moonshot AI, the Beijing-based lab behind the Kimi assistant, open-sourced Kimi K2.6 on April 21, a 1 trillion parameter Mixture-of-Experts model with native agent swarm orchestration, long-horizon coding capabilities, and multimodal vision built in. Weights are available on Hugging Face under a Modified MIT License.
The headline numbers for autonomous coding are strong. K2.6 scores 58.6 on SWE-Bench Pro, compared to 57.7 for GPT-5.4 and 53.4 for Claude Opus 4.6, according to Moonshot’s technical blog. On Terminal-Bench 2.0, it hits 66.7 versus 65.4 for both GPT-5.4 and Claude Opus 4.6. On Humanity’s Last Exam with tools, it leads at 54.0, ahead of Claude Opus 4.6 (53.0) and GPT-5.4 (52.1), according to MarkTechPost.
Architecture
K2.6 uses a MoE architecture with 384 total experts, 8 selected per token plus 1 shared expert always active, resulting in 32 billion active parameters per forward pass out of the 1 trillion total. It supports 256K token context, uses Multi-head Latent Attention, and includes a 400M parameter MoonViT vision encoder for native image and video input, according to Moonshot’s blog.
12-Hour Coding Sessions
Two case studies demonstrate what “long-horizon” means in practice. In the first, K2.6 deployed the Qwen3.5-0.8B model locally on a Mac and optimized inference in Zig, a niche programming language. Across 4,000+ tool calls over 12 hours and 14 iterations, it improved throughput from roughly 15 to 193 tokens per second, approximately 20% faster than LM Studio, according to Moonshot.
In the second, K2.6 overhauled exchange-core, an 8-year-old open-source financial matching engine. Over 13 hours, it iterated through 12 optimization strategies with 1,000+ tool calls, modifying 4,000+ lines of code. It analyzed CPU flame graphs, reconfigured the core thread topology, and extracted a 185% median throughput increase (0.43 to 1.24 MT/s), per MarkTechPost.
ZDNET reported that Moonshot also demonstrated K2.6 building a full SysY compiler from scratch in 10 hours, passing 140 functional tests without human input.
Agent Swarms
The swarm architecture scales to 300 sub-agents executing across 4,000 coordinated steps simultaneously, according to MarkTechPost. Each agent specializes in a different skill. One researches, another writes, another tests, another organizes output. Moonshot AI founder Zhilin Yang told ZDNET: “By orchestrating 100 or even 1,000 sub-agents in parallel, we can accomplish complex tasks within a timeframe that is tolerable for the real world.”
Industry Validation
Multiple companies provided beta feedback in the release. Augment Code noted K2.6’s “surgical precision in large codebases” and ability to pivot when initial paths are blocked. Baseten said the model “excels on coding tasks at a level comparable to leading closed source models.” Factory.ai reported a 15% improvement over K2.5 on their benchmarks. Ollama confirmed it works out of the box with all integrations, according to Moonshot’s blog.
Open-Source Positioning
K2.6 is available via Kimi.com, the Kimi App, the API, and Kimi Code CLI. It shares the same architecture as K2.5, so existing deployment configurations carry over. The release continues a pattern of Chinese labs shipping competitive open-source models. Combined with Alibaba’s Qwen 3.6 release last week and DeepSeek’s V4, the open-source agent infrastructure stack is increasingly built in China while the commercial stack remains Western. For builders choosing between closed-source APIs and self-hosted alternatives, the performance gap on coding benchmarks has effectively closed.