DeepSeek released preview versions of its V4 model family on April 24, a year after its R1 reasoning model rattled Silicon Valley. The release includes two open-source Mixture-of-Experts models with native 1 million token context windows and dedicated agentic capabilities, per DeepSeek’s API documentation.

Two Models, One Architecture

V4-Pro carries 1.6 trillion total parameters with 49 billion active per token. V4-Flash runs at 284 billion total with 13 billion active. Both ship with open weights on Hugging Face, support the OpenAI ChatCompletions and Anthropic API formats, and default to 1 million token context across all DeepSeek services, per the release notes.

The 1 million token context represents an 8x jump from V3’s 128,000 token window. At that length, V4-Pro uses only 27% of the single-token inference FLOPs and 10% of the KV cache size compared to V3.2, Fello AI reported based on the technical paper. V4-Flash pushes efficiency further to 10% of FLOPs and 7% of cache.

The architectural innovation behind these numbers is DeepSeek Sparse Attention (DSA), a novel attention mechanism using token-wise compression that the team frames as breaking “the efficiency barrier of ultra-long-context processing,” according to the technical report.

Benchmarks and Positioning

On standard reasoning benchmarks, V4-Pro-Max sits between OpenAI’s GPT-5.2 and GPT-5.4, falling “marginally short” of GPT-5.4 and Google’s Gemini 3.1-Pro, according to TechXplore’s AP reporting. On Codeforces, V4-Pro reaches a 3,206 rating, ranking 23rd among human competitors, per Fello AI.

On agentic tasks specifically, DeepSeek claims V4-Pro outperforms Anthropic’s Claude Sonnet 4.5 and approaches Claude Opus 4.5, TechXplore reported. V4-Flash performs on par with V4-Pro on simpler agent tasks.

“Based on the benchmark results, it does appear DeepSeek V4 is going to be very competitive against its U.S. rivals,” Lian Jye Su, chief analyst at research group Omdia, told TechXplore.

Huawei Chips, Not Nvidia

V4 was trained on Huawei Ascend chips, not Nvidia hardware. Huawei confirmed compatibility in a separate statement on Friday, TechXplore reported. Marina Zhang, associate professor at the University of Technology Sydney, called the release “a pivotal milestone for China’s AI industry,” describing it as a demonstration of technical feasibility outside the Nvidia-dominated computing ecosystem “amid sustained technological decoupling between China and the U.S.”

Agent Integration

DeepSeek’s release notes emphasize dedicated agent optimizations. V4 is “seamlessly integrated with leading AI agents like Claude Code, OpenClaw & OpenCode” and already drives DeepSeek’s internal agentic coding workflows, per the API documentation. The dual Thinking/Non-Thinking modes across both models enable agents to toggle between deep reasoning and fast execution.

The legacy deepseek-chat and deepseek-reasoner endpoints will be fully retired after July 24, 2026, with traffic currently routing to V4-Flash.

Context as Competitive Axis

The release arrived hours after OpenAI shipped GPT-5.5 on April 23. Where OpenAI leads on raw capability at the frontier, DeepSeek is competing on the cost-efficiency curve for long-context agent workloads. For teams running autonomous agents that process large codebases, legal documents, or multi-step task histories, a 1 million token context at drastically reduced compute cost changes the economics of what agents can hold in working memory.