Mistral AI released Mistral Small 4 on March 16, a 119-billion-parameter Mixture-of-Experts model that consolidates multiple specialized capabilities into a single open-source package.

Model Architecture

Mistral Small 4 uses a 128-expert Mixture-of-Experts design with only 6 billion active parameters per inference pass. This sparse activation keeps inference fast while maintaining the representational power of a much larger dense model.

The model supports two inference modes:

Mistral Small 4 achieves similar or better performance than Mistral’s specialized Magistral reasoning models on internal benchmarks.

Unified Capabilities

The release consolidates three prior model families into one:

The model supports a 256K context window, enabling longer documents and complex multi-step agent tasks without context truncation.

Licensing and Availability

Mistral Small 4 ships under the Apache 2.0 license, available on Hugging Face, the Mistral API, and NVIDIA Build. Apache 2.0 means any organization can download, deploy, and modify the model without licensing fees or usage restrictions.

Performance Gains

The model is 40% faster and handles 3x more concurrent queries per second than its predecessor, according to Mistral’s internal testing.

NVIDIA Partnership Announcement

Embedded in the Small 4 release is a new NVIDIA-Mistral partnership to co-develop frontier open models, announced simultaneously at GTC 2026. The partnership positions Mistral as NVIDIA’s preferred open-model provider for the NemoClaw agentic framework and broader developer ecosystem.

This dual-track strategy — closed enterprise models (NemoClaw) alongside open community models (Mistral) — reflects NVIDIA’s shift toward a full-stack agentic AI platform play, where NVIDIA controls infrastructure and partners provide the model layer.

Why It Matters

Mistral Small 4’s Apache 2.0 license on a 119B model makes the open-source stack viable for production agentic AI. Prior open models at this scale required custom training or faced licensing restrictions. Small 4 unifies reasoning, multimodal, and code capabilities that previously required separate model deployments.

For teams building autonomous agents, this means deploying a single open model across multiple agent types — reasoning agents, code-generation agents, and multimodal agents — without vendor lock-in or per-token fees.

The NVIDIA partnership signals that open-source infrastructure is no longer competing against proprietary closed models, but complementing them. NVIDIA runs enterprise proprietary models on Vera Rubin infrastructure. Developers run Mistral open models on commodity hardware and dev clouds. Both strategies coexist within a single platform narrative.