NVIDIA Releases Nemotron 3 Nano Omni, a 30B-Parameter Multimodal Model That Runs on a Single GPU

NVIDIA released Nemotron 3 Nano Omni on Tuesday, an open-weight multimodal model that processes video, audio, images, and text in a single architecture designed to power AI agents on edge hardware. The model has 30 billion parameters but activates only 3 billion per inference pass through a mixture-of-experts design, allowing it to run on a single GPU, according to the NVIDIA Blog.

NVIDIA claims 9x higher throughput than comparable open multimodal models with equivalent interactivity. The model tops six benchmarks across document intelligence, video understanding, and audio comprehension, according to The Next Web.

Why One Model Instead of Many

Most enterprise AI agent deployments currently stitch together separate models for vision, speech, and language understanding. Each handoff between models introduces latency and loses context. Nemotron 3 Nano Omni eliminates that pipeline by routing each token to 6 of 128 experts within a unified architecture, meaning vision, audio, and text flow through the same model but activate different expertise depending on the modality, according to The Next Web.

“To build useful agents, you can’t wait seconds for a model to interpret a screen,” said Gautier Cloix, CEO of H Company. “By building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings. This isn’t just a speed boost: It’s a fundamental shift in how our agents perceive and interact with digital environments in real time,” according to the NVIDIA Blog.

Architecture and Deployment

The model uses a hybrid Mamba-Transformer architecture with 23 Mamba-2 selective state-space layers, 23 mixture-of-experts layers, and 6 grouped-query attention layers. The vision encoder handles variable-resolution images with up to 13,312 visual patches. The audio encoder (Parakeet-TDT-0.6B-v2) processes speech and environmental audio. Video processing uses 3D convolutions to capture motion between frames, according to The Next Web.

The 3B active parameters at inference mean the model runs on hardware like NVIDIA’s DGX Spark and DGX Station workstations without requiring multi-GPU clusters, according to The Next Web.

Nemotron 3 Nano Omni is available on Hugging Face, OpenRouter, and build.nvidia.com as an NVIDIA NIM microservice. Weights are open under NVIDIA’s Open Model Agreement with full commercial use rights, according to the NVIDIA Blog.

Early Adoption

Companies already adopting the model include Aible, Applied Scientific Intelligence, Eka Care, Foxconn, H Company, Palantir, and Pyler. Dell Technologies, Docusign, Infosys, Oracle, and Zefr are evaluating it, according to the NVIDIA Blog.

The Nemotron family (Nano, Super, and Ultra) has seen over 50 million downloads in the past year, according to SiliconANGLE. Nano Omni extends the family into multimodal and agentic domains, positioning NVIDIA as a direct competitor not just in AI infrastructure but in the models that run on it.

NVIDIA Releases Nemotron 3 Nano Omni, a 30B-Parameter Multimodal Model That Runs on a Single GPU

Why One Model Instead of Many

Architecture and Deployment

Early Adoption

Get our morning briefing in your inbox

Keep Reading

AIxCrypto Launches Agentir for On-Chain AI Agent Simulation and Testing

India's AI Agent Market Splits: Emergent Hits $1.5B Valuation as Krutrim Shuts Down Agent Platform

AWS Gives AI Agents Their Own Managed Desktop to Operate Legacy Applications Without API Rewrites