Microsoft released MAI-Image-2-Efficient on April 14, a lower-cost variant of its flagship text-to-image model priced at $5 per million input tokens and $19.50 per million image output tokens. That output pricing is 41% cheaper than MAI-Image-2’s $33 rate, according to VentureBeat. The model is available now in Microsoft Foundry and MAI Playground with no waitlist.
Performance Numbers
The model runs 22% faster than its flagship sibling and achieves 4x greater GPU throughput on NVIDIA H100 hardware at 1024x1024 resolution, according to Microsoft’s announcement. Microsoft also claims it outpaces Google’s Gemini 3.1 Flash, Gemini 3.1 Flash Image, and Gemini 3 Pro Image by an average of 40% on p50 latency benchmarks, per VentureBeat.
The original MAI-Image-2 debuted at #3 on the Arena.ai leaderboard for image model families, according to Microsoft. The efficient variant is built on the same architecture but engineered for throughput.
Why Agent Builders Should Pay Attention
The cost and latency profile matters most for autonomous workflows. AI agents handling marketing asset generation, e-commerce product photography, and UI mockup creation burn through image generation calls at volumes that make per-image cost a first-order constraint. Microsoft is explicitly positioning the model for these “high-volume production workflows,” per its blog post, as well as “real-time and conversational experiences” where latency during agent interactions directly affects user experience.
The two-model pricing strategy (flagship for high-fidelity creative work, efficient for volume production) mirrors what OpenAI and Anthropic have done with language models. VentureBeat noted Microsoft is “telling enterprise customers: use the efficient model for your assembly line, and the flagship for your showcase.”
The OpenAI Independence Angle
SiliconANGLE framed the release as part of Microsoft’s broader push to reduce reliance on OpenAI. MAI-Image-2-Efficient was developed by Microsoft’s MAI superintelligence team led by Mustafa Suleyman. The fast turnaround from flagship to efficient variant (weeks, not months) signals the team is executing at pace.
The model is also rolling out across Copilot and Bing, with additional product surfaces to follow. Combined with Microsoft’s recently launched MAI-Voice-1 and MAI-Transcribe-1, the company now has a multimedia AI stack that covers image, voice, and transcription without touching OpenAI’s API, per Microsoft.
The Infrastructure Economics Shift
For teams building agent pipelines that include image generation (product catalogs, automated reporting with charts, marketing content agents), the math just changed. A 41% cost reduction on the output token side means high-volume agent workflows that were borderline cost-prohibitive at flagship pricing may now pencil out. The 4x throughput improvement per GPU also matters at the infrastructure layer, where agent orchestrators need image generation to keep pace with the rest of the pipeline rather than becoming a bottleneck.