Microsoft on Thursday launched three foundational AI models built entirely in-house and, in a Bloomberg interview published the same day, Microsoft AI CEO Mustafa Suleyman said the company aims to deliver “state-of-the-art” capabilities across text, image, and audio by 2027. The explicit goal: reduce Microsoft’s dependence on OpenAI and Anthropic for its AI product stack.

The three models — MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 — are available immediately through Microsoft Foundry, according to the company’s announcement. They cover speech-to-text, text-to-speech, and image generation. Together, they represent the first output from Suleyman’s superintelligence team, which he formed six months ago to pursue what he calls “AI self-sufficiency.”

What the Models Do

MAI-Transcribe-1 claims the lowest average Word Error Rate on the FLEURS benchmark across 25 languages, averaging 3.8% WER. According to VentureBeat, it beats OpenAI’s Whisper-large-v3 on all 25 languages, Google’s Gemini 3.1 Flash on 22 of 25, and ElevenLabs’ Scribe v2 on 15 of 25. Microsoft says batch transcription runs 2.5x faster than its existing Azure Fast offering. The model is already being tested inside Copilot Voice and Microsoft Teams.

MAI-Voice-1 generates 60 seconds of audio in one second and supports custom voice creation from a few seconds of sample audio. Microsoft is pricing it at $22 per million characters. MAI-Image-2 debuted as a top-three model family on the Arena.ai leaderboard with 2x faster generation than its predecessor, priced at $5 per million tokens for text input and $33 per million tokens for image output. Microsoft is deploying it across Bing and PowerPoint.

Fewer Than 10 Engineers Per Model

Suleyman told VentureBeat that each model was built by teams of fewer than 10 people. “The audio model was built by 10 people, and the vast majority of the speed, efficiency and accuracy gains come from the model architecture and the data that we have used,” he said. The image team was equally small. He described the approach as “fewer people who are more empowered” operating in “an extremely flat structure.”

This contrasts sharply with Meta’s strategy of offering $100 million to $200 million compensation packages to attract top researchers, as Suleyman noted in a separate Bloomberg interview in December.

The Contract That Made It Possible

Until October 2025, Microsoft was contractually barred from independently pursuing artificial general intelligence under its original 2019 deal with OpenAI, according to VentureBeat. When OpenAI expanded its compute partnerships to SoftBank and others, Microsoft renegotiated. “Since then, we’ve been convening the compute and the team and buying up the data that we need,” Suleyman told VentureBeat. Microsoft retains license rights to OpenAI models through 2032 under the revised terms.

What This Means for Agent Builders

Microsoft’s entire enterprise agent infrastructure — Copilot, Azure AI Services, Agent 365, M365 Copilot integrations — currently runs on OpenAI and Anthropic models. If Microsoft delivers frontier-class large language models by 2027, the model supply chain for enterprise agents shifts. Azure-hosted agents, Copilot agents, and Power Platform automations could all run on Microsoft-native models with different pricing, latency, and capability profiles than OpenAI or Anthropic currently offer.

The timing matters. Microsoft’s stock just closed its worst quarter since the 2008 financial crisis, per CNBC, with investors questioning whether AI infrastructure spending will convert to revenue. Shipping competitive models built by 10-person teams at half the GPU cost of competitors is Suleyman’s first answer to that pressure.