Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
154962 stories
·
33 followers

New MAI models in Microsoft Foundry across text, image, voice, and speech

1 Share

Continuing Microsoft AI momentum in Foundry

Since launching MAI-Image-2-Efficient, MAI-Image-2, MAI-Voice-1, MAI-Transcribe-1 in Microsoft Foundry this spring, we've been laser-focused on one thing: giving developers the most complete first-party AI stack to build with.

Today, at Microsoft Build 2026, we're taking the next step. We're announcing the availability of new models from Microsoft AI (MAI) in Microsoft Foundry across 4 modalities:

  • Text/Reasoning: MAI-Thinking-1 is our first large language model, designed to deliver strong reasoning, math, and general intelligence at a fraction of the cost of other models.
  • Image: MAI-Image-2.5 is an updated image generation model that adds image-to-image editing and a suite of "control with preservation" capabilities, once again debuting at No. 3 on Arena.ai for image generation model families. We also have MAI-Image-2.5 Flash for a faster and more efficient option available in Foundry.
  • Voice: MAI-Voice-2 is an updated multilingual text-to-speech model that brings voice cloning and voice prompting to more than 15 languages. We also have MAI-Voice-2 Flash for a faster and more efficient option coming soon.
  • Speech: MAI-Transcribe-1.5 is an updated speech-to-text model that supports 43 total languages and adds content biasing and improved accuracy, retaining its #1 spot on the FLEURS benchmark1.

These are the same models already powering experiences across Copilot, Bing, PowerPoint, and Azure Speech, and now they're available in Foundry for developers to build with.

Read on for a deeper look at each model and how to start building.

MAI-Thinking-1: Medium-size model that stands among the strongest in its weight class

MAI-Thinking-1 is MAI’s first large language model -- and it's purpose-built for the workloads enterprises run at scale. With MAI-Thinking-1, we’ve been listening to customer feedback on leading models and making a clear bet: deliver strong reasoning, math, and general intelligence at a price-performance point that makes high-volume, always-on AI workloads economically viable.

MAI-Thinking-1 uses a Mixture-of-Experts (MoE) architecture that selectively activates only the parts of the model needed for each request. The result: capability scales without compute scaling linearly. MAI-Thinking-1 is well-suited for enterprise use-cases that often require deep context – analyzing long documents, complex multi-step reasoning, and processing extended agent traces without chunking and stitching.

MAI-Thinking-1 matches Claude Opus 4.6 on SWE-Bench Pro at substantially lower cost, while initial testing shows parity in preference with models such as Sonnet 4.6. We trained it from the ground up on clean data, without distillation from third-party models.

MAI-Image-2.5: Control with preservation for enterprise creative workflows

We’re also introducing the MAI-Image-2.5 family of models. This includes MAI-Image-2.5 for maximum fidelity and MAI-Image-2.5-Flash for fast, scalable production workloads. MAI-Image-2.5 debuted at No. 3 on Arena.ai, and makes meaningful gains in text rendering, stylized illustration and commercial imagery. Additionally, we're adding the editing surface enterprise creative teams have been asking for and optimizing for the way creative work actually gets done.

 

MAI-Image-2.5 introduces image-to-image editing with a suite of other capabilities that add control while preserving identity and brand:

  • Identity & character consistency: Preserves recognizable faces (plus hair, clothing, full-body identity) across stylization, pose, and layout changes — built for branded characters, spokespeople, and social campaigns.
  • Style & scene control: Applies full-frame restyling (anime, color grading, film grain, de-aging) and restructures shots by adding, removing, or repositioning objects and adjusting human pose and interactions.
  • Text, graphics & layout control: Generates typography, logos, and responsive text edits from natural cues ("make the text more rounded"), and produces PPT-ready infographics and slides with coherent hierarchy, alignment, and template adherence — including targeted edits like "convert to a 3-step flow."

These new features come with efficiency gains that we are passing directly to customers. Together, they deliver the best price-to-performance ELO in the market, giving customers the flexibility to optimize production image workflows for quality, speed, or cost. 

MAI-Voice-2 and MAI-Transcribe-1.5: A more accurate, multilingual audio stack

Voice and speech continue to be the primary interface for the next generation of AI agents, and with MAI-Voice-2 and MAI-Transcribe-1.5, we're closing some of the biggest gaps that have kept general models out of enterprise voice workflows.

MAI-Voice-2: One voice, many languages

MAI-Voice-2 adds two headline capabilities, identity preservation and voice prompting, with the expansion to 15+ languages in a single unified system:

  • Identity preservation recreates the unique vocal identity of a specific person, so the model can "speak as" that individual across markets – useful for consistent branded voices, localized spokesperson and celebrity campaigns, personalized digital assistants, and accessibility solutions.
  • Voice prompting takes a short audio sample as a reference for tone, emotion, accent, pacing, and speaking style and lets developers control delivery without managing a separate voice library.

Both capabilities now operate across all supported languages, so a single cloned voice or reference style carries naturally across markets without separate systems per language.

MAI-Transcribe-1.5: Faster and more accurate transcription

MAI-Transcribe-1.5 doubles down on the best-in-class speed and cost of MAI-Transcribe-1 – it is now up to 5x more efficient than Gemini 3.1 Flash, ScribeV2, gpt-4o-transcribe on the Artificial Analysis leaderboard. It also adds two highly-request capabilities:

  • Entity biasing primes the model with domain context – names, brand terms, industry vocabulary – so it transcribes specialized words correctly instead of guessing the closest common spelling. This was a heavily requested feature from our customers, and a long-standing failure mode for general speech models in sports, business, medical, and technical workflows.
  • Improved accuracy holds up in the conditions that enterprises operate in every day — cross-talk, background noise, and long-form meetings — where general models tend to drift. On FLEURS - the standard multilingual benchmark across 25 languages – Word Error Rate (WER) improved from 3.9% to 3.7%, maintaining our position as the most accurate model on this benchmark1

Try them today

Try the models today models in Microsoft Foundry:

  • MAI-Thinking-1: In private preview, request access here.
  • MAI-Image-2.5: Available directly in the Foundry Model Catalog. Pricing starts at $5 USD per 1M tokens for text input, $8 USD per 1M tokens for image input, and $47 USD per 1M tokens for image output.
  • MAI-Image-2.5 Flash: Available directly in the Foundry Model Catalog. Pricing starts at $1.75 USD per 1M tokens for text and image input and $33 USD per 1M tokens for image output.
  • MAI-Voice-2: Available through Azure Speech. Pricing starts at $22 USD per 1M characters.
  • MAI-Transcribe-1.5: Available through Azure Speech. Pricing starts at $0.36 USD per hour.
  • Experiment in MAI Playground: Try MAI models at the MAI Playground.

 

References

11st on overall WER on the FLEURS benchmark. Out of the top 25 global languages, MAI-Transcribe-1.5 ranks 1st by FLEURS in 11 core languages. It wins against Whisper-large-v3 on the remaining 14 and Gemini 3.1 Flash on 11 of those 14.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Building Premium Android Experiences at Google I/O ‘26

1 Share
Posted by Ataul Munim, Android Developer Relations Engineer
A truly differentiated Android experience is about delivering premium delight wherever your users are. At Google I/O ‘26, we showcased how the latest advancements in the Android ecosystem can help you elevate your app's quality while maximizing development efficiency.

To help you build apps that stand out, we're diving into the key tools and libraries designed to optimize your core performance, extend the surfaces of your app to other devices, and streamline how your app handles high-quality media. 

Here is a recap of the essential updates and sessions you need to know to deliver a next-level experience across form factors!

Maximize app performance and ROI with the R8 Configuration Analyzer

A premium experience is only as good as its foundation, and a performant foundation is what allows your app to scale across the Android ecosystem. This is especially true with the release of Android 17, which introduces conservative, device RAM-based app memory limits to target extreme memory leaks and outliers before they cause system-wide instability. To stay below these new system thresholds and prevent your app from being terminated, having a lean footprint is no longer optional: it’s a critical requirement.

This year, we’re making it easier to build highly optimized, fast apps by introducing the R8 Configuration Analyzer in Android Studio. R8 is your most powerful tool for improving app performance, but its effectiveness is often limited by overly broad "keep rules" that prevent the compiler from stripping away unused code. The new Configuration Analyzer provides optimization, obfuscation, and shrinking scores, allowing you to identify specific rules that are preventing the benefits of R8 optimization.

By optimizing their R8 configurations, developers at Monzo achieved a 30% improvement in cold starts and a 35% reduction in ANRs. Smaller, faster code isn't just about efficiency; it's about ensuring your app has the memory headroom to deliver delight on every form factor, from the phone to the car.

Extend your reach with a unified approach to Widgets on Phones, Watches and Cars

User interaction is shifting toward quick, glanceable moments—short bursts of information that keep users connected without needing to open the full app. To help you increase the reach of your app content, we are unifying the development experience across the Android ecosystem with Jetpack Glance. By using a consistent, Compose-based model, you can elevate the content most important to your users straight to the phone’s home screen, Wear Widgets (previously Tiles!), and cars with a familiar workflow.

In order to help users engage with your content and features, even outside your app, we are making widgets more expressive and adaptive with RemoteCompose. On Wear OS, RemoteCompose allows you to use the Compose tools you’re already comfortable with to define UI logic that renders natively on remote surfaces, ensuring that your glanceable experiences remain highly performant and responsive even on resource-constrained hardware. On mobile and cars, RemoteCompose is used as a new framework giving Widgets new expressive capabilities.

You can use Jetpack Glance (together with RemoteCompose on Wear) to deliver a cohesive user journey. Whether it’s viewing flight status on the car dashboard, checking a gate change on a watch, or managing a boarding pass from a phone widget, this shared approach maximizes your app’s presence while keeping your development effort focused and efficient.

Supercharge your media pipeline with a complete, production-ready toolkit

Android has become a world-class home for the entire media lifecycle, and we are simplifying the journey from the first capture to the final playback. By leveraging Jetpack CameraX and Media3, you can build professional-grade experiences that feel native across the entire ecosystem. 

It starts with high-fidelity capture using the CameraXViewfinder Composable, which ensures your preview remains perfectly scaled and responsive on any form factor, including foldables and tablets. Use this to build adaptive capture experiences like a picture-in-picture view for multi-tasking, or that take advantage of modern features like high-frame-rate or slow-motion capture with CameraX v1.5.

The new Media3 AI Effects library will provide a unified interface for premium features like Image & Video Enhance, Magic Eraser, and Studio Sound. This allows you to focus on the creative intent while Media3 handles the heavy lifting of choosing the most efficient and reliable path for the device. Then, use the latest improvements in multi-asset editing with Media3 Transformer to composite your edited videos together!

Complete the pipeline with tools designed for professional-grade export and viewing, including:

  • CodecDB, which offers data-driven encoding recommendations tailored to specific chipsets, ensuring your exported videos maintain high visual quality with minimal noise or blurriness
  • Scrubbing Mode in ExoPlayer to provide the buttery-smooth seeking experience users expect from premium media apps
  • Enhanced Cast support with the new CastPlayer API in Media3

By unifying these technical pillars, you can build a cohesive, high-performance media journey that delivers both delight for your users and high ROI for your development team.

For more details, check out the premium Android experience YouTube playlist.

Read the whole story
alvinashcraft
37 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Announcing the new Work IQ APIs

1 Share

Work IQ is a new intelligence layer for Microsoft 365, designed to understand how work gets done across your organizations.

The post Announcing the new Work IQ APIs appeared first on Microsoft 365 Blog.

Read the whole story
alvinashcraft
50 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Introducing Microsoft Scout: Your always-on personal agent

1 Share

Microsoft Scout is integrated across the Microsoft 365 apps you use every day, keeping it grounded in your flow of work.

The post Introducing Microsoft Scout: Your always-on personal agent appeared first on Microsoft 365 Blog.

Read the whole story
alvinashcraft
57 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Frontier Tuning: Teaching AI to work the way you do

1 Share

Today at Microsoft Build, we introduced Frontier Tuning, a new approach to making AI work the way your business does by applying reinforcement learning inside your compliance boundary with your own data, processes, and conventions. We’re announcing private preview, available through Forward Deployed Engineers, and upcoming availability in Microsoft Copilot Studio and Microsoft Foundry. 

Inside Frontier Tuning  

Frontier Tuning has three parts that work together: the environment where learning happens; the unique inputs you provide from your own business; and the tuned output models, skills, and harness that the system produces. 

  1. A continuously evolving environment. Tuning runs in a managed Reinforcement Learning Environment (RLE) used both for post-training and inference. During training, the system learns from real workflows, tool usage, and eval signals without affecting production systems. At inference it explores multiple frontier and fine-tuned models, from Microsoft AI and OpenAI, across turns to find stronger candidate paths before returning an answer. The system improves continuously as it learns from each interaction. 
  1. Your company’s data, domain knowledge, and workflows, in one platform. You bring your business data and know-how into the RLE: content, processes, conventions, terminology, and workflows that collectively define how your business runs. The experience is built to be easy to use, with no need for a data science degree. With a simple, guided approach, teams can bring data in and start tuning right away, enabling more people within your organization to capture the power of tuning. 
  1. Tuned models, skills, and harness that stay within your compliance boundary. This system produces tuned models, embeddings, skills, orchestration logic, and a runtime harness. All of this runs on your data, with your controls, without leaving your compliance boundary. The models inherit your access controls, so only people who could see the underlying data can access models built from it. The tools are virtualized, so agents can improve without affecting production systems. 

Together, these three pieces form a loop that gets sharper with every agent interaction. As knowledge in your environment grows, the models and harness evolve with it, so your agents keep getting better at the work you actually do. 

Fits the way you operate 

Image of Fronteir Tuning showing

Frontier Tuning fits into how you already build and operate agents. Users interact with agents tuned on your company’s data and workflows. Makers and developers build and refine these agents in the tools you’re already using, including Microsoft Copilot Studio and Microsoft Foundry.  

For example, soon within Copilot Studio, you will be able to access the RLE and use data like transcripts, knowledge bases and Microsoft 365 artifacts to improve the agents you already rely on. We’re also bringing capabilities to Foundry to allow developers to tune agents alongside the tools you already use. Here too, you’ll be able to set up an RLE, bring in your data, and tune models, including Microsoft AI models, and runtime behavior. More details on Foundry support will come in the coming months. And today, Frontier Tuning is available in Private Preview through our Forward Deployed Engineering (FDE) team. FDEs can partner with you end to end: defining the scenario, setting eval criteria, running the tuning process and delivering the agent, all within your environment.  

 Whether you build in Copilot Studio, develop in Foundry, or partner with an FDE, Frontier Tuning fits how your business already operates.  

Frontier Tuning in action 

Frontier Tuning is already in the hands of customers. We’ve partnered with a focused set of organizations including Land O’Lakes, EY, Bristol Myers Squibb, Pearson, McKinsey, McCarthy Tétrault, and the Josh Bersin Company. 

Microsoft Frontier Tuning enabled us to generate significantly better Copilot outputs for Communication Coach. The results were more closely aligned with Pearson’s learning science, giving learners clearer, more actionable feedback on how to strengthen workplace communication,” said Gian Paolo Perrucci, Product & Technology Officer, Pearson.

“Microsoft Frontier Tuning is set to transform the tax practice across the global EY organization. By combining a tax-domain–tuned reasoning LLM with our extensive enterprise knowledge and insights from our Tax Advisors, EY is elevating the delivery of tax services. Leveraging client context in Microsoft Work IQ and deep EY expertise, we are tuning an advisory agent within the RLE that will be deployed to 75,000 tax professionals globally in the coming period.” – Ben Ambrosino, EY Global Tax CTO, EY 

“Microsoft Frontier Tuning has given us a powerful way to bring Galileo’s research-backed HR intelligence into bespoke agents inside the Copilot experience. It is one of the most compelling capabilities we have seen for putting deep domain expertise into the daily flow of work.” – Josh Bersin, Founder and CEO, The Josh Bersin Company 

“With Frontier Tuning, we’re teaching the system how Microsoft HR works – capturing organizational knowledge in one connected environment that learns and improves with every use. We partnered with our product teams until the results were undeniable; successful task completion increased from 13% to 87%. Now we’re expanding to more HR workflows.” – Nathalie D’Hers, CVP Employee Experience

The pattern across these engagements is consistent: when you teach the system how your organization actually works, you get much higher fidelity output and more predictable execution. 

Get started 

If you’re interested in Frontier Tuning, visit aka.ms/frontiertuning to learn more. We will follow up with additional details. 

The post Frontier Tuning: Teaching AI to work the way you do appeared first on Microsoft 365 Developer Blog.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

Five ways we’re confusing AI capability and AI reality—and how to bridge the gap

1 Share

On the evening of June 1 at San Francisco’s Bartlett Hall, Microsoft CTO Kevin Scott spoke at a joint event with Lectures on Tap, attended by approximately 150 developers, founders, media, and tech industry leaders. His talk focused on what he described as the growing perceptual disconnect between AI capability and AI reality: the tendency to mistake rapid advances in model performance for equally rapid progress in deployment, organizational transformation, trust, and real-world value creation. 

Scott argued that the AI industry is at an inflection point where technical breakthroughs are arriving faster than institutions, workflows, and human systems can absorb them. While acknowledging the extraordinary pace of progress in areas like software development and agentic systems, he emphasized that the difficult challenge ahead is operationalizing these capabilities responsibly and meaningfully at scale. 

Here are his five observations of the ways in which AI reality is diverging from apparent capability. 

1. Capability ≠ deployment 

According to Scott, one of the biggest mistakes people are making right now is confusing technical capability with real-world deployment. The fact that a model can do something impressive doesn’t mean the surrounding systems, economics, governance, and human behaviors are ready to absorb it at scale. 

> We just shouldn’t have uniform faith that, as AI model capabilities improve, we’re going to get this crazy fast deployment everywhere.

“Today’s AI models are actually more capable than the things we’re using them for in the real world,” he said, addressing today’s “capability overhang,” as he has dubbed it. “We just shouldn’t have uniform faith that, as AI model capabilities improve, we’re going to get this crazy fast deployment everywhere.” 

2. Closed feedback loops ≠ universal progress 

Scott explained that some areas of AI (like agentic software development) are improving extraordinarily quickly because tight feedback loops allow those systems to iterate, evaluate, and refine outputs at high speed. But that dynamic doesn’t automatically extend to domains constrained by physical systems, regulation, or long experimental cycles.

> Tight feedback loops don’t automatically extend to domains constrained by physical systems.

“One of the things models can already do is postulate new ideas for particle physics experiments,” he said. “And the problem with particle physics experiments is that they take a lot of expert technical labor to set up and run, and they require the use of extremely expensive infrastructure. So there really isn’t a convenient way—other than publications in the scientific literature—to get the output of those experiments and feed it back into an actual model.” 

3. Software velocity ≠ organizational velocity 

AI is dramatically accelerating software development production, but that doesn’t mean organizations can suddenly move faster. In many cases, speeding up code generation simply exposes the slower-moving bottlenecks that were already present: deployment, integration, governance, and organizational change. 

“I build a lot of prototypes that are greenfield, where I have no constraints whatsoever,” noted Scott. “I just get an idea and there’s nothing stopping me from using an agentic coding system to produce a brand-new thing. But in many cases, the things we want to produce are fairly highly constrained.” 

> When things are moving this fast, it’s hard for people to notice the change and snap to.

Scott pointed to last-mile problems, the need for a lot of plumbing work, and human psychology as throttling issues. He also acknowledged the forecasting problem we will inevitably face as things move exponentially faster: “A lot of the stuff I’m doing right now using agentic coding to build things wasn’t even possible in November of last year,” he said. “When things are moving this fast, it’s hard for people to notice the change and snap to.” 

4. Activity ≠ value 

“Just because you’re using AI to create a lot of activity doesn’t necessarily mean that the activity you’re creating is valuable,” Scott said. 

The ability to generate enormous amounts of output doesn’t guarantee meaningful impact. As AI lowers the cost of creation, the defining question shifts from, “How much can we produce?” to: “What is actually worth building?”

> We have to pay close attention to how we measure value.

“We can have a lot of output, we can build more complex things than we built before,” added Scott. “That doesn’t necessarily mean that the things we’re building are super valuable. When they go into a user’s hand, are they solving a real problem? As developers, we have to pay especially close attention to how we measure value and to the feedback we get on the work that we’re doing.” 

5. Autonomy ≠ trust 

AI systems are becoming increasingly capable of operating autonomously, but autonomy alone does not create trust. Real-world deployment still requires governance, identity, access control, transparency, and meaningful human oversight. 

> That’s a new way of thinking about software.

“You’re always going to have human oversight, so this notion of autonomy is a little bit of a pipe dream,” Scott noted. “You have to build systems doing complex things in a way where people can trust that they’re doing them correctly and in a way that’s aligned with their interests and values. And that’s a new way of thinking about software.” 

Bridging the gap between capability and reality 

Ultimately, said Scott, “There’s a lot of work for all of us to do over the next months and years to fully unlock the potential of this crazy tool that we’ve built collectively. These problems that I enumerated don’t go away just as a function of scaling up an AI model. There is no silver bullet. That means there’s a bunch of technical work to be done, a bunch of societal work, a bunch of organizational work, and just dealing with legacy systems and plumbing.” 

AI capability gains will continue, but turning those gains into trusted systems that create meaningful, durable value is the harder and more important work. And that work starts today. 

“We need to engage more intensely than ever before,” said Scott, “because we can see the promise of this technology to benefit the world if we’re able to overcome these obstacles.” 

The post Five ways we’re confusing AI capability and AI reality—and how to bridge the gap appeared first on Command Line.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories