Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
146207 stories
·
33 followers

Maia 200: The AI accelerator built for inference - The Official Microsoft Blog

1 Share

Today, we’re proud to introduce Maia 200, a breakthrough inference accelerator engineered to dramatically improve the economics of AI token generation. Maia 200 is an AI inference powerhouse: an accelerator built on TSMC’s 3nm process with native FP8/FP4 tensor cores, a redesigned memory system with 216GB HBM3e at 7 TB/s and 272MB of on-chip SRAM, plus data movement engines that keep massive models fed, fast and highly utilized. This makes Maia 200 the most performant, first-party silicon from any hyperscaler, with three times the FP4 performance of the third generation Amazon Trainium, and FP8 performance above Google’s seventh generation TPU. Maia 200 is also the most efficient inference system Microsoft has ever deployed, with 30% better performance per dollar than the latest generation hardware in our fleet today.

Maia 200 is part of our heterogenous AI infrastructure and will serve multiple models, including the latest GPT-5.2 models from OpenAI, bringing performance per dollar advantage to Microsoft Foundry and Microsoft 365 Copilot. The Microsoft Superintelligence team will use Maia 200 for synthetic data generation and reinforcement learning to improve next-generation in-house models. For synthetic data pipeline use cases, Maia 200’s unique design helps accelerate the rate at which high-quality, domain-specific data can be generated and filtered, feeding downstream training with fresher, more targeted signals.

Maia 200 is deployed in our US Central datacenter region near Des Moines, Iowa, with the US West 3 datacenter region near Phoenix, Arizona, coming next and future regions to follow. Maia 200 integrates seamlessly with Azure, and we are previewing the Maia SDK with a complete set of tools to build and optimize models for Maia 200. It includes a full set of capabilities, including PyTorch integration, a Triton compiler and optimized kernel library, and access to Maia’s low-level programming language. This gives developers fine-grained control when needed while enabling easy model porting across heterogeneous hardware accelerators.

YouTube Video

Engineered for AI inference

Fabricated on TSMC’s cutting-edge 3-nanometer process, each Maia 200 chip contains over 140 billion transistors and is tailored for large-scale AI workloads while also delivering efficient performance per dollar. On both fronts, Maia 200 is built to excel. It is designed for the latest models using low-precision compute, with each Maia 200 chip delivering over 10 petaFLOPS in 4-bit precision (FP4) and over 5 petaFLOPS of 8-bit (FP8) performance, all within a 750W SoC TDP envelope. In practical terms, Maia 200 can effortlessly run today’s largest models, with plenty of headroom for even bigger models in the future.

A close-up of the Maia 200 AI accelerator chip.

Crucially, FLOPS aren’t the only ingredient for faster AI. Feeding data is equally important. Maia 200 attacks this bottleneck with a redesigned memory subsystem. The Maia 200 memory subsystem is centered on narrow-precision datatypes, a specialized DMA engine, on-die SRAM and a specialized NoC fabric for high‑bandwidth data movement, increasing token throughput.

A table with the title “Industry-leading capability” shows peak specifications for Azure Maia 200, AWS Trainium 3 and Google TPU v7.

Optimized AI systems

At the systems level, Maia 200 introduces a novel, two-tier scale-up network design built on standard Ethernet. A custom transport layer and tightly integrated NIC unlocks performance, strong reliability and significant cost advantages without relying on proprietary fabrics.

Each accelerator exposes:

  • 2.8 TB/s of bidirectional, dedicated scaleup bandwidth
  • Predictable, high-performance collective operations across clusters of up to 6,144 accelerators

This architecture delivers scalable performance for dense inference clusters while reducing power usage and overall TCO across Azure’s global fleet.

Within each tray, four Maia accelerators are fully connected with direct, non‑switched links, keeping high‑bandwidth communication local for optimal inference efficiency. The same communication protocols are used for intra-rack and inter-rack networking using the Maia AI transport protocol, enabling seamless scaling across nodes, racks and clusters of accelerators with minimal network hops. This unified fabric simplifies programming, improves workload flexibility and reduces stranded capacity while maintaining consistent performance and cost efficiency at cloud scale.

A top-down view of the Maia 200 server blade.

A cloud-native development approach

A core principle of Microsoft’s silicon development programs is to validate as much of the end-to-end system as possible ahead of final silicon availability.

A sophisticated pre-silicon environment guided the Maia 200 architecture from its earliest stages, modeling the computation and communication patterns of LLMs with high fidelity. This early co-development environment enabled us to optimize silicon, networking and system software as a unified whole, long before first silicon.

We also designed Maia 200 for fast, seamless availability in the datacenter from the beginning, building out early validation of some of the most complex system elements, including the backend network and our second-generation, closed loop, liquid cooling Heat Exchanger Unit. Native integration with the Azure control plane delivers security, telemetry, diagnostics and management capabilities at both the chip and rack levels, maximizing reliability and uptime for production-critical AI workloads.

As a result of these investments, AI models were running on Maia 200 silicon within days of first packaged part arrival. Time from first silicon to first datacenter rack deployment was reduced to less than half that of comparable AI infrastructure programs. And this end-to-end approach, from chip to software to datacenter, translates directly into higher utilization, faster time to production and sustained improvements in performance per dollar and per watt at cloud scale.

A view of the Maia 200 rack and the HXU cooling unit.

Sign up for the Maia SDK preview

The era of large-scale AI is just beginning, and infrastructure will define what’s possible. Our Maia AI accelerator program is designed to be multi-generational. As we deploy Maia 200 across our global infrastructure, we are already designing for future generations and expect each generation will continually set new benchmarks for what’s possible and deliver ever better performance and efficiency for the most important AI workloads.

Today, we’re inviting developers, AI startups and academics to begin exploring early model and workload optimization with the new Maia 200 software development kit (SDK). The SDK includes a Triton Compiler, support for PyTorch, low-level programming in NPL and a Maia simulator and cost calculator to optimize for efficiencies earlier in the code lifecycle. Sign up for the preview here.

Get more photos, video and resources on our Maia 200 site and read more details.

Scott Guthrie is responsible for hyperscale cloud computing solutions and services including Azure, Microsoft’s cloud computing platform, generative AI solutions, data platforms and information and cybersecurity. These platforms and services help organizations worldwide solve urgent challenges and drive long-term transformation.

Tags: AI, Azure, datacenters

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Who Will Adapt Best to AI Disruption?

1 Share
From: AIDailyBrief
Duration: 9:00
Views: 365

Brought to you by:
KPMG – Go to ⁠www.kpmg.us/ai⁠ to learn more about how KPMG can help you drive value with our AI solutions.
Vanta - Simplify compliance - ⁠⁠⁠⁠⁠⁠⁠https://vanta.com/nlw

The AI Daily Brief helps you understand the most important news and discussions in AI.
Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614
Get it ad free at
Join our Discord: https://bit.ly/aibreakdown

Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete

973: The Web’s Next Form: MCP UI (with Kent C. Dodds)

1 Share

Scott and Wes sit down with Kent C. Dodds to break down MCP, context engineering, and what it really takes to build effective AI-powered tools. They dig into practical examples, UI patterns, performance tradeoffs, and whether the future of the web lives in chat or the browser.

Show Notes

  • 00:00 Welcome to Syntax!
  • 00:44 Introduction to Kent C. Dodds
  • 02:44 What is MCP?
  • 03:28 Context Engineering in AI
  • 04:49 Practical Examples of MCP
  • 06:33 Challenges with Context Bloat
  • 08:08 Brought to you by Sentry.io
  • 09:37 Why not give AI API access directly?
  • 12:28 How is an MCP different from Skills
  • 14:58 MCP optimizations and efficiency levers
  • 16:24 MCP UI and Its Importance
  • 19:18 Where are we at today with MCP
  • 24:06 What is the development flow for building MCP servers?
  • 27:17 Building out an MCP UI.
  • 29:29 Returning HTML, when to render.
  • 36:17 Calling tools from your UI
  • 37:25 What is Goose?
  • 38:42 Are browsers cooked? Is everything via chat?
  • 43:25 Remix3
  • 47:21 Sick Picks & Shameless Plugs

Sick Picks

Shameless Plugs

Hit us up on Socials!

Syntax: X Instagram Tiktok LinkedIn Threads

Wes: X Instagram Tiktok LinkedIn Threads

Scott: X Instagram Tiktok LinkedIn Threads

Randy: X Instagram YouTube Threads





Download audio: https://traffic.megaphone.fm/FSI8670091234.mp3
Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Stop Teaching and Start Doing—The Secret to Agile Adoption in Construction | Felipe Engineer-Manriquez

1 Share

Agile in Construction: Stop Teaching and Start Doing—The Secret to Agile Adoption in Construction With Felipe Engineer-Manriquez

Read the full Show Notes and search through the world's largest audio library on Agile and Scrum directly on the Scrum Master Toolbox Podcast website: http://bit.ly/SMTP_ShowNotes.

 

"I forgot a couple key things. Number one, they don't have the enthusiasm and love for these new ways of working like I do because they didn't understand the problem that they were in." - Felipe Engineer-Manriquez

 

Felipe shares a powerful failure story from his early days adopting Lean and Agile in construction. After discovering Jeff Sutherland's "Red Book" and experiencing incredible results using Scrum with his 4-year-old son on a weekend project, he was eager to bring these methods to his construction team. The problem? He immediately went into teaching mode. His boss Nate and the rest of the team wanted nothing to do with Scrum—they Googled it, saw it was "a software thing," and shut down completely. This is what Felipe now calls the "Not Invented Here Syndrome"—people resist ideas that don't originate from their domain. The breakthrough came when Felipe stopped teaching and started doing. He calls it the "ninja Scrum approach"—embodying the processes and tools without labeling them, making work visible, and delivering results. 

When he managed $25 million worth of scopes using these methods silently, one project manager named Tom stopped him and said, "We've never come to a project where people held their promises." Within a year, even his resistant boss Nate acknowledged the transformation in a post-mortem review. The lesson: don't teach until people pull for the teaching.

 

In this episode, we refer to NoEstimates and Scrum: The Art of Doing Twice the Work in Half the Time by Jeff Sutherland.

 

Self-reflection Question: When you introduce new practices to a team, do you wait until they pull for the teaching, or do you default to explaining before they've seen the value?

 

[The Scrum Master Toolbox Podcast Recommends]

🔥In the ruthless world of fintech, success isn't just about innovation—it's about coaching!🔥

Angela thought she was just there to coach a team. But now, she's caught in the middle of a corporate espionage drama that could make or break the future of digital banking. Can she help the team regain their mojo and outwit their rivals, or will the competition crush their ambitions? As alliances shift and the pressure builds, one thing becomes clear: this isn't just about the product—it's about the people.

 

🚨 Will Angela's coaching be enough? Find out in Shift: From Product to People—the gripping story of high-stakes innovation and corporate intrigue.

 

Buy Now on Amazon

 

[The Scrum Master Toolbox Podcast Recommends]

 

About Felipe Engineer-Manriquez

 

Felipe Engineer-Manriquez is a best-selling author, international speaker, and host of The EBFC Show. A force in Lean and Agile, he helps teams build faster with less effort. Felipe trains and coaches changemakers worldwide—and wrote Construction Scrum to make work easier, better, and faster for everyone.

 

You can link with Felipe Engineer-Manriquez on LinkedIn.

 

You can also find Felipe at thefelipe.bio.link, check out The EBFC Show podcast, and join the EBFC Scrum Community of Practice.

 





Download audio: https://traffic.libsyn.com/secure/scrummastertoolbox/20260126_Felipe_Engineer_M.mp3?dest-id=246429
Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Episode 100: The 17th Annual Customer Choice Awards

1 Share

The votes are in, and the time has come to reveal the winners of the 17th Annual Trader Joe's Customer Choice Awards! Seventeen years of celebrating our customers' favorite products, as voted by you, our customers. It's just as exciting to us this year as it was in year one.

Favorite cheese? It's here. Favorite snack? It's a crunchy one. Favorite Trader Joe's product overall? It's delicious! We have the results, including a couple of surprises – headscratchers, really, and we're here for 'em. Listen in for all the details, and visit traderjoes.com for the full list. We close out this episode with a little self-indulgence (even more than usual!), and a look toward what's ahead.

Transcript (PDF)

 





Download audio: https://traffic.libsyn.com/secure/insidetjs/The_17th_Annual_Customer_Choice_Awards.mp3?dest-id=704103
Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete

New in Excel for the web: The full Power Query experience

1 Share

We’ve reached yet another milestone in Excel for the web: The full Power Query user experience is now generally available, including the import wizard and Power Query Editor.

After we released the ability to refresh Power Query data from authenticated data sources, we were able to unlock the ability to complete the full user journey of importing data and editing it using Power Query.

Getting started

Go here to learn all about Power Query in Excel for the web: https://aka.ms/pqxlo

See this support article for more information on Power Query data sources in Excel versions.


Note: 

Viewing and refreshing queries is available to all Microsoft 365 Subscribers.
The full Power Query experience is available to all Microsoft 365 Subscribers with Business or Enterprise plans.

 

Importing data

You can import data into Excel using Power Query from a wide variety of data sources, for example: Excel Workbook, Text/CSV, XML, JSON, SQL Server Database, SharePoint Online List, OData, Blank Table, and Blank Query.

  1. Select Data Get Data:

  2. In the Choose data source dialog box, select one of the available data sources:

 

 

  1. Connect to the data source.

  2. After you select the source, the authentication kind will be auto-populated, according to the relevant source (you can still change it, if you like).

  3. Press Next, and choose the table you wish to import:

  4. Press Transform data to open the table in the Power Query editor, where you can perform many powerful transformations.
    Note: You can open the editor whenever you need it, by using Data > Get Data > Launch Power Query Editor.

  5. When you are done, load the table – press Close & Load to load to the Excel grid:



    Or Close & Load to - to either load to the Excel grid, or create a connection-only query:

  6. See the query was created in the Queries & Connections pane:




    If you loaded to a table, you can see it on the Excel grid:



  7. You can refresh the created query from the Queries & Connections pane, or by using Data > Refresh/Refresh All.
    1. You can also perform operations, such as editing the query (with the Power Query Editor), renaming it, and more:

     

     

What’s next? 

Future plans include adding data sources and advanced features.

Feedback 

We hope you like this new addition to Excel and we’d love to hear what you think about it!

Let us know by using the Feedback button in the top right corner in Excel - add #PowerQuery in your feedback so that we can find it easily.

 

Want to know more about Excel for the web? See What's new in Excel for the web and subscribe to our Excel Blog to get the latest updates. Stay connected with us and other Excel fans around the world – join our Excel Community and follow us on Twitter

Jonathan Kahati, Gal Horowitz

~ Excel Team

Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories