PowerToys just got even better with two useful new utilities and as massive set of improvements across the suite.
The post PowerToys 0.99 Arrives With Two New Utilities, Many Improvements appeared first on Thurrott.com.
In any non-trivial GenAI platform, you end up managing a fleet of models. Cheap models for classification and light chat. Reasoning models for multi-step tasks. Frontier models for the hard stuff. Specialty models for code, vision, or long context.
The architectural question isn’t which model is best — it’s how do we dispatch the right model per request, at scale, with governance and observability intact?
The usual patterns each have problems:
|
Pattern |
Trade-off |
|
Single-model deployment |
Overpays on simple prompts, underperforms on complex ones |
|
Application-layer router (rules/classifier) |
Brittle, needs constant retuning as models evolve |
|
LLM-as-router |
Adds a call hop, governance complexity, and its own failure modes |
|
Per-use-case deployments |
Explodes deployment surface; quota and cost reporting fragment |
Model Router in Microsoft Foundry is a platform-level answer to this: a trained routing model, deployed as a single endpoint, that dispatches across up to 18 underlying LLMs per prompt.
Design note: The routing decision is made by a trained model, not a rules engine. It analyzes the prompt itself — complexity, task type, reasoning requirements — and is updated by Microsoft as new underlying models are onboarded.
For architects, the division of responsibility is the key mental model:
|
Mode |
Quality band |
When to use |
|
Balanced (default) |
~1–2% of top model |
General-purpose chat and agent surfaces |
|
Quality |
Always top model |
Regulated outputs, complex reasoning, RAG over critical docs |
|
Cost |
~5–6% band |
High-volume classification, drafting, low-stakes chat |
Treat the routing mode as a deployment-scoped SLO lever. Different product surfaces can point at different Model Router deployments with different modes and subsets.
This is the feature most worth deliberate design thought. The subset list governs:
New models introduced in future router versions are not auto-added to your subset. That’s a deliberate guardrail — additions require explicit deployment changes.
Model Router is deployed like any Foundry model. Below is an indicative ARM/Bicep-style deployment snippet that sets Balanced mode and restricts routing to a curated subset — omit subset to accept the full default pool.
|
resource modelRouter 'Microsoft.CognitiveServices/accounts/deployments@2024-10-01' = { name: 'model-router-prod' parent: foundryAccount sku: { name: 'GlobalStandard' capacity: 250 } properties: { model: { format: 'OpenAI' name: 'model-router' version: '2025-11-18' } routingConfiguration: { mode: 'Balanced' // Balanced | Quality | Cost modelSubset: [ 'gpt-5-mini' 'gpt-5' 'gpt-5.2' 'claude-sonnet-4-5' 'claude-opus-4-6' 'o4-mini' ] } } } |
Confirm the exact schema against the current Foundry deployment API — parameter names can evolve between API versions.
If you prefer the portal over IaC, the flow is short:
Propagation note: changes to routing mode or model subset can take up to five minutes to take effect. Plan rollouts and tests accordingly.
Once deployed, Model Router is a standard chat-completions endpoint. Always capture response.model — it’s your per-request attribution for cost analysis and routing validation.
|
from openai import AzureOpenAI
client = AzureOpenAI( azure_endpoint="https://<your-resource>.openai.azure.com/", api_key="<your-key>", api_version="2025-11-18", )
response = client.chat.completions.create( model="model-router-prod", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Summarize the trade-offs of event sourcing at scale."}, ], )
print(response.choices[0].message.content) print("Served by:", response.model) # e.g. "gpt-5-mini-2025-08-07" |
Streaming works exactly as it does for any Azure OpenAI chat deployment. The routing decision happens before the first token; once chosen, the underlying model streams directly.
|
stream = client.chat.completions.create( model="model-router-prod", messages=[ {"role": "user", "content": "Walk me through CAP theorem with a concrete example."}, ], stream=True, )
for chunk in stream: if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) |
The 2025-11-18 release adds tool-use support, enabling Model Router inside the Foundry Agent Service. The router picks the right model per turn — cheap for trivial turns, reasoning-grade for multi-step ones.
|
tools = [{ "type": "function", "function": { "name": "get_order_status", "description": "Retrieve the current status of a customer order.", "parameters": { "type": "object", "properties": { "order_id": {"type": "string", "description": "The order ID."}, }, "required": ["order_id"], }, }, }]
response = client.chat.completions.create( model="model-router-prod", messages=[ {"role": "system", "content": "You help customers track orders."}, {"role": "user", "content": "Where is order A-4571?"}, ], tools=tools, tool_choice="auto", )
choice = response.choices[0] if choice.message.tool_calls: call = choice.message.tool_calls[0] print("Tool requested:", call.function.name, call.function.arguments) print("Served by:", response.model) |
Agent Service caveat: if your agent flow uses Foundry Agent Service tools, routing is restricted to OpenAI models only. Plan your subset accordingly when the router sits behind agent flows that depend on those tools.
If you’re standardizing on the Microsoft Foundry SDK rather than the OpenAI Python SDK, the Responses API offers an equivalent path. Install: pip install azure-ai-projects>=2.0.0 azure-identity.
|
from azure.identity import DefaultAzureCredential from azure.ai.projects import AIProjectClient
with ( DefaultAzureCredential() as credential, AIProjectClient(endpoint=project_endpoint, credential=credential) as project_client, project_client.get_openai_client() as openai_client, ): response = openai_client.responses.create( model="model-router-prod", input="In one sentence, name the most popular tourist destination in Seattle.", ) print(response.output_text) |
Because Model Router can dispatch to either chat or reasoning (o-series) models, parameter behavior shifts based on the actual model picked. Build your application around the union of both behaviors.
Practical rule: don’t rely on temperature/top-p for determinism in a router-fronted deployment, and treat reasoning_effort as the only knob with consistent meaning across reasoning vs. non-reasoning paths.
The JSON shape is identical to a standard chat completion. The model field is the key signal — it tells you which underlying model actually served the request. The usage block also reveals cached_tokens (prompt-cache hits) and reasoning_tokens (when an o-series model handled the prompt).
|
{ "id": "xxxx-yyyy-zzzz", "object": "chat.completion", "model": "gpt-5-mini-2025-08-07", "choices": [ { "index": 0, "finish_reason": "stop", "message": { "role": "assistant", "content": "Charismatic and bold—combining brash showmanship..." }, "content_filter_results": { "hate": { "filtered": false, "severity": "safe" }, ... } } ], "usage": { "prompt_tokens": 3254, "completion_tokens": 163, "total_tokens": 3417, "prompt_tokens_details": { "cached_tokens": 3200, "audio_tokens": 0 }, "completion_tokens_details": { "reasoning_tokens": 128, "audio_tokens": 0 } } } |
Performance metrics
Cost attribution
Three practical recommendations
|
Issue |
Likely cause |
Resolution |
|
Rate limit exceeded |
Too many requests to the router deployment |
Increase TPM quota or implement retry with exponential backoff |
|
Unexpected model selection |
Routing logic picked a different model than expected |
Review routing mode; constrain via model subset |
|
High latency |
Router overhead plus underlying-model processing |
Use Cost mode for latency-sensitive workloads; smaller models respond faster |
|
Claude model not routing |
Claude requires a separate catalog deployment |
Deploy Claude models from the catalog before adding to subset |
|
Context exceeded |
Effective context = smallest model in subset |
Curate subset to larger-context models, or summarize/truncate upstream |
Strong fit:
Weaker fit:
Model Router turns multi-model dispatch from an application concern into a platform concern, with governance levers (mode, subset, region) that map cleanly to the trade-offs architects actually negotiate: cost, quality, compliance, and resilience. That’s a meaningful simplification of an otherwise accidentally-complex part of production GenAI architecture.
Microsoft publishes several open-source samples in the foundry-samples GitHub organization that are useful for hands-on evaluation:
These samples are for learning and experimentation. Review against your organization’s security, compliance, and Responsible AI policies before adapting any of it for production.
If you’re piloting Model Router, what subset and mode did you land on — and what surprised you in the routing distribution? Share in the comments.
Some journeys begin with a plan. Others begin with a brave choice—and the determination to keep showing up. For Ariane Djeupang, becoming a Microsoft Most Valuable Professional (MVP) wasn’t a trophy hunt. It was the next chapter in a much longer story: years of mentoring, organizing, writing, and building community - often with limited resources, but unlimited heart.
Ariane is a project manager and machine learning engineer based in Cameroon - and a leader across multiple open-source communities. “I’m currently chairing PyCon Africa,” she shared, describing a conference that rotates host countries across the continent. In addition, she volunteers and organizes across the Python and Django ecosystem, mentoring beginners, coaching Django Girls workshops, and helping events run smoothly behind the scenes.
“I also mentor people - newcomers and beginners in tech - and those who would like to start their Python or Django journey.”
Growing up, Ariane felt the familiar pressure many young people experience: someone else had already decided what her future “should” be. “My dad wanted me to become a doctor,” she said. But after high school, she made a bold pivot: “I instantly chose to register in computer science.”
From there, her world expanded beyond textbooks. A senior student introduced her to a local developer community, and Ariane started asking big questions: “What is a community? … What is that impact?” Soon she was volunteering at events - and then helping build new ones. She and peers co-founded Python Cameroon, fueled by a love for the language and its welcoming learning curve. “I used to describe the syntax as elegant,” she laughed, remembering how she encouraged friends to start learning Python.
Ariane didn’t set out to chase an award—she didn’t even know the program existed until a friend from the Django community pointed it out. “That was my first time … someone talking about that,” she said. After she was nominated and completed the application, she was welcomed into the Microsoft MVP community - one of only four MVPs in Cameroon, and the first (and currently only) woman MVP in the country.
“It’s like a validation of years of dedication for me - because late nights and written tutorials… organizing meetups and events, mentoring aspiring technologists… all these were done with limited resources.”
For Ariane, the recognition wasn’t just personal - it was also a statement about what meaningful contribution looks like. “Impact is not just measured by … demography, by geography or privilege,” she said. “It’s measured by consistency.”
Professionally, being an MVP opened doors she “never imagined even possible” - including “direct access to Microsoft product teams” and early previews of technology (she mentioned getting access to previews in GitHub Copilot). But she quickly returned to what matters most to her: the people around her. “Perhaps, most importantly, it’s a responsibility,” Ariane said. In a country where digital transformation is still emerging and opportunity can be unevenly distributed, she sees her MVP platform as a way to show others what’s possible - especially for those whose voices are too often overlooked.
When asked what helps communities become more welcoming - especially in global spaces - Ariane didn’t hesitate. She believes experienced community leaders have “a unique responsibility … to set the tone for inclusion.” Here are a few practices she shared that any of us can start using right now.
“Inclusivity is not just a one-time effort, it’s a continuous practice. By modeling openness, humility, and curiosity… we can create environments where everyone feels they belong, can contribute - contribute meaningfully.”
Ariane also offered a refreshingly grounded reminder about growth: it doesn’t have to be frantic to be real. She remembers the early days of MVP onboarding clearly. “There is a lot to know. There is a lot to read,” she said. Her advice: “Don’t rush… no rush. You will learn gradually.”
Most importantly, she encouraged new MVPs (and anyone stepping into leadership) to keep doing what earned trust in the first place: “Just continue to work as you used to work.” Yes, new doors open - talk opportunities, volunteering, collaborations - but sustainability matters. “At the end of the day, you are not like a robo,” she said. “You shouldn’t overstress yourself … trying to prove ‘I’m an MVP’ by doing everything at once. “I’m giving 15 talks in one month… I’ve written 100 articles in one month… that’s not sustainable.”
In Ariane’s world, “inclusion” isn’t just what happens on stage - it’s whether people can even get into the room. She spoke candidly about the realities many African technologists face when attending global events: flights that can be “almost 2000” dollars, plus accommodation, ground transportation, and visa fees. Those constraints don’t reflect a lack of talent - they reflect a lack of access.
And sometimes, inclusion starts with language. Ariane helped change the name of a conference benefit from “financial aid” to “opportunity grant.” Why? “The main reason we changed it was because of inclusivity reasons,” she explained. Some people avoid applying because they don’t want to be seen as “broken”—when the reality is simply: “I cannot afford maybe a ticket or the flight to attend the conference.” Names matter. They can either add stigma - or open a door.
Ariane’s story is a celebration - but it’s also an invitation. In every region, in every user group, in every online forum, we can choose to be the kind of community member who makes someone feel seen. We can lead with empathy. We can simplify onboarding. We can amplify voices that are too often ignored. And we can sponsor - not just with money, but with introductions, speaking invites, leadership opportunities, and public credit.
If you want to learn from Ariane’s advice and support underrepresented voices in your tech community, start here:
Congratulations again to Ariane - an MVP whose work reminds us that community leadership isn’t about a spotlight. It’s about building ladders, widening doors, and making sure more people get to step into their future. Learn more and connect with Ariane Djeupang through her MVP Profile and on LinkedIn.
To find an MVP and learn more about the MVP Program visit the MVP Communities website and follow our updates on LinkedIn or #mvpbuzz.
Join us for a future live session through the Microsoft Reactor where we walk through what the MVP program is about, what we look for, and how nominations work. These sessions are designed to help you connect the dots between the work you’re already doing and the impact the MVP Program recognizes - with time for questions, examples, and real conversations.
Learn efficient strategies I’ve utilized to move data, from 1GB to 700+ GB, in Azure SQL DB with little to no downtime.
✅ Chapters:
0:00 Introduction
2:45 Demo
9:20 Tips
11:20 Getting started
✅ Resources:
Github:
About MVPs:
Microsoft Most Valuable Professionals, or MVPs, are technology experts who passionately share their knowledge with the community. They are always on the "bleeding edge" and have an unstoppable urge to get their hands on new, exciting technologies. They have very deep knowledge of Microsoft products and services, while also being able to bring together diverse platforms, products and solutions, to solve real world problems. MVPs make up a global community of over 4,000 technical experts and community leaders across 90 countries/regions and are driven by their passion, community spirit, and quest for knowledge. Above all and in addition to their amazing technical abilities, MVPs are always willing to help others - that's what sets them apart. Learn more: https://aka.ms/mvpprogram
📌 Let's connect:
Twitter: Anna Hoffman, https://twitter.com/AnalyticAnna
Twitter: AzureSQL, https://aka.ms/azuresqltw
🔴 To watch other MVP Edition episodes, see our playlist: https://aka.ms/dataexposedmvps
To check out even more Data Exposed episodes, see our playlist: https://aka.ms/dataexposedyt
🔔 Subscribe to our channels for even more SQL tips:
Microsoft Azure SQL: https://aka.ms/msazuresqlyt
Microsoft SQL Server: https://aka.ms/mssqlserveryt
Microsoft Developer: https://aka.ms/microsoftdeveloperyt
#AzureSQL #SQLServer
As AI agents become more capable—and more autonomous—one question rises fast: How do we draw agentic borders? For many organizations, this is also a sovereignty question: what stays in-country, who can access it, and how do you enforce policy across systems and regions?
In this episode of The Shift Podcast: Agentic Edition, leaders from Microsoft Azure explore the evolving boundaries between agents, humans, systems, and responsibility—and what it takes to keep trust and accountability as AI becomes more agentic.
The conversation covers:
What “agentic borders” mean in enterprise practice
How to approach responsibility and oversight in agent systems
How sovereignty requirements (data residency, jurisdiction, and access) shape agent design and deployment
Why governance matters as agents move into real use
What to consider when setting guardrails for agents
An honest discussion on moving forward responsibly—while still building with speed and ambition.
👉 Read Microsoft’s European Digital Commitments: https://aka.ms/MSFTEUDigitalCommit
👉 Join the Tech Community: https://techcommunity.microsoft.com/
Get to know the team:
Evelyn Ozzie https://www.linkedin.com/in/evelyn-ozzie-18062721/
Meena Gowdar https://www.linkedin.com/in/mjgowdar/
Edouard de Cremiers https://www.linkedin.com/in/edouarddecremiers/
Karim Batthish https://www.linkedin.com/in/karimondo/
The Shift Podcast: Agentic Edition is a place for experts to share their insights and opinions. As students of the future of technology, Microsoft values inputs from a diverse set of voices. That said, the opinions and findings of our guests are their own and they may not necessarily reflect Microsoft's positions as a company.
This episode of The Shift was recorded in February 2026. All information about products and offers is relevant to the time of recording.
#TheShiftPodcast #ResponsibleAI #AgenticAI #DigitalGovernance #MicrosoftAzure #DigitalSovereignty