Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
152637 stories
·
33 followers

GitHub for Beginners: Getting started with Markdown

1 Share

Welcome back to GitHub for Beginners. We’ve covered a wide range of topics so far this season, including GitHub Issues and Projects, GitHub Actions, security, and GitHub Pages. Now we’re going to teach you everything you need to know to get started with Markdown, the markup language used across GitHub.

Once you learn the basics of how to use Markdown, you’ll develop an essential skill that will transform how you write READMEs as well as how to format issues, pull requests, and your agent instruction files. By the end of this post, you’ll have the knowledge you need to make your projects and contributions easier for others to explore.

As always, if you prefer to watch the video or want to reference it, we have all of our GitHub for Beginners episodes available on YouTube.

What is Markdown and why is it important?

Markdown is a lightweight language for formatting plain text. You can use Markdown syntax, along with some additional HTML tags, to format your writing on GitHub. You can do this in repository READMEs, issue and pull request descriptions, and comments on issues and pull requests.

Markdown gives you the ability to create clear, readable documentation. Having a clean README in your project or a well-formatted issue can make a huge difference when someone lands on your content for the first time.

And one of the best parts is that when you get the syntax down, you’ll find yourself using it in almost every project you work on!

Where can I use Markdown?

The most common place where you’ll encounter Markdown is in your repository’s README file. But you’ll also find yourself using it in issues, pull requests, discussions, and even wikis. Any time you write or communicate on GitHub, Markdown is behind the scenes, helping your text look clean and consistent.

Markdown extends beyond GitHub to modern note-taking apps, blog platforms, and documentation tools. It’s a widely adopted language used across the technical space, so learning how to use it can benefit you beyond just how you interact with GitHub.

Basic syntax

We’re going to start with the common features that you’ll use the most. While we’re going through these, you can try them out to see how they work. The easiest way to do this is by opening a markdown file on your repository.

  1. Navigate to a repository you own on github.com.
  2. Make sure you are on the Code tab of your repository.
  3. Click Add file near the top of the window and select Create new file from the pull-down menu.
  4. In the box at the top of the editor, name your file. Make sure the filename ends in .md (e.g., markdownTestFile.md).
  5. Select the Edit button.
  6. Enter any Markdown syntax into the editor window.

You can see what the Markdown text you enter will look like by selecting the Preview button; there’s no need to make a commit unless you want to save your test file. Just select the Edit button to go back to editing so you can enter more Markdown text.

Now that you know how to try it out, let’s get started with the syntax. First up are headers. These are your title and section names. You create them by adding pound signs (#), also known as hashtags, in front of your text. One pound sign indicates a header, two will create a subheader, and so on.

# GitHub for Beginners 

 

## Basic Markdown syntax 

 

### Headers 

If you want to emphasize your text, you can use bold and italic fonts. You create these by using either asterisks (*) or underscores (_). Either of these symbols work in the same way, you just have to make sure to pair them up appropriately. A single character makes text italic, a double character makes text bold, and a triple character makes it both bold and italic. You can emphasize characters within a string or multiple strings within a line of text.

Here is some *italic text* 

Here is some **bold text** 

___Here is both bold and italic text 

Over multiple lines___

Sometimes you may want to quote important text. To do this, add the greater than (>) symbol as the first character in a line of text. If you would like to quote something that spans multiple lines, you need to add the greater than symbol at the start of each individual line.

> No design skills required. 

> 

> No overthinking allowed. 
> 

> Just ship your work.

Lists

Now let’s get into something a little more involved: lists.

Lists are a common way to express your steps and procedures in an ordered and unordered manner. To create an ordered list, number each element in the list (i.e. 1., 2., 3., etc.).

While this can be clear to read, what if you want to add an element between two consecutive numbers? The good news is that you don’t need to renumber the entire list. Markdown interpreters allow you to order your items with any number, and they automatically interpret it as an ordered list from first to last.

1. Click the "Use this template” button at the top of this repo. 

1. Name your new repository (e.g., my-portfolio). 

1. Clone your new repo and start customizing!

For an unordered list, start a line with either a hyphen (-), asterisk (*), or a plus sign (+). Markdown will render any of these characters as the start of an unordered list.

* Click the "Use this template” button at the top of this repo. 

* Name your new repository (e.g., my-portfolio). 

* Clone your new repo and start customizing!

If you would like to create nested lists, indent four spaces to start a new indented list. You can do this with both ordered and unordered list items.

1. Click the "Use this template” button. 

    - Located at the top of the repo. 

    - This will create a new repository using this template. 

1. Name your new repository. 

    - e.g., my-portfolio 

    - This can be created under your personal GitHub account. 

1. Clone your new repo and start customizing!

When you’re done with your list, hit Enter twice to go back to plain text.

Code

Sometimes you may want to display a snippet of code in your Markdown as an example. This could be for steps in a procedure or as part of your project’s installation process. Many Markdown interpreters render code snippets with formatting and syntax highlighting. You can denote code in Markdown by surrounding it with a backtick (`) character.

`git clone https://github.com/YOUR_USERNAME/YOUR_REPO_NAME.git` 

If you have code that spans multiple lines, you can use three backtick characters to create a code block. Any characters between these triple backticks, including spaces and new lines, will render as code.

```bash 

# Clone the repository 

git clone https://github.com/YOUR_USERNAME/YOUR_REPO_NAME.git 

cd YOUR_REPO_NAME 

 

# Install dependencies 

npm install 

 

# Start the development server 

npm run dev 

``` 

Now let’s learn how to spice up our Markdown files. We’ll start with links. Links allow you to point people to helpful resources, documentation, or other pages in your project. They’re written using brackets ([]) and parentheses (()). Place the text you want to display in the brackets, followed immediately by the URL in parentheses, with no space between the two. This keeps your writing clean and easy to follow.

Open [your local host](http://localhost:3000) to see your portfolio. 

Images work in almost the same way, but with one small difference: you need to add an exclamation point (!) at the beginning. This is perfect for adding screenshots, diagrams, or even a project logo to your README.

![Mona](https://avatars.githubusercontent.com/u/92997159?v=4) 

To make things even easier, on GitHub, you can just drag-and-drop an image into an issue or pull request, and it automatically generates the right Markdown for you.

Whether you’re linking out to a tutorial or showing off a screenshot, links and images help you add that extra bit of personality and clarity to your Markdown.

What’s next?

You now know the basics of Markdown, including what it is, why it matters, where you can use it, and how to start writing it with confidence. With just a few techniques, you can create clean, readable documentation that makes your GitHub projects stand out.

Whether you’re building a README, opening an issue, or writing project notes, Markdown is going to be one of the tools you use the most.

If you want to learn more about Markdown, here are some good places to get started:

Happy coding!

The post GitHub for Beginners: Getting started with Markdown appeared first on The GitHub Blog.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Unlocking human ambition to drive business growth with AI

1 Share

As our customers progress toward becoming Frontier Firms, they are using AI not only to optimize how work gets done, but to reinvent their business on the promise of growth. Organizations can now unlock creativity, accelerate innovation and democratize intelligence by bringing Copilots and agents directly into the tools people love and use every day. As adoption continues to scale, business value is no longer measured solely by time saved or productivity gained, but in how effectively organizations translate their unique IQ into decisions that drive measurable impact across core business processes.

The two most important elements in any AI solution are Intelligence + Trust. At Microsoft, we are focused on providing a platform for both through Microsoft IQ and Agent 365, respectively, so customers can harness the power of AI, have it amplify their unique differentiation and do so in a model diverse, open and heterogeneous manner. Microsoft IQ brings context to your data and provides faster, more accurate, more trusted experiences across modalities of chat, artifact creation and augmentation, and agent development; all while safeguarding your assets and protecting your intellectual property. Agent 365 provides observability, governance and security across all the agents you build — whether on Microsoft’s platform or third-party environments — so you can trust the outcomes you achieve with AI and ensure ROI for the same.

With intelligence embedded into daily work, organizations are activating human ambition — engaging customers more effectively, reshaping business processes and accelerating innovation without adding operational complexity — turning gains into competitive advantage. Trust makes this durable, allowing organizations to scale securely with AI. The shift to becoming Frontier can be seen in our recent partnerships, with BMW Group selecting Microsoft for its large-scale deployment of Microsoft 365 Copilot across its global workforce and Accenture rolling out Copilot to more than 740,000 employees.

Frontier Transformation — built on a foundation of Intelligence + Trust — is how organizations are enabling AI for growth; moving from aspiration to outcome with confidence, driving measurable business gains and maintaining the rigor required to operate AI responsibly. Across industries, our customers and partners are putting AI to work to reveal new sources of innovation and business value. I am pleased to highlight additional stories from this past quarter.

With millions of customer queries overwhelming its support channels, Air India was facing rising costs, slower response times and growing frustration for customers and employees. Within six months, internal development teams built an agentic AI solution using Azure OpenAI and in Foundry models. AI.g handles 40,000 customer queries daily and since launching has saved the company millions of dollars. The agent has resolved more than 13 million conversations with a 97% success rate, allowing employees to focus on contributing at a higher level — solving complex cases that require nuanced human judgement and problem-solving skills. Air India is the first airline worldwide to deploy generative AI for customer service at scale.

As the second largest school district in Florida, Broward County Public Schools serve approximately 235,000 students across 235 schools and 25,000 employees. Although the district had extensive data, it lacked the real-time insights required to support its students — while simultaneously facing a $90-million budget shortfall. Rather than slowing innovation, the district used financial pressure as a catalyst to modernize systems and rethink how work was done. By deploying Microsoft 365 Copilot, educators and staff reclaimed six to seven hours weekly — time redirected to students for direct interaction, coaching and feedback. The district also equipped students with Microsoft 365 Copilot Chat and Copilot Studio to provide faster access to learning resources and foster more equitable learning — providing support for students with disabilities, English language learners and those needing additional academic assistance. The district’s adoption of Copilot — the largest K-12 deployment globally — is also expected to generate $40 to $50 million in savings over five years.

Cemex is one of the world’s largest building materials companies, operating more than 50 cement plants and over 1,000 ready mix plants across four continents. To accelerate execution at scale, Cemex built LUCA Bot — an AI agent built in Microsoft Foundry with Azure OpenAI — giving approximately 100 senior business leaders visibility into company-wide performance across more than 120 KPIs. The self-service tool processes 400 to 500 queries per month with high accuracy, delivering real-time, conversational insights across global sales, plant operations and financial performance. By compressing decision cycles from days to seconds, the company shifted from reactive to real-time decision-making — allowing leaders to recognize demand signals faster, improve operational efficiency and drive business outcomes across its multi-billion-dollar enterprise.

Cybersecurity startup ContraForce is democratizing enterprise-grade protection for managed service providers by operationalizing Microsoft’s security — Microsoft Sentinel, Defender XDR, Entra ID and Azure OpenAI in Foundry models — into a turnkey, AI-driven platform. Built for environments where traditional tools were too complex and costly for most providers to operate efficiently, the solution automates more than 90% of incident response, reducing cost per incident and enabling 24/7 protection. Providers can onboard more customers, deliver higher-quality security services and scale operations without adding headcount — transforming security delivery into a growth engine. Analysts can manage significantly more volume with incidents resolving in minutes and teams freed to focus on more strategic advisory work.

As global professional services firm KPMG expanded its Digital Gateway platform to support secure, global engagement with clients and professionals, its data environment grew increasingly fragmented and complex — spanning multiple tools and systems that slowed collaboration and increased operational effort. The company established Microsoft Fabric as its strategic data platform; unifying its data engineering, storage, analytics, reporting and global security policies into a single, trusted environment and pacing adoption as it matured in enterprise governance. Client data onboarding times were 87% faster — from sixteen hours to two — and operational IT efforts were reduced by 25%. With a governed, real-time data foundation, KPMG is accelerating insights; enabling faster, more confident decisions and freeing teams to provide consistent, high-quality client value delivery across global engagements.

To democratize AI across its digital workplace, Mercedes-Benz is deploying Microsoft 365 Copilot company-wide — one of the largest, most comprehensive industry deployments in European industry. Moving beyond selective use of AI, the company is integrating it systematically into day-to-day work, supporting decision-making and core operational processes while reducing complexity. By placing secure, enterprise-grade AI in the hands of employees across functions, Mercedes-Benz is enabling faster execution, lowering operating costs and driving more consistent performance across its global business. Copilot is also helping teams strengthen decision quality at scale, enabling them to respond more precisely and compete more effectively in dynamic markets.

As one of the world’s busiest rail operators, MTR manages high‑volume, complex transit operations with strict service and reliability expectations. To simplify complex administrative workloads and accelerate decision‑making, MTR deployed Microsoft 365 Copilot alongside Power Platform, embedding role‑based copilots and low‑code workflows across drafting, summarization and analysis. The result reduced manual effort, shortened turnaround times and improved operational consistency across its network. To improve passenger services, the company also launched AI Tracy — a personalized assistant built on Microsoft Azure that provides real-time guidance on ticketing, station facilities and local amenities. With AI embedded into everyday workflows and passenger services, MTR is extending consistent, real-time service across its network and expanding what teams can achieve and execute with AI.

PepsiCo operates one of the world’s largest consumer goods enterprises, with 320,000 employees across more than 200 countries. The company faced a fragmented technology landscape, with disparate collaboration and tools that slowed coordination. This limited PepsiCo’s ability to grow its business, so the company standardized on Microsoft Teams and deployed Microsoft 365 Copilot to create a unified, secure foundation to innovate with AI. With Teams and Copilot seamlessly embedded into the tools employees use every day, PepsiCo is improving how work gets done. With 90% to 95% daily Copilot usage, AI is embedded in everyday work — creating a more connected environment that fosters collaboration and reduces friction, saving employees hours each day and freeing up time for them to focus on higher-impact work.

Global real estate developer Tata Realty manages a diverse portfolio spanning residential, commercial, mixed-use and infrastructure in Southeast Asia. As the business grew, fragmented data across finance, operations, engineering, safety and HR made it difficult to generate insights — slowing decision-making, increasing costs and introducing operational risk. By adopting Microsoft Fabric as a unified, governed data platform, the company consolidated engineering, warehousing and reporting into a single environment. This move helped reduce data processing time by 20% and lower annual analytics costs by 20% to 30%. With real-time, cross-functional insights now embedded in core workflows, teams are making faster, more informed decisions and operating with greater speed, consistency and control across the business.

Tru Cooperative Bank (formerly First West Credit Union) serves more than 250,000 members with complex financial products and high service expectations. By deploying Microsoft 365 Copilot and Copilot Studio, the organization reduced administrative effort and accelerated service delivery, reaching 93% employee adoption and 90% weekly usage. With AI embedded in daily workflows, employees can instantly access and bring together member context, policies and procedures — enabling faster decision cycles and more proactive, personalized guidance. By automating routine work, teams are creating space to focus on higher-value client conversations that deepen relationships, advance financial outcomes and drive member growth — with human ambition at the center of every interaction. At the same time, moving from reactive work to real-time, insight-driven engagement is strengthening member trust and scaling consistent, high-quality delivery across the organization.

Frontier Transformation is changing how organizations operate, compete and grow. By embedding intelligence into the flow of work and grounding it in enterprise‑grade trust, businesses are operating with greater precision and expanding what their teams can achieve at global scale. With an open, model-diverse and secure platform, Microsoft enables organizations to unlock human ambition and leverage AI for growth — transforming their unique IQ into decisions and actions that drive measurable business outcomes. We are grateful for the continued trust of our customers and partners. Together, we are shaping how every organization can lead with AI.

Judson Althoff is the chief executive officer of the commercial business at Microsoft. He is responsible for the product strategy, sales, services, support, marketing, operations and revenue growth of the company’s commercial business, which operates in more than 120 regional and national subsidiaries globally.

The post Unlocking human ambition to drive business growth with AI appeared first on The Official Microsoft Blog.

Read the whole story
alvinashcraft
14 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Continuing the story of early DOS development

1 Share

Over the last few years, we’ve been working to open some of the earliest chapters of Microsoft’s operating system history. In 2018 we (re)-open-sourced MS‑DOS 1.25 and 2.11, and more recently in 2024 we were able to make the source for MS‑DOS 4.0 available to the public as well. Today, on 86-DOS 1.00’s 45th anniversary, we’re continuing that tradition by preserving the earliest DOS source code discovered to date. These releases are about making historically important systems software available for study, preservation, and plain ol’ curiosity.

But that work doesn’t end with a GitHub repo. Software history lives in code, yes, but also in scanned listings, internal documents, assembler printouts, and the sometimes wonderfully analog artifacts of how operating systems came together in the late 1970s and early 1980s. If you read the original announcement around re‑open sourcing MS‑DOS 1.25 and 2.0 on the Windows Command Line blog you’ll know how much context matters when trying to understand where today’s platforms came from.

We’re stoked today to showcase some newly available source code materials that provide an even earlier look into the development of PC-DOS 1.00, the first release of DOS for the IBM PC. A dedicated team of historians and preservationists led by Yufeng Gao and Rich Cini has worked to locate, scan, and transcribe the stack of DOS-era source listings from Tim Paterson, the author of DOS.

The listings include sources to the 86-DOS 1.00 kernel, several development snapshots of the PC-DOS 1.00 kernel, and some well-known utilities such as CHKDSK. Not only were these assembler listings, but there were also listings of the assembler itself! This work offers rare insight into how MS-DOS/PC-DOS came to be, and how operating system development was done at the time, not as it was later reconstructed.

What DOS development on the IBM PC looked like in the early 80s. Credits: Rich Cini.

It’s also worth noting that these materials aren’t just operating system releases in the traditional sense. In several cases, the listings represent point‑in‑time working states and hand-written notes, preserved by Tim Paterson himself. Think of them as a printed commit history of a Git repository. They create a timeline of changes, showing which features were implemented when, what errors were made, and how they were fixed. Soon you’ll be able to visit these living artifacts at the Interim Computer Museum as they’ve been generously donated by Tim Paterson.

We want to thank everyone involved in curating and bringing these materials forward in a responsible and accessible way. This kind of software archaeology takes real effort across legal review, archival work, and technical validation, and it’s an important part of preserving the shared history of our industry.

If you’re interested in digging into these early listings yourself, we encourage you to check out the full posts from the team on Yufeng’s website and Rich’s website, as well as Joshua’s research on printer listing OCR. For an insiders’ look at the process, explore the scanned listings and OCR’ed code at DOS-History/Paterson-Listings, which we’ve worked with the maintainers to license under MIT via pull request for researchers, hobbyists, and enthusiasts like us!

Developer working in an office

Explore listings and code

Dig into the DOS-History/Paterson-Listings today.

The post Continuing the story of early DOS development appeared first on Microsoft Open Source Blog.

Read the whole story
alvinashcraft
24 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Visual Studio 2026 Gives IntelliSense Priority in Longstanding Copilot Completion Clash

1 Share
The April update suppresses Copilot completions while IntelliSense is active, addressing a long-running editor conflict.
Read the whole story
alvinashcraft
31 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

PowerToys 0.99 Arrives With Two New Utilities, Many Improvements

1 Share

PowerToys just got even better with two useful new utilities and as massive set of improvements across the suite.

The post PowerToys 0.99 Arrives With Two New Utilities, Many Improvements appeared first on Thurrott.com.

Read the whole story
alvinashcraft
42 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Architecting Cost-Aware LLM Workloads with Model Router in Microsoft Foundry

1 Share

The architectural problem

In any non-trivial GenAI platform, you end up managing a fleet of models. Cheap models for classification and light chat. Reasoning models for multi-step tasks. Frontier models for the hard stuff. Specialty models for code, vision, or long context.

The architectural question isn’t which model is best — it’s how do we dispatch the right model per request, at scale, with governance and observability intact?

The usual patterns each have problems:

Pattern

Trade-off

Single-model deployment

Overpays on simple prompts, underperforms on complex ones

Application-layer router (rules/classifier)

Brittle, needs constant retuning as models evolve

LLM-as-router

Adds a call hop, governance complexity, and its own failure modes

Per-use-case deployments

Explodes deployment surface; quota and cost reporting fragment

 

Model Router in Microsoft Foundry is a platform-level answer to this: a trained routing model, deployed as a single endpoint, that dispatches across up to 18 underlying LLMs per prompt.

Conceptual architecture

 

 

Design note: The routing decision is made by a trained model, not a rules engine. It analyzes the prompt itself — complexity, task type, reasoning requirements — and is updated by Microsoft as new underlying models are onboarded.

What you govern, what the platform governs

For architects, the division of responsibility is the key mental model:

Platform-owned

  • Real-time prompt analysis and routing decisions
  • Automatic failover across the subset
  • Data-zone boundary enforcement
  • Prompt-caching passthrough to supporting models
  • Underlying-model versioning (via router versioning)

You own

  • Routing mode — Balanced (default), Quality, or Cost
  • Model subset — the allow-list of underlying models
  • Deployment type — Global Standard or Data Zone Standard
  • Region — East US 2 or Sweden Central (current availability)
  • Observability hooks — logging response.model for per-request attribution

Routing modes as design levers

Mode

Quality band

When to use

Balanced (default)

~1–2% of top model

General-purpose chat and agent surfaces

Quality

Always top model

Regulated outputs, complex reasoning, RAG over critical docs

Cost

~5–6% band

High-volume classification, drafting, low-stakes chat

 

Treat the routing mode as a deployment-scoped SLO lever. Different product surfaces can point at different Model Router deployments with different modes and subsets.

The model subset: your governance surface

This is the feature most worth deliberate design thought. The subset list governs:

  1. Compliance — which vendors/regions your prompts can touch
  2. Context window — the effective context equals the smallest model in the subset; curate accordingly
  3. Cost ceiling — bound worst-case per-call cost
  4. Failover pool — keep at least two models in every subset
  5. Cache hit rate — narrower, more deterministic subsets improve the odds that consecutive overlapping prompts land on the same underlying model

 

New models introduced in future router versions are not auto-added to your subset. That’s a deliberate guardrail — additions require explicit deployment changes.

Code: deploying with a custom subset

Model Router is deployed like any Foundry model. Below is an indicative ARM/Bicep-style deployment snippet that sets Balanced mode and restricts routing to a curated subset — omit subset to accept the full default pool.

resource modelRouter 'Microsoft.CognitiveServices/accounts/deployments@2024-10-01' = {

  name: 'model-router-prod'

  parent: foundryAccount

  sku: {

    name: 'GlobalStandard'

    capacity: 250

  }

  properties: {

    model: {

      format: 'OpenAI'

      name: 'model-router'

      version: '2025-11-18'

    }

    routingConfiguration: {

      mode: 'Balanced'             // Balanced | Quality | Cost

      modelSubset: [

        'gpt-5-mini'

        'gpt-5'

        'gpt-5.2'

        'claude-sonnet-4-5'

        'claude-opus-4-6'

        'o4-mini'

      ]

    }

  }

}

 

 

Confirm the exact schema against the current Foundry deployment API — parameter names can evolve between API versions.

Deploying via the Foundry portal

If you prefer the portal over IaC, the flow is short:

  1. Sign in to Microsoft Foundry and ensure the New Foundry toggle is on.
  2. Open the model catalog, find model-router, and select it.
  3. Choose Default settings for Balanced mode across all supported models, or Custom settings to pick a routing mode and a model subset.
  4. Apply a content filter at the model router deployment — it covers all underlying models. Don’t set per-model content filters.
  5. Set the TPM rate limit at the model router level — it applies to all activity to and from the router. Don’t set rate limits per underlying model.
  6. (Claude only) Deploy Claude models separately from the catalog before adding them to your subset. Other vendors are invoked transparently.

 

Propagation note: changes to routing mode or model subset can take up to five minutes to take effect. Plan rollouts and tests accordingly.

Code: calling the endpoint (Python)

Once deployed, Model Router is a standard chat-completions endpoint. Always capture response.model — it’s your per-request attribution for cost analysis and routing validation.

from openai import AzureOpenAI

 

client = AzureOpenAI(

    azure_endpoint="https://<your-resource>.openai.azure.com/",

    api_key="<your-key>",

    api_version="2025-11-18",

)

 

response = client.chat.completions.create(

    model="model-router-prod",

    messages=[

        {"role": "system", "content": "You are a helpful assistant."},

        {"role": "user", "content": "Summarize the trade-offs of event sourcing at scale."},

    ],

)

 

print(response.choices[0].message.content)

print("Served by:", response.model)   # e.g. "gpt-5-mini-2025-08-07"

 

Code: streaming responses

Streaming works exactly as it does for any Azure OpenAI chat deployment. The routing decision happens before the first token; once chosen, the underlying model streams directly.

stream = client.chat.completions.create(

    model="model-router-prod",

    messages=[

        {"role": "user", "content": "Walk me through CAP theorem with a concrete example."},

    ],

    stream=True,

)

 

for chunk in stream:

    if chunk.choices and chunk.choices[0].delta.content:

        print(chunk.choices[0].delta.content, end="", flush=True)

 

Code: tool use (agentic scenarios)

The 2025-11-18 release adds tool-use support, enabling Model Router inside the Foundry Agent Service. The router picks the right model per turn — cheap for trivial turns, reasoning-grade for multi-step ones.

tools = [{

    "type": "function",

    "function": {

        "name": "get_order_status",

        "description": "Retrieve the current status of a customer order.",

        "parameters": {

            "type": "object",

            "properties": {

                "order_id": {"type": "string", "description": "The order ID."},

            },

            "required": ["order_id"],

        },

    },

}]

 

response = client.chat.completions.create(

    model="model-router-prod",

    messages=[

        {"role": "system", "content": "You help customers track orders."},

        {"role": "user", "content": "Where is order A-4571?"},

    ],

    tools=tools,

    tool_choice="auto",

)

 

choice = response.choices[0]

if choice.message.tool_calls:

    call = choice.message.tool_calls[0]

    print("Tool requested:", call.function.name, call.function.arguments)

print("Served by:", response.model)

 

 

Agent Service caveat: if your agent flow uses Foundry Agent Service tools, routing is restricted to OpenAI models only. Plan your subset accordingly when the router sits behind agent flows that depend on those tools.

Code: alternative — Foundry Responses SDK

If you’re standardizing on the Microsoft Foundry SDK rather than the OpenAI Python SDK, the Responses API offers an equivalent path. Install: pip install azure-ai-projects>=2.0.0 azure-identity.

from azure.identity import DefaultAzureCredential

from azure.ai.projects import AIProjectClient

 

with (

    DefaultAzureCredential() as credential,

    AIProjectClient(endpoint=project_endpoint, credential=credential) as project_client,

    project_client.get_openai_client() as openai_client,

):

    response = openai_client.responses.create(

        model="model-router-prod",

        input="In one sentence, name the most popular tourist destination in Seattle.",

    )

    print(response.output_text)

 

Parameter handling when a reasoning model is selected

Because Model Router can dispatch to either chat or reasoning (o-series) models, parameter behavior shifts based on the actual model picked. Build your application around the union of both behaviors.

  • Temperature, Top_P — ignored when an o-series reasoning model is selected; honored otherwise.
  • stop, presence_penalty, frequency_penalty, logit_bias, logprobs — dropped for o-series; honored otherwise.
  • reasoning_effort — supported starting in the 2025-11-18 router release. When a reasoning model is selected, the router passes your value through to the underlying model.

 

Practical rule: don’t rely on temperature/top-p for determinism in a router-fronted deployment, and treat reasoning_effort as the only knob with consistent meaning across reasoning vs. non-reasoning paths.

Anatomy of the response

The JSON shape is identical to a standard chat completion. The model field is the key signal — it tells you which underlying model actually served the request. The usage block also reveals cached_tokens (prompt-cache hits) and reasoning_tokens (when an o-series model handled the prompt).

{

  "id": "xxxx-yyyy-zzzz",

  "object": "chat.completion",

  "model": "gpt-5-mini-2025-08-07",

  "choices": [

    {

      "index": 0,

      "finish_reason": "stop",

      "message": {

        "role": "assistant",

        "content": "Charismatic and bold—combining brash showmanship..."

      },

      "content_filter_results": { "hate": { "filtered": false, "severity": "safe" }, ... }

    }

  ],

  "usage": {

    "prompt_tokens": 3254,

    "completion_tokens": 163,

    "total_tokens": 3417,

    "prompt_tokens_details": { "cached_tokens": 3200, "audio_tokens": 0 },

    "completion_tokens_details": { "reasoning_tokens": 128, "audio_tokens": 0 }

  }

}

 

Monitoring in the Azure portal

Performance metrics

  • Open the Azure portal and navigate to Monitoring → Metrics for your Azure OpenAI / Foundry resource.
  • Filter by your model router deployment name.
  • Split the metrics by underlying model to see how traffic is actually being distributed across the routed models.

 

Cost attribution

  • Open Resource Management → Cost analysis in the Azure portal.
  • Filter by Tag, set the tag type to Deployment, and select your model router deployment name.
  • Total cost = sum of the underlying-model charges for requests that hit this deployment.

 

Three practical recommendations

  1. Log response.model on every call. This is your primary application-side signal for routing distribution and per-request attribution.
  2. Expect mixed-model billing. Model Router charges at the rate of the underlying model that served each request. Cross-check Azure Cost analysis against your application logs.
  3. Watch cache hit rates per underlying model. Caching benefits apply only when consecutive overlapping prompts land on the same model. A too-permissive subset can silently degrade cache efficiency.

Failure modes to design around

  • Context-window overrun. The effective context is the smallest model in the subset. If a large prompt arrives, it fails unless routed to a larger-context model. Defend against this by curating the subset or by summarizing/truncating upstream.
  • Claude model not routing. Claude requires a separate catalog deployment first. Surface a deployment health check.
  • Region/deployment-type mismatch. Currently East US 2 and Sweden Central; Global Standard and Data Zone Standard only. Plan DR accordingly.
  • Rate limits. 250 RPM / 250K TPM on Global Standard by default; higher on Enterprise/MCA-E. Build backpressure early.
  • Audio unsupported. Images are accepted but routing decisions are text-only.

Common issues — quick reference

Issue

Likely cause

Resolution

Rate limit exceeded

Too many requests to the router deployment

Increase TPM quota or implement retry with exponential backoff

Unexpected model selection

Routing logic picked a different model than expected

Review routing mode; constrain via model subset

High latency

Router overhead plus underlying-model processing

Use Cost mode for latency-sensitive workloads; smaller models respond faster

Claude model not routing

Claude requires a separate catalog deployment

Deploy Claude models from the catalog before adding to subset

Context exceeded

Effective context = smallest model in subset

Curate subset to larger-context models, or summarize/truncate upstream

 

When Model Router is the right architectural choice

Strong fit:

  • Heterogeneous traffic — wide variance in prompt complexity
  • Multi-vendor LLM strategy (OpenAI + Anthropic + open models) that you want to consolidate behind a single governed endpoint
  • Agent platforms where tasks span trivial to complex reasoning

Weaker fit:

  • Uniform workloads where a single well-chosen model is simpler
  • Workloads dominated by large-context prompts (unless the subset is curated for it)
  • Scenarios requiring deterministic, reproducible model selection per request — the router is intentionally adaptive

Recommended rollout path

  1. Phase 1 — Baseline. Deploy Model Router with Balanced mode and the full pool. Log response.model across representative traffic.
  2. Phase 2 — Govern. Introduce a model subset based on your compliance, context, and cost requirements. Ensure at least two models for failover.
  3. Phase 3 — Tune. Only after the baseline distribution tells you which way to lean, switch to Cost or Quality mode, or split product surfaces across two Model Router deployments with different profiles.
  4. Phase 4 — Integrate. Wire the router into Foundry Agent Service for agentic surfaces.

Closing thought

Model Router turns multi-model dispatch from an application concern into a platform concern, with governance levers (mode, subset, region) that map cleanly to the trade-offs architects actually negotiate: cost, quality, compliance, and resilience. That’s a meaningful simplification of an otherwise accidentally-complex part of production GenAI architecture.

Sample repositories

Microsoft publishes several open-source samples in the foundry-samples GitHub organization that are useful for hands-on evaluation:

  • Model Router Capabilities Interactive Demo (Python). Compare Balanced, Cost, and Quality routing modes against your own prompt sets; see live benchmark data for cost savings, latency, and routing distribution.
  • Routed Models Distribution Analysis (Python). Run prompt batches across routing profiles and model subsets to inspect which models the router selects and in what proportions — useful before committing to a routing policy.
  • Multi-team Quality & Cost Benchmarking (Python workshop). Deploy Model Router, benchmark against fixed-model deployments, and analyze cost/latency trade-offs in a multi-team enterprise scenario.
  • On-Call Copilot Multi-Agent Demo (Python). See per-step model selection inside an agent flow — fast/cheap models for classification, reasoning models for root-cause analysis.

 

These samples are for learning and experimentation. Review against your organization’s security, compliance, and Responsible AI policies before adapting any of it for production.

Learn more

If you’re piloting Model Router, what subset and mode did you land on — and what surprised you in the routing distribution? Share in the comments.

Read the whole story
alvinashcraft
54 seconds ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories