Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
153932 stories
·
33 followers

1005: Programatic and Skill based Video Creation with Remotion

1 Share

Scott and Wes are joined by Jonny Burger, creator of Remotion, to talk about the explosion of programmatic video, going from 125k to 800k installs per day, and how AI and a new HTML-in-Canvas Chrome spec are changing the game. They dig into monetization, the wild world of video slop, motion graphics workflows, and the new Media Bunny tool.

Show Notes

Sick Picks

Hit us up on Socials!

Syntax: X Instagram Tiktok LinkedIn Threads

Wes: X Instagram Tiktok LinkedIn Threads

Scott: X Instagram Tiktok LinkedIn Threads

Randy: X Instagram YouTube Threads





Download audio: https://traffic.megaphone.fm/FSI1805655428.mp3
Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

The Courage Gap in Tech Leadership

1 Share

Josh and Bob dig into the courage problem that runs through every layer of tech leadership right now. The courage to push back on your team. The courage to be honest with peers. And the hardest one, the courage to tell the people above you that the thing they want is the wrong thing.

They get into why most leaders are courageous downward but cave upward. Why fear of losing a job often costs you the job anyway. The "blinds" metaphor for how much truth you actually let out in the room. And the moment Josh told his boss "no" for the first time and watched the conversation turn into something better than he expected.

Then they bring it back to the current reality. Boards with youthful enthusiasm about cutting 50% of the workforce with AI. The agile-is-dead chorus. Senior leaders want fast answers to questions that deserve careful ones. And the leaders in the middle who quietly comply instead of saying what they actually think.

This is a challenge episode. If you've been swallowing what you really believe in leadership conversations, this one is going to sit with you.

Stay Connected and Informed with Our Newsletters

Josh Anderson's "Leadership Lighthouse"

Dive deeper into the world of Agile leadership and management with Josh Anderson's "Leadership Lighthouse." This bi-weekly newsletter offers insights, tips, and personal stories to help you navigate the complexities of leadership in today's fast-paced tech environment. Whether you're a new manager or a seasoned leader, you'll find valuable guidance and practical advice to enhance your leadership skills. Subscribe to "Leadership Lighthouse" for the latest articles and exclusive content right to your inbox.

Subscribe here

Bob Galen's "Agile Moose"

Bob Galen's "Agile Moose" is a must-read for anyone interested in Agile practices, team dynamics, and personal growth within the tech industry. The newsletter features in-depth analysis, case studies, and actionable tips to help you excel in your Agile journey. Bob brings his extensive experience and thoughtful perspectives directly to you, covering everything from foundational Agile concepts to advanced techniques. Join a community of Agile enthusiasts and practitioners by subscribing to "Agile Moose."

Subscribe here

Do More Than Listen:

We publish video versions of every episode and post them on our YouTube page.

Help Us Spread The Word: 

Love our content? Help us out by sharing on social media, rating our podcast/episodes on iTunes, or by giving to our Patreon campaign. Every time you give, in any way, you empower our mission of helping as many agilists as possible. Thanks for sharing!





Download audio: https://episodes.captivate.fm/episode/59825da6-0d2c-4496-b8e8-8aba14755e9d.mp3
Read the whole story
alvinashcraft
12 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

130. AI, Cybersecurity & Creativity in Modern Cloud Engineering - with Hannah King

1 Share

In this episode, Rick & Oscar talk with Hannah King, a Cloud Solution Architect and cybersecurity expert at Microsoft, about cloud architecture, security and the rise of AI. Hannah shares how organizations are balancing innovation with security while still struggling with cloud fundamentals. They also explore how AI is changing security operations and why creativity plays a big role in engineering. Alongside the technical topics, Hannah reflects on her unconventional journey from teaching and fine art into tech.

About this episode, and Hannah King in particular: you can find Hannah on LinkedIn.

About Betatalks: watch our videos and follow us on Instagram, LinkedIn, and B 





Download audio: https://www.buzzsprout.com/1622272/episodes/19181483-130-ai-cybersecurity-creativity-in-modern-cloud-engineering-with-hannah-king.mp3
Read the whole story
alvinashcraft
17 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Launched: Microsoft 365 Copilot Adoption Hub Redesign

1 Share

One of the biggest barriers to Copilot adoption is people don’t always know where to start.

Today, we released the first version of our redesigned Microsoft 365 Copilot business user hub. Take a look.

We set out to simplify adoption by making the experience more practical and focused on key roles: AI Business User, AI Champion and AI Leader.

Microsoft 365 Copilot adoption hub

What you’ll find:

  • Prompts you can apply immediately in your work.
    • Real examples of how Copilot helps across tasks.
    • Clear guidance based on your role.

What's the same: 

  • Connection to what’s new, communities and event information.
  • Advanced guidance for User Enablement and IT Professionals.
  • Content from Microsoft Learn to advance your skilling through the AI Skills Navigator and learning paths for certification.

Your feedback was essential in crafting this evolution of how to learn and use AI experiences from Microsoft.  Keep sharing your insights via our feedback form at aka.ms/amc/feedback.  The entire team reads what you submit and innovates to support your needs. 

That’s really the focus here; helping people get started and keep going to get work done.

We’d love your input! What do you think, and what content would you like to see us build next?

#Copilot #AIAdoption

Read the whole story
alvinashcraft
27 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Now in Foundry: Tongyi-MAI Z-Image-Turbo, with FLUX.1-schnell and SDXL base 1.0

1 Share

This week's Model Mondays edition pairs three models available through the Hugging Face collection in Microsoft FoundryTongyi-MAI's Z-Image-Turbo, a new designed for lower latency on a single GPU and native bilingual text rendering; Black Forest Labs' FLUX.1-schnell, a 12B rectified flow transformer distilled to 1–4 step inference and one of the most adopted open-weight image models since its 2024 release; and Stability AI's stable-diffusion-xl-base-1.0 (SDXL), a latent diffusion research model that can be used to generate and modify images based on text prompts.

Models of the week

Tongyi-MAI: Z-Image-Turbo

Model Specs

  • Parameters / size: 6B (BF16)
  • Resolution: Up to 1024×1024 native
  • Primary task: Text-to-image generation (English and Chinese)

Why it's interesting (Spotlight)

  • Scalable Single-Stream Diffusion Transformer (S3-DiT) architecture: Z-Image concatenates text tokens, visual semantic tokens, and image VAE tokens into a single unified input stream rather than running text and image through separate branches. This single-stream design can improve parameter efficiency relative to dual-stream DiT architectures at the same capacity. See the Z-Image technical report for details.
  • 8-step inference at sub-second latency, fits in 16GB VRAM: Z-Image-Turbo is distilled with Decoupled Distribution Matching Distillation (Decoupled-DMD) and further refined with DMDR, a method that fuses DMD with reinforcement learning during post-training. The result is a model that runs 8 Number-of-Function-Evaluations (NFE) per image with no Classifier-Free Guidance (CFG)—which roughly halves the per-step compute compared to CFG-based inference. See the Decoupled-DMD and DMDR papers.
  • Native bilingual text rendering and strong instruction adherence: Unlike most open-weight image models, which struggle with legible in-image text, Z-Image-Turbo renders complex English and Chinese text accurately which is useful for posters, signage, packaging mockups, and marketing creative.

Try it

Figure 1. Cherry cake generated by Z-Image-TurboFigure 2. Using the original image to create a poster for marketing material

Imagine you're a community programs coordinator at your city's parks department, planning a new summer event series — a "Cake Picnic in the Park" — designed to bring neighbors together over food in shared green space. The event is a few weeks out. You haven't booked bakery partners yet, so no actual cake exists, and you need marketing assets this week to start driving sign-ups: a hero image for the registration page, a flyer for community centers and libraries, social tiles for the city's channels. Use the prompt below and a photorealistic image, that can now be scaled to become additional assets like printed flyers or social images in minutes using image editing tools (or another model). 

Prompt: A round layered cake displayed on a white ceramic cake stand, topped with glossy fresh red cherries and smooth pastel pink buttercream frosting piped in delicate rosettes around the edge. One generous slice has been cleanly cut and removed from the front, revealing a perfect cross-section: four distinct horizontal layers alternating between soft pink sponge cake and fluffy white vanilla cream frosting. Professional bakery photography, soft natural window light from the left, shallow depth of field, marble countertop, warm and inviting atmosphere, photorealistic detail on the cake texture, cherry highlights, and frosting swirls.

Black Forest Labs: FLUX.1-schnell

Model Specs

  • Parameters / size: 12B (rectified flow transformer)
  • Resolution: Flexible up to 2 megapixels
  • Primary task: Text-to-image generation

Why it's interesting (Spotlight)

  • Rectified flow transformer with adversarial distillation for 1–4 step inference: FLUX.1-schnell is the distilled, Apache 2.0 sibling of the FLUX.1 family. It uses a rectified flow formulation (a diffusion variant that learns straight-line probability paths between noise and data, reducing the number of solver steps needed) and is further compressed with latent adversarial diffusion distillation. The model generates high quality images in for latency-sensitive workloads.
  • Permissive licensing for commercial use: Released under Apache 2.0, FLUX.1-schnell can be used for personal, scientific, and commercial purposes. This has driven broad adoption across product features that need an open, redistributable image backbone.
  • Strong prompt adherence at its parameter range: At 12B parameters, FLUX.1-schnell sits between the SDXL family and frontier proprietary image models, and it remains a common reference point for evaluating open image generation prompt following—particularly for complex compositional prompts and longer captions—roughly two years after its initial release.

Try it

Hugging Face Spaces give developers the ability to experiment and try new models before deploying them. Test out a few prompts here: 

https://black-forest-labs-flux-1-schnell.hf.space then when you are ready, deploy the model in Microsoft Foundry.

Stability AI: stable-diffusion-xl-base-1.0

Figure 2. Architectural diagram available here: stabilityai/stable-diffusion-xl-base-1.0 · Hugging Face

Model Specs

  • Parameters / size: 2.6B UNet (≈3.5B total with text encoders)
  • Resolution: 1024×1024 native
  • Primary task: Text-to-image generation

Why it's interesting (Spotlight)

  • Dual text encoder design and an ensemble-of-experts pipeline: SDXL uses two pretrained text encoders—OpenCLIP-ViT/G and CLIP-ViT/L—concatenated to capture both broad semantic alignment and finer-grained token-level cues. It can be run standalone or paired with the SDXL refiner in an ensemble-of-experts pipeline where the base model handles early denoising and the refiner specializes in the final steps. See the SDXL report for the original training and architecture details.
  • CreativeML Open RAIL++-M licensing for managed deployments: SDXL is distributed under the CreativeML Open RAIL++-M license, which permits commercial use and downstream fine-tuning with documented use restrictions. 

Try it

To go deeper on SDXL, take a look at Stability AI's generative-models GitHub repository, which implements the most popular diffusion frameworks for both training and inference and continues to expand with new capabilities like distillation. 

Getting started

You can deploy open-source Hugging Face models directly in Microsoft Foundry in two ways. The first by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. The second way is direct through the Hugging Face Hub, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure. Learn how to discover models and deploy them using Microsoft Foundry documentation:

Read the whole story
alvinashcraft
39 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

You Can Scale MCP Servers Behind a Load Balancer on App Service — Here's How

1 Share

Most MCP servers in the wild are single-instance processes. That's fine when they're driving a local Claude or VS Code session — but it's the wrong shape for a production agent fleet that has to absorb traffic spikes, ride through deploys, and survive instance failures.

The good news: the MCP spec already grew up. The 2025-06-18 revision formalizes stateless HTTP transport (and the current 2025-11-25 revision keeps it), which means a single request carries everything the server needs to answer. No long-lived connection, no in-process session table, no sticky-session hacks to keep a client glued to one box.

That tiny protocol change unlocks something big: you can stick an MCP server behind App Service's built-in load balancer and scale it like any other web API. This post walks through how, with a runnable sample.

Sample: seligj95/app-service-mcp-stateless-scale-python. One azd up and you have a stateless FastAPI MCP server running on three App Service instances behind the platform load balancer, with a staging slot, Application Insights, and a k6 script that visualizes load distribution from the client side.

Why "stateless" is the whole story

Earlier MCP transports leaned on persistent connections — SSE channels and WebSocket-style sessions where the server held per-client state in memory (open tools, subscriptions, partial streams). That model is great for a local IDE talking to a local process. It's hostile to load balancing, because routing a follow-up request to a different instance breaks the session.

The stateless HTTP transport flips that. Each request is a complete JSON-RPC envelope (initialize, tools/list, tools/call), every response is self-contained, and the server is allowed to forget the client between requests. Any instance can serve any call. That is the property a load balancer needs.

In the sample, every tool is a pure function of its arguments — whoami reports the serving instance, lookup_fact reads a static dictionary, compute_primes runs a sieve. None of them touches per-client memory. That's not a constraint of the protocol; it's a discipline you adopt to keep statelessness intact.

Why App Service, and not Functions or AKS

A few defaults made App Service the right home for a scaled MCP server:

  • Always On. Reasoning tools call into LLMs and external APIs; latencies routinely sit in the multi-second range. Functions caps a single execution at ten minutes by default (and aggressively scales workers to zero between bursts, which kills warm caches). App Service keeps the process resident.
  • Horizontal scale is one parameter. Pick a Premium SKU, set the plan's capacity to N, and you have N instances behind a managed load balancer. No VMSS to declare, no ingress controller to wire up, no Service to reconcile.
  • Deployment slots. Swap a warmed-up staging slot into production for zero-downtime deploys. Critical when your "API" is an LLM tool surface that an agent is actively driving.
  • Easy Auth. OAuth 2.1 in front of the MCP endpoint without writing the flow yourself — turn on the App Service authentication blade and point it at Entra ID. The sample leaves this off so the deploy is one command, but the wiring is a checkbox away.

The TL;DR: it's PaaS that already knows how to run a stateful long-lived process at horizontal scale, which is exactly the shape of a scaled MCP server.

The FastAPI MCP server, end-to-end stateless

The whole transport is one POST handler. The full source is in main.py, but here are the load-bearing pieces:

@app.post("/mcp")
async def mcp_endpoint(request: Request):
    body = await request.json()
    method = body.get("method", "")
    msg_id = body.get("id")

    if method == "initialize":
        return {"jsonrpc": "2.0", "id": msg_id, "result": _server_info()}

    if method == "tools/list":
        return {"jsonrpc": "2.0", "id": msg_id, "result": {"tools": [...]}}

    if method == "tools/call":
        params = body.get("params", {})
        result = await MCP_TOOLS[params["name"]]["function"](**params.get("arguments", {}))
        return {
            "jsonrpc": "2.0",
            "id": msg_id,
            "result": {"content": [{"type": "text", "text": json.dumps(result)}]},
        }

There is no session table. There is no client_id cookie. There is no AsyncIterator held open between requests. initialize, tools/list, and tools/call all return in a single round trip, which is the shape App Service's load balancer expects.

The most useful debugging tool in the sample is whoami:

async def tool_whoami() -> Dict[str, Any]:
    return {
        "instance_id": os.environ.get("WEBSITE_INSTANCE_ID", "local"),
        "hostname": socket.gethostname(),
        ...
    }

WEBSITE_INSTANCE_ID is unique per App Service worker. Call whoami a few times from your MCP client and the value rotates — that's the load balancer working. If it doesn't rotate, something is pinning your traffic (almost always the ARR Affinity cookie; we'll get there).

The Bicep that actually makes it scale

The infra is a P0v3 plan with capacity: 3, a web app with affinity disabled, and a staging slot on the same plan:

resource appServicePlan 'Microsoft.Web/serverfarms@2024-04-01' = {
  name: name
  sku: {
    name: 'P0v3'
    capacity: instanceCount   // 3 by default
  }
  properties: { reserved: true }
}

resource web 'Microsoft.Web/sites@2024-04-01' = {
  name: name
  properties: {
    serverFarmId: appServicePlanId
    httpsOnly: true
    clientAffinityEnabled: false   // ← the one line that matters
    siteConfig: {
      linuxFxVersion: 'PYTHON|3.11'
      alwaysOn: true
      healthCheckPath: '/health'
      appCommandLine: 'python -m uvicorn main:app --host 0.0.0.0 --port 8000'
    }
  }
}

resource staging 'Microsoft.Web/sites/slots@2024-04-01' = {
  parent: web
  name: 'staging'
  properties: { /* same shape — separate hostname, same plan */ }
}

The single most important line in that template is clientAffinityEnabled: false. App Service defaults to on, which sets the ARRAffinity cookie and pins every subsequent request from a given client to the instance that handled the first one. That default exists because legacy ASP.NET apps used in-process session state. Stateless MCP does not. Leaving affinity on silently undoes everything we just built.

Premium v3 (P0v3) is the floor for two reasons: it gives Always On and unlocks deployment slots. Below that tier you don't get either.

Application Insights without writing telemetry code

The sample drops one line of bootstrap into main.py:

from azure.monitor.opentelemetry import configure_azure_monitor

if os.environ.get("APPLICATIONINSIGHTS_CONNECTION_STRING"):
    configure_azure_monitor(logger_name="mcp")

The Azure Monitor OpenTelemetry distro auto-instruments FastAPI and outbound HTTP. Every request span App Service emits is tagged with cloud_RoleInstance, which Application Insights populates from WEBSITE_INSTANCE_ID. That makes the question "is traffic actually spreading across my instances?" a one-liner in Logs:

requests
| where timestamp > ago(15m)
| where name contains "/mcp"
| summarize count() by cloud_RoleInstance
| order by count_ desc

If you see three roughly-equal rows, you're done. If you see one row, your client is sending ARRAffinity cookies — turn affinity off and redeploy.

Deploy

azd auth login
azd up

That provisions the resource group, plan, web app, staging slot, Log Analytics workspace, and Application Insights resource, then deploys the Python app via Oryx. The output prints both WEB_URI and WEB_STAGING_URI. Open the production URI — the home page renders the instance ID that served it. Refresh. The ID changes.

To swap the staging slot into production with no downtime:

az webapp deployment slot swap \
  --resource-group <rg> --name <app> \
  --slot staging --target-slot production

App Service warms the staging instances, redirects traffic, and the old production becomes the new staging — the classic blue-green pattern, but free.

Prove it scales

The sample ships a k6 script that hammers /mcp with tools/call requests and tags every response with the instance_id the server returned:

BASE_URL=https://<your-app>.azurewebsites.net \
  k6 run --summary-export=summary.json loadtest/k6-mcp.js
jq '.metrics.mcp_instance_hits.values' summary.json

The output groups hits per instance tag. On a three-instance plan with a 60-second steady load you should see something close to:

{
  "count": 1842,
  "instance0d3e2f...": 614,
  "instance7a91bc...": 612,
  "instance19f0c4...": 616
}

Roughly 33% on each box — the App Service load balancer round-robining new connections, with no help from the application.

What I'd do next

The sample is intentionally a starting point. Two extensions are the obvious next moves:

  1. Add Easy Auth. Turn on App Service authentication, pick Entra ID, require auth on /mcp. The token surfaces as headers; your tool handlers can use it to identify the calling agent without you owning any of the OAuth machinery.
  2. Autoscale on CPU. instanceCount: 3 is a starting point. Wire up Microsoft.Insights/autoscalesettings against the plan and let it scale 3 → 10 on the prime-counting tool. The architecture already supports it — that's the whole point of stateless.

Try it

If you ship something with it, I'd love to hear how it held up.

Read the whole story
alvinashcraft
49 seconds ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories