Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
155091 stories
·
33 followers

Research: "What's the Default Language of an LLM?"

1 Share

Chad Fowler did an interesting study and posted about it to LinkedIN, in which he asked the question, "if I ask Claude / GPT / Gemini for "a script that..." or "a small web app for...", what am I going to get back?" I thought, "What about local LLMs? Does that change the conversation at all?"

First off, his original LinkedIN post is here, just to give credit where credit is due. Fortunately, he also put together a nice little test harness up on GitHub, which I was able to fork. I encourage readers to go look at either repository to understand the project code and methodology before continuing.

Local changes

The code required a few changes to run locally:

  • Modify the models.yaml file (which contained the list of models to run the prompt against). The original had a list of cloud models and providers, so it wasn't too hard to add a list of local-hosted models and URLs. There's one small mismatch, in that the code expects there to be an environment variable (OPENAI_API_KEY) that's used as part of the API calls, so in order to run locally I had to have some kind of value there (a la export OPENAI_API_KEY=foobar in the shell before running). Longer-term fix would be to probably check if it is provided, and if not, simply don't go looking for it and see if the call fails.

  • The original was using a second call to a cloud model to "judge" the returned LLM result, in order to determine what language the LLM had used to generate the code. Since I was running everything locally, I needed to modify the code to use a local LLM. Rather than switch models to match what was being used (or deliberately a different model than what was being used), I just chose a model and hard-coded it.

  • I also added an extract.py script that takes the JSONL file and turns each row into a standalone file in a peer extractions directory. This turned out to be necessary because I was getting some very weird results from the glm-4.7-flash model--more on this later. The extract script works a lot like the report script: it takes the JSONL and extracts the data into standalone files, one for each row.

Results

In my initial run, I use qwen-3.6, qwen3-coder, gpt-oss, gemma4, and glm-4.7-flash, and while most of the time the results aligned pretty closely with Chad's original results, the glm-4.7-flash model really choked hard.

Like, 48 none results, hard.

The rest of the models behaved somewhat similarly to what Chad found in his work: Lots of preference for Python when the context of the problem didn't strongly suggest (if not outright enforce) something else.

But the glm-4.7-flash failures were curious, as most of the time, it was exceptionally verbose and its output actually spilled out into a second response, which was actually the call to the classifier-judge request. For example, with the cli-dir-size task, which gemma4 completed in about 70 lines of response, the glm-4.7-flash model used over 6k lines no less than four times, and in some cases it got to a workable solution then talked itself right out of it. I have zero idea why that would be the case, but it was a common problem. We can see this when running the python3 -m whichlang.extract script, which breaks the JSONL out into separate files for easier comparison.

Now, I can't say for certain that the problem was with the model, since it could very well have been something I did wrong in the Ollama setup/configuration, but I couldn't say exactly what that would be. Asking Ollama for its model configuration, we got:

tedneward@Teds-MBP-16 Research-whichlang % ollama show glm-4.7-flash
  Model
    architecture        glm4moelite    
    parameters          29.9B          
    context length      202752         
    embedding length    2048           
    quantization        Q4_K_M         
    requires            0.15.0         

  Capabilities
    completion    
    tools         
    thinking      

  Parameters
    temperature    1    

  License
    MIT License                        
    Copyright (c) [year] [fullname]    
    ...                                

... which seems fine, but...? Certainly its context length and embedding length seemed fine, and I did nothing to change any of the configuration after the ollama pull, but glm-4.7-flash consistently failed like this over several runs.

Conclusions

In of itself, my modifications to Chad's experiment were pretty minor and incremental, at best--the only real "value-add" was the added data in the runs.jsonl results. For the most part, what I think of as the "standard" local coding models, gemma4, gpt-oss and the various qwen3 models, all did pretty well, well enough that I consider them to be on par with what the cloud models would create for a bunch of these sorts of tasks. The glm-4.7-flash model I think is stronger than this experiment suggests it to be, but it may need some kind of tuning or better harnessing to avoid what appeared to be getting caught in a "dead-end" loop.

If anything, my personal "big win" is the tasks.yaml file, which I plan to use as a harness for some of my other experiments, most notably the one I was working on before Chad distracted me, around the various permutations of "skills" files that we see across the industry. They seem like a nice collection of tasks to feed to OpenCode and capture the results.

One last thing: When Chad and I were DM'ing about this experiment, one thing that became very apparent is how much he is hoping this experiment can serve as an ongoing, "live" experiment to which others can contribute and improve. I heartily second that emotion--like Chad, I'm putting all this out into the public space so that people can take it and run with it, maybe adding new models (cloud or local) and/or new tasks, or even just run the experiment with different parameters (temperature, context lengths, whatever). The more we can get data that shows different behavior of the models, the more we collectively as an industry can get a handle on exactly what and how these models can help us.

And in the end, isn't that what these things are supposed to be doing? Helping us, I mean?

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

How to Fine-Tune Nemotron 3.5 ASR for Your Language, Domain, or Accent

1 Share
Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

Accelerate Edge AI Development with Foundry Local

1 Share

Why edge AI development is still hard 

AI is no longer confined to cloud experiments. Developers are increasingly expected to deliver AI inside apps, devices, and edge systems where responsiveness, privacy, resilience, and local control are essential. But building those experiences for production is still difficult. 

Teams often have to solve model packaging, runtime fragmentation, hardware differences, and deployment complexity before they can ship a single reliable feature. That slows iteration and makes it harder to move from prototype to product. 

At Microsoft Build 2026, we’re announcing updates across Foundry Local and Foundry Local on Azure Local that help developers build once and run AI closer to where data is created and decisions are made. These updates expand platform support, improve control over inference and acceleration, add new on-device APIs, and simplify deployment across disconnected, regulated, and sovereign environments.

 

What’s new in Foundry Local 

The latest Foundry Local updates focus on the areas developers care about most: broader platform reach, familiar APIs, better runtime control, and simpler access to hardware acceleration. Together, these improvements help teams move faster from experimentation to production on AI PCs, edge devices, and enterprise infrastructure. 

 

Foundry Local 

Last month we announced the 1.1.0 release of Foundry Local (Foundry Local 1.1: Live Transcription, Embeddings, and Responses API | Microsoft Foundry Blog) — Microsoft’s cross-platform local AI solution that let developers bring AI directly into their applications with no cloud dependency, no network latency, and no per-token costs. 

The 1.1.0 release added: 

  • Live audio transcription for real-time speech-to-text scenarios like captioning, voice UIs, and meeting transcription. 
  • Text embeddings for semantic search, RAG, clustering, and similarity matching use cases. 
  • Responses API support for structured agentic interactions, including tool calling and multimodal vision-language input. 
  • WebGPU execution provider plugin delivered separately to reduce the default package size for applications that don’t need it. 
  • Reduced JavaScript package size by replacing the koffi FFI layer with a custom Node-API C addon. 
  • Broader .NET compatibility by targeting lower framework versions in the C# SDK. 

Today we are announcing the 1.2.0 release of Foundry Local, which expands language support in the Live Transcription API, offers a wide range of device support for Linux, improves cancellation and execution provider workflows, adds new on-device API options, and strengthens the Windows acceleration story with Windows ML (WinML) 2.0. 

What’s new in 1.2.0 

  • Multilingual ASR: Last month we included support for real-time speech-to-text streaming directly from a microphone. We identified NVIDIA’s Nemotron Speech Streaming as the strongest candidate for real-time English streaming on resource-constrained hardware (for further details, read: https://arxiv.org/pdf/2604.14493). Today we are happy to announce that Foundry Local 1.2.0 goes multilingual with support for 40+ languages via the latest Nemotron 3.5 ASR Streaming Multilingual model. Try out: https://github.com/microsoft/Foundry-Local/tree/main/samples/python/live-audio-transcription 

 

from foundry_local_sdk import Configuration, FoundryLocalManager 

config = Configuration(app_name="my_app") 

FoundryLocalManager.initialize(config) 
manager = FoundryLocalManager.instance 

model = manager.catalog.get_model( 

    "nvidia-nemotron-3.5-asr-streaming-multilingual-0.6b" 

) 

model.download() 
model.load()  

session = model.get_audio_client().create_live_transcription_session() 
session.settings.sample_rate = 16000 
session.settings.channels = 1 
session.settings.language = "auto"   # or "de", "zh-CN", "en", ...    

session.start() 
session.append(pcm_bytes)            # push audio chunks from a mic/file 
for result in session.get_stream(): 
    print(result.content[0].text)    # clean text, inline language tags stripped 
session.stop() 

 

 

  • Faster model downloads via cross-region catalog: Foundry Local now fronts the model catalog with Azure Traffic Manager, routing each user to the best-performing region, so end users see noticeably faster first-run model downloads. No code changes required — developers just need to bump to the v1.2.0 SDK. 
  • Download and EP cancellation across all 5 SDKs: Cancel model and execution-provider downloads from C#, Python, JavaScript, Rust, and C++ using each language’s native cancellation pattern. Try out: https://github.com/microsoft/Foundry-Local/blob/main/README.md 
  • Inference cancellation: Cancel in-flight chat completions and transcription sessions cleanly when users move on, without wasted compute or orphaned streams. Try out: https://github.com/microsoft/Foundry-Local/blob/main/README.md 
  • Per-EP download progress in Python: Surface per-provider download progress in Python instead of a generic spinner. Try out: https://github.com/microsoft/Foundry-Local/tree/main/sdk/python 
  • Upgraded to Windows ML (WinML) 2.0: The Foundry Local WinML packages now ship with the latest WinML 2.0, removing the previous Windows App SDK runtime dependency and bootstrap step so Python, JavaScript, Rust, and C++ apps get NPU and GPU acceleration with no extra installation or initialization code. Try out: https://learn.microsoft.com/en-us/windows/ai/new-windows-ml/overview 
  • WebGPU execution provider for WinML: Expand GPU acceleration coverage across more Windows hardware with the new WebGPU execution provider for the WinML SDK. Try out: https://learn.microsoft.com/en-us/windows/ai/new-windows-ml/overview 

 

Foundry Local in action: voice input in GitHub Copilot CLI 

The GitHub Copilot CLI’s voice input is built on Foundry Local. When you dictate a prompt in the terminal, audio is captured from your mic, streamed into a Foundry Local live transcription session running the Nemotron ASR Streaming model, and the partial + final results are piped straight into the CLI’s input buffer — all on-device, no cloud hop, no audio leaving the machine. 

To enable use /voice on and then you can speak into your Copilot CLI by holding space (or, Ctrl+k v to toggle): 

GitHub Copilot Voice powered by Foundry Local image

There is no private API or custom integration here. The CLI uses the same create_live_transcription_session() entry point shown in the snippet above, with the same sample_rate / channels / language=”auto” settings, the same append(pcm_bytes) push model, and the same get_stream() iterator. Cancellation when you hit Esc mid-utterance uses the new 1.2.0 inference cancellation path. If you have the Copilot CLI installed, run a few prompts with voice and look at: 

  • End-to-end latency from speech to token — that’s your floor for what a streaming-ASR UX feels like on the user’s hardware. 
  • Quality – the model delivers high accuracy (in our internal testing the model delivers ~8% Word Error Rate). 
  • Low Resource usage while transcribing — the model uses low single digit (%) CPU resource. 

 

If the behavior works for your use case, you can reproduce it in your own app in a few lines using any of the five SDKs — no extra services to stand up, no per-minute transcription bill. 

How developers are using Foundry Local 

Foundry Local is already being used across privacy-sensitive, performance-sensitive, and hardware-diverse scenarios. From local assistants and document workflows to multimodal context collection and enterprise AI pipelines, developers are using it to reduce platform complexity and deliver production-ready AI experiences faster. 

  

  

Privacy-first and secure local AI

Across consumer apps and enterprise workflows, developers are using Foundry Local to keep sensitive data closer to the device while delivering faster, more responsive AI experiences.

Foxit PDF Editor AI Assistant

Foxit uses Foundry Local to bring secure, local AI into document workflows such as question answering, summarization, translation, and document understanding. The result is a more practical path to on-device AI that helps keep sensitive information closer to the user while simplifying deployment at scale.

“Foundry Local gives us a practical way to bring powerful AI experiences directly into PDF workflows while keeping sensitive data closer to the user. Just as importantly, its managed local model approach helps simplify deployment, improve reliability, and reduce the operational burden of delivering on-device AI at scale.” – Queena Wei, SVP of Product at Foxit

 

Raycast

Raycast uses Foundry Local to make privacy-first, on-device AI more accessible to end users. By simplifying model discovery and local interaction, it helps bring local AI into everyday workflows with less friction.

“The integration of Foundry Local into Raycast gives our users the perfect option for privacy-first local AI. With it, they can easily leverage a variety of powerful models optimized for their Windows devices. Foundry Local made it super easy for us to implement the first step, a platform to browse and install models and a quick chat interface to use them, no internet required.” – Thomas Paul Mann, CEO & Founder at Raycast

 

Rakuten

Rakuten uses Foundry Local to bring responsive, privacy-sensitive AI experiences directly onto the device while balancing local responsiveness with broader cloud-connected capabilities. The result is a hybrid experience that feels more natural to end users while improving efficiency behind the scenes.

“Through our partnership with HP, Rakuten AI for Desktop uses Foundry Local to bring AI closer to the user — running responsive, privacy-sensitive experiences directly on the device while reducing cloud inference costs. Combined with Rakuten AI’s cloud intelligence and ecosystem integrations, this enables a hybrid AI experience that feels native to the desktop and scales efficiently for more advanced tasks.” – Vasanth Raju, Head of AI Product at Rakuten Group

 

PhonePe

PhonePe uses Foundry Local to power AI-driven transaction insights in its digital payments app with strong data protection. This helps deliver more responsive, privacy-conscious AI experiences without requiring personal financial information to leave the device.

 

Liquid AI’s ShieldFlow

ShieldFlow is an on-device privacy layer to redact sensitive data and prevent prompt injection before any prompt leaves the device. Through Foundry Local, ShieldFlow runs efficiently on CPUs on every Windows device including AI PCs, and enterprises can pull customized Liquid Foundational Model (LFM)  tuned to their own policies and roll them out across their Windows fleet through a single managed runtime.

 

 

Hardware portability and cross-device optimization

For teams building across different chips and execution environments, Foundry Local helps reduce hardware-specific complexity and accelerate deployment across devices.

Cephable

Cephable is a private AI assistant that runs entirely on device, enabling voice control, dictation, content generation, and task automation across apps. With Foundry Local, Cephable’s AI features run faster, support more models across NPU, GPU, and CPU, and let the team focus on building the assistant instead of managing silicon-specific optimizations.

“Since shifting from our custom inferencing implementation to Foundry Local, our engineers have been able to ship core features faster. We’re saving dozens of hours on optimizing models and managing build pipelines to handle the right acceleration in the right version of our app package. This directly leads to a better user experience and more choice for our users.” – Cordellia Yokum, Director and Principal Architect at Cephable

 

FlowyAIPC

FlowyAIPC builds an intelligent assistant for the era of heterogeneous AIPC silicon. FlowyAIPC integrates Foundry Local and Windows ML to solve the fundamental challenge of model-hardware decoupling across Intel, AMD, Qualcomm, and NVIDIA chips spanning CPU, NPU, iGPU, and dGPU.

“By leveraging Foundry Local’s automatic hardware detection and execution-provider abstraction, FlowyAIPC dynamically routes AI workloads to the optimal compute unit without user intervention: lightweight inference and sustained background tasks tap the NPU for power efficiency, while demanding generative workloads seamlessly spill to the GPU or CPU.” – Guoliang QI, CEO at StarwaveAI

 

AnythingLLM

AnythingLLM is a local-first, zero-configuration AI desktop application that allows enterprises to run LLMs completely on-device. Instead of maintaining separate runtimes for each hardware configuration, AnythingLLM uses Foundry Local to deliver on-device AI across a broad range of silicon platforms.

“With the rapid pace of AI software, maintaining custom runtimes for every specialized NPU and hardware configuration on the market creates a massive development bottleneck. The Foundry Local SDK helps us solve this by providing optimized, hardware-level, vendor agnostic performance out of the box, allowing us to deliver a consistent and secure local AI experience to our Windows users globally without the engineering overhead.” – Timothy Carambat, Founder & CEO at AnythingLLM

 

LUCI Desktop by Memories.ai

Memories.ai uses Foundry Local to run multimodal models efficiently across Qualcomm, Intel, and AMD devices in LUCI Desktop which provides an on-device context layer for PCs. That portability helps the team scale on-device research and multimodal workflows without extensive per-chip optimization.

“Foundry Local SDK took the silicon-portability problem off our plate — one SDK, simple APIs, and our multimodal models run efficiently across Qualcomm, Intel, and AMD without weeks of per-chip optimization. It lets us scale our on-device research globally on day one and keeps our team focused on the harder problems above the silicon layer.” – Shawn Shen, CEO at Memories.ai

 

Model HQ by LLMWare

Model HQ enables enterprise teams to build and run RAG pipelines and multi-step agents locally on AI PCs and private servers using a no-code interface. By integrating Foundry Local, Model HQ enables fast, offline-capable AI experiences directly on Windows devices built on chips from AMD, Intel, Qualcomm and Nvidia.

“The Foundry Local SDK made it incredibly easy for us to integrate NPU-optimized local AI models directly into Model HQ and rapidly deliver high-performance on-device NPU inferencing with minimal engineering overhead. It has significantly accelerated our ability to fully leverage emerging NPU compute capabilities for fast, efficient, and power-optimized local AI experiences.” – Darren Oberst, Co-Founder at LLMWare

 

Taken together, these customer stories show what Foundry Local means for developers in practice: fewer runtime and hardware-specific hurdles, faster paths from prototype to production, and more control over how AI runs on real devices. Whether you’re building privacy-sensitive apps, deploying across diverse silicon, or operationalizing local RAG and agent workflows, Foundry Local helps you spend less time stitching infrastructure together and more time shipping experiences that work.

 

Foundry Local on Azure Local 

At Build, we’re also introducing Foundry Local on Azure Local in preview: a new on-premises AI platform for running models, agents, and tools at enterprise scale. 

Designed for organizations that seek control, compliance, and low-latency execution, Foundry Local on Azure Local runs as containerized Kubernetes workloads on Azure Local and is orchestrated through Azure Arc. It helps teams deploy consistently across edge, hybrid, and fully disconnected environments while keeping AI close to the data and operations that depend on it. 

Here are some of the key preview capabilities announced today: 

Register to get access to Foundry Local on Azure Local preview: https://aka.ms/FoundryLocalAzure_PreviewRequest  

  • Custom MCP tools – Extend agents with custom tool servers using the Model Context Protocol (MCP) standard. 
  • GitHub Enterprise Local – Build and deploy AI apps end to end on-premises with local repos, CI/CD pipelines, and integrated security scanning. https://aka.ms/GHEL 
  • Azure Local for small form factor devices – Extend Azure Local to industrial PCs and ruggedized devices for manufacturing and retail edge deployments, with turnkey AI inference and Azure Arc-based device management. https://aka.ms/AzureSFF  
  • Watch the demoaka.ms/AzureSFFLaunchDemo

 

Early momentum is already visible across sovereign, industrial, and disconnected scenarios where organizations seek to have AI run reliably under strict operational and compliance constraints. 

“In energy operations, AI needs to run where the work happens – at remote facilities, offshore platforms, and field locations where connectivity is often limited, and safety is paramount. Foundry Local on Azure Local gives us a path to bring AI-driven decision-making closer to our operational data, with the governance our industry demands. The ability to deploy and run AI workloads consistently across edge and field environments, even when disconnected, is critical as we advance Chevron’s vision for autonomous and intelligent operations.”  (Chevron) Ed Moore – OT Strategist and Distinguished Engineer 

 

Together, these capabilities help organizations support both sovereign AI requirements, such as data control and compliance, and industrial edge scenarios that depend on real-time, localized execution. 

 

Get started 

 

If you want to start building with Foundry Local, begin with the documentation, Edge AI for Beginners, explore the available samples, and test local inference in your own application workflow. From there, you can evaluate the right model, runtime, and hardware path for your scenario, whether you’re building for AI PCs, enterprise apps, edge devices, or disconnected environments. 

 

If you’re following Microsoft Build 2026, these related sessions can help you go deeper into the announcements and developer scenarios supported by these releases: 

The post Accelerate Edge AI Development with Foundry Local appeared first on Microsoft Foundry Blog.

Read the whole story
alvinashcraft
2 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Foundry Toolkit for VS Code at //build: Hosted Agents End-to-End, a Smarter Toolbox, and More

1 Share

We’re excited to share what’s new for Foundry Toolkit for Visual Studio Code at //build 2026. Since going generally available, the toolkit has kept moving fast, and this release is a big one. The headline: a complete, end-to-end Hosted Agent experience, scaffold, run, deploy, and observe without ever leaving VS Code. On top of that, we’ve expanded the Toolbox with native enterprise integrations and shipped a wave of LangGraph samples so every developer has a clear path from idea to production. From your first prompt to a production-grade, observable agent, Foundry Toolkit meets you where you are. 

Hosted Agents, End to End 

Building an agent is the easy part; getting it from a first draft to a production-grade, observable service is what matters. This release makes the full Hosted Agent lifecycle available in VS Code, and it follows the way you actually work — scaffold, run, deploy, observe.

Scaffold — start from a rich set of samples

Hosted Agent creation now opens with a refreshed scaffolding experience and a rich sample selection, so you start from a working, framework-appropriate template instead of a blank file. Creation is smarter, too: we auto-select your subscription when there’s only one, gate tabs more clearly, and tightened spacing for a cleaner setup flow. 

New Hosted Agent scaffolding dialog with the rich sample picker open

Run (F5) — inspect as you build

Press F5 and your agent runs locally with the Agent Inspector, now aligned with the rest of the extension and featuring Copilot SDK visualization so you can see what the Inspector visualizes as the agent executes. It’s the fastest loop from change to verification before anything leaves your machine. 

Deploy — a new UX and new ways to ship

Different teams ship differently, so deployment got a refreshed UX and two new options for Hosted Agents: 

  • ZIP Code Deploy: Package your agent source as a ZIP and deploy it directly to Microsoft Foundry Agent Service. 
  • Bring-Your-Own-Image (BYOI): Already have a pre-built container in your own Azure Container Registry? Deploy straight from it. 
Hosted Agent deploy dialog showing "ZIP code deploy" and "Bring-your-own-image (ACR)" side by side.

Observe — know it works in production

Once deployed, the full observability story is now available: 

  • Hosted Agent Tracing: Inspect end-to-end traces of Hosted Agent invocations directly from VS Code — tool calls, delegation chains, and timing for real debugging instead of guesswork. 
  • Continuous Evaluation Settings: A new page to configure ongoing evaluation for deployed Hosted Agents, so quality is measured continuously — not just at ship time. 
  • Evaluations Node: One-click access to evaluation runs and results right from the Foundry project tree. 
Hosted Agent trace view showing a span tree of tool calls and timings.

A Smarter, More Connected Toolbox 

What it is, and why it matters 

A Toolbox is how your agent gets its capabilities — the curated set of tools, knowledge sources, and integrations it can call at runtime. Instead of hand-wiring each connection, you assemble a Toolbox once and your agent consumes it consistently across local runs and production. The result: agents that can act on real enterprise data and systems, with the connections managed in one place. 

From what to how: create, connect, consume 

  • Create: Start a new Toolbox from the Foundry Toolkit sidebar “Tools Catalog” and pick the capabilities your agent needs. 
  • Connect: Configure and wire in enterprise systems through native, first-class connections once, and use it for all your agents.
  • Consume: Reference the Toolbox from your Hosted Agent so its tools are available the moment the agent runs, locally (F5) and once deployed. 

New this release 

Building on that flow, the Toolbox is now richer and more enterprise-ready: 

  • WorkIQ as a Built-in Tool: A first-class WorkIQ experience powered by A2A connections — no MCP fallback required. End-to-end toolbox creation with WorkIQ works out of the box. 
  • Fabric IQ (OneLake Catalog) Integration: Connect your agents to Microsoft Fabric OneLake catalogs directly from the Toolbox. 
  • Toolbox Guardrails: Apply content-safety guardrails to your Toolbox for safer agent execution. 
  • Faster discovery: A new Toolbox Search Toggle and Agent Tool Multi-Select let you find and wire in multiple tools in a single action. 
Redesigned Tools Catalog, including WorkIQ and Fabric IQ tiles.

LangGraph Reaches Parity 

LangGraph developers, this one is for you. We’ve added five new Hosted Agent samples that bring LangGraph to full parity with the Agent Framework Responses learning path — so you get an equivalent, end-to-end walkthrough no matter which framework you prefer: 

  • MCP — tool loading from a remote MCP server (defaults to GitHub Copilot MCP) via MultiServerMCPClient. 
  • Workflows — a custom StateGraph chaining three specialized LLM nodes: slogan writer, legal reviewer, and formatter. 
  • Files — local filesystem tools plus the Foundry-Toolbox code_interpreter working over session-uploaded files. 
  • Human-in-the-Loop — a StateGraph that drafts a proposal and pauses for approval via langgraph.types.interrupt. 
  • Observability — GenAI OpenTelemetry tracing with enable_auto_tracing(); spans, metrics, and logs flow to Application Insights. 

We’ve also refreshed the existing bring-your-own LangGraph samples against the new hosting layer (chat with local tools, Foundry-managed Toolbox loading, and SSE-streamed multi-turn sessions backed by a MemorySaver checkpointer), so every sample reflects how Hosted Agents work today. 

Workflow visualization of the LangGraph human-in-the-loop sample paused at an approval node.

Polish Across the Board 

A release is more than headline features. This one also includes a redesigned Prompt Builder “Improve an Instruction” dialog for faster iteration, fixes for MCP toolbox tool icons, clearer ZIP-deploy error surfacing, and assorted Agent Builder and Playground regression fixes — the whole experience feels tighter end to end. 

Get Started Today 

Join the Community 

Share your projects, file issues, or suggest features on our GitHub repository. We can’t wait to see what you build. 

Welcome to the next chapter of AI development! 

Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete

The case for language clarity, with Iva Cheung

1 Share

1191. This week, we talk to Iva Cheung, a plain language expert and editor who has helped shape Canada's accessibility standards. We look at what plain language actually means (it's more than just short words and simple sentences) and why it matters for healthcare, legal rights, and everyday communication. Then we explore cognitive load theory, the expertise reversal effect, and why user testing is the secret ingredient most writers skip.


Find more from Iva at IvaCheung.com.


🔗 Join the Grammar Girl Patreon.

🔗 Share your familect recording in Speakpipe or by leaving a voicemail at 833-214-GIRL (833-214-4475)

🔗 Watch my LinkedIn Learning writing courses.

🔗 Subscribe to the newsletter.

🔗 Take our advertising survey.

🔗 Get the edited transcript here.

🔗 Get Grammar Girl books.

| HOST: Mignon Fogarty

| Grammar Girl is part of the Quick and Dirty Tips podcast network.

  • Audio Engineer: Dan Feierabend
  • Director of Podcast: Holly Hutchings
  • Advertising Operations Specialist: Morgan Christianson
  • Marketing and Video: Nat Hoopes, Rebekah Sebastian
  • Podcast Associate: Maram Elnagheeb

| Theme music by Catherine Rannus.

| Grammar Girl Social Media: YouTubeTikTokFacebookThreadsInstagramLinkedInMastodonBluesky.


Hosted on Acast. See acast.com/privacy for more information.





Download audio: https://sphinx.acast.com/p/open/s/69c1476c007cdcf83fc0964b/e/6a1a245fdd90858af9382270/media.mp3
Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Announcing Blazorise 2.2 - Cetina

1 Share
Blazorise 2.2, codenamed Cetina, is named after one of the most powerful rivers in Croatia, originating beneath the highest mountain in Croatia.
Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories