Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
155092 stories
·
33 followers

Learn T-SQL With Erik: Indexed View Maintenance

1 Share

Learn T-SQL With Erik: Indexed View Maintenance


Chapters

  • 00:00:00 – Introduction
  • 00:02:16 – Index Views Considered Harmful
  • 00:04:03 – Common Misconceptions About Index Views
  • 00:05:33 – Disallowed Features in Index Views
  • 00:08:44 – Performance Comparison Before and After

Full Transcript

Erik Darling here with Darling Data, and today’s video should be a fun one. We’re going to talk about making sure that indexed views don’t ruin your modifications, because they sure can under certain circumstances.

Down in the video description, you will see all sorts of helpful links, including a link to purchase the full course material. Remember, these are just tiny little crumbs of the full course material, which you can go get down in the video description below.

Or if you attend one of my upcoming advanced T-SQL pre-cons, you will also get the full course material for free 99. It’s amazing how that works.

You can also hire me for consulting. You can become a supporting member of the channel. You can buy me half of a New York City cappuccino every month if you’d like. Is it $4?

It’s not an incredible amount of money. And if you would like to ask me office hours questions, you can do that. And of course, if you are not feeling monetarily obligated towards me, which I understand many people just are very happy to just take things for free, you can like, subscribe, and tell a friend.

Fill a hole, fill a void in someone’s life that this channel would obviously, it would just complete them in ways that you can’t imagine. Or maybe you can imagine.

Maybe it just completed you. Maybe it just completed you in those same ways. Over on my GitHub repo, I have a free open source SQL Server performance monitoring tool. And it doesn’t suck.

It is all the stuff that I care about monitoring performance-wise on the SQL Server packaged up, given to you. You can point it at your SQL Servers. You can start getting great information about what’s going wrong with them performance-wise. Excuse me.

Spring is springing here, and now I’m starting to get a little allergic to things, so you’ll have to forgive my throat clearing on that one. There’s also a built-in MCP server.

So if you like the robots and you want to have the robots talk to your performance data and give you summaries or some analysis on them, maybe even give you some feedback on what you should do to fix it, you can do that. And they’ll do it, I mean, depending on how you call the robots, it might be free.

It might not be. I don’t know. All right. Tokens ain’t free, I guess. But anyway, this will be the last video that I record before going to… Jacksonville, Florida, May 1st for an advanced T-SQL pre-con.

You can still buy tickets. At least you better be able to still buy tickets, because it’s not for a couple days after this, so…

Or maybe… Actually, this one’s Thursday. The next day, Friday. Hey, you better hurry up before Jacksonville is gone. After I get back from Jacksonville, I will be on my way to Chicago, Illinois, May 7th and 8th for a pass-on tour east.

I will also be doing an advanced T-SQL pre-con. I will be doing an advanced T-SQL pre-con. I will be doing an advanced T-SQL pre-con there. Mmm. Many chances. After that, I will be at the lovely SQL Day in Poland, May 11th through 13th. And I’m flying right from Chicago to Poland.

It’s gonna be crazy times. Boy, I hope the weather’s nice. And in Poland, I know it might be hard for you to believe, but I will also have an advanced T-SQL pre-con there.

After that, I will be at Data Saturday Croatia with, believe it or not, an advanced T-SQL pre-con. And after that, I will be at PassData Community Society. I will be there. PassData Community Summit, the westest of all the summits in Seattle, Washington.

And there, well, I’m just gonna have to surprise you with what I’ll be doing there. It’s gonna be out of this world. Anyway, it is still April-ing outside.

Next week’s video will… Next week I will be debuting the May graphics. It’s gonna be wonderful and fantastic. You’re gonna be just as terrified of it as I am, I think.

Anyway. T-SQL Server Management Studio. When most people think about index views, they rightfully think about all the stuff they can’t do with them.

And I sympathize with that because, man, so many times they’ve been like, oh, if only you could do this, if only you could do that, it sure would be nice. And I realize that all the air has gone out of the room as far as making index views more powerful because everyone’s like, well, you could just use batch mode.

And that’s true in a lot of cases. But it’s also not true in a lot of cases. um batch mode is not always better than a completely pre-aggregated set of data so getting things like a min and a max in an index view you can’t do it and that sucks right this is lazy um but you know like really having those aggregations maintained somewhere uh can make read queries a lot faster especially when you consider that on standard edition um batch mode is still terribly hobbled everything maxes out at a dopp of two so if you still need like a dop above two like say four or eight or even six all right uh you don’t get it right it’s it just sucks and it’s annoying because you didn’t pay the friendship tax to microsoft but uh index use uh you know they have they have many challenges of their own uh we’re going to talk about those a little bit more i have another file queued up for next week’s stuff that we’ll talk a little bit index use in a slightly different way but um you know you know beside from the things that are disallowed in them uh you may you may still want and need no expand hints uh when you query them to keep uh sql server from expanding them into the underlying queries and ruining all the hard work you did to index that view just like filtered indexes and computed columns you need to have some anti-set options lined up correctly so that you do not experience terrible errors or queries not matching to your to your uh your your rocket science query tuning efforts um if you so you got to get this stuff lined up if you want them to work correctly but um we’re just to make things just like nice and compact here we’ve got this view which is not indexed yet but it is set up to be indexed by having a schema binding thingy here and a count big thingy here and of course we have the correct grouping that we need to do here this view already exists the problem is this view still takes 15 entire seconds to run right we are not having a good time with this view look at that well 14.2 seconds close enough uh you know sql server is like uh merge join that’s a good idea when is a merge joint ever a good idea right make that a hash join make that a parallel hash join and this thing would probably be about five seconds all right let’s say let’s try that let’s see what happens let’s do this and let’s let’s come on i’m clicking on you listen to me why don’t you ever listen to me let’s do a option i don’t know if you can hear the sirens outside but that’s another lovely side effect of spring the weather gets nice and i open my windows and new york’s like screw you here’s some sirens in your youtube video all right so let’s see let’s uh estimated plan what do we get look at that parallel hash join isn’t that a thing of beauty oh my word oh it’s gorgeous it’s wonderful let’s see what happens remember the last one was 15 well 14.2 seconds man wow parallel hash join 2.2 seconds actually let’s let’s go let’s go the tape i’ve been lied to by uh ssms before but yeah okay fine 2.2 seconds elapsed that is beautiful why would sql server pick a serial merge join plan when it could have had a beautiful parallel hash join plan i don’t know sql server sometimes i i i want to migrate to postgres when i see what you do but um when when we’re talking about you know uh trade-offs and query tuning and should i do this or should i do that i mean it’s our job to test these things right it’s our job to make sure that the the changes that we’re affecting have positive effects on the workload as a whole not we if we make one query a little bit faster but we completely ruin a whole bunch of other queries we didn’t do a good job right so we can’t have that but let’s let’s take a look right now uh at what an update to the post table currently looks like all right so i’m gonna do a begin transaction and a rollback and in the middle we’re gonna hit this little helper uh in a table valued function here called what’s up locks if you’re not familiar with this it’s available on my github repo with all my other grand stuff um so you can go get it there if you really want it but if we run this and we look at what happens when we update 100 rows uh we get you know we get a few x locks it’s not that big a deal uh the execution plan uh is pretty simple well for the for the update it’s pretty simple for for what’s up locks it’s clearly a complete disaster it’s well not a disaster but it is kind of a nightmare but uh here we have this thing and this you know we seek into an index and we do our update everything is just fine and dandy and even if we update 28 000 rows right if we hit john skeet and we say well it’s like 27 900 and something like you know we this thing does you know uh sort of lock the entire table but there’s no other competing locks so the lock escalation there not really uh any like all that unexpected for updating that many rows the execution plan does change a bit right but it’s it’s still a it’s still a pretty you know um you know still a pretty efficient plan for updating 27 900 and something rows but it can’t be too angry at that right that’s not not so not so bad but now let’s come back to our view here right and let’s let’s create a unique clustered index on our index view right and well this this takes a second to create and it’s not it’s like you know creating any other index it takes it takes a moment but but wait but while you’re sitting here and you’re sitting there waiting for that index to create you get to do all sorts of other things like run sp who is active maniacally and more manically rather and and like you know sit there and stare at your availability group and you’re blocking and whatever else and wait for it to finish but it’s finished now so that’s great and now with this done right with with this with this view indexed this all happens relatively quickly now we do one tiny little seek into the clustered index view uh sql server even suggests another index on here which we’re not we’re not going to add because we can just pretend 500 milliseconds is fast enough but now the the query plan for our update is going to change a bit right it doesn’t really get meaningfully slower for a couple reasons um i mean it gets a little bit slower but not like terrible and we have a lot more complexity in here now all right so because now we have to maintain the indexed view and so now we seek into the post table and we do all the updating we need to have the post table and then we have this sequence operator and the sequence operator says after you happen you happen i’m going to sequence you right and now down here we have to maintain the indexed view or rather the clustered index on the view which requires touching both tables in our case though i have added good indexes to support my index view so this is not a complete disaster right sometimes you do need good indexes in place to support the query underneath your index view to make reassembling the indexed view faster right it’s it’s a crazy world like like like when people say it’s like turtles all the way down it’s indexes all the way down it’s just oh i need an index to tune this and now i need an index to tune this and i’m going to make an index view but now i need indexes to make updating maintain maintaining my index view faster it does take some effort right and it does take some testing and stuff and uh updating 28 000 rows you know again this is half a second last time it it this does slow down a bit right it’s two points per second right again this does slow down a bit right it’s 2.6 seconds now so this is not the perfect world but now you can’t have everything all the time but uh you know we and we have a much more still a like this this plan was already a little bit more complex on top all right with all the sorting and splitting and filtering and whatnot but now maintaining the index view is still well it’s still like the bulk of our effort right that take that’s like you know you know there’s like 600 ah screw you there’s like 600 milliseconds here but two two two full seconds down here in the index view maintenance phase um you know most of it is not um assembling the index view right because we’re only at 325 milliseconds here that’s like 1.7 seconds total just updating the the values in the index view perhaps if we put that nonclustered index on the index view it would be it would be faster but you know let’s not let’s not get ahead of ourselves so if you want index views to work well for you uh you need to consider read queries and modification queries in your workload uh like with anything else modification query wise the more rows you get involved the longer something’s going to take a lot of the times um you know when i’m looking at trying to tune modification queries uh there’s like almost nothing from the read portion to tune everything you need to tune is in the right portion which is when things like batching become so much more valuable and useful and interesting to to get in because uh it turns out updating smaller chunks of work is going to take a lot longer than you think it’s going to take a lot longer than you think it’s going to take a lot longer than you think it’s going to take a lot longer than you think it’s to time well you may take the same amount of time to iterate over the table and update all 100 million rows or something uh you’re much kinder to your server in the process and things get a lot less like nuclear meltdowny when you’re doing that anyway thank you for watching uh yes this if you run a creator alter on an index view it drops all the indexes um but thank you for watching i hope you enjoyed yourselves i hope you learned something i hope that you will use indexes and index views responsibly in your SQL Server.

Thank you for watching. I hope you learned something. I think I may have already said that. Anyway, I’m good now. Goodbye.

Thank you. You are very kind people in the world. All right. Adios.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

The post Learn T-SQL With Erik: Indexed View Maintenance appeared first on Darling Data.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

Research: "What's the Default Language of an LLM?"

1 Share

Chad Fowler did an interesting study and posted about it to LinkedIN, in which he asked the question, "if I ask Claude / GPT / Gemini for "a script that..." or "a small web app for...", what am I going to get back?" I thought, "What about local LLMs? Does that change the conversation at all?"

First off, his original LinkedIN post is here, just to give credit where credit is due. Fortunately, he also put together a nice little test harness up on GitHub, which I was able to fork. I encourage readers to go look at either repository to understand the project code and methodology before continuing.

Local changes

The code required a few changes to run locally:

  • Modify the models.yaml file (which contained the list of models to run the prompt against). The original had a list of cloud models and providers, so it wasn't too hard to add a list of local-hosted models and URLs. There's one small mismatch, in that the code expects there to be an environment variable (OPENAI_API_KEY) that's used as part of the API calls, so in order to run locally I had to have some kind of value there (a la export OPENAI_API_KEY=foobar in the shell before running). Longer-term fix would be to probably check if it is provided, and if not, simply don't go looking for it and see if the call fails.

  • The original was using a second call to a cloud model to "judge" the returned LLM result, in order to determine what language the LLM had used to generate the code. Since I was running everything locally, I needed to modify the code to use a local LLM. Rather than switch models to match what was being used (or deliberately a different model than what was being used), I just chose a model and hard-coded it.

  • I also added an extract.py script that takes the JSONL file and turns each row into a standalone file in a peer extractions directory. This turned out to be necessary because I was getting some very weird results from the glm-4.7-flash model--more on this later. The extract script works a lot like the report script: it takes the JSONL and extracts the data into standalone files, one for each row.

Results

In my initial run, I use qwen-3.6, qwen3-coder, gpt-oss, gemma4, and glm-4.7-flash, and while most of the time the results aligned pretty closely with Chad's original results, the glm-4.7-flash model really choked hard.

Like, 48 none results, hard.

The rest of the models behaved somewhat similarly to what Chad found in his work: Lots of preference for Python when the context of the problem didn't strongly suggest (if not outright enforce) something else.

But the glm-4.7-flash failures were curious, as most of the time, it was exceptionally verbose and its output actually spilled out into a second response, which was actually the call to the classifier-judge request. For example, with the cli-dir-size task, which gemma4 completed in about 70 lines of response, the glm-4.7-flash model used over 6k lines no less than four times, and in some cases it got to a workable solution then talked itself right out of it. I have zero idea why that would be the case, but it was a common problem. We can see this when running the python3 -m whichlang.extract script, which breaks the JSONL out into separate files for easier comparison.

Now, I can't say for certain that the problem was with the model, since it could very well have been something I did wrong in the Ollama setup/configuration, but I couldn't say exactly what that would be. Asking Ollama for its model configuration, we got:

tedneward@Teds-MBP-16 Research-whichlang % ollama show glm-4.7-flash
  Model
    architecture        glm4moelite    
    parameters          29.9B          
    context length      202752         
    embedding length    2048           
    quantization        Q4_K_M         
    requires            0.15.0         

  Capabilities
    completion    
    tools         
    thinking      

  Parameters
    temperature    1    

  License
    MIT License                        
    Copyright (c) [year] [fullname]    
    ...                                

... which seems fine, but...? Certainly its context length and embedding length seemed fine, and I did nothing to change any of the configuration after the ollama pull, but glm-4.7-flash consistently failed like this over several runs.

Conclusions

In of itself, my modifications to Chad's experiment were pretty minor and incremental, at best--the only real "value-add" was the added data in the runs.jsonl results. For the most part, what I think of as the "standard" local coding models, gemma4, gpt-oss and the various qwen3 models, all did pretty well, well enough that I consider them to be on par with what the cloud models would create for a bunch of these sorts of tasks. The glm-4.7-flash model I think is stronger than this experiment suggests it to be, but it may need some kind of tuning or better harnessing to avoid what appeared to be getting caught in a "dead-end" loop.

If anything, my personal "big win" is the tasks.yaml file, which I plan to use as a harness for some of my other experiments, most notably the one I was working on before Chad distracted me, around the various permutations of "skills" files that we see across the industry. They seem like a nice collection of tasks to feed to OpenCode and capture the results.

One last thing: When Chad and I were DM'ing about this experiment, one thing that became very apparent is how much he is hoping this experiment can serve as an ongoing, "live" experiment to which others can contribute and improve. I heartily second that emotion--like Chad, I'm putting all this out into the public space so that people can take it and run with it, maybe adding new models (cloud or local) and/or new tasks, or even just run the experiment with different parameters (temperature, context lengths, whatever). The more we can get data that shows different behavior of the models, the more we collectively as an industry can get a handle on exactly what and how these models can help us.

And in the end, isn't that what these things are supposed to be doing? Helping us, I mean?

Read the whole story
alvinashcraft
3 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

How to Fine-Tune Nemotron 3.5 ASR for Your Language, Domain, or Accent

1 Share
Read the whole story
alvinashcraft
3 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Accelerate Edge AI Development with Foundry Local

1 Share

Why edge AI development is still hard 

AI is no longer confined to cloud experiments. Developers are increasingly expected to deliver AI inside apps, devices, and edge systems where responsiveness, privacy, resilience, and local control are essential. But building those experiences for production is still difficult. 

Teams often have to solve model packaging, runtime fragmentation, hardware differences, and deployment complexity before they can ship a single reliable feature. That slows iteration and makes it harder to move from prototype to product. 

At Microsoft Build 2026, we’re announcing updates across Foundry Local and Foundry Local on Azure Local that help developers build once and run AI closer to where data is created and decisions are made. These updates expand platform support, improve control over inference and acceleration, add new on-device APIs, and simplify deployment across disconnected, regulated, and sovereign environments.

 

What’s new in Foundry Local 

The latest Foundry Local updates focus on the areas developers care about most: broader platform reach, familiar APIs, better runtime control, and simpler access to hardware acceleration. Together, these improvements help teams move faster from experimentation to production on AI PCs, edge devices, and enterprise infrastructure. 

 

Foundry Local 

Last month we announced the 1.1.0 release of Foundry Local (Foundry Local 1.1: Live Transcription, Embeddings, and Responses API | Microsoft Foundry Blog) — Microsoft’s cross-platform local AI solution that let developers bring AI directly into their applications with no cloud dependency, no network latency, and no per-token costs. 

The 1.1.0 release added: 

  • Live audio transcription for real-time speech-to-text scenarios like captioning, voice UIs, and meeting transcription. 
  • Text embeddings for semantic search, RAG, clustering, and similarity matching use cases. 
  • Responses API support for structured agentic interactions, including tool calling and multimodal vision-language input. 
  • WebGPU execution provider plugin delivered separately to reduce the default package size for applications that don’t need it. 
  • Reduced JavaScript package size by replacing the koffi FFI layer with a custom Node-API C addon. 
  • Broader .NET compatibility by targeting lower framework versions in the C# SDK. 

Today we are announcing the 1.2.0 release of Foundry Local, which expands language support in the Live Transcription API, offers a wide range of device support for Linux, improves cancellation and execution provider workflows, adds new on-device API options, and strengthens the Windows acceleration story with Windows ML (WinML) 2.0. 

What’s new in 1.2.0 

  • Multilingual ASR: Last month we included support for real-time speech-to-text streaming directly from a microphone. We identified NVIDIA’s Nemotron Speech Streaming as the strongest candidate for real-time English streaming on resource-constrained hardware (for further details, read: https://arxiv.org/pdf/2604.14493). Today we are happy to announce that Foundry Local 1.2.0 goes multilingual with support for 40+ languages via the latest Nemotron 3.5 ASR Streaming Multilingual model. Try out: https://github.com/microsoft/Foundry-Local/tree/main/samples/python/live-audio-transcription 

 

from foundry_local_sdk import Configuration, FoundryLocalManager 

config = Configuration(app_name="my_app") 

FoundryLocalManager.initialize(config) 
manager = FoundryLocalManager.instance 

model = manager.catalog.get_model( 

    "nvidia-nemotron-3.5-asr-streaming-multilingual-0.6b" 

) 

model.download() 
model.load()  

session = model.get_audio_client().create_live_transcription_session() 
session.settings.sample_rate = 16000 
session.settings.channels = 1 
session.settings.language = "auto"   # or "de", "zh-CN", "en", ...    

session.start() 
session.append(pcm_bytes)            # push audio chunks from a mic/file 
for result in session.get_stream(): 
    print(result.content[0].text)    # clean text, inline language tags stripped 
session.stop() 

 

 

  • Faster model downloads via cross-region catalog: Foundry Local now fronts the model catalog with Azure Traffic Manager, routing each user to the best-performing region, so end users see noticeably faster first-run model downloads. No code changes required — developers just need to bump to the v1.2.0 SDK. 
  • Download and EP cancellation across all 5 SDKs: Cancel model and execution-provider downloads from C#, Python, JavaScript, Rust, and C++ using each language’s native cancellation pattern. Try out: https://github.com/microsoft/Foundry-Local/blob/main/README.md 
  • Inference cancellation: Cancel in-flight chat completions and transcription sessions cleanly when users move on, without wasted compute or orphaned streams. Try out: https://github.com/microsoft/Foundry-Local/blob/main/README.md 
  • Per-EP download progress in Python: Surface per-provider download progress in Python instead of a generic spinner. Try out: https://github.com/microsoft/Foundry-Local/tree/main/sdk/python 
  • Upgraded to Windows ML (WinML) 2.0: The Foundry Local WinML packages now ship with the latest WinML 2.0, removing the previous Windows App SDK runtime dependency and bootstrap step so Python, JavaScript, Rust, and C++ apps get NPU and GPU acceleration with no extra installation or initialization code. Try out: https://learn.microsoft.com/en-us/windows/ai/new-windows-ml/overview 
  • WebGPU execution provider for WinML: Expand GPU acceleration coverage across more Windows hardware with the new WebGPU execution provider for the WinML SDK. Try out: https://learn.microsoft.com/en-us/windows/ai/new-windows-ml/overview 

 

Foundry Local in action: voice input in GitHub Copilot CLI 

The GitHub Copilot CLI’s voice input is built on Foundry Local. When you dictate a prompt in the terminal, audio is captured from your mic, streamed into a Foundry Local live transcription session running the Nemotron ASR Streaming model, and the partial + final results are piped straight into the CLI’s input buffer — all on-device, no cloud hop, no audio leaving the machine. 

To enable use /voice on and then you can speak into your Copilot CLI by holding space (or, Ctrl+k v to toggle): 

GitHub Copilot Voice powered by Foundry Local image

There is no private API or custom integration here. The CLI uses the same create_live_transcription_session() entry point shown in the snippet above, with the same sample_rate / channels / language=”auto” settings, the same append(pcm_bytes) push model, and the same get_stream() iterator. Cancellation when you hit Esc mid-utterance uses the new 1.2.0 inference cancellation path. If you have the Copilot CLI installed, run a few prompts with voice and look at: 

  • End-to-end latency from speech to token — that’s your floor for what a streaming-ASR UX feels like on the user’s hardware. 
  • Quality – the model delivers high accuracy (in our internal testing the model delivers ~8% Word Error Rate). 
  • Low Resource usage while transcribing — the model uses low single digit (%) CPU resource. 

 

If the behavior works for your use case, you can reproduce it in your own app in a few lines using any of the five SDKs — no extra services to stand up, no per-minute transcription bill. 

How developers are using Foundry Local 

Foundry Local is already being used across privacy-sensitive, performance-sensitive, and hardware-diverse scenarios. From local assistants and document workflows to multimodal context collection and enterprise AI pipelines, developers are using it to reduce platform complexity and deliver production-ready AI experiences faster. 

  

  

Privacy-first and secure local AI

Across consumer apps and enterprise workflows, developers are using Foundry Local to keep sensitive data closer to the device while delivering faster, more responsive AI experiences.

Foxit PDF Editor AI Assistant

Foxit uses Foundry Local to bring secure, local AI into document workflows such as question answering, summarization, translation, and document understanding. The result is a more practical path to on-device AI that helps keep sensitive information closer to the user while simplifying deployment at scale.

“Foundry Local gives us a practical way to bring powerful AI experiences directly into PDF workflows while keeping sensitive data closer to the user. Just as importantly, its managed local model approach helps simplify deployment, improve reliability, and reduce the operational burden of delivering on-device AI at scale.” – Queena Wei, SVP of Product at Foxit

 

Raycast

Raycast uses Foundry Local to make privacy-first, on-device AI more accessible to end users. By simplifying model discovery and local interaction, it helps bring local AI into everyday workflows with less friction.

“The integration of Foundry Local into Raycast gives our users the perfect option for privacy-first local AI. With it, they can easily leverage a variety of powerful models optimized for their Windows devices. Foundry Local made it super easy for us to implement the first step, a platform to browse and install models and a quick chat interface to use them, no internet required.” – Thomas Paul Mann, CEO & Founder at Raycast

 

Rakuten

Rakuten uses Foundry Local to bring responsive, privacy-sensitive AI experiences directly onto the device while balancing local responsiveness with broader cloud-connected capabilities. The result is a hybrid experience that feels more natural to end users while improving efficiency behind the scenes.

“Through our partnership with HP, Rakuten AI for Desktop uses Foundry Local to bring AI closer to the user — running responsive, privacy-sensitive experiences directly on the device while reducing cloud inference costs. Combined with Rakuten AI’s cloud intelligence and ecosystem integrations, this enables a hybrid AI experience that feels native to the desktop and scales efficiently for more advanced tasks.” – Vasanth Raju, Head of AI Product at Rakuten Group

 

PhonePe

PhonePe uses Foundry Local to power AI-driven transaction insights in its digital payments app with strong data protection. This helps deliver more responsive, privacy-conscious AI experiences without requiring personal financial information to leave the device.

 

Liquid AI’s ShieldFlow

ShieldFlow is an on-device privacy layer to redact sensitive data and prevent prompt injection before any prompt leaves the device. Through Foundry Local, ShieldFlow runs efficiently on CPUs on every Windows device including AI PCs, and enterprises can pull customized Liquid Foundational Model (LFM)  tuned to their own policies and roll them out across their Windows fleet through a single managed runtime.

 

 

Hardware portability and cross-device optimization

For teams building across different chips and execution environments, Foundry Local helps reduce hardware-specific complexity and accelerate deployment across devices.

Cephable

Cephable is a private AI assistant that runs entirely on device, enabling voice control, dictation, content generation, and task automation across apps. With Foundry Local, Cephable’s AI features run faster, support more models across NPU, GPU, and CPU, and let the team focus on building the assistant instead of managing silicon-specific optimizations.

“Since shifting from our custom inferencing implementation to Foundry Local, our engineers have been able to ship core features faster. We’re saving dozens of hours on optimizing models and managing build pipelines to handle the right acceleration in the right version of our app package. This directly leads to a better user experience and more choice for our users.” – Cordellia Yokum, Director and Principal Architect at Cephable

 

FlowyAIPC

FlowyAIPC builds an intelligent assistant for the era of heterogeneous AIPC silicon. FlowyAIPC integrates Foundry Local and Windows ML to solve the fundamental challenge of model-hardware decoupling across Intel, AMD, Qualcomm, and NVIDIA chips spanning CPU, NPU, iGPU, and dGPU.

“By leveraging Foundry Local’s automatic hardware detection and execution-provider abstraction, FlowyAIPC dynamically routes AI workloads to the optimal compute unit without user intervention: lightweight inference and sustained background tasks tap the NPU for power efficiency, while demanding generative workloads seamlessly spill to the GPU or CPU.” – Guoliang QI, CEO at StarwaveAI

 

AnythingLLM

AnythingLLM is a local-first, zero-configuration AI desktop application that allows enterprises to run LLMs completely on-device. Instead of maintaining separate runtimes for each hardware configuration, AnythingLLM uses Foundry Local to deliver on-device AI across a broad range of silicon platforms.

“With the rapid pace of AI software, maintaining custom runtimes for every specialized NPU and hardware configuration on the market creates a massive development bottleneck. The Foundry Local SDK helps us solve this by providing optimized, hardware-level, vendor agnostic performance out of the box, allowing us to deliver a consistent and secure local AI experience to our Windows users globally without the engineering overhead.” – Timothy Carambat, Founder & CEO at AnythingLLM

 

LUCI Desktop by Memories.ai

Memories.ai uses Foundry Local to run multimodal models efficiently across Qualcomm, Intel, and AMD devices in LUCI Desktop which provides an on-device context layer for PCs. That portability helps the team scale on-device research and multimodal workflows without extensive per-chip optimization.

“Foundry Local SDK took the silicon-portability problem off our plate — one SDK, simple APIs, and our multimodal models run efficiently across Qualcomm, Intel, and AMD without weeks of per-chip optimization. It lets us scale our on-device research globally on day one and keeps our team focused on the harder problems above the silicon layer.” – Shawn Shen, CEO at Memories.ai

 

Model HQ by LLMWare

Model HQ enables enterprise teams to build and run RAG pipelines and multi-step agents locally on AI PCs and private servers using a no-code interface. By integrating Foundry Local, Model HQ enables fast, offline-capable AI experiences directly on Windows devices built on chips from AMD, Intel, Qualcomm and Nvidia.

“The Foundry Local SDK made it incredibly easy for us to integrate NPU-optimized local AI models directly into Model HQ and rapidly deliver high-performance on-device NPU inferencing with minimal engineering overhead. It has significantly accelerated our ability to fully leverage emerging NPU compute capabilities for fast, efficient, and power-optimized local AI experiences.” – Darren Oberst, Co-Founder at LLMWare

 

Taken together, these customer stories show what Foundry Local means for developers in practice: fewer runtime and hardware-specific hurdles, faster paths from prototype to production, and more control over how AI runs on real devices. Whether you’re building privacy-sensitive apps, deploying across diverse silicon, or operationalizing local RAG and agent workflows, Foundry Local helps you spend less time stitching infrastructure together and more time shipping experiences that work.

 

Foundry Local on Azure Local 

At Build, we’re also introducing Foundry Local on Azure Local in preview: a new on-premises AI platform for running models, agents, and tools at enterprise scale. 

Designed for organizations that seek control, compliance, and low-latency execution, Foundry Local on Azure Local runs as containerized Kubernetes workloads on Azure Local and is orchestrated through Azure Arc. It helps teams deploy consistently across edge, hybrid, and fully disconnected environments while keeping AI close to the data and operations that depend on it. 

Here are some of the key preview capabilities announced today: 

Register to get access to Foundry Local on Azure Local preview: https://aka.ms/FoundryLocalAzure_PreviewRequest  

  • Custom MCP tools – Extend agents with custom tool servers using the Model Context Protocol (MCP) standard. 
  • GitHub Enterprise Local – Build and deploy AI apps end to end on-premises with local repos, CI/CD pipelines, and integrated security scanning. https://aka.ms/GHEL 
  • Azure Local for small form factor devices – Extend Azure Local to industrial PCs and ruggedized devices for manufacturing and retail edge deployments, with turnkey AI inference and Azure Arc-based device management. https://aka.ms/AzureSFF  
  • Watch the demoaka.ms/AzureSFFLaunchDemo

 

Early momentum is already visible across sovereign, industrial, and disconnected scenarios where organizations seek to have AI run reliably under strict operational and compliance constraints. 

“In energy operations, AI needs to run where the work happens – at remote facilities, offshore platforms, and field locations where connectivity is often limited, and safety is paramount. Foundry Local on Azure Local gives us a path to bring AI-driven decision-making closer to our operational data, with the governance our industry demands. The ability to deploy and run AI workloads consistently across edge and field environments, even when disconnected, is critical as we advance Chevron’s vision for autonomous and intelligent operations.”  (Chevron) Ed Moore – OT Strategist and Distinguished Engineer 

 

Together, these capabilities help organizations support both sovereign AI requirements, such as data control and compliance, and industrial edge scenarios that depend on real-time, localized execution. 

 

Get started 

 

If you want to start building with Foundry Local, begin with the documentation, Edge AI for Beginners, explore the available samples, and test local inference in your own application workflow. From there, you can evaluate the right model, runtime, and hardware path for your scenario, whether you’re building for AI PCs, enterprise apps, edge devices, or disconnected environments. 

 

If you’re following Microsoft Build 2026, these related sessions can help you go deeper into the announcements and developer scenarios supported by these releases: 

The post Accelerate Edge AI Development with Foundry Local appeared first on Microsoft Foundry Blog.

Read the whole story
alvinashcraft
4 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Foundry Toolkit for VS Code at //build: Hosted Agents End-to-End, a Smarter Toolbox, and More

1 Share

We’re excited to share what’s new for Foundry Toolkit for Visual Studio Code at //build 2026. Since going generally available, the toolkit has kept moving fast, and this release is a big one. The headline: a complete, end-to-end Hosted Agent experience, scaffold, run, deploy, and observe without ever leaving VS Code. On top of that, we’ve expanded the Toolbox with native enterprise integrations and shipped a wave of LangGraph samples so every developer has a clear path from idea to production. From your first prompt to a production-grade, observable agent, Foundry Toolkit meets you where you are. 

Hosted Agents, End to End 

Building an agent is the easy part; getting it from a first draft to a production-grade, observable service is what matters. This release makes the full Hosted Agent lifecycle available in VS Code, and it follows the way you actually work — scaffold, run, deploy, observe.

Scaffold — start from a rich set of samples

Hosted Agent creation now opens with a refreshed scaffolding experience and a rich sample selection, so you start from a working, framework-appropriate template instead of a blank file. Creation is smarter, too: we auto-select your subscription when there’s only one, gate tabs more clearly, and tightened spacing for a cleaner setup flow. 

New Hosted Agent scaffolding dialog with the rich sample picker open

Run (F5) — inspect as you build

Press F5 and your agent runs locally with the Agent Inspector, now aligned with the rest of the extension and featuring Copilot SDK visualization so you can see what the Inspector visualizes as the agent executes. It’s the fastest loop from change to verification before anything leaves your machine. 

Deploy — a new UX and new ways to ship

Different teams ship differently, so deployment got a refreshed UX and two new options for Hosted Agents: 

  • ZIP Code Deploy: Package your agent source as a ZIP and deploy it directly to Microsoft Foundry Agent Service. 
  • Bring-Your-Own-Image (BYOI): Already have a pre-built container in your own Azure Container Registry? Deploy straight from it. 
Hosted Agent deploy dialog showing "ZIP code deploy" and "Bring-your-own-image (ACR)" side by side.

Observe — know it works in production

Once deployed, the full observability story is now available: 

  • Hosted Agent Tracing: Inspect end-to-end traces of Hosted Agent invocations directly from VS Code — tool calls, delegation chains, and timing for real debugging instead of guesswork. 
  • Continuous Evaluation Settings: A new page to configure ongoing evaluation for deployed Hosted Agents, so quality is measured continuously — not just at ship time. 
  • Evaluations Node: One-click access to evaluation runs and results right from the Foundry project tree. 
Hosted Agent trace view showing a span tree of tool calls and timings.

A Smarter, More Connected Toolbox 

What it is, and why it matters 

A Toolbox is how your agent gets its capabilities — the curated set of tools, knowledge sources, and integrations it can call at runtime. Instead of hand-wiring each connection, you assemble a Toolbox once and your agent consumes it consistently across local runs and production. The result: agents that can act on real enterprise data and systems, with the connections managed in one place. 

From what to how: create, connect, consume 

  • Create: Start a new Toolbox from the Foundry Toolkit sidebar “Tools Catalog” and pick the capabilities your agent needs. 
  • Connect: Configure and wire in enterprise systems through native, first-class connections once, and use it for all your agents.
  • Consume: Reference the Toolbox from your Hosted Agent so its tools are available the moment the agent runs, locally (F5) and once deployed. 

New this release 

Building on that flow, the Toolbox is now richer and more enterprise-ready: 

  • WorkIQ as a Built-in Tool: A first-class WorkIQ experience powered by A2A connections — no MCP fallback required. End-to-end toolbox creation with WorkIQ works out of the box. 
  • Fabric IQ (OneLake Catalog) Integration: Connect your agents to Microsoft Fabric OneLake catalogs directly from the Toolbox. 
  • Toolbox Guardrails: Apply content-safety guardrails to your Toolbox for safer agent execution. 
  • Faster discovery: A new Toolbox Search Toggle and Agent Tool Multi-Select let you find and wire in multiple tools in a single action. 
Redesigned Tools Catalog, including WorkIQ and Fabric IQ tiles.

LangGraph Reaches Parity 

LangGraph developers, this one is for you. We’ve added five new Hosted Agent samples that bring LangGraph to full parity with the Agent Framework Responses learning path — so you get an equivalent, end-to-end walkthrough no matter which framework you prefer: 

  • MCP — tool loading from a remote MCP server (defaults to GitHub Copilot MCP) via MultiServerMCPClient. 
  • Workflows — a custom StateGraph chaining three specialized LLM nodes: slogan writer, legal reviewer, and formatter. 
  • Files — local filesystem tools plus the Foundry-Toolbox code_interpreter working over session-uploaded files. 
  • Human-in-the-Loop — a StateGraph that drafts a proposal and pauses for approval via langgraph.types.interrupt. 
  • Observability — GenAI OpenTelemetry tracing with enable_auto_tracing(); spans, metrics, and logs flow to Application Insights. 

We’ve also refreshed the existing bring-your-own LangGraph samples against the new hosting layer (chat with local tools, Foundry-managed Toolbox loading, and SSE-streamed multi-turn sessions backed by a MemorySaver checkpointer), so every sample reflects how Hosted Agents work today. 

Workflow visualization of the LangGraph human-in-the-loop sample paused at an approval node.

Polish Across the Board 

A release is more than headline features. This one also includes a redesigned Prompt Builder “Improve an Instruction” dialog for faster iteration, fixes for MCP toolbox tool icons, clearer ZIP-deploy error surfacing, and assorted Agent Builder and Playground regression fixes — the whole experience feels tighter end to end. 

Get Started Today 

Join the Community 

Share your projects, file issues, or suggest features on our GitHub repository. We can’t wait to see what you build. 

Welcome to the next chapter of AI development! 

Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete

The case for language clarity, with Iva Cheung

1 Share

1191. This week, we talk to Iva Cheung, a plain language expert and editor who has helped shape Canada's accessibility standards. We look at what plain language actually means (it's more than just short words and simple sentences) and why it matters for healthcare, legal rights, and everyday communication. Then we explore cognitive load theory, the expertise reversal effect, and why user testing is the secret ingredient most writers skip.


Find more from Iva at IvaCheung.com.


🔗 Join the Grammar Girl Patreon.

🔗 Share your familect recording in Speakpipe or by leaving a voicemail at 833-214-GIRL (833-214-4475)

🔗 Watch my LinkedIn Learning writing courses.

🔗 Subscribe to the newsletter.

🔗 Take our advertising survey.

🔗 Get the edited transcript here.

🔗 Get Grammar Girl books.

| HOST: Mignon Fogarty

| Grammar Girl is part of the Quick and Dirty Tips podcast network.

  • Audio Engineer: Dan Feierabend
  • Director of Podcast: Holly Hutchings
  • Advertising Operations Specialist: Morgan Christianson
  • Marketing and Video: Nat Hoopes, Rebekah Sebastian
  • Podcast Associate: Maram Elnagheeb

| Theme music by Catherine Rannus.

| Grammar Girl Social Media: YouTubeTikTokFacebookThreadsInstagramLinkedInMastodonBluesky.


Hosted on Acast. See acast.com/privacy for more information.





Download audio: https://sphinx.acast.com/p/open/s/69c1476c007cdcf83fc0964b/e/6a1a245fdd90858af9382270/media.mp3
Read the whole story
alvinashcraft
5 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories