Learn NuGet package metadata best practices. Configure README files, icons, license expressions, tags, and repository URLs to make your .NET package shine on NuGet.org.
Learn NuGet package metadata best practices. Configure README files, icons, license expressions, tags, and repository URLs to make your .NET package shine on NuGet.org.
This blog post was created with the help of AI tools. Yes, I used a bit of magic from language models to organize my thoughts and automate the boring parts, but the geeky fun and the
in C# are 100% mine.
Hi!
Part 1 of the CopilotHarness series

Big models are great for heavy thinking. But what about simple questions — “rename this variable”, “write a short docstring”, “what does this function do”? Those don’t need GPT-5 across the internet. They can be answered instantly by a model running on your own machine, offline, for free.
This post shows you how to wire GitHub Copilot in VS Code to local models using three minimal proxies — one for Ollama, one for Foundry Local, one for Azure OpenAI — and how to run all three together with a single command.
VS Code Copilot supports a Bring Your Own Model mechanism. You register any OpenAI-compatible endpoint in a config file, and Copilot treats it as just another model in the picker — no extension, no plugin.
Official docs: Bring your own model to GitHub Copilot Chat
The config lives in a file called chatLanguageModels.json in your VS Code user folder:
{
"providers": [
{
"name": "Ollama (local)",
"vendor": "customendpoint",
"url": "http://localhost:5099/v1",
"modelId": "llama3.1:8b",
"chatModelId": "copilot-chat-model"
}
]
}
That's it — point url at any OpenAI-compatible endpoint, give it a name, and it appears in Copilot's model picker.

A local model provider isn’t always OpenAI-compatible out of the box. Each proxy in this repo acts as a thin translation layer — it speaks OpenAI on one side and the local backend on the other.

| Proxy | Port | Backend | Best for |
|---|---|---|---|
| OllamaProxy | 5099 | Ollama (local) | Quickest start, huge model catalog |
| FoundryLocalProxy | 5101 | Foundry Local SDK (offline, NPU) | Offline/air-gapped, hardware acceleration |
| FoundryProxy | 5100 | Azure OpenAI / Foundry cloud | Production-grade, secret-managed cloud models |
Each proxy is a single ASP.NET Core Minimal API file — no frameworks, no abstractions. The entire proxy fits on one screen. That’s intentional: these are teaching samples, not production middleware.
Here’s something most developers don’t know: when GitHub Copilot Chat sends “hi”, it doesn’t send just "hi". It sends this:
<attachments>...file contents...</attachments>
<context>...editor state...</context>
<reminderInstructions>...workspace instructions...</reminderInstructions>
<userRequest>hi</userRequest>
The actual message is buried inside <userRequest>. A proxy that naively reads the last user message sees ~3 KB of boilerplate instead of the word "hi".
All three proxies share one class — CopilotMessageExtractor — that unwraps this envelope. It lives in the shared Proxies.Common library:
// The key method — finds <userRequest>...</userRequest> and returns its content.
// Falls back gracefully for non-Copilot clients (curl, SDK) that send plain text.
public static string ExtractTypedUserMessage(string rawUserMessage)
{
// Look for <userRequest> first (VS Code Copilot always uses this)
var userRequest = ExtractTagContent(rawUserMessage, "userRequest")
?? ExtractTagContent(rawUserMessage, "user-request");
if (!string.IsNullOrWhiteSpace(userRequest))
return userRequest.Trim();
// No tag — strip all known wrapper blocks and return what's left
var stripped = rawUserMessage;
foreach (var tag in CopilotWrapperTags)
stripped = RemoveTagBlock(stripped, tag);
// If stripping removed everything, fall back to the raw message
// (this path is hit for plain curl/SDK clients — they send plain text)
return string.IsNullOrWhiteSpace(stripped.Trim())
? rawUserMessage.Trim()
: stripped.Trim();
}
This class is why the proxies work correctly with both Copilot Chat and direct API calls. The logging in each proxy shows the real ask — not the 3 KB envelope.
Pre-requisite: Ollama running with at least one model pulled.
# Pull a model ollama pull llama3.1:8b # Start the proxy cd src/proxies/OllamaProxy dotnet run # → http://localhost:5099
The proxy auto-discovers your installed Ollama models and passes the model ID through. Add it to VS Code:
// chatLanguageModels.json (Windows: %APPDATA%\Code\User\)
{
"providers": [
{
"name": "Ollama — llama3.1:8b",
"vendor": "customendpoint",
"url": "http://localhost:5099/v1",
"modelId": "llama3.1:8b",
"chatModelId": "copilot-chat-model"
}
]
}
Verify it’s running:
curl http://localhost:5099/health
# → {"status":"ok","backend":"ollama","models":["llama3.1:8b",...]}

Microsoft Foundry Local runs models fully offline using the ONNX Runtime with hardware acceleration (CPU, GPU, NPU on Windows).
No pre-requisites — the SDK downloads the model on first run and caches it locally.
cd src/proxies/FoundryLocalProxy dotnet run # First run: downloads phi-4-mini (~2.5 GB) automatically # → http://localhost:5101
The Models page in the test app shows which models are cached, lets you load/unload (frees GPU RAM instantly), and delete models from disk:

Tip: Use the Models page to download a model before chatting with it. If you send a chat request to an unloaded model, you get a clear error explaining the model needs to be loaded first — not a cryptic 500.
Add it to VS Code alongside Ollama — Copilot lets you pick which model to use per conversation:
{
"name": "Foundry Local — phi-4-mini",
"vendor": "customendpoint",
"url": "http://localhost:5101/v1",
"modelId": "phi-4-mini",
"chatModelId": "copilot-chat-model"
}
For cloud models, FoundryProxy uses .NET User Secrets so your API key never touches the repo.
cd src/proxies/FoundryProxy # Store credentials locally (never committed to git) dotnet user-secrets set "Foundry:Endpoint" "https://your-resource.openai.azure.com" dotnet user-secrets set "Foundry:ApiKey" "your-key" dotnet user-secrets set "Foundry:Deployment" "gpt-4o-mini" dotnet run # → http://localhost:5100
The fastest way to run everything is via the .NET Aspire CLI. One command starts all three proxies, the Blazor test app, and the Aspire dashboard with logs, traces, and health checks:
cd src/proxies aspire start
What starts:
| Service | URL | What it is |
|---|---|---|
ollama-proxy | http://localhost:5099 | OllamaProxy |
foundry-proxy | http://localhost:5100 | FoundryProxy |
foundry-local-proxy | http://localhost:5101 | FoundryLocalProxy |
proxies-test-app | http://localhost:5102 | Blazor test UI |
| Aspire dashboard | printed in console | Logs, traces, health for all services |
Requires Aspire CLI:
dotnet workload install aspire
The Blazor test app at http://localhost:5102 gives you a browser UI to test all three proxies without writing any code:

traces for every request, including custom LlmActivity spans with prompt text, model ID, token counts, and latency. You can see exactly what Copilot sent and what the model returned.


To stop everything:
aspire stop
The /setup page at http://localhost:5102/setup generates the exact chatLanguageModels.json snippet for each running proxy, with the correct port and model ID. Copy and paste into your VS Code user config folder:

%APPDATA%\Code\User\chatLanguageModels.json~/Library/Application Support/Code/User/chatLanguageModels.json~/.config/Code/User/chatLanguageModels.jsonAfter saving, reload VS Code. Open Copilot Chat, click the model picker, and your local models appear alongside the built-in cloud models.
Shortcut: If you have the CopilotHarness CLI tool installed, running
harness initwrites this file automatically.
The proxies shown here are static: you pick a model manually per conversation. The next level is automatic routing — where every Copilot request is analyzed and sent to the best model automatically.
“Is this a simple rename? → local llama3.1:8b. Is this a complex architecture question? → cloud GPT-5. Is this about GitHub Actions? → a specialist agent.”
That’s what the full CopilotHarness router does — policy-based routing with semantic matching, local classifiers, and per-request telemetry. Part 2 of this series walks through building and using it.
Repo: github.com/elbruno/ElBruno.CopilotHarness
| Goal | Command |
|---|---|
| Start just Ollama proxy | cd src/proxies/OllamaProxy && dotnet run |
| Start all three + test UI | cd src/proxies && aspire start |
| Stop all | aspire stop |
| Generate VS Code config | Open http://localhost:5102/setup |
| View traces | Open Aspire dashboard URL printed in console |
| Manage Foundry Local models | Open http://localhost:5102/models |
| Test proxy health | curl http://localhost:5099/health |
This is Part 1 of the CopilotHarness series.
Next: Part 2 — Smart Routing: Sending Each Request to the Right Model Automatically
Code: github.com/elbruno/ElBruno.CopilotHarness/tree/main/src/proxies
Happy coding!
Greetings
El Bruno
More posts in my blog ElBruno.com.
More info in https://beacons.ai/elbruno

The short version is that newer Claude models sometimes call Pi’s edit tool with extra, invented fields in the nested
edits[]array. And not Haiku or some small model: Opus 4.8. The edit itself is usually correct but the arguments do not match the schema as the model invents made-up keys and Pi thus rejects the tool call and asks to try again.That alone is not too surprising as models emit malformed tool calls sometimes. Particularly small ones. What surprised me is that this is getting worse with newer Anthropic models as both Opus 4.8 and Sonnet 5 show it but none of the older models. In other words, the SOTA models of the family are worse at this specific tool schema than their older siblings.
Armin theorizes that this is because more recent Anthropic models have been specifically trained (presumably via Reinforcement Learning) to better use the edit tools that are baked into Claude Code. This has the unfortunate effect that other coding harnesses, such as Pi, may find that their own custom edit tools are more likely to be used incorrectly.
Claude's edit tool uses search and replace. OpenAI's Codex uses an apply_patch mechanism instead, and OpenAI have talked in the past about how their models are trained to use that tool effectively.
Does this mean third-party coding harnesses like Pi should implement multiple edit tools just so they can use the one with the best performance for the underlying model the user has selected?
Tags: armin-ronacher, ai, openai, generative-ai, llms, anthropic, llm-tool-use, coding-agents, pi
Hey all 👋
I’ve got a genuinely exciting update to share today, and it’s one that’s been a long time coming: Azure landing zone (ALZ) is now an official Microsoft product, owned by the Azure Migrate product team.
For the past five or so years, ALZ has been built completely in the open, in the open source repos, in community calls, in issues, in PRs, together with an incredible group of customers, partners, and Microsoft folks who cared enough to keep showing up and making it better. That community effort is the entire reason ALZ is what it is today, and it deserves a moment of recognition before we talk about what’s changing.
ALZ is graduating from a community-driven, open-source initiative into a fully-fledged, officially owned Microsoft product, with a dedicated product team behind it in the Azure Migrate team.
For you, practically? Nothing changes. The GitHub repos, the modules, the way you consume ALZ today all stay exactly as they are. What’s changing is who’s steering the ship, and that it now has the backing, investment, and roadmap of an official product team. If you’ve got an issue to raise, that still happens exactly where it always has, over at aka.ms/alz/issues.
I want to be upfront about this part: myself (Jack Tracey), Matt White, Jared Holgate, and Zach Trocinski are no longer involved in the day-to-day of ALZ. No more issue triage, no more day-to-day operations from us. That responsibility now sits with the Azure Migrate team.
We’re not disappearing entirely, though. Over the last couple of months we’ve been running a proper handover, and we’ll continue to be around behind the scenes for those moments when the Azure Migrate team needs a bit of extra context we didn’t manage to pass on during that process.
And honestly? You’re in great hands. The Azure Migrate team already know ALZ inside and out. They’ve been working alongside us building the Azure Migrate agent’s platform landing zone creation experience, which uses ALZ under the hood. This isn’t a handover to strangers, it’s a handover to people who’ve already been in the engine room with us.
The Azure Migrate team are keen to keep the community engagement up and active, just as we did in the ALZ team of old. They want to run community calls and be just as visible and active as we’ve always tried to be.
So, keep an eye out for blog posts and announcements from them over the coming months. This is very much a “watch this space” moment, and we’re confident you’ll see the same energy and openness from them that you’ve come to expect from ALZ.
Before I move on, I wanted to add something a bit more personal.
ALZ is one of the things I’m proudest of from my time at Microsoft so far. I’ve built and led it over the past five or six years, surrounded by genuinely great people who helped shape it, and backed by an amazing community, customer, and partner base who supported us every step of the way to make it the success it is today.
So, while I’m stepping away and I’m no longer involved in the day-to-day, ALZ will always hold a special place for me. I’ll forever be happy to chat about it socially. It’s something I still have real passion for, and that’s not going away just because my day job has moved on.
That said, I’m taking those learnings and that passion into other things at Microsoft, including now focusing on AVM (Azure Verified Modules) alongside Jared and several other great folks. We’ll have some announcements of a similar nature to share on that front soon, so watch this space 😁
And finally, the wider thanks. Alongside Matt White, Jared Holgate, and Zach Trocinski, huge thanks to: Paul Grimley, Rob Kuehfus, Sacha Narinx, Seif Bassem, Arjen Huitema, Nelson Pereira, Paulo Alves Oliveira Jr., Vishal Mehrotra, Charlie Grabiaud, Simona Tarantola, Bruno Gabrielli, Luke Taylor, Adam Tuckwell, and Kevin Rowlandson.
A special shout-out too to Remo Leone Laudo, Rhys Ash, Jamie Pla, Igor Jovovic, and Haflidi Fridthjofsson, who will continue to contribute to the ALZ IaC modules alongside their day jobs as CSAs, as and when they can 🙂
And to everyone else who’s contributed to ALZ over the past five years or so, through code, issues, feedback, conversations, or just using it and telling us what worked and what didn’t, thank you. This milestone is yours as much as anyone’s.
Here’s to the next chapter for ALZ. 🎉
Dave Plummer, the retired Microsoft engineer who built Task Manager and helped ship Space Cadet Pinball, has recreated Notepad in roughly 2.5 kilobytes. The project is called TinyRetroPad, and despite the size (or lack of it), it still has Open, Save, Find and Replace, printing, font selection, word wrap, and the unsaved changes prompt, packed into an executable that is significantly smaller than the featured image above this paragraph.

Plummer has spent recent months telling Microsoft what they do not want to hear about Windows 11. He argued the OS needs its own Windows XP SP2 moment, a stretch where Microsoft drops new features and only fixes what is broken. He has also said Windows 11 has turned into a sales channel for Microsoft’s other products, nudging users toward Edge, OneDrive, and Copilot.
At a time when Memory and storage cost a fortune, what we’re interested in is how an app was created with an install size that mocks the entire fabric of software development.

Plummer explains this isn’t really a magic trick. Windows already contains most of what makes up a Windows application: a window manager, menus, common dialogues, clipboard handling, edit controls, font selection, and file open and save dialogues, along with printing infrastructure. A tiny native Windows program doesn’t have to bring along its own entire civilization.

As Plummer puts it, “it arrives with a lunchbox and a map of the city.” A mature operating system is also a giant library of already solved problems, and because that machinery is already installed on the machine, a tiny executable can call into it and appear to perform miracles.

TinyRetroPad is a fork of Matt Power’s Dave’s Tiny Editor, itself built on tiny.asm, a project Plummer wrote years ago to prove what the smallest complete Windows application could look like. It’s a thin wrapper around RICHEDIT50W, the rich text control Windows has carried for decades. Drawing characters, managing the cursor, handling selection, cut, copy, paste, undo history,
Windows already does all of it inside that one control. Early versions used the plainer EDIT control and got down to 890 bytes, though Windows Defender wasn’t a fan of how aggressively that build was compressed. Later versions moved to RICHEDIT for cheap access to the Courier font and bigger file support, settling at 981 bytes before a single menu existed.

The growth log Plummer kept shows what each addition cost:

None of this works without Crinkler, a compression linker built for the demoscene that squeezes and rearranges the executable instead of just linking it. Sometimes a whole feature adds nothing to the file size because the code happens to compress well. Sometimes a clean function ends up bigger than an ugly, repetitive one, since Crinkler compresses repetition far more efficiently than a lookup table full of branches.
It’s also not a finished product. There’s no Releases page for some reason, and Crinkler-built executables may trigger antivirus false positives. The open GitHub issues read like a list of what a 2.5KB program gives up. One user reported it chewing through around 500MB of RAM on 64-bit Windows 7, and others found it won’t run on Windows XP SP3 at all.
Modern Notepad has spent the last couple of years turning into a case study in feature creep. The notepad.exe on a typical Windows 11 install comes in at around 352KB, with an install size closer to 808KB, because that exe is really a stub pointing at a UWP and WinUI app adding up to roughly 5MB on disk. The original XP-era Notepad was about 65KB in total.

Of course, you’re not losing any precious memory because of the bloated Notepad, but the way Microsoft deviated it from being a simple text editor is what created all this backlash.
Tabs and autosave were welcome additions, and now I can’t think of Notepad without these. But in June 2025, Notepad gained Markdown formatting, and users pointed out that Windows already had WordPad for that job before Microsoft killed it off.
By August, the right-click menu had grown so cluttered with Copilot options that Microsoft had to redesign it just to make cut and paste findable again. A Create a table tool arrived in January 2026, and image support followed in February, built on that same Markdown engine.

That month gave us proof that this feature creep costs something real. Microsoft confirmed an 8.8 rated remote code execution flaw, tracked as CVE-2026-20841, where a malicious Markdown link could let an attacker run code with the victim’s own permissions just by getting them to click it inside Notepad. A plain text editor with no link handling could never have that problem.
By March, Microsoft scaled back Copilot branding across several apps, and by April, Microsoft mostly just renamed Copilot to Writing Tools in Notepad instead of pulling the AI features out.

Windows 11 LTSC, the long-term servicing edition Microsoft builds for enterprises that can’t tolerate constant change, still ships the classic Notepad with no Copilot and no Markdown, and neither does Windows 10’s. The plain Notepad TinyRetroPad is recreating what was never deleted. Microsoft just quietly retired it from Windows 11.

Plummer has said the point was never to get anyone to use a hand-assembled 2.5KB editor. It’s to show how much untapped potential already sits inside Windows, because modern app development defaults to bundling everything an app might need instead of asking what the OS already provides.
In a recent test, Windows Latest found that Windows 11’s Media Player, takes a few seconds to open a video and uses 377MB idle, against 103.4MB and instant playback on the legacy version, one that predates HEVC yet plays it better than the modern app does without a $0.99 Store add-on.

Sure, we need modern-looking apps in Windows 11, but that mustn’t come at the cost of efficiency and control. We’re not saying that Microsoft isn’t allowed to bundle subscription plans in their inbox apps, but Windows 11 itself isn’t free. It’s paid software. Microsoft’s decades-old classic apps still look good and are robust. Also, the software giant built Calculator, Notepad, and Media Player decades ago without today’s tools and infrastructure. What needs to change isn’t the hardware. It’s the mindset that every rewrite needs to be as efficient as possible, just for the sake of it being possible.
The post Ex-Microsoft engineer rebuilds Notepad in 2.5KB using nothing but stuff Windows already had appeared first on Windows Latest
Read more of this story at Slashdot.