The US Department of Energy reportedly deleted about 6,000 pages related to energy conservation as a historic heatwave tears across the country.
The deletion was suspiciously timed, following Republican outrage over Mayor Zohran Mamdani asking New Yorkers to help reduce strain on the grid by setting their AC to 78 degrees. Republicans like Ted Cruz (who has famously fled severe weather in his home state), Nikki Haley, and Representative Nancy Mace (South Carolina) quicklypounced, framing the request as socialism and an act of war on women in menopause (the Republican Party is notoriously concerned about women's health).
Connor Jones reports: AdaptHealth says attackers used social engineering to breach its systems and steal sensitive patient data, including passwords associated with insurance billing. The medical equipment company disclosed the attack to the Securities and Exchange Commission (SEC) on Thursday, noting that attackers accessed internal patient management systems, document storage platforms, and external electronic health record system...
I blog because I want to share knowledge. That’s how I have fun. It feels rewarding to sit down and formulate your thoughts after spending a week researching a topic for a side project. Oftentimes, such topics are largely unexplored: with the basics being too obvious and other topics laid out clearly, the remains are bound to be unconventional.
To me, that’s a positive thing. It’s my chance to make progress in a stale and closed industry, not to mention that exploration is a lost art in the age of vibecoding.
I love weird and deeply technical blogs like The Old New Thing, jart’s, and Lemire’s, and for a long while I had assumed that others would love mine the same way. Sure, I’m not as good at writing, but the ideas should shine through regardless.
Learn NuGet package metadata best practices. Configure README files, icons, license expressions, tags, and repository URLs to make your .NET package shine on NuGet.org.
This blog post was created with the help of AI tools. Yes, I used a bit of magic from language models to organize my thoughts and automate the boring parts, but the geeky fun and the in C# are 100% mine.
Hi!
Part 1 of the CopilotHarness series
Big models are great for heavy thinking. But what about simple questions — “rename this variable”, “write a short docstring”, “what does this function do”? Those don’t need GPT-5 across the internet. They can be answered instantly by a model running on your own machine, offline, for free.
This post shows you how to wire GitHub Copilot in VS Code to local models using three minimal proxies — one for Ollama, one for Foundry Local, one for Azure OpenAI — and how to run all three together with a single command.
1. How BYOK Works in VS Code
VS Code Copilot supports a Bring Your Own Model mechanism. You register any OpenAI-compatible endpoint in a config file, and Copilot treats it as just another model in the picker — no extension, no plugin.
That's it — point url at any OpenAI-compatible endpoint, give it a name, and it appears in Copilot's model picker.
2. The Three Proxy Flavors
A local model provider isn’t always OpenAI-compatible out of the box. Each proxy in this repo acts as a thin translation layer — it speaks OpenAI on one side and the local backend on the other.
Proxy
Port
Backend
Best for
OllamaProxy
5099
Ollama (local)
Quickest start, huge model catalog
FoundryLocalProxy
5101
Foundry Local SDK (offline, NPU)
Offline/air-gapped, hardware acceleration
FoundryProxy
5100
Azure OpenAI / Foundry cloud
Production-grade, secret-managed cloud models
Each proxy is a single ASP.NET Core Minimal API file — no frameworks, no abstractions. The entire proxy fits on one screen. That’s intentional: these are teaching samples, not production middleware.
3. The Shared Secret: Unwrapping Copilot’s Envelope
Here’s something most developers don’t know: when GitHub Copilot Chat sends “hi”, it doesn’t send just "hi". It sends this:
The actual message is buried inside <userRequest>. A proxy that naively reads the last user message sees ~3 KB of boilerplate instead of the word "hi".
All three proxies share one class — CopilotMessageExtractor — that unwraps this envelope. It lives in the shared Proxies.Common library:
// The key method — finds <userRequest>...</userRequest> and returns its content.
// Falls back gracefully for non-Copilot clients (curl, SDK) that send plain text.
public static string ExtractTypedUserMessage(string rawUserMessage)
{
// Look for <userRequest> first (VS Code Copilot always uses this)
var userRequest = ExtractTagContent(rawUserMessage, "userRequest")
?? ExtractTagContent(rawUserMessage, "user-request");
if (!string.IsNullOrWhiteSpace(userRequest))
return userRequest.Trim();
// No tag — strip all known wrapper blocks and return what's left
var stripped = rawUserMessage;
foreach (var tag in CopilotWrapperTags)
stripped = RemoveTagBlock(stripped, tag);
// If stripping removed everything, fall back to the raw message
// (this path is hit for plain curl/SDK clients — they send plain text)
return string.IsNullOrWhiteSpace(stripped.Trim())
? rawUserMessage.Trim()
: stripped.Trim();
}
This class is why the proxies work correctly with both Copilot Chat and direct API calls. The logging in each proxy shows the real ask — not the 3 KB envelope.
4. OllamaProxy — 5 Minutes to Your First Local Model
Pre-requisite:Ollama running with at least one model pulled.
# Pull a model
ollama pull llama3.1:8b
# Start the proxy
cd src/proxies/OllamaProxy
dotnet run
# → http://localhost:5099
The proxy auto-discovers your installed Ollama models and passes the model ID through. Add it to VS Code:
Microsoft Foundry Local runs models fully offline using the ONNX Runtime with hardware acceleration (CPU, GPU, NPU on Windows).
No pre-requisites — the SDK downloads the model on first run and caches it locally.
cd src/proxies/FoundryLocalProxy
dotnet run
# First run: downloads phi-4-mini (~2.5 GB) automatically
# → http://localhost:5101
The Models page in the test app shows which models are cached, lets you load/unload (frees GPU RAM instantly), and delete models from disk:
Tip: Use the Models page to download a model before chatting with it. If you send a chat request to an unloaded model, you get a clear error explaining the model needs to be loaded first — not a cryptic 500.
Add it to VS Code alongside Ollama — Copilot lets you pick which model to use per conversation:
6. FoundryProxy — Azure OpenAI with Proper Secret Management
For cloud models, FoundryProxy uses .NET User Secrets so your API key never touches the repo.
cd src/proxies/FoundryProxy
# Store credentials locally (never committed to git)
dotnet user-secrets set "Foundry:Endpoint" "https://your-resource.openai.azure.com"
dotnet user-secrets set "Foundry:ApiKey" "your-key"
dotnet user-secrets set "Foundry:Deployment" "gpt-4o-mini"
dotnet run
# → http://localhost:5100
7. All Three Together — One Command with Aspire
The fastest way to run everything is via the .NET Aspire CLI. One command starts all three proxies, the Blazor test app, and the Aspire dashboard with logs, traces, and health checks:
The Blazor test app at http://localhost:5102 gives you a browser UI to test all three proxies without writing any code:
traces for every request, including custom LlmActivity spans with prompt text, model ID, token counts, and latency. You can see exactly what Copilot sent and what the model returned.
To stop everything:
aspire stop
8. Wire It to VS Code Copilot
The /setup page at http://localhost:5102/setup generates the exact chatLanguageModels.json snippet for each running proxy, with the correct port and model ID. Copy and paste into your VS Code user config folder:
After saving, reload VS Code. Open Copilot Chat, click the model picker, and your local models appear alongside the built-in cloud models.
Shortcut: If you have the CopilotHarness CLI tool installed, running harness init writes this file automatically.
9. What’s Next — Smart Routing
The proxies shown here are static: you pick a model manually per conversation. The next level is automatic routing — where every Copilot request is analyzed and sent to the best model automatically.
“Is this a simple rename? → local llama3.1:8b. Is this a complex architecture question? → cloud GPT-5. Is this about GitHub Actions? → a specialist agent.”
That’s what the full CopilotHarness router does — policy-based routing with semantic matching, local classifiers, and per-request telemetry. Part 2 of this series walks through building and using it.
Armin reports on a weird problem he ran into while hacking on Pi:
The short version is that newer Claude models sometimes call Pi’s edit tool with extra, invented fields in the nested edits[] array. And not Haiku or some small model: Opus 4.8. The edit itself is usually correct but the arguments do not match the schema as the model invents made-up keys and Pi thus rejects the tool call and asks to try again.
That alone is not too surprising as models emit malformed tool calls sometimes. Particularly small ones. What surprised me is that this is getting worse with newer Anthropic models as both Opus 4.8 and Sonnet 5 show it but none of the older models. In other words, the SOTA models of the family are worse at this specific tool schema than their older siblings.
Armin theorizes that this is because more recent Anthropic models have been specifically trained (presumably via Reinforcement Learning) to better use the edit tools that are baked into Claude Code. This has the unfortunate effect that other coding harnesses, such as Pi, may find that their own custom edit tools are more likely to be used incorrectly.
Does this mean third-party coding harnesses like Pi should implement multiple edit tools just so they can use the one with the best performance for the underlying model the user has selected?