Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
156939 stories
·
33 followers

White House deletes thousands of web pages about energy conservation as heatwave slams US

1 Share
The sun flares over the sign marking the location of the US Department of Energy headquarters building

The US Department of Energy reportedly deleted about 6,000 pages related to energy conservation as a historic heatwave tears across the country.

The deletion was suspiciously timed, following Republican outrage over Mayor Zohran Mamdani asking New Yorkers to help reduce strain on the grid by setting their AC to 78 degrees. Republicans like Ted Cruz (who has famously fled severe weather in his home state), Nikki Haley, and Representative Nancy Mace (South Carolina) quickly pounced, framing the request as socialism and an act of war on women in menopause (the Republican Party is notoriously concerned about women's health).

Of course, this i …

Read the full story at The Verge.

Read the whole story
alvinashcraft
13 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

AdaptHealth says attackers sweet-talked their way into cloud systems and stole patient data

1 Share
Connor Jones reports: AdaptHealth says attackers used social engineering to breach its systems and steal sensitive patient data, including passwords associated with insurance billing. The medical equipment company disclosed the attack to the Securities and Exchange Commission (SEC) on Thursday, noting that attackers accessed internal patient management systems, document storage platforms, and external electronic health record system...

Source

Read the whole story
alvinashcraft
34 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Does technical blogging spark joy?

1 Share

I blog because I want to share knowledge. That’s how I have fun. It feels rewarding to sit down and formulate your thoughts after spending a week researching a topic for a side project. Oftentimes, such topics are largely unexplored: with the basics being too obvious and other topics laid out clearly, the remains are bound to be unconventional.

To me, that’s a positive thing. It’s my chance to make progress in a stale and closed industry, not to mention that exploration is a lost art in the age of vibecoding.

I love weird and deeply technical blogs like The Old New Thing, jart’s, and Lemire’s, and for a long while I had assumed that others would love mine the same way. Sure, I’m not as good at writing, but the ideas should shine through regardless.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

NuGet Package Metadata Best Practices: README, Icon, Tags, and License

1 Share

Learn NuGet package metadata best practices. Configure README files, icons, license expressions, tags, and repository URLs to make your .NET package shine on NuGet.org.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

Run the Right AI Model for the Right Copilot Task — No Cloud Credits Wasted

1 Share

⚠ This blog post was created with the help of AI tools. Yes, I used a bit of magic from language models to organize my thoughts and automate the boring parts, but the geeky fun and the 🤖 in C# are 100% mine.

Hi!

Part 1 of the CopilotHarness series


Hero: local AI running on your machine

Big models are great for heavy thinking. But what about simple questions — “rename this variable”, “write a short docstring”, “what does this function do”? Those don’t need GPT-5 across the internet. They can be answered instantly by a model running on your own machine, offline, for free.

This post shows you how to wire GitHub Copilot in VS Code to local models using three minimal proxies — one for Ollama, one for Foundry Local, one for Azure OpenAI — and how to run all three together with a single command.


1. How BYOK Works in VS Code

VS Code Copilot supports a Bring Your Own Model mechanism. You register any OpenAI-compatible endpoint in a config file, and Copilot treats it as just another model in the picker — no extension, no plugin.

Official docs: Bring your own model to GitHub Copilot Chat

The config lives in a file called chatLanguageModels.json in your VS Code user folder:

{
  "providers": [
    {
      "name": "Ollama (local)",
      "vendor": "customendpoint",
      "url": "http://localhost:5099/v1",
      "modelId": "llama3.1:8b",
      "chatModelId": "copilot-chat-model"
    }
  ]
}
That's it — point url at any OpenAI-compatible endpoint, give it a name, and it appears in Copilot's model picker.
BYOK flow: VS Code → config → proxy → model

2. The Three Proxy Flavors

A local model provider isn’t always OpenAI-compatible out of the box. Each proxy in this repo acts as a thin translation layer — it speaks OpenAI on one side and the local backend on the other.

Proxy overview: three proxies connecting VS Code to different backends
ProxyPortBackendBest for
OllamaProxy5099Ollama (local)Quickest start, huge model catalog
FoundryLocalProxy5101Foundry Local SDK (offline, NPU)Offline/air-gapped, hardware acceleration
FoundryProxy5100Azure OpenAI / Foundry cloudProduction-grade, secret-managed cloud models

Each proxy is a single ASP.NET Core Minimal API file — no frameworks, no abstractions. The entire proxy fits on one screen. That’s intentional: these are teaching samples, not production middleware.


3. The Shared Secret: Unwrapping Copilot’s Envelope

Here’s something most developers don’t know: when GitHub Copilot Chat sends “hi”, it doesn’t send just "hi". It sends this:

<attachments>...file contents...</attachments>
<context>...editor state...</context>
<reminderInstructions>...workspace instructions...</reminderInstructions>
<userRequest>hi</userRequest>
The actual message is buried inside <userRequest>. A proxy that naively reads the last user message sees ~3 KB of boilerplate instead of the word "hi".

All three proxies share one class — CopilotMessageExtractor — that unwraps this envelope. It lives in the shared Proxies.Common library:

// The key method — finds <userRequest>...</userRequest> and returns its content.
// Falls back gracefully for non-Copilot clients (curl, SDK) that send plain text.
public static string ExtractTypedUserMessage(string rawUserMessage)
{
    // Look for <userRequest> first (VS Code Copilot always uses this)
    var userRequest = ExtractTagContent(rawUserMessage, "userRequest")
                   ?? ExtractTagContent(rawUserMessage, "user-request");

    if (!string.IsNullOrWhiteSpace(userRequest))
        return userRequest.Trim();

    // No tag — strip all known wrapper blocks and return what's left
    var stripped = rawUserMessage;
    foreach (var tag in CopilotWrapperTags)
        stripped = RemoveTagBlock(stripped, tag);

    // If stripping removed everything, fall back to the raw message
    // (this path is hit for plain curl/SDK clients — they send plain text)
    return string.IsNullOrWhiteSpace(stripped.Trim())
        ? rawUserMessage.Trim()
        : stripped.Trim();
}

This class is why the proxies work correctly with both Copilot Chat and direct API calls. The logging in each proxy shows the real ask — not the 3 KB envelope.


4. OllamaProxy — 5 Minutes to Your First Local Model

Pre-requisite: Ollama running with at least one model pulled.

# Pull a model
ollama pull llama3.1:8b

# Start the proxy
cd src/proxies/OllamaProxy
dotnet run
# → http://localhost:5099

The proxy auto-discovers your installed Ollama models and passes the model ID through. Add it to VS Code:

// chatLanguageModels.json  (Windows: %APPDATA%\Code\User\)
{
  "providers": [
    {
      "name": "Ollama — llama3.1:8b",
      "vendor": "customendpoint",
      "url": "http://localhost:5099/v1",
      "modelId": "llama3.1:8b",
      "chatModelId": "copilot-chat-model"
    }
  ]
}

Verify it’s running:

curl http://localhost:5099/health
# → {"status":"ok","backend":"ollama","models":["llama3.1:8b",...]}
Proxies test app — health dashboard showing all three proxies green

5. FoundryLocalProxy — Offline + NPU Inference

Microsoft Foundry Local runs models fully offline using the ONNX Runtime with hardware acceleration (CPU, GPU, NPU on Windows).

No pre-requisites — the SDK downloads the model on first run and caches it locally.

cd src/proxies/FoundryLocalProxy
dotnet run
# First run: downloads phi-4-mini (~2.5 GB) automatically
# → http://localhost:5101

The Models page in the test app shows which models are cached, lets you load/unload (frees GPU RAM instantly), and delete models from disk:

Foundry Local model management — load, unload, delete cached models

💡 Tip: Use the Models page to download a model before chatting with it. If you send a chat request to an unloaded model, you get a clear error explaining the model needs to be loaded first — not a cryptic 500.

Add it to VS Code alongside Ollama — Copilot lets you pick which model to use per conversation:

{
  "name": "Foundry Local — phi-4-mini",
  "vendor": "customendpoint",
  "url": "http://localhost:5101/v1",
  "modelId": "phi-4-mini",
  "chatModelId": "copilot-chat-model"
}

6. FoundryProxy — Azure OpenAI with Proper Secret Management

For cloud models, FoundryProxy uses .NET User Secrets so your API key never touches the repo.

cd src/proxies/FoundryProxy

# Store credentials locally (never committed to git)
dotnet user-secrets set "Foundry:Endpoint"   "https://your-resource.openai.azure.com"
dotnet user-secrets set "Foundry:ApiKey"     "your-key"
dotnet user-secrets set "Foundry:Deployment" "gpt-4o-mini"

dotnet run
# → http://localhost:5100

7. All Three Together — One Command with Aspire

The fastest way to run everything is via the .NET Aspire CLI. One command starts all three proxies, the Blazor test app, and the Aspire dashboard with logs, traces, and health checks:

cd src/proxies
aspire start

What starts:

ServiceURLWhat it is
ollama-proxyhttp://localhost:5099OllamaProxy
foundry-proxyhttp://localhost:5100FoundryProxy
foundry-local-proxyhttp://localhost:5101FoundryLocalProxy
proxies-test-apphttp://localhost:5102Blazor test UI
Aspire dashboardprinted in consoleLogs, traces, health for all services

Requires Aspire CLI: dotnet workload install aspire

The Blazor test app at http://localhost:5102 gives you a browser UI to test all three proxies without writing any code:

Chat page — pick a proxy, model, and send a message with streaming support

Compare page — same prompt sent to all three proxies simultaneously traces for every request, including custom LlmActivity spans with prompt text, model ID, token counts, and latency. You can see exactly what Copilot sent and what the model returned.

Aspire dashboard — all four services running and healthy
Aspire traces — LLM spans with latency and token counts

To stop everything:

aspire stop

8. Wire It to VS Code Copilot

The /setup page at http://localhost:5102/setup generates the exact chatLanguageModels.json snippet for each running proxy, with the correct port and model ID. Copy and paste into your VS Code user config folder:

Setup page — auto-generated VS Code config snippets for each proxy
  • Windows: %APPDATA%\Code\User\chatLanguageModels.json
  • macOS: ~/Library/Application Support/Code/User/chatLanguageModels.json
  • Linux: ~/.config/Code/User/chatLanguageModels.json

After saving, reload VS Code. Open Copilot Chat, click the model picker, and your local models appear alongside the built-in cloud models.

💡 Shortcut: If you have the CopilotHarness CLI tool installed, running harness init writes this file automatically.


9. What’s Next — Smart Routing

The proxies shown here are static: you pick a model manually per conversation. The next level is automatic routing — where every Copilot request is analyzed and sent to the best model automatically.

“Is this a simple rename? → local llama3.1:8b. Is this a complex architecture question? → cloud GPT-5. Is this about GitHub Actions? → a specialist agent.”

That’s what the full CopilotHarness router does — policy-based routing with semantic matching, local classifiers, and per-request telemetry. Part 2 of this series walks through building and using it.

Repo: github.com/elbruno/ElBruno.CopilotHarness


Quick Reference

GoalCommand
Start just Ollama proxycd src/proxies/OllamaProxy && dotnet run
Start all three + test UIcd src/proxies && aspire start
Stop allaspire stop
Generate VS Code configOpen http://localhost:5102/setup
View tracesOpen Aspire dashboard URL printed in console
Manage Foundry Local modelsOpen http://localhost:5102/models
Test proxy healthcurl http://localhost:5099/health

This is Part 1 of the CopilotHarness series.
Next: Part 2 — Smart Routing: Sending Each Request to the Right Model Automatically

Code: github.com/elbruno/ElBruno.CopilotHarness/tree/main/src/proxies

Happy coding!

Greetings

El Bruno

More posts in my blog ElBruno.com.

More info in https://beacons.ai/elbruno






Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

Better Models: Worse Tools

1 Share

Better Models: Worse Tools

Armin reports on a weird problem he ran into while hacking on Pi:

The short version is that newer Claude models sometimes call Pi’s edit tool with extra, invented fields in the nested edits[] array. And not Haiku or some small model: Opus 4.8. The edit itself is usually correct but the arguments do not match the schema as the model invents made-up keys and Pi thus rejects the tool call and asks to try again.

That alone is not too surprising as models emit malformed tool calls sometimes. Particularly small ones. What surprised me is that this is getting worse with newer Anthropic models as both Opus 4.8 and Sonnet 5 show it but none of the older models. In other words, the SOTA models of the family are worse at this specific tool schema than their older siblings.

Armin theorizes that this is because more recent Anthropic models have been specifically trained (presumably via Reinforcement Learning) to better use the edit tools that are baked into Claude Code. This has the unfortunate effect that other coding harnesses, such as Pi, may find that their own custom edit tools are more likely to be used incorrectly.

Claude's edit tool uses search and replace. OpenAI's Codex uses an apply_patch mechanism instead, and OpenAI have talked in the past about how their models are trained to use that tool effectively.

Does this mean third-party coding harnesses like Pi should implement multiple edit tools just so they can use the one with the best performance for the underlying model the user has selected?

Tags: armin-ronacher, ai, openai, generative-ai, llms, anthropic, llm-tool-use, coding-agents, pi

Read the whole story
alvinashcraft
2 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories