Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
150557 stories
·
33 followers

Stop Wasting Tokens: Smart Tool Routing for LLMs with MCPToolRouter

1 Share

⚠ This blog post was created with the help of AI tools. Yes, I used a bit of magic from language models to organize my thoughts and automate the boring parts, but the geeky fun and the πŸ€– in C# are 100% mine.

Hi!

Today I want to share something that’s been hearing a couple of times: You know when you’re building an AI agent or working with LLMs, and you have dozens (or hundreds) of tools available? What do you do? Send ALL of them to the LLM every single time, right?

Yeah, me too. And it’s expensive. I had as a pet project for a while, so I decided to finish it, and start to give a real try. Let’s move on.

The Token Problem Nobody Talks About

TL;DR: every tool you send to an LLM costs tokens. Not just tool names, descriptions, and full JSON schemas. With 50+ tools, you can easily burn through 2,000+ tokens before the LLM even thinks about your question.

And that’s just non-optimal. IE: if I’m asking “What’s the weather in Toronto?”, why am I sending 47 other tools about databases, emails, and file systems?

Enter MCPToolRouter

So I built something to fix this: ElBruno.ModelContextProtocol.MCPToolRouter. A .NET library that uses semantic search to route your prompts to the most relevant tools. Think of it as a smart filter that sits between your user’s question and your LLM.

Here’s how it works:

  1. Index your tools once (using local embeddingsβ€”no API calls)
  2. Search semantically when a user asks something
  3. Get back only the relevant tools (top 3, top 5, whatever you need)
  4. Send those to your LLM instead of everything

The result? 70-80% token savings in my testing scenarios.

Show Me the Code

Let’s get practical. Here’s the simplest possible example:

using ElBruno.ModelContextProtocol.MCPToolRouter;
using ModelContextProtocol.Protocol;

// Define your MCP tools (or pull them from an MCP server)
var tools = new[]
{
    new Tool { Name = "get_weather", Description = "Get weather for a location" },
    new Tool { Name = "send_email", Description = "Send an email message" },
    new Tool { Name = "search_files", Description = "Search files by name or content" },
    new Tool { Name = "calculate", Description = "Perform mathematical calculations" }
};

// Create the index (one-time cost)
await using var index = await ToolIndex.CreateAsync(tools);

// Find the most relevant tools for a prompt
var results = await index.SearchAsync("What's the temperature outside?", topK: 3);

foreach (var r in results)
    Console.WriteLine($"{r.Tool.Name}: {r.Score:F3}");
Output:
get_weather: 0.892
search_files: 0.234
calculate: 0.187

See that? It knows get_weather is the right tool. Now you send just that one (or top 3) to your LLM instead of all 50.

Real-World Integration with Azure OpenAI

Here’s where it gets practical. Let’s say you’re using Azure OpenAI and want to save tokens:

// Create Azure OpenAI client
var chatClient = new AzureOpenAIClient(
    new Uri("https://your-resource.openai.azure.com/"),
    new AzureKeyCredential("your-api-key"))
    .GetChatClient("gpt-5-mini");

// Route to relevant tools only
var userPrompt = "What's the weather in Seattle?";
var relevant = await index.SearchAsync(userPrompt, topK: 3);

// Add only filtered tools to the chat call β€” saving tokens!
var chatOptions = new ChatCompletionOptions();
foreach (var r in relevant)
    chatOptions.Tools.Add(ChatTool.CreateFunctionTool(r.Tool.Name, r.Tool.Description ?? ""));

var response = await chatClient.CompleteChatAsync(
    [new UserChatMessage(userPrompt)],
    chatOptions);

Instead of sending 50 tools (2,000 tokens), you send 3 tools (~300 tokens). That’s an 85% reduction. Multiply that by thousands of API calls, and πŸ˜‰.

The Best Part: It’s All Local

Here’s what I love about this: no external API calls for embeddings. MCPToolRouter uses local ONNX models (via my ElBruno.LocalEmbeddings library), so everything runs on your machine or server. Fast, private, and cost-effective.

First run downloads a small embedding model (~25 MB), and after that, it’s instant.

Advanced Features

Once you get the basics, there’s more:

Save/Load Indexes

Pre-build your index and load it instantly on startup:

// Save
using var file = File.Create("tools.bin");
await index.SaveAsync(file);

// Load (instant warm-start)
using var stream = File.OpenRead("tools.bin");
await using var loaded = await ToolIndex.LoadAsync(stream);

Dynamic Updates

Add or remove tools at runtime without rebuilding everything:

await index.AddToolsAsync(new[] { new Tool { Name = "new_tool", Description = "..." } });
index.RemoveTools(new[] { "obsolete_tool" });

Dependency Injection

Works great with ASP.NET Core:

builder.Services.AddMcpToolRouter(tools, opts =>
{
    opts.QueryCacheSize = 20; // LRU cache for repeated queries
});

// Inject IToolIndex anywhere
app.MapGet("/search", async (IToolIndex index, string query)
    => await index.SearchAsync(query, topK: 3));

Try It Yourself

The library is available on NuGet and fully open source:

dotnet add package ElBruno.ModelContextProtocol.MCPToolRouter

I’ve also included 6 sample applications that show different use cases:

  • BasicUsage: Getting started with tool indexing and search
  • TokenComparison: Side-by-side comparison showing token savings
  • TokenComparisonMax: Extreme scenario with 120+ tools
  • FilteredFunctionCalling: End-to-end function calling with filtered tools
  • AgentWithToolRouter: Integration with Microsoft Agent Framework
  • FunctionalToolsValidation: 52 real tools with execution validation

Check out the full repo on GitHub: πŸ‘‰ https://github.com/elbruno/ElBruno.ModelContextProtocol

Why This Matters

Look, I’m not saying you should always use semantic routing. If you only have 5-10 tools, sending them all is fine. But once you cross into dozens or hundreds of tools, the cost and context window bloat become real problems.

MCPToolRouter is an approach that try to solve this with a simple, pragmatic approach: send only what matters.

What’s Next?

If I see the need and some real traction, I’ll start working on this library more and more, and I’d love your feedback.

Try it out, break it, and let me know what you think. Open issues, send PRs, or just drop a comment.

And if you find it useful, star the repo and share it with your team.


P.S. β€” The TokenComparisonMax sample has a beautiful Spectre.Console UI that shows live token savings. It’s oddly satisfying to watch. πŸ˜„

Happy coding!

Greetings

El Bruno

More posts in my blog ElBruno.com.

More info in https://beacons.ai/elbruno




Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Introducing Text Control Agent Skills

1 Share
With the introduction of Text Control Agent Skills, AI coding assistants can now understand how to correctly work with the TX Text Control Document Editor and its APIs. This means that developers can now ask their AI coding assistants to write code that interacts with the Document Editor, making it easier than ever to create powerful applications with TX Text Control.

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Google Made Gemini Code Assist Free. What's the Catch?

1 Share

When a company that makes most of its money from ads starts giving away developer tools for free, you should ask questions. Google announced in March 2026 that Gemini Code Assist is now free for individual developers. No credit card. No trial period. Just sign in with your Google account and start coding.

I've been using it for two weeks. Here's what's actually going on.

What You Get For Free

The free tier includes code completion, chat-based assistance, and multi-file editing directly in VS Code and JetBrains IDEs. You get Gemini 2.5 Pro under the hood, which is Google's best reasoning model. The context window is large β€” Google claims full-repository understanding through their indexing system.

Setup is dead simple:

# VS Code
code --install-extension google.gemini-code-assist

# JetBrains (from marketplace)
# Search "Gemini Code Assist" in Plugins

After installation, you authenticate with Google and you're running. No API keys, no workspace configuration, no billing setup. It just works.

The free tier gives you what Google calls "generous daily limits" without specifying exact numbers. In my testing, I hit a rate limit exactly once during a particularly intense refactoring session β€” maybe 200+ completions in an hour. It reset within 30 minutes. For normal development, you won't notice the ceiling.

How It Compares to Copilot

GitHub Copilot costs $10/month for individuals. It's been the default for most developers since 2022. So the obvious question: is Gemini Code Assist good enough to cancel your Copilot subscription?

In short β€” it depends on your language. For Python, TypeScript, and Go, Gemini Code Assist is now roughly on par with Copilot. The completions are fast, contextually aware, and rarely wrong in ways that waste your time.

Where Gemini genuinely beats Copilot is multi-file awareness. When I'm refactoring a TypeScript project, Gemini understands how changes in one file affect imports, types, and tests across the codebase. Copilot is getting better at this but still feels more file-local in its suggestions.

// When you rename a type in types.ts, Gemini automatically suggests
// updates in files that import it

// types.ts - you change this:
export interface UserProfile {
  id: string;
  displayName: string;  // was: name
  email: string;
  avatarUrl: string;    // was: avatar
}

// Gemini immediately suggests fixes in user-service.ts:
// Before: const name = profile.name;
// After:  const name = profile.displayName;

// And in UserCard.tsx:
// Before: <img src={user.avatar} />
// After:  <img src={user.avatarUrl} />

Where Copilot still wins: niche languages and frameworks. If you're writing Rust, Elixir, or working with less popular libraries, Copilot's training data advantage shows. Gemini's suggestions get noticeably vaguer outside the mainstream.

How It Compares to Claude Code

Different category entirely. Claude Code is a terminal-based autonomous agent. You describe what you want, and it writes entire features, runs tests, and iterates. It costs money per token and targets developers who want AI to do more of the heavy lifting.

Gemini Code Assist is an IDE co-pilot. It helps you write code faster while you stay in the driver's seat. These aren't really competing products β€” they serve different workflows.

That said, Gemini Code Assist's chat mode is trying to bridge the gap. You can highlight code, ask it to refactor, and it'll propose multi-file changes. It's decent for small-to-medium refactors but can't match a dedicated agent for anything complex.

What's the Actual Catch?

Here's where I get opinionated. There are three catches, and none of them are hidden β€” they're just not in the marketing.

Catch 1: Your code goes to Google. The privacy policy is clear. Code snippets are sent to Google's servers for processing. For the free tier, Google reserves the right to use your interactions to improve their models. If you're working on proprietary code at a company with strict IP policies, this is a non-starter. The paid Enterprise tier ($19/user/month) has a no-training-data clause.

Catch 2: Google kills products. I don't need to list the graveyard. Google has a well-documented history of launching, underinvesting in, and then shutting down developer tools. If you build your workflow around Gemini Code Assist and it gets discontinued in 18 months, that's a real cost. Copilot has Microsoft's commitment to monetize it through GitHub. Google's incentive is less clear.

Catch 3: The ecosystem lock-in is real. Gemini Code Assist works best when you're also using Google Cloud, Firebase, and Google's other services. The suggestions subtly favor Google's ecosystem. Ask it to set up a database and it'll suggest Firestore before PostgreSQL. Ask about deployment and Cloud Run appears before anything else.

# Ask Gemini to help with caching and you'll get:
from google.cloud import memorystore

client = memorystore.CloudMemorystoreClient()
# ... Google Cloud Memorystore setup

# Ask Copilot the same question:
import redis

client = redis.Redis(host='localhost', port=6379, db=0)
# ... standard Redis setup

It's not that the Google suggestions are wrong. They're fine. But the nudge toward Google's cloud is consistent and worth being aware of.

My Recommendation

If you're a solo developer or student who doesn't want to pay $10/month for Copilot, Gemini Code Assist is a genuinely good free alternative. The quality gap has closed significantly, and for mainstream languages you won't feel like you're using a second-tier product.

If you're at a company, the calculation is different. The privacy implications and the risk of Google sunsetting the product make Copilot Business or Copilot Enterprise a safer bet. You're paying for stability and IP protection as much as for the AI itself.

If you're already deep in the Google Cloud ecosystem, this is a no-brainer. The integration is smooth and the cloud-specific suggestions are actually helpful rather than annoying.

And if you're using Claude Code or Cursor for agentic workflows, Gemini Code Assist isn't a replacement β€” it's a complement. Use the agent for big changes, use the co-pilot for line-by-line flow.

The AI coding tools market is getting crowded and competitive. Free is a strong price. Just remember that when you're not paying for the product, your data is part of the deal.

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

7 Mac Apps Every Startup CTO Should Have in 2026

1 Share

Being a startup CTO means wearing every hat β€” writing code, reviewing PRs, managing infrastructure, talking to customers, and somehow keeping your own sanity intact. Your Mac is your command center, and the apps you run on it can make or break your day.

Here are 7 Mac apps I think every startup CTO should have installed in 2026.

1. Raycast

Free (Pro $8/mo)

raycast.com

Raycast replaced Spotlight for me and never looked back. It's a launcher, clipboard manager, snippet expander, and window manager rolled into one. The extensions ecosystem is wild β€” I have quick actions for GitHub, Linear, Slack, and Notion all accessible from a single keyboard shortcut. If you're constantly switching between tools (and as a CTO, you are), Raycast shaves minutes off every hour.

2. Warp

Free for individuals

warp.dev

Warp is a terminal built for the modern era. It has IDE-like features β€” command palette, block-based output, AI command suggestions β€” that make it feel less like a relic from the 80s. I like that I can share terminal sessions with my team and bookmark frequently used commands. If you SSH into servers, run deploys, or do anything in the terminal daily, Warp makes it noticeably faster.

3. CleanShot X

$29 one-time

cleanshot.com

When you're a CTO, you screenshot everything β€” bugs, UI mockups, Slack threads for documentation, architecture diagrams. CleanShot X is the best screenshot tool on Mac, period. Scrolling capture, annotations, screen recording, OCR built in. I use it multiple times a day for async communication with my team, and it's one of those tools you don't realize you needed until you have it.

4. TokenBar

$5 lifetime

tokenbar.site

If your startup uses LLMs at all (and in 2026, you probably do), you need visibility into what you're spending. TokenBar sits in your menu bar and gives you real-time token counts and cost tracking across providers. No dashboard to open, no spreadsheet to maintain β€” it's just there. As someone who has to justify AI infrastructure costs to investors, having this number glanceable at all times is a lifesaver. Five bucks, lifetime. No-brainer.

5. Fantastical

Free (Premium $4.75/mo)

flexibits.com/fantastical

Your calendar is probably your most important tool as a CTO. Fantastical makes it bearable. Natural language event creation ("coffee with Sarah Tuesday 3pm" just works), multi-calendar support, and a gorgeous menu bar widget. I switched from the default Calendar app and immediately wondered why I waited so long. The scheduling feature alone β€” sending availability links without a separate tool β€” pays for itself.

6. Monk Mode

$15 lifetime

mac.monk-mode.lifestyle

Here's the thing about being a CTO: everyone needs your attention, and every feed on the internet is designed to steal it. Monk Mode doesn't block apps β€” it blocks feeds within apps. So you can still use Twitter for DMs, Reddit for specific subreddits, or YouTube for tutorials, but the infinite scroll is gone. It's surgical where other blockers are blunt. I turn it on during my morning deep work block and get more done by noon than I used to in a full day.

7. Obsidian

Free for personal use

obsidian.md

Every CTO needs a second brain, and Obsidian is the best one I've found. Local-first markdown files, bidirectional linking, and a plugin ecosystem that does everything from Kanban boards to daily journaling. I keep architecture decisions, 1:1 notes, interview questions, and technical specs all in one vault. The graph view occasionally surfaces connections I didn't know existed. It's free, it's fast, and your notes are plain text files you actually own.

Honorable Mentions

  • Rectangle (free) β€” window management via keyboard shortcuts. rectangleapp.com
  • Hand Mirror (free) β€” quick webcam check from the menu bar before investor calls. handmirror.app
  • MetricSync ($5/mo) β€” AI-powered nutrition tracking from your phone. Because CTOs who skip lunch and live on coffee eventually crash. metricsync.download

Wrapping Up

The common thread here: these are all tools that stay out of your way. No complex onboarding, no team-wide rollout needed, no enterprise pricing page. Install them, configure them once, and they quietly make your day better.

If you're a startup CTO (or aspiring to be one), invest in your personal tooling. It compounds faster than you think.

What's in your stack? Drop your must-haves in the comments β€” I'm always looking for the next tool that'll save me 10 minutes a day.

Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Copilot CLI Weekly: MCP Servers Get LLM Access

1 Share

MCP Sampling Lands in v1.0.13-0

The most significant change this week is buried in a prerelease tag: MCP servers can now request LLM inference. Version 1.0.13-0, released today, adds sampling support to the Model Context Protocol implementation. MCP servers can call the user's LLM through a permission prompt, eliminating the need for servers to maintain their own API subscriptions.

This is a shift in how MCP servers work. Before this, an MCP server was a tool provider β€” it exposed functions the agent could call, but it couldn't reason on its own. Now, with sampling, an MCP server can delegate reasoning back to the user's LLM mid-execution. A recipe generator can ask the LLM to format output. A code analysis server can ask for natural language summaries. The user approves each request via a review prompt, maintaining control over what their LLM processes.

The feature has been in the MCP spec since VS Code shipped it last summer, but adoption has been slow. Copilot CLI supporting it means the entire GitHub-integrated toolchain now has access to this capability. If you're building MCP servers, you can now lean on the user's model instead of spinning up your own inference backend.

Model Picker Gets a Full Redesign

Version 1.0.12 landed March 26 with 28 improvements and fixes, but the UX standout is the full-screen model picker with inline reasoning effort controls. Previously, selecting a model and adjusting reasoning effort were separate steps. Now, the picker opens in full-screen mode, and you adjust reasoning effort with arrow keys (← / β†’) while browsing models.

The picker also reorganizes models into three tabs: Available, Blocked/Disabled, and Upgrade. This solves the confusion around which models are accessible based on your plan and organization policy. If you're on a free tier and wondering why you can't select opus-4.6, the picker now tells you explicitly instead of silently blocking the selection.

The reasoning effort level also displays in the header next to the model name (e.g., claude-sonnet-4.5 (high)), so you always know your current configuration without running a command.

Organization Policy Enforcement and Memory Fixes

v1.0.12 also included critical stability work. The CLI no longer crashes with out-of-memory errors when shell commands produce high-volume output β€” a real problem when running builds or test suites that generate megabytes of logs. Memory usage improvements extend to the grep tool, which now handles large files without exhausting available RAM.

Version 1.0.11, released March 23, brought organization-level policy enforcement for third-party MCP servers. If your organization uses an MCP allowlist, that policy now applies universally. Blocked servers no longer show up in /mcp show, and the CLI displays a warning when policy blocks a server from loading. This matters for enterprise deployments where security teams need to audit what external tools can access organizational data.

Skills Directory Alignment and Hook Improvements

v1.0.11 also aligned the personal skills directory with VS Code's GitHub Copilot for Agents extension. The CLI now discovers skills in ~/.agents/skills/, matching the default used by the VS Code extension. If you've been maintaining separate skill directories for the CLI and the extension, you can consolidate them.

Extension hooks from multiple extensions now merge instead of overwriting each other. Previously, if two extensions defined a sessionStart hook, only one would fire. Now, both execute, and the additionalContext from sessionStart hooks is injected into the conversation. That's critical for building custom agents that layer multiple extension behaviors.

The /yolo Command Gets More Precise

The /allow-all command (aliased as /yolo) now supports subcommands: /yolo on, /yolo off, and /yolo show. This replaces the previous toggle behavior with explicit enable/disable semantics, reducing the risk of accidentally leaving permission-free mode enabled. The CLI also persists /yolo path permissions across /clear session resets, so you don't have to re-approve directories after clearing context.

What Else Shipped

The other notable fixes from v1.0.12:

  • Workspace MCP servers defined in .mcp.json now load correctly when your working directory is the git root
  • Sessions with active work are no longer cleaned up by the stale session reaper (a frustrating bug if you left a session idle mid-task)
  • Resume session now correctly restores your previously selected custom agent
  • Clipboard copy works on Windows even when a non-system clip.exe is in PATH
  • Emoji selection in the terminal now works correctly (yes, this matters)

Version 1.0.13-0 added a handful of additional fixes, including correct reasoning effort handling for Bring Your Own Model (BYOM) providers and better error messaging when using classic Personal Access Tokens.

The Platform Is Taking Shape

Three releases in seven days, with one major capability (MCP sampling), a redesigned model picker, enterprise policy enforcement, and dozens of stability fixes. The pattern I've been tracking since the biggest week yet continues: this isn't a standalone tool anymore. It's a platform. MCP servers can now request inference, extensions can inject context into subagents, and the SDK keeps expanding with hooks and custom commands.

The CLI is becoming the runtime for a growing ecosystem of agent tooling. If you're building developer tools and haven't looked at integrating with it, this is the week that makes the case.

Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Call For Papers Listings for 3/27

1 Share

A collection of upcoming CFPs (call for papers) from across the internet and around the world.

The post Call For Papers Listings for 3/27 appeared first on Leon Adato.

Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories