Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
152937 stories
·
33 followers

State of Routing in Model Serving

1 Share

By Nipun Kumar, Rajat Shah, Peter Chng

Introduction

This is the first blog post in a multi-part series that shares technical insights into how our ML model serving infrastructure powers several personalized experiences at scale across various domains (e.g., title recommendations, commerce). In this introductory blog post, we will dive into our domain-independent API abstraction and its traffic routing capabilities that the central ML model serving platform exposes to several domain-specific microservices for model inference. This singular API, or entry point, into the ML model serving platform has significantly increased the speed of innovation for iterating on newer versions of existing ML experiences, as well as enabling completely new product experiences with ML.

Machine Learning use cases powering member experiences on Netflix require rapid iteration and evolution in response to new learnings. The success of our ML model serving infrastructure largely depends on enabling researchers to rapidly experiment with new hypotheses and safely, at scale, release their models into production. Equally important is enabling multiple microservices at Netflix to seamlessly get model inference without exposing the complexities of ML model inference. To achieve this in a uniform and scalable manner, we created a centralized ML serving platform. As of 2025, the platform serves hundreds of model types and versions, netting 1 million requests per second. In this post, we’ll zoom in on a core challenge of any large-scale ML serving system: How to route traffic to the right model instance, on the right cluster shard, for the right user and use case, while preserving a simple abstraction for both client services and model researchers.

Background

Models at Netflix

To properly frame our discussion, let’s first clarify the distinction between model serving and model inference. At Netflix, the definition of an ML model has historically been somewhat unique. While model inference typically focuses only on an infer(features) -> score capability, models at Netflix act as self-contained workflows that transform inputs to outputs. A “model” encapsulates pre- and post-processing, feature computation logic, and an optional ML-trained component, all packaged in a standard format suitable for use across multiple contexts. We refer to the end-to-end execution of this workflow as model serving. This distinction matters because our routing and API abstractions operate at the level of workflows, not just individual scoring functions.

A few simplified examples of model serving use cases:

Use case: Personalized Continue Watching row on Netflix Homepage

  • Input: UserId, Country, Device ID
  • Output: Ranked List of movies and shows (aka title): [titleId1, titleId2, titleId3,…]

Use case: Payment Fraud Detection

  • Input: UserId, Country, Payment Transaction details
  • Output: Probability of the transaction being fraudulent

A typical flow of this serving workflow is depicted below:

To achieve this higher level of abstraction, the model definition contains a list of facts (raw, unprocessed data or observations built as states in different business workflows) that it needs to compute features, and it relies on the model serving platform to supply these facts at serving time by calling several other microservices. Likewise, during offline training, Netflix’s ML fact store provides snapshots for bulk access to facilitate feature computation.

The important takeaway from this model definition is that the calling services only need to provide standard request context (such as userId, country, device), and the relevant domain context (such as titles to rank, or payment transaction for fraud detection), and the model can itself compute features and perform inference as part of the execution flow. This common set of request contexts across domains enables them to share a standard API abstraction and standardizes how various client microservices can uniformly integrate with the serving app. Furthermore, clients are shielded from the model selection and execution, allowing the model architecture and data inputs to evolve with minimal client coordination.

This post focuses on showcasing the technical details to support this design paradigm. We’ll first describe how we implemented this abstraction with Switchboard, a centralized routing service, and then discuss the operational challenges we encountered at scale and how they led us to the Lightbulb architecture.

ML Model Serving Platform Principles

We envisioned a central model serving platform for all of Netflix’s member-facing ML Model serving needs. This ambitious effort required principled thinking to provide the right level of abstraction for both the researchers and client applications. The following ideas, which are relevant to the topic of this blog post, ensured that the platform acts as an enabler of rapid ML innovation and limits the exposure of ML model iterations to the client apps:

  • Model innovation independent of client apps: There should be only a one-time integration effort by the calling app with the ML serving platform for a new use case. After that, almost all model iterations, including intermediate model A/B experiments, should be mostly opaque to the calling apps. This implies that the platform should handle tasks such as model selection based on a user’s A/B allocation, fetching additional data needed by experimental models, logging for further training or observability, and more. This also benefits the ML researcher, as they only need to coordinate with one platform for model innovation.
  • Decouple clients from model sharding: Models are distributed across multiple serving compute cluster shards, each with its own Virtual IP (VIP) Address. Various factors, such as traffic patterns, SLAs, model architecture, and CPU/Memory availability, affect model-to-cluster mapping, and changes to this mapping result in changes to the VIP address at which a model is reachable. The serving platform should make clients agnostic to such frequent VIP address changes while ensuring high availability.
  • Flexible traffic routing rules: Support flexible mechanisms to introduce new traffic routing rules. This includes supporting traffic routing based on A/B experiments, providing a knob to slowly shift traffic to new models and VIP addresses, and allowing client overrides.

Introducing Switchboard

Standard out-of-the-box API Gateway solutions (such as AWS API Gateway, a standalone Service Mesh proxy) did not meet all our requirements. In particular, we needed first-class integration with Netflix’s experimentation platform, the ability to expose gRPC endpoints to clients, and the ability to use rich domain-specific context for routing customizations, which generic proxies were not designed to handle. Furthermore, the platform required customizations to model-specific lifecycle stages (shadow mode, canaries, rollbacks) to enable safe rollouts and migrations.

Hence, we embarked on building a custom service that serves as a flexible proxy layer for all traffic, handling over 1 million requests per second while maintaining high availability and reliability. We named it Switchboard.

Switchboard serves as the central entry point for the system, acting as a mandatory interface for all clients to access the appropriate model based on their context. Its role is to perform context-aware routing and to apply any configured context enrichment to the model inputs.

Here is a visual representation of the request flow from different clients to different serving clusters:

Objective Abstraction

To support this system design, we introduce the concept of an “Objective”. It’s an Enumeration defined by the serving platform that every request into the system must provide. It has three key purposes:

In short, an Objective is the serving platform’s name for a specific business use case (e.g., ContinueWatchingRanking), which decouples clients from concrete models and guides the platform’s routing and model selection decisions.

Key Capabilities of Switchboard

To summarize, these are the key capabilities of Switchboard:

  1. Common Client Abstraction: Switchboard provides a single point of contact for all our clients’ model needs. When clients wish to consume additional models for new ML applications addressing the same business need, there is no new service dependency to introduce or new clients to manage to make requests to the models. From an ML Ops perspective, this also gives us knobs to control client rate limits across model versions and manage central concurrency limits to deal with bad clients.
  2. Context-Aware Routing: Switchboard can route a request based on a rich set of contextual features, such as the user’s current device, locale, ranking surface type (e.g., home page vs. search results), or the current A/B test a user is in.
  3. Dynamic Traffic Splitting: It enables real-time traffic splitting for canary deployments and experimentation. This allows engineers to safely roll out a new model version to a small, controlled percentage of users before a full launch.
  4. Model Versioning and Lifecycle Management: Switchboard inherently manages concurrent request traffic to multiple versions of the same model. This is crucial for:
  • Shadow Mode Testing: Routing production traffic to a new model version without affecting the user experience, enabling performance comparisons.
  • Instant Rollback: Immediate switching of traffic away from a problematic new model version back to a stable one.

But is this the whole story? Not quite. Introducing this routing layer adds complexity to our model deployment cycles. In addition, we need a mechanism to collect the context-based routing information from the researchers when they choose to deploy model variants.

The Glue — Switchboard Rules

Given that Objectives serve as the contract between clients and the serving platform, we needed a way for researchers to attach model variants, experiments, and traffic splits to those Objectives without changing client code. This is where Switchboard Rules comes in.

The primary UX for model researchers to define models associated with an objective in a flexible manner is a JavaScript configuration, which we call Switchboard Rules. It’s used to produce a set of rules (typically a JSON file) that primarily dictate the following things to the serving platform:

  1. The default model to use for a given Objective
  2. A/B experiments to configure for a set of Objectives and the corresponding models to load for those experiments
  3. Customizations to gradually shift traffic to a new model

Here is an example of an A/B test rule in the context of the Continue Watching row:

/**
Configuration rule written by a Model Researcher to add an A/B experiment in the Model Serving system.
Cell 1: Uses the default, currently productized model
Cell 2 and Cell 3: Use different experimental (candidate) models
**/

function defineAB12345Rule() {
const abTestId = 12345;

const objectives = Objectives.ContinueWatchingRanking;
const abTestCellToModel = {
1: {name: "netflix-continue-watching-model-default"},
2: {name: "netflix-continue-watching-model-cell-2"},
3: {name: "netflix-continue-watching-model-cell-3"}
};

return {
cellToModel: abTestCellToModel,
abTestId: abTestId,
targetObjectives: [objectives],
modelInputType: constants.TITLE_INPUT_TYPE,
modelType: 'SCORER'
};
}

These rules are consumed by both the Switchboard and the Model Serving clusters. Given these rules, the serving platform components can take various actions, some detailed below:

Control Plane Flow:

  1. Assignment: Produce model-to-cluster shard assignment.
  2. Validation: Load all specified models into the Serving Cluster Shard and validate model dependencies to ensure successful execution.
  3. Mapping: Provide the model-to-shard VIP address mapping to Switchboard.

Data Plane Flow:

  1. Allocation: If the request is for Objective=ContinueWatchingRanking, query the Experimentation Platform for the userId’s cell allocation.
  2. Model Selection: Use the allocation and A/B test rule to select the appropriate model.
  3. Request Routing: Route the request to the serving cluster shard with the selected model and context.
  4. Model Execution (on the serving host): Run the model workflow steps and return the response.

A key highlight of this setup is the decoupling of the experimentation config from the serving platform code. This includes having an independent release cycle for the rules, separate from the code deployments. Netflix’s Gutenberg system provides an excellent ecosystem that enables a flexible pub-sub architecture, facilitating proper versioning, dynamic loading, easy rollbacks, and more. Both Switchboard and the Serving Cluster Host subscribe to the same Switchboard Rules configuration.

To prevent race conditions and ensure proper sync of the dynamic Switchboard Rules configuration, the following flow is considered:

Evolving Challenges

Switchboard solved the primary problem of improving model iteration and innovation velocity, and provided an excellent ML serving abstraction to over 30 service clients. However, as the system scale increased, a few challenges and problems with this design became apparent:

  • Single point of failure: The presence of Switchboard in the critical request path clearly highlights the risks of shutting down access to all serving hosts in extreme cases, such as unintentional bugs or noisy neighbors sending excessive traffic.
  • Why this matters: Switchboard became a shared dependency whose failure would degrade or disable multiple ML-powered experiences at Netflix.
  • Added latency due to additional network hop: Switchboard in the request path adds between 10–20ms of latency due to serialization-deserialization operations, depending on payload size. Additionally, it further exposes a request to tail latency amplification.
  • Why this matters: The added latency is unacceptable for some latency-sensitive clients, resulting in end-user impact due to service timeouts.
  • Reduced Client flexibility: Switchboard obscures visibility into client request origins from the serving clusters. Consequently, distinguishing data logged for real vs artificial traffic, which is essential for model training, is difficult and requires ongoing customization and increased MLOps overhead.
  • Why this matters: It makes it harder to do tenant separation and test traffic isolation.

What Next? — Lightbulb

The aforementioned challenges of operating Switchboard at scale forced us to rethink the core implementation while retaining its key features. Our goal was not to throw away Switchboard’s design, but to refactor where and how its responsibilities were executed, keeping the benefits while reducing risk and latency. Particularly:

  • Common Client Abstraction
  • Decouple clients from model sharding
  • Flexible traffic routing rules
  • Lightweight system client
  • Single place to define model and experimentation config
  • Fast experimentation config propagation
  • Fallback and client-side caching in case of failures

However, we did want to address some of the previous design choices to move forward with:

  • Remove the routing service from the direct request path: Having a single service in the active request path introduces another failure mode and limits fallback flexibility. While routing rules change infrequently, maintaining consistency comes at the cost of increased availability risks.
  • Separate model inputs from the request metadata: In certain cases, the request payload could be quite large. Needing to deserialize and then re-serialize the payload as it flowed through Switchboard to make a routing decision was a significant contributor to latency and increased serving costs.
  • Provide better isolation for the routing layer: Consolidating multiple use cases (tenants) into a single routing cluster poses two main challenges. First, error propagation posed a risk, as a surge of problematic requests from one tenant could cascade errors back to Switchboard, potentially impacting other users. Second, the cluster had to accommodate diverse latency requirements because the requests from different use cases varied significantly in complexity.

This required some changes in our setup flow: While it largely remained unchanged, however, we created separate components for Routing and Model Selection (Lightbulb):

We now take the rules for an Objective and break them into distinct sets of configuration:

  • Model Serving Configuration: This allows us to determine which model should be used at request time, along with the required metadata
  • Routing Rules: Given a model we want to serve at request time, this tells us which VIP the request should be routed to.

The Data Plane changes also reflect this separation, as we now rely on Envoy to take care of the routing details:

Envoy is already used for all egress communication between apps at Netflix, and it can route requests to different clusters (VIPs) based on the configurable Routing Rules published from our control plane. However, it lacks the information needed to make routing decisions and the ability to enrich the request body with additional serving parameters required for A/B testing model variants. We introduced Lightbulb to cover this gap:

  • Lightbulb consumes the minimal request context, which contains use-case information, and provides the metadata mapping required for routing at the Envoy layer.
  • Lightbulb resolves the request context to determine a routingKey configuration along with the ObjectiveConfig — this is where we place the model id along with other request-specific configurations required for model execution. This is done to separate the config resolution associated with the request from the placement and routing information needed to reach it on the inference cluster.
  • While the routingKey is added to the headers for Envoy proxy to consume, the client adds the ObjectiveConfig parameters to the request itself. This is done to avoid bloating the request headers while passing additional parameters for the model to process the request appropriately.
  • The routing of the actual request is performed by the Envoy proxy, which has the metadata to map the routingKey to the actual cluster VIP running the model. Because the routingKey is in a header, this determination can be made with minimal overhead.

These changes retain the advantages of Switchboard, such as a single integration point, abstraction of model id from use case, context-aware routing, while addressing the challenges we observed over time.

Conclusion

The evolution from Switchboard to Lightbulb marks a significant architectural refinement in our ML model serving infrastructure. While Switchboard provided the initial abstraction layer critical for rapid innovation, its latency and single-point-of-failure risk posed scaling hurdles. The subsequent adoption of Lightbulb, a decoupled service focused solely on routing metadata, and its integration with Envoy successfully resolved these challenges. This sophisticated new architecture preserves the key benefits — seamless client integration and flexible experimentation — while ensuring reliable, efficient, and scalable delivery of personalized member experiences, positioning us well for future ML growth.

In future posts in this series, we’ll dive deeper into other aspects of our ML serving platform, including inference and feature fetching, and how they interact with the routing architecture described here.

Special thanks to Sura Elamurugu, Sri Krishna Vempati, Ed Maddox, and Sreepathi Prasanna for their invaluable feedback and partnership in iterating on this idea and bringing this blog post to life.


State of Routing in Model Serving was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read the whole story
alvinashcraft
27 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

NAudio Modernization with Claude Code

1 Share

Almost 25 years ago, I created NAudio, an open-source audio library for .NET. Over the years I've had periods where I've done a lot of work on it, and periods where I barely touched it. That's certainly been the case recently, partly because I've been busy with other projects, and partly because creating a version 3 of NAudio requires extensive modernization and refactoring, which under normal circumstances would be impossible for me to find time for.

Claude Code

However recently, Anthropic's "Claude for Open Source" program very generously offered me six month's free access to their "20x Max subscription plan". This allowed me to try Claude Code for the first time (I'd been mainly using GitHub Copilot and Google Gemini previously), and gave me freedom to attempt some extremely ambitious coding tasks without worrying about burning through my token allowance too quickly.

I decided to run an experiment to see how well Claude Code could assist with modernizing the NAudio codebase, including adding some of my most wanted features that were previously out of scope due to their size.

Modernization

The first item on my to-do list was a wide-ranging modernization of the NAudio codebase. Right from the start of NAudio I've always tried to support as many versions of Windows and .NET as possible, and while I'm proud of how long I have kept that going, it has got in the way of adopting new features from .NET Core.

For example, the Span<T> feature is perfect for NAudio, but to fully embrace it means dropping support for the legacy .NET Framework. Another area that was in need of an overhaul was the COM interop, moving to the newer [GeneratedComInterface] approach instead of [ComImport] which opens the door to supporting IL trimming and Native AOT. I also wanted to tidy up the project structure, to clearly distinguish between the Windows specific, and cross-platform capabilities of NAudio.

Pair Programming with AI

What makes using coding assistants like Claude Code so fun is being able to treat them as an expert pair programmer, and run your crazy ideas past it. It's a great way to discover alternative approaches you hadn't thought of.

This was particularly valuable for revisiting some of my original API design decisions that I wasn't happy with. Some of these were due to my own inexperience, while others made sense in the past but are now well past their best-before date.

Examples include rethinking the approach to supporting custom chunks in WaveFileReader (which currently requires inheritance for every customized chunk), reconsidering how best to allow you to decorate a WaveStream with ISampleProvider effects without needing to hold a reference to the start and end of the chain in order to support repositioning, and considering whether the very Windows MME-centric WaveFormat class (based on WAVEFORMATEX) is the best approach or whether I should make a more generic abstraction (e.g. AudioFormat).

For each of these big design decisions, including how to make ASIO playback and recording much more pleasant to work with, I held an in-depth discussion with Claude Code up front. This helped solidify API design and make naming decisions that I was happy with, as well as thrashing out implementation plans that include testing and documentation.

With a well-defined concrete plan in place, in most cases I was able to just let Claude get on with the implementation. I did find myself needing to interrupt and course correct sometimes, but often I'd just do a thorough code review at the end. I'm still convinced that a manual code review of AI-generated code is vital. It regularly uncovered issues that I hadn't considered up front, and often resulted in several additional rounds of refactoring.

Test Coverage

AI assistants can be quite lazy about testing. They are so focused on achieving the 'goal' that they'll happily write code with no tests at all, or ask you to manually try things out for them and report back! You need to be clear about what level of testing you expect from them.

There are several areas of NAudio that were lacking in their unit test coverage. For example, Fast Fourier Transforms and pitch shifting algorithms are not straightforward to validate, especially if you don't fully trust your own DSP skills. So it was useful to get Claude Code to introduce sanity checks for some of these trickier areas.

Of course many NAudio capabilities require true "integration/end-to-end" testing. I need to play and record audio through real soundcards, and listen to the output of various operations such as decoding MP3s or applying audio effects in order to be confident that things are working as expected.

For years I've mostly made use of two test harnesses - one WinForms and one WPF app. These are very useful, but I've always wanted a console-based test harness, with a menu system that I could use to pick which test to run, but that would also support scripting so it could automatically run through a series of tests, and generate a test report, recording what tests were run, any errors encountered, and details of the system on which it was run. This would make it much easier for NAudio users to submit bug reports.

I got Claude Code to quickly scaffold my console test harness idea, and while it's still far from finished, it has already greatly accelerated the speed at which I can validate new features. It's an example of where a more "vibe coding" approach can be used - where you don't really need detailed scrutinization of all of the code generated, but just try it and see if it works. With auxiliary utilities like this, the stakes are a lot lower and technical debt is not really a major concern.

Performance Optimizations

One of the primary goals of introducing Span<T> into NAudio was improved performance, but just updating the public interface to use Span<T> wasn't enough. I had to flow that right through many of the classes in NAudio all the way into the interop layer, eliminating as many unnecessary copies as possible. Again this was something that Claude Code was able to greatly accelerate as it was able to search through hundreds of files and propose strategies for wide-scale refactoring that previously would have taken me days to plan.

It was also able to take additional steps to improve performance. For example there are some parts of NAudio that would benefit from vectorization and SIMD optimizations, which are not my speciality at all. But with sufficient unit tests in place, I could safely implement the vectorization performance optimizations that it had recommended and backed them up with a BenchmarkDotNet project to validate and quantify how much faster the new code actually was.

Memory Management and Interop

A large part of NAudio consists of COM interop to Windows APIs, and that poses some tricky memory management challenges. In particular, COM works by reference counting, but .NET uses a mark and sweep approach to garbage collection and so bridging the two worlds can be complicated, and it can be difficult to know when it's safe to dispose of COM objects. I was able to discuss this problem with Claude Code and come up with a consistent strategy for how I wanted memory management to behave. And then was able to quickly get it to roll that out across all of the WASAPI API wrappers in NAudio.

I was also able to get it to audit the coverage of Windows audio APIs and identify missing capabilities. This has allowed me to fill in a number of key gaps in NAudio. It wasn't all plain sailing though. Some of the capabilities offered by the Windows audio APIs have proved incredibly challenging to successfully wrap in C#. One of the most difficult so far is capturing audio from a specific process, which I'd spent many hours trying to do manually before using AI and every time failed miserably. And when I tried using Claude Code I ran into exactly the same problems and have had several failed attempts. I've not completely given up yet - hopefully the next try will be the successful one, but it certainly feels a lot less risky to attempt tasks like this now - a failed attempt is now just a few hours rather than days of wasted time.

Challenging Features

Although NAudio has lots of features, there are many missing capabilities that I would love to offer but have simply been too difficult for me to implement. Often it's a skill issue - I don't have enough deep understanding of digital signal processing or interop. But it's just as often a time constraint problem.

One of the great things about AI assistants like Claude Code is that (almost) no task is too daunting to attempt. Now I can realistically consider taking on challenges like creating my own synthesiser or VST3 plugin wrapper. So it's been really enjoyable with Claude Code to start tackling some of the more ambitious ideas on my backlog.

As a simple example of something I really struggled with many years back was creating a spectrum analyzer visualization in WPF. This required me to work out what the best FFT windowing function to use was, and decide things like whether I should use linear or logarithmic scales for each axis.

So it was fascinating to talk through all of my questions with Claude Code and discuss each of the existing design decisions and my concerns about what I'd got wrong and what I wanted to be improved. Within a short period of time it had discovered several mistakes in my original implementation, and created a much better visualization.

NAudio FFT spectrum analyser display

Bug and PR backlog

One of my biggest regrets as a maintainer of an open source project like NAudio is that it has simply not been possible for me to keep up with the rate of issues and PRs that I've received. Several years ago I reached the point where I wasn't able to reply to every single issue any more. That means there's probably many valid bugs and feature requests that deserve to be looked at, and also probably many excellent contributions sitting idle that are worthy of being merged into the NAudio code base.

So one of the next tasks I have with Claude Code is to see if it can help me triage all of these legacy issues and pull requests. This will let me close the ones that don't make sense to keep open any more but also to respond to many of the existing issues, and fix the bugs that have been reported. For pull requests, the substantial changes in NAudio 3 will mean they won't necessarily merge easily, but it should be possible to take the key ideas and reimplement them in a way that fits with the NAudio 3 design.

I've made a small start on this recently, so if you're wondering why your bug report from 2017 is suddenly being looked at, you'll know why!

Documentation

Documenting a library like NAudio is a major task, and although I've written many blog posts and tutorials, there's certainly a lot of scope for improvement. Again this is something Claude Code is able to help a lot with. I've already asked it to audit all of the existing documentation, check it for mistakes and correct it.

I've also used it to draft tutorials for the new features, and I'm also eager to use it to generate a migration guide. This will be especially valuable for NAudio 3, as I have decided to allow myself to make a number of strategic breaking changes to the API. A good migration document should allow users to point their own coding assistants at it and get upgraded relatively painlessly.

Can I trust its output?

Perhaps the biggest question in the software development industry at the moment is this - can we really trust AI coding assistants to create high-quality, production-ready code? Are we in danger of just accepting code that seems superficially correct, but under the hood and behind the scenes, significant bugs or architectural issues have been introduced?

Certainly, it's not all been plain sailing even with Claude Code's most powerful Opus models. In fact, some of the recent work I've used it on to completely rewrite the COM interop has surfaced some extremely challenging access violations that have taken hours to troubleshoot (and in fact as I write this there's still a really nasty one I'm struggling to get to the bottom of).

AI coding assistants can fail in less spectacular ways as well. They might simply ignore instructions, or accidentally drop an important line of code while refactoring some code. Or they might decide to implement a requirement that I didn't ask for or want. Or everything might seem great but after "speed-running" your way through several large features you discover you hadn't fully validated an earlier feature and now have to go back and unpick the mess.

I've been trying to be disciplined with thorough manual testing and careful reading of all of the code that Claude has generated, asking it questions and challenging its decisions. Often this leads to a much better implementation. But I must also admit that there have been times when I don't fully understand its changes, because it's doing things that are outside my comfort zone, such as modernizing the COM interop mechanisms.

This means I've spent a lot of time running manual tests, trying to find edge cases and race conditions. I'm determined that NAudio 3 is not going to just be a bunch of AI slop, but of course I can't guarantee that it will be bug free. For this reason I intend to release some early "alpha" versions of NAudio 3, allowing people to give feedback on the architectural changes as well as report bugs.

10x Speed-up?

Without doubt, AI has accelerated my progress way beyond what I could have achieved manually. But ironically, I've probably put in way more hours on NAudio 3 as a result of this speed-up than I was ever likely to have done without AI assistance. There are a few reasons for that - one is that the increased speed often results in me expanding the scope accordingly - attempting much more ambitious tasks than I would have previously dared. Another is that some of the access violations introduced by changes to the interop proved extremely time-consuming to root cause - you end up pulling the AI slot-machine lever repeatedly, hoping that this time it will fix the bug.

Another factor is that the huge speed increase allows you to try out wide-scale changes and then completely backtrack on them. The cost of prototyping has dropped dramatically, but that doesn't always result in as much of a speed increase as you might imagine, as you often allow yourself to go much further down a dead end before walking it back.

When is NAudio 3 coming?

I've been working on this modernisation of NAudio for over a month now and have committed a lot of work which you can find on the naudio3dev branch in GitHub if you're interested. There is still a lot that needs to be done, in terms of features to add, design decisions to be finalized and bugs to be fixed. I'm hoping to be in a place fairly soon where I can publish some pre-release NuGet packages allowing people to try out the changes and give me feedback. Hopefully people will be understanding of the reasoning behind making a number of breaking changes to the public API, but if there is enough pushback there will be time to reconsider some of the choices.

I'm also hoping to find some time to blog about a number of the important decisions so watch this space.

Read the whole story
alvinashcraft
32 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Daily Reading List – May 1, 2026 (#775)

1 Share

Happy May! I’m expecting another exciting month of tech updates and real-world stories of people learning the best ways to do modern work. Buckle up.

[blog] Databases Were Not Designed For This. Predictable queries and deterministic code? That’s not what databases encounter today. Are defensively designing your data layer for agents? This piece has specific advice.

[blog] How to correctly use MCP servers with your AI Agents. We’re getting smarter on how to best load the right tools when we need them.

[blog] Lessons on Building MCP Servers. More MCP lessons, especially around which tools to expose and and how to chain activities together.

[article] Build To Learn FAQ. Great follow-up from Marty that answers questions about what a product manager really does now.

[blog] One Map Key, One Lookup. It’s a small waste, but you’re still wasting CPU cycles if your code uses maps this way.

[article] How can engineering leaders calculate the return on their AI investments? Very good analysis, and a killer quote you should share with your manager. Find the one about this being a systems decision, not a tooling decision.

[blog] Google Cloud Next 2026: The End Of The AI Pilot Era. Forrester Research takes a look at this important next step. Let’s get to work.

[article] AI agents are forcing enterprises to overhaul their operations. We’ve got a ways to go to land on autonomous ops, but the journey will trigger some important conversations.

[article] Are we ready to give AI agents the keys to the cloud? Cloudflare thinks so. Agents can do any commercial transaction on Cloudflare. Bold. Once the guardrails are truly in place, this likely becomes more acceptable.

[blog] The Journey Begins: Meet the 2026 GSoC Contributors! Wonderful program, glad to see this season get rolling.

[article] The Psychological Costs of Adopting AI. Let’s ensure we intentionally build the human infrastructure needed to get the most value from these AI tools.

[blog] Building with Gemini Embedding 2: Agentic multimodal RAG and beyond. Regardless of your media type, you can use this single Embeddings model. How amazing is that?

Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:



Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

We Standardized the API. We Didn’t Standardize the Application.

1 Share

During my Thursday office hours this week I explored adding applications and obtaining keys for six developer portals back to back: Notion, Slack, LinkedIn, GitHub, Cloudflare, and Google. Here was my plan — create an application against each one, the same way any developer would when wiring a new integration. I had Claude open in a side panel as a co-tracker, capturing every URL, every form field, every required input, every gating dialog. Three hours later I had a map of what it actually takes to onboard six common APIs (out of the hundreds we use).

Seeing the Diff Between

Every provider exposes the same conceptual thing: a configurable “application” that holds the credentials, scopes, and metadata required to consume their API. None of the providers call it the same thing. None of them shape it the same way. None of them gate it on the same conditions. None of them export it in the same format.

Notion calls it an “internal connection.” It has a single Configuration page with capability checkboxes (read content, update content, insert content, read comments, insert comments, user info), a Content Access tab where you select individual pages and teamspaces, and a single installation access token. There is no client ID, no client secret, no verification gate. For internal use it’s the cleanest shape of the six providers I dug into.

Slack calls it an “App.” Slack has the largest surface area of any provider in this research, with seventeen distinct configuration pages spread across two host domains, three sidebar groups (Settings, Features, Submit to Marketplace), and two scope namespaces (bot token vs. user token). It has a Socket Mode option that routes events over WebSockets instead of public HTTP. It has a Block Kit framework for app home tabs. It has a JSON or YAML manifest that exports the entire app config, and that manifest is the most agent-friendly artifact across all six providers. No one else has anything like it.

LinkedIn calls it an “App,” gated behind a “company-page verification” flow you complete by sending a magic URL to a Page Admin who has thirty days to act. Your scopes are not free-form — they are bundled inside “products” (Default tier, Standard tier, Development tier) and only the Default tier products are available without verification. The default access token TTL is sixty days. If you accidentally type a personal profile URL into the LinkedIn Page field instead of a Company Page URL, the form lets you submit and then warns you it cannot be undone.

GitHub gives you two completely separate paths. Personal Access Tokens, 1) fine-grained or classic are the path of least resistance for any application: name, expiration, repos, permission matrix. OAuth Apps are the path for multi-user or distributed scenarios: client ID, client secret, callback URL, optional Device Flow toggle. There is no verification gate on either. There is also no auto-cleanup; my own account had expired tokens piling up and one with no expiration date at all, which is its own problem (for me).

Cloudflare doesn’t have an “App” abstraction at all. It has profile-scoped API tokens. The entire authorization surface is a single flat list under your user profile, with permissions × resources × IP filtering × TTL configurable per token. Cloudflare is the only provider in this set that exposes TTL and IP allowlist as first-class fields on token creation. It is also the only one that ships a curated set of token templates for pre-built “common shapes” that bypass the robut yet complex permission matrix.

Google has eight surfaces. Credentials, OAuth Overview, Branding, Audience, Clients, Data Access (scopes), Verification Center, and the APIs Dashboard — split across two product groups, APIs & Services and the Google Auth Platform. You can have API Keys, OAuth 2.0 Client IDs, and Service Accounts coexisting on the same project. Verification has two independent axes — branding verification and data-access verification — surfaced as separate cards. There is a hard, irreversible 100-user cap for unapproved sensitive scopes that applies over the entire lifetime of the project. Google’s makes my head hurt.

Six providers. Six different vocabularies for the same essential part of using an API.

Why No Standard?

OpenAPI exists. We have a shared way to describe what an API does. We do not have a shared way to describe what it takes to use one.

Part of the answer is incentives. The application-creation flow is the developer’s first prolonged exposure to a provider’s brand and product. It is where Slack shows you Block Kit, where LinkedIn shows you their product tiers, where Google shows you the consent screen they want you to brand. None of them are motivated to commodify that surface. It is also where the provider’s monetization model first becomes visible — what’s gated, what’s free, what triggers a verification queue, what requires an account upgrade. That is product surface, and product surfaces resist standardization–especially across the competition.

Part of it is history. Each of these portals accreted over a decade or more, layer by layer. Slack’s seventeen pages did not appear at once; they grew as the platform shipped App Home, Socket Mode, Workflow Steps, Org Level Apps, MCP support. Google’s eight surfaces are the geological accumulation of OAuth 2.0 evolution, the GDPR-driven verification regime, the post-2020 Audience publishing model, and the per-API enablement architecture. There was never a moment when all of this could have been frozen into a clean standard, because none of it was finished, or is finished today.

Manageable with 10 APIs

Most of us internalized this divergence years ago without noticing. You learned the GitHub OAuth flow once and re-used it for ten years. You learned where Google hides the consent screen and you came back to it twice a year. You learned that LinkedIn product tiers existed because you had to once, for one integration, and then you forgot. The cost was distributed thin enough across a long enough time horizon that it never registered as a tax we all were paying.

Then the number of APIs we touch in a given quarter went from ten to fifty, and the cost stopped being bearable. I have eight Google API Keys on my personal account, three of which are flagged as unrestricted, two of which date back to 2017 and have no business still being active. I have six Cloudflare API tokens, two of which were last used in 2019 and 2021 and somehow still have permissions. I have an OAuth client on my Google project that hasn’t been used in five months and is scheduled for automatic deletion. None of these portals talked to each other when I created the credentials, none of them talk to each other now, and none of them will talk to each other when the credentials expire or get revoked.

That was the state of the world before agents. This won’t scalle y’all.

Agentic Bottleneck

When you point an agent or a copilot at the tools you actually use to do your work — Notion for notes, Slack for communication, GitHub for code, Google for everything else, you are asking it to navigate exactly this divergence. The agent does not have ten years of muscle memory for where Google hides the OAuth client editor. It does not remember that LinkedIn requires Page-Admin verification before the Products tab unlocks. It does not know that Slack’s Token Rotation requires Redirect URLs to be configured first.

You can paper over this with bespoke per-provider integrations — every major agent harness today does — but that is the same trap we hit with the ten-to-fifty-API transition, just one layer up. Every new provider is bespoke. Every change to a provider’s portal silently breaks the integration. The cost compounds with the number of providers, not the number of agents. This means it will get much worse in a much faster timeline as the agent ecosystem expands.

What we actually need is some sort of manifest. Slack already has one — a JSON or YAML document that captures the full application configuration in a portable, inspectable format. Of the six providers I walked through, Slack’s manifest is the only artifact that an agent could read, modify, and re-apply without scraping a web UI. The other five require either browser automation or per-provider SDKs that wrap the underlying portal APIs (when those APIs exist at all, which is not always).

A cross-provider application manifest would not need to be that big. It would need to just capture the things that are recurring, like the credential type (PAT, OAuth app, internal connection, service account), the scope vocabulary (with provider-specific mappings), the resource selection model (repos, pages, workspaces, accounts, projects), the verification state, the TTL and rotation policy. It would not need to flatten every provider into one shape. It would need to be lossless enough that an agent could understand what an application is, what it can do, and what it would take to provision it against any new provider.

I have felt this many for years. So many I am almost numb to it. But everytime I hear someone say how easy it will be for agents to do all this work for us, I remember just how hard it is to do across all of the APIs I depend on as a small business. It ain’t easy. I know we have OAuth automation moving in to support some of this, but I think we are going to need something like SLA4OAS in the play helping us with the application, SLA, pricing, and other layers too. It just feels like we are tending to the economics of this at scale and just kind of perpetually sweeping things under the run.



Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Fast Focus: GitHub Models—An AI-infused Developer Experience | Visual Studio Live! Las Vegas 2026

1 Share
From: VisualStudio
Duration: 21:20
Views: 67

Exploring AI models shouldn’t require complex setup or expensive infrastructure. In this Visual Studio Live! Las Vegas 2026 session, Brian Randell introduces GitHub Models, a free, cloud-based tool that lets developers experiment with different AI models side-by-side.

See how to evaluate models, compare outputs, and integrate them into your development workflow so you can make smarter decisions about performance, cost, and real-world use cases.

🔑 What You’ll Learn
• What GitHub Models is and how it fits into modern AI development
• How to explore and compare models from OpenAI, Meta, Mistral, and more
• How to test prompts and evaluate model performance side-by-side
• Key differences between model types, capabilities, and use cases
• How tokens, context windows, and parameters impact results
• How to generate code snippets to integrate models into your apps
• When to use hosted models vs. open source or self-hosted options
• How to think about cost, performance, and scalability when choosing models

⏱️ Chapters
00:28 What GitHub Models is and why it matters
02:11 Accessing GitHub Models and navigating the marketplace
03:52 Exploring the model catalog and capabilities
08:15 Using the playground to test prompts and responses
10:34 Comparing models side-by-side
13:54 Configuring models and calling them from your code
15:24 Understanding model size, parameters, and performance tradeoffs
18:02 Using models inside GitHub repositories
19:28 Final thoughts: choosing the right model for your needs

👤 Speaker
Brian Randell (@brianrandell)
Partner, MCW Technologies | VSLive! Conference Co-Chair

🔗 Links
• Download Visual Studio 2026: http://visualstudio.com/download
• Explore more VS Live! Las Vegas sessions: https://aka.ms/VSLiveLV26
• Join upcoming VS Live! events: https://aka.ms/VSLiveEvents

#github #ai #copilot #visualstudio #vslive

Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Busy .NET Developer's Guide to Python | Visual Studio Live! Las Vegas 2026

1 Share
From: VisualStudio
Duration: 1:12:58
Views: 67

Curious about Python but coming from a .NET background? In this Visual Studio Live! Las Vegas 2026 session, Ted Neward walks through what .NET developers need to know to get started with Python.

From setup and environments to core language concepts, you’ll see how Python compares to C# and Java, where it fits best, and how to start building real applications quickly.

🔑 What You’ll Learn
• Where Python fits in modern development, including AI and automation
• How Python compares to C#, Java, and other object-oriented languages
• Setting up Python environments using common tools and installers
• Managing dependencies with pip and virtual environments
• Core language concepts like dynamic typing and simple syntax
• Working with common data structures such as lists, dictionaries, and sets
• How Python handles flow control, functions, and error handling
• Key conventions and philosophy behind Python

⏱️ Chapters
01:26 What Python is used for (AI, scripting, automation)
06:04 What Python is and its core philosophy
10:26 Installing Python and choosing a distribution
16:53 Running Python and using the REPL
24:29 Managing packages with pip and virtual environments
34:11 Python fundamentals: types, variables, and syntax
44:18 Core Python data types: strings, lists, and dictionaries
52:16 Flow control and pattern matching
58:46 Functions, type hints, and common patterns
1:10:45 Classes, globals, and Python scope behavior

👤 Speaker
Ted Neward
Principal, Neward & Associates

🔗 Links
• Download Visual Studio 2026: http://visualstudio.com/download
• Explore more VS Live! Las Vegas sessions: https://aka.ms/VSLiveLV26
• Join upcoming VS Live! events: https://aka.ms/VSLiveEvents

#python #dotnet #visualstudio #vslive

Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories