Learn how to implement dark mode and a theme switch for Blazor web applications using standardized CSS features and no JavaScript code.
In this article, we’ll learn how to properly implement dark mode and a theme switch for Blazor web applications.
We want to implement a minimal yet complete example of dark mode and a theme switch for Blazor web applications.
We will keep it simple and use pure CSS to style the application and Blazor to manage the state. We do not use any JavaScript code for this solution.
Hint: Although the code used in the example project is based on the .NET 10 Blazor Web Application project template, which uses Bootstrap for its sample pages, we do not use Bootstrap or any other CSS library to implement the dark mode or the theme switch.
You can access the code used in this example on GitHub.
There are three core principles we want to follow:
.dark-theme CSS class to override/replace those base colors.This solution works for Blazor Server and Blazor WebAssembly.
In the app.css file or any other CSS file referenced in the index.html or App.razor file of your Blazor web application, we add the following CSS variable definitions:
:root {
--background-color: #FFFFFF;
--text-color: #1F1F39;
}
.dark-theme {
--background-color: #1F1F39;
--text-color: #FAFAFB;
}
Hint: We keep it simple in this example and use only CSS variables for the background color and text color. In a real-world implementation, you might also want to define CSS variables for border colors, primary and secondary colors, etc.
Now that we defined the CSS variables for our desired colors, we need to apply those variables in CSS definitions. Remember that the variable definition above only defines the variables, but does not apply them.
.theme {
background-color: var(--background-color);
color: var(--text-color);
}
Again, we keep it simple and apply the background color and the text color to the theme CSS class. We could go on and define colors for buttons, headings and other parts of the web application.
Now that we have the basic building block for styling the component in place, we want to keep track of whether dark mode is in use.
A simple implementation alters the MainLayout component like this:
<div class=@($"page theme {ThemeCSSClass}")>
<div class="sidebar">
<NavMenu />
</div>
<main>
<div class="top-row px-4">
<button class="btn btn-primary" @onclick="ToggleTheme">
Toggle Theme
</button>
<a href="https://learn.microsoft.com/aspnet/core/" target="_blank">About</a>
</div>
<article class="content px-4">
@Body
</article>
</main>
</div>
We add two CSS classes to the class list of the outer div element. In addition to the page CSS class used in the default project template, we also want to add the previously defined theme class, followed by a class name managed by the component.
We also add a button to toggle the theme.
We add the following behavior to the code section of the component:
@code {
private bool _darkModeEnabled = false;
private string ThemeCSSClass => _darkModeEnabled ? "dark-theme" : "";
private void ToggleTheme()
{
_darkModeEnabled = !_darkModeEnabled;
}
}
The code defines a _darkModeEnabled variable that tracks the user’s theme choice. We conditionally return dark-theme or an empty string for the ThemeCSSClass property, which is then rendered as part of the CSS class name list for the Layout component’s outer div.
The ToggleTheme method is triggered when the user presses the button and wants to switch the theme. It flips the value of the _darkModeEnabled boolean variable.
The user can now toggle the light and dark themes using the button in the page header.


This solution leverages how Razor component rendering works. Whenever the state of a component changes, the component is re-rendered.
For the layout component, this means that whenever the user presses the button, the component’s state changes, so it will be re-rendered. As part of the rendering process, the correct CSS classes will be applied.
A declarative user interface definition combined with state-driven component rendering is the strength of Blazor, and our solution therefore fits it perfectly.
There are two main areas that could be improved to take our simple solution to the next level:
MainLayout component. It means that if the user closes the browser tab or refreshes the page, the state is lost. Persisting the theme choice using local storage is a great idea to keep the user experience consistent. Learn more about accessing the local storage for Blazor Server web applications or how to use local Storage in Blazor WebAssembly applications.preferes-color-scheme media feature. The code would look like this:@media (prefers-color-scheme: dark) {
:root {
/* variable definitions */
}
}
@media (prefers-color-scheme: light) {
:root {
/* variable definitions */
}
}
As shown in the previous chapter on improving the user experience of our simple solutions, our approach is basic and doesn’t scale well. The following imminent limitations come to mind:
MainLayout component element tree are out of scope for theming and must be treated the same way (by adding the theme and dark-mode CSS classes).For a small website, the approach shown in this article will probably work well. For larger web applications consisting of hundreds of pages and components, it can become a maintenance nightmare if the implementation is not carefully architected to properly handle state changes and apply the correct CSS classes for each component during rendering.
In modern large-scale web applications, theming is important and helps meet accessibility requirements and expected user experience standards.
The Progress Telerik UI for Blazor component library provides a Blazor ThemeBuilder tool that allows visual customization (color previews), SCSS-based design tokens and built-in dark and light themes.
And most importantly, the themes are applied consistently across all components.
It is still important to understand how theming works under the hood, but for a professional web application, using a professionally implemented theming system can prevent headaches and reduce the amount of custom code required. Plus, accessibility features are baked in.
Learn more about theming from Peter Vogel in the Themes Magic in Telerik UI for Blazor article.
We learned how to implement a simple solution for a theme switch and light and dark mode in a Blazor web application. This basic solution works for Blazor Server and Blazor WebAssembly because it uses standardized CSS features and no JavaScript.
We learned how to further improve the solution and what limitations it will face for a large-scale web application.
Professionally implemented user interface component libraries, such as Telerik UI for Blazor, implement complex theming systems that we can leverage to get around those limitations.
If you want to learn more about Blazor development, watch my free Blazor Crash Course on YouTube. And stay tuned to the Telerik blog for more Blazor Basics.
Pulumi ESC (Environments, Secrets, and Configuration) allows you to compose environments by importing configuration and secrets from other environments, but this also means a child environment can silently override a value set by a parent. When that value is a security policy or a compliance setting, an accidental override can cause real problems. With the new fn::final built-in function, you can mark values as final, preventing child environments from overriding them. If a child environment tries to override a final value, ESC raises a warning and preserves the original value.
Let’s say you have a parent environment that sets the AWS region for all deployments. You can use fn::final to ensure no child environment can change it:
# project/parent-env
values:
aws-region:
fn::final: us-east-1
If a child environment tries to override the final value, ESC raises a cannot override final value warning.
# project/child-env
imports:
- project/parent-env
values:
aws-region: eu-west-1 # raises a warning
This evaluates to:
{
"aws-region": "us-east-1"
}
In this scenario, the ESC environment is still valid, but the final value remains unchanged.
Use fn::final for:
The fn::final function is available now in all Pulumi ESC environments. For more information, check out the fn::final documentation!
This is a guest post by Faisal Waris, an AI strategist in the telecom industry. Faisal built RT.Assistant to explore how .NET, F#, and the OpenAI Realtime API can come together in a production-style, multi-agent voice application.
RT.Assistant is a voice-enabled, multi-agent assistant built entirely in .NET — combining the OpenAI Realtime API over WebRTC for low-latency, bidirectional voice; F# discriminated unions and async state machines for agent orchestration; .NET MAUI (via Fabulous) for cross-platform native UI on iOS, Android, macOS, and Windows; and Microsoft.Extensions.AI for portable LLM integration with both OpenAI and Anthropic models.
Under the hood, a custom RTFlow framework hosts multiple specialized agents — a Voice Agent, a CodeGen Agent, a Query Agent, and an App Agent — that communicate over a strongly-typed async bus, while a deterministic state-machine (the “Flow”) keeps the non-deterministic LLM behavior in check. The sample also showcases an unconventional RAG approach: instead of vector search, user queries are translated into Prolog and executed against a logic-programming knowledge base embedded in a .NET MAUI HybridWebView, yielding precise, hallucination-resistant answers.
Faisal works in the telecom industry, where one of the most common customer pain points is choosing the right phone plan. Carriers offer dozens of bundled plans that mix voice, data, hotspot, streaming, and promotional pricing in ways that are genuinely difficult to compare — even for the people selling them. It’s the kind of domain where a conversational AI assistant can make a real difference: customers ask natural-language questions and get precise, verifiable answers instead of sifting through comparison matrices or waiting on hold.
RT.Assistant uses this domain as a realistic proving ground. The application maintains a mocked — but representative — catalog of plans modeled after a major US carrier’s actual offerings. Let’s look at what makes these plans hard to compare in the first place.
Phone plans (like many other offerings these days) are bundled products and services which makes it non-trivial to ascertain which plan will work best for one’s needs. Consider the following components of a typical contemporary plan:
Additionally, the available features may be dependent on the number of lines (distinct phone numbers). For example, Netflix may be excluded for a single line but included for two or more lines.
The system internally maintains a mocked—but representative—set of phone plans, modeled as Prolog facts, simulating offerings from a typical major telecom provider. From (a) capturing the voice input, to (b) querying the Prolog knowledge base, and finally (c) generating the results, multiple components work together seamlessly.
This sample highlights the integration of the following frameworks and technologies:
There is a lot going on here: generative AI; old-school symbolic AI; multi-agents; realtime voice; cross-platform native mobile apps; to name some. The following explains how these are all stitched together into a comprehensive system.
All agents are orchestrated by the RTFlow framework, which provides hosting and communication services. The diagram below illustrates the RTFlow agent arrangement for the RT.Assistant sample:

As there are multiple frameworks / technologies in play here. Let’s briefly delve into each one of them – in the order of perceived importance.
RTFlow is a framework for building real-time, agentic applications. It is composed of three primary elements: Flow, Bus, and Agents.
The Bus provides the communication substrate that connects Agents to one another and to the Flow. It exposes two distinct logical channels:
This separation allows agent collaboration to occur independently of system-level orchestration, while still enabling agents to explicitly signal the Flow when required.
Both Flow and Agents maintain private internal state and communicate exclusively via strongly typed, asynchronous messages. Message ‘schemas’ are defined as F# discriminated unions (DUs) types and are fixed at implementation time, providing:
The Flow is an asynchronous, deterministic state machine. Its state transitions are triggered solely by messages arriving on the Flow input channel.
Depending on application requirements, the Flow can range from minimal to highly directive:
Start → Run → Terminate), where agents primarily interact with each other via the broadcast channel and the Flow plays a supervisory role.This design allows system-level determinism and control to be introduced incrementally, without constraining agent autonomy where it is unnecessary.
From a multi-agent systems perspective, RTFlow employs a hybrid bus–star topology:
This hybrid model balances scalability and decoupling with deterministic system control.
The F# language offers a clean way to model asynchronous state machines (or more precisely Mealy machines) where the states are functions and transitions happen via pattern matching over messages (DUs) or with ‘active patterns’. In the snippet below s_XXX are functions as states and M_xxx are messages that arrive on the Bus. The structure F packages the next state along with any output messages to be sent to agents.
let rec s_start msg = async {
match msg with
| M_Start -> return F(s_run,[M_Started]) //transition to run
| _ -> return F(s_start,[]) //stay in start state
}
and s_run msg = async {
match msg with
| M_DoSomething -> do! doSomething()
return F(s_run,[M_DidSomething])
| M_Terminate -> return F(s_terminate,[])
| _ -> return F(s_run,[])
}
and s_terminate msg = async {
...
LLMs are inherently non-deterministic. RTFlow offers a way to control non-determinism to keep the overall system stable. As applications move from being human-centric to being more autonomous, we will need increasingly sophisticated methods to manage non-determinism. RTFlow’s approach is to inject a deterministic state machine in the mix to effect such control.
Given the relatively simple building blocks of RTFlow, we can construct rich agentic systems that can support many realtime needs with the ability to dial-in the desired degree of control, when needed.
RTOpenAI wraps the OpenAI realtime voice API for native mobile(+) apps. Its two key features are a) support for the WebRTC protocol; and b) strongly-typed realtime protocol messages. These are discussed next.
The OpenAI voice API can be used via Web Sockets or WebRTC, where WebRTC has some key advantages over the other;
The RTOpenAI.Events library attempts to define F# types for all OpenAI realtime API protocol messages (that are currently documented).
Additionally, the server (and client) messages are wrapped in DUs, which is convenient for consuming applications; incoming events can be handled with simple pattern matching. After the realtime connection is established, there is a steady flow of incoming events from the server that the application needs to accept and handle. The following snippet is an impressionistic version of how the Voice Agent handles server events:
let handleEvent (ev:ServerEvent) = async {
match ev with
| SessionCreated -> ...
| ResponseOutputItemDone ev when isFunctionCall ev -> ...
| _ -> ... //choose to ignore
}
The RTOpenAI library is a cross-platform .NET MAUI (see next) library and as such supports realtime voice applications for IOS, MacOS, Android and Windows.
Microsoft .NET MAUI is a technology for building cross-platform native apps. The F# library Fabulous.MauiControls enables building of .NET MAUI apps in F#.
Fabulous is a functional-reactive UI framework (influenced by Elm and React).
Fabulous is a joy to use. UI’s can be defined declaratively in simple and understandable F#. UI ‘events’ are messages, which again are F# DU types that are ‘handled’ with pattern matching. In the simplest case, events update the application state, which is then rendered by Fabulous on to the screen.
Fabulous for .NET MAUI has a rich feature set, which cannot be fully covered here but the Counter App sample is replicated below to provide some sense of how the library works:
/// A simple Counter app
type Model = //application state
{ Count: int }
type Msg = //DU message types
| Increment
| Decrement
let init () =
{ Count = 0 }
let update msg model = //function to handle UI events/messages
match msg with
| Increment -> { model with Count = model.Count + 1 }
| Decrement -> { model with Count = model.Count - 1 }
let view model =
Application(
ContentPage(
VStack(spacing = 16.) { //view
Image("fabulous.png")
Label($"Count is {model.Count}")
Button("Increment", Increment)
Button("Decrement", Decrement)
}
)
)
The RT.Assistant is a .NET MAUI application and so the project structure is defined by .NET MAUI. Its a single project that targets multiple platforms. Components specific to each target platform are under the Platforms folder:
/RT.Assistant
/Platforms
/Android
/IOS
/MacCatalyst
/Windows
The platform specific folders contain the native-app required components (plists, app manifests, etc.). For example, here is the IOS plist.
RT.Assistant application code is 90% shared across platforms. However platform-specific libraries are required when interfacing with hardware that .NET MAUI does not cover. For WebRTC, RTOpenAI uses platform-native libraries with Native Library Interop. The iOS WebRTC binding library wraps the WebRTC.xcframework written in C++. And for Android the native libwebrtc.aar Android Archive is wrapped.
Since most mobile apps have both IOS and Android versions, as such, .NET MAUI makes a lot of sense. Instead of maintaining multiple code bases and dev teams, with .NET MAUI one can maintain a single code base with 90% shared code across platforms. And unlike other mobile platforms (e.g. React Native), .NET MAUI apps are proper native apps. For example, it would be problematic to host a realtime multi-agent systems like RTFlow in a JavaScript-based system like React Native.
To make the sample somewhat fun and interesting, I decided to use Prolog-based ‘RAG’. Generative AI meets Symbolic AI.
Prolog is a language for logic programming that was created almost 50 years ago. It has endured well even till today. The best known open source implementation is SWI- Prolog. However here I am using the much lighter weight Tau Prolog engine that runs in the browser.
Fortunately web content can be easily hosted in .NET MAUI apps via the HybridWebViewControl. In RT.Assistant there is a hidden web view that loads the Tau engine and the plan facts.
The typical phone plans from the major telecoms are ‘rich’ offerings. The interplay of base plans, number of lines, features and promotions suggest a rules engine based approach. This is precisely where Prolog excels. By representing valid combinations of plans, features, and pricing as logical facts, Prolog ensures consistency and removes ambiguity.
Prolog is a declarative language for first-order logic programming. A Prolog ‘database’ consists of facts (e.g. plans and their features) and rules (to derive new facts from existing ones). A Prolog implementation will find any and all possible solutions that satisfy a query, given a set of facts and rules.
The ‘schema’ for the plan and its features is in plan_schema.pl. The skeletal form is:
plan(title,category,prices,features)
% where each feature may have a different attribute set
A partial fact for the ‘Connect’ plan is given below:
plan(
"Connect",
category("all"),
prices([
line(1, monthly_price(20), original_price(25)),
line(2, monthly_price(40), original_price(26)),
...
]),
features([
feature(
netflix(
desc("Netflix Standard with Ads On Us"),
included(yes)
),
applies_to_lines(lines(2, 2))
),
feature(
autopay_monthly_discount(
desc("$5 disc. per line up to 8 lines w/AutoPay & eligible payment method."),
discount_per_line(5),
lines_up_to(8),
included_in_monthly_price(yes)
),
applies_to_lines(all)
),
...
])
).
Note
The full Prolog fact may seem complex, however the same rules expressed in a relational database schema would be far more complex to understand and query. The metadata (columns, tables, relations) required to represent the rules and facts will be far greater than what is required under Prolog.While we can obtain an answer by prompting the LLM with text descriptions of the plans along with the query, there is a sound reason for not doing so. LLMs are not perfect and can make mistakes. And here we desire a more precise answer. So, instead we transform the natural language user query into an equivalent Prolog query – with the help of an LLM. It is surmised that the reformulation of the question is easier for the LLM, i.e. the LLM is less likely to hallucinate compared to the case of generating the answer directly. For direct answer generation, the LLM will need to sift through a much larger context – the entire plan database as plain text. For query generation, the LLM need only look at the database ‘schema’ – which is much more compact, especially in the case of Prolog.
If query transformation goes awry then the Prolog query may fail entirely or produce strange results. Either way the user will be alerted and will not rely on the results to make a decision. If on the other hand, the answer is generated directly, a hallucination may subtly alter or miss facts. The user is likely to accept it without questioning because the answer looks plausible. This is a more egregious error.
Below are some typical questions that can be asked:
The RT.Assistant application shows the natural language query; the generated Prolog; and the Prolog query results on the UI in realtime.
Example:
Natural language query generated by voice model from conversation:
Find the plans in the category 'military_veteran' for 2 lines and list their costs.
Prolog query:
plan(Title,
category(military_veteran),
prices(Lines),
_),
member(price(2, monthly_price(Price), _), Lines).
Note
In Prolog, uppercase-starting names are ‘free’ variables that can be bound to values. For example, ‘Title’ above will bind to each of the plan titles for the found solutions. A solution satisfies all constraints. One obvious constraint is ‘category=military_veteran’ so only Military Veteran plans will be considered.Results:
Title = Connect Next Military, Lines =
[line(1,monthly_price(85),original_price(90)),
line(2,monthly_price(130),original_price(140)),
line(3,monthly_price(165),original_price(180)),
line(4,monthly_price(200),original_price(220)),
line(5,monthly_price(235),original_price(260))],
Price = 130
Title = Core, Lines = ...
If a Prolog error occurs, the system regenerates the Prolog query but this time includes the Prolog error message along with the original query. This cycle may be repeated up to a limit.
For code generation, the application allows for a choice between Claude Sonnet 4.5 and GPT 5.1 (via the app Settings). The GPT Codex model was also tested but there the latency is too high for realtime needs.
For this particular task, GPT-5.1 has the clear edge, generating code that produces concise and relevant output. See this analysis for more details.
For what it’s worth, both models generate syntactically correct Prolog 99% of the time. (A retry loop corrects generated errors, if any.)
For question-answering, the OpenAI realtime model generates satisfactory answers to user queries from the generated Prolog output. Note that for any real production system there should be a well-crafted ‘eval’ suite to truly gauge the performance.
The post RT.Assistant: A Multi-Agent Voice Bot Using .NET and OpenAI appeared first on .NET Blog.
Agent Caching in Fiddler Everywhere allows you to iterate as you build an agent without having to pay for every response when it hasn’t changed.
If you have ever built a model-powered agent, you know the development loop. Write some code, fire it at the endpoint, check the response, tweak the parsing, fire it again. Repeat until the output looks right. It is a perfectly normal workflow—and it quietly drains your token budget with every single iteration.
The new Progress Telerik Fiddler Everywhere Agent Cache feature is designed to break that cycle. Once you capture a response from a model-provider endpoint, you can flip a single switch and have Fiddler software replay that response for every subsequent matching call—without the request ever leaving your machine. Same output, zero additional tokens consumed on the provider side.
This post walks through exactly how that works, using a small open-source demo project to make everything concrete.
Building an agent that calls a completion endpoint involves a lot of repetition that has nothing to do with the model itself. You are iterating on:
None of those iterations require a new, unique response from the model. You already have a good one from the first call. But unless you manually save the raw response and mock it yourself, every invocation sends a fresh request, and the provider charges for it.
Once agents move beyond demos, three pressures show up together and stay for the duration of development:
This is especially visible in teams that build many small, task-specific agents rather than one large agent. Even small per-run costs compound when iteration is constant—and none of that spend actually improves the agent.
Most teams already compensate for this manually. Common patterns include separating development runs from real execution, validating agent wiring before triggering model calls, reusing mocked or previously captured responses, and avoiding live execution early to keep iteration fast.
These approaches work, but they are fragmented. Provider-level caching helps in some cases but is limited. Custom mocks and fixtures are costly to maintain. Replay logic often lives outside the main development flow, and different teams end up solving the same problem with different local tooling.
The problem is not a lack of solutions. It is the lack of a low-friction one that fits naturally into everyday iteration.
Fiddler Everywhere acts as a proxy that sits between your agent and the remote endpoint. When your agent makes an HTTPS call to, say, api.anthropic.com, Fiddler software intercepts it, forwards it and logs the full request-response pair in the Traffic pane.
The new Agent Calls tab is a focused view inside that pane. It automatically filters and displays HTTPS sessions that target supported model-provider endpoints—such as OpenAI, Anthropic and Gemini—so you are not wading through noise from other traffic. Every captured call gets a Caching toggle.
Enable the toggle, and Fiddler software starts intercepting any outbound call that matches that session’s request. Instead of forwarding the request, it immediately returns the cached response. The endpoint never receives the duplicate call. Your agent sees the exact same payload it would have received from a live call. Token count: zero.
Disable the toggle at any time and live traffic resumes, no restarts required.
A few details that matter when you start using it:
Agent Cache is built around three practical benefits that matter most during active development.
To make this tangible, walk through the agent-cache-demo—a minimal Python agent that takes a fixed bug report and returns a structured analysis (severity, category, a plain-English summary and a suggested next step).
The input never changes between runs, which makes it a perfect showcase for Agent Cache: the model’s answer to an identical prompt is always reusable, so there is genuinely no reason to pay for it more than once.
The core of agent.py is straightforward:
message = client.messages.create(
model=MODEL,
max_tokens=256,
system=SYSTEM_PROMPT,
messages=[
{"role": "user", "content": f"Analyze this bug report:\n\n{report}"}
],
)
It sends the bug report to the Claude API and expects a JSON response like this:
{
"severity": "high",
"category": "crash",
"summary": "App crashes with a NullPointerException when attempting to log in under no network connectivity.",
"suggested_next_step": "Add a null or connectivity check in NetworkManager.checkConnectivity() before network calls."
}
That response is then formatted and printed to the terminal:
── Bug Report Analysis ─────────────────────────────────────
Severity : HIGH
Category : crash
Summary : App crashes with a NullPointerException when attempting to
log in under no network connectivity.
Next step : Add a null or connectivity check in
NetworkManager.checkConnectivity() before network calls.
─────────────────────────────────────
Clone the repository and install dependencies:
git clone [https://github.com/NickIliev/agent-cache-demo](https://github.com/NickIliev/agent-cache-demo)
cd agent-cache-demo
python -m venv .venv
source .venv/bin/activate # macOS / Linux
.venv\Scripts\activate # Windows
pip install -r requirements.txt
export ANTHROPIC_API_KEY=sk-ant-... # macOS / Linux (Git Bash)
set ANTHROPIC_API_KEY=sk-ant-... # Windows (CMD)
The demo supports routing traffic through the Fiddler proxy or running directly against the provider. It also covers SSL/TLS trust configuration for HTTPS interception. See the repository README for full details on proxy setup, environment variables and certificate options.
Start Fiddler Everywhere and run the agent:
python agent.py
The terminal shows the result and, crucially, the token consumption:
[tokens] Input: 312 | Output: 68 | Total: 380
Switch to Fiddler Everywhere and open Traffic > Agent Calls. You will see the captured call to api.anthropic.com with the full request and response visible.

This is your baseline. You paid for 380 tokens. That is fair—you needed the live call to validate the end-to-end flow.
In the Agent Calls grid, find the captured session and flip its Caching switch to on. That is the entire configuration step.

Run the agent again:
python agent.py
The output in the terminal is byte-for-byte identical to the first run, including the token count display. Because the Caching switch was on, Fiddler software served the stored response immediately and never forwarded the request to the provider. The endpoint never saw the call.

You can now iterate on agent.py as many times as you need—refactor the display logic, adjust the JSON parsing, add logging—and none of those runs cost a single token.

Agent Cache is a development-stage tool. It is particularly valuable when:
Agent Cache is available on Fiddler Everywhere Trial, Pro and Enterprise tiers. The feature is not included in Lite licenses.
The full demo is on GitHub: github.com/NickIliev/agent-cache-demo. Clone it, set your Anthropic API key, and you can see the before-and-after token counts yourself in under five minutes.
The point is not really the 380 tokens saved in a single run. It is the dozens of runs you make in a typical development session, the parallel runs across a team—all of which can stop paying for answers they already have.
Agent Cache does not change how you build agents. It just removes the tax on iterating.
If you aren’t already using Fiddler Everywhere, it does come with a free trial:
Agent development workflows are still evolving quickly, and your feedback shapes what comes next. If you try Agent Cache during development—or if there is something you wish it did differently—we want to hear about it.
As a software consultant, I’ve noticed a pattern play out at nearly every client over the last year. A team adopts Cursor or Claude Code or Copilot and their productivity, especially on greenfield tasks, jumps noticeably. And then, someone asks: “If the AI can do this, what are the developers for?”
It’s a valid question, and one I’ve been thinking about myself as AI as improved at many software development tasks over the last year or so. Using these tools daily on client projects, internal work, side project has made the answer clear to me. No matter how good AI gets at some of our daily tasks, developers will still be needed for their systems thinking, their setting of guardrails for the AI, and most importantly, their human judgement.
For the last many decades, a huge part of professional software development is following the patterns that already exist in a codebase. You might stand up a new REST API, and that first endpoint is genuinely hard. You’re making real decisions about design patterns, systems architecture, URL structure, authentication/authorization, database access, caching, and error handling. But for endpoint two through twenty, you’re just following the recipe you already wrote. We run into this a lot with client teams. They might not have the experience to architect a well-designed system from scratch, but once we get them going on a good pattern, they can easily follow it for new features.
AI is also very good at following recipes. Point it at a codebase with established conventions and it’ll crank out the next endpoint, the next service method, or the next React component in the same shape as the ones before it. In that way it’s like a junior developer who reads the existing code before writing new code and carefully follows the established patterns.
So yes, a meaningful chunk of what we used to spend our days typing is now automatable. My fingers don’t hurt at the end of the day anymore, and I don’t think they’re going to again, especially as more developers leverage voice chat capabilities with their AI tools.
Here’s what I keep running into. These models are trained on an internet’s worth of code. A lot of that code, most of it really, is mediocre. Developers who have tried to find solutions to their questions on StackOverflow for the last deccade already know this. Whether it is tutorial snippets or Reddit, Stack Overflow answers written in a hurry, or open source projects with no review process, a lot of the code on the internet (and in the world) is poor-to-mediocre. These models are fundamentally averaging machines the guess at the most likely next word or token. If you give them a vague prompt, you will most like get back something that looks like the average of what’s out there on the internet. It might eventually compile, it might work, but it won’t reflect the specific decisions and tradeoffs your project needs.
I’ve never seen an AI look at a codebase and suggest a better architecture unless it’s specifically asked to by the developer running it. It probably won’t notice that your auth middleware has a subtle timing vulnerability. It won’t propose event sourcing because it picked up on a pattern of concurrency bugs in your shared state. It doesn’t know your deployment constraints, your team’s skill level, your future scalability needs, or the fact that your biggest customer hammers one particular endpoint at the same time every Monday morning.
Those are judgment calls that require human experience. In my experience, the developers getting the most out of AI right now are the ones whose judgment is already sharp. They can tell when the model missed and know exactly how to correct it.
We have a joke in consulting: clients say they have “detailed specs” and then hand you the title of their project. I’ve been doing this long enough to know that the gap between what someone says they want and what they actually need is where most project risk can be found.
AI has the exact same problem. A quick, vague prompt by someone without experience can generate a lot of impressive-looking code fast. Then you spend three times as long iterating it into something that actually meets the requirements–requirements you should have pinned down before you started generating anything.
The teams I’ve seen get real traction with AI-assisted development aren’t the ones who figured out some magic prompt template. They’re the ones who already had their software fundamentals together: clear requirements, fast CI pipelines, strong automated test coverage, pull request reviews that actually catch things. Those aren’t AI skills. Those are engineering discipline. AI just raised the stakes on them.
If your feedback loops are slow (if you don’t know your code is broken until it’s in staging) then AI is only going to make you produce broken code faster, and that’s not a win.
This one’s harder to talk about, and I think our industry is being too quiet about it. Hiring managers at companies I work with are pausing junior roles. Not eliminating them all together, just pausing hiring for them, so they can see how far their current teams can scale with AI. Honestly, it’s rough for new graduates right now.
I lived through the outsourcing scare of the mid-2000s. Teams I worked on lost people to offshore replacements. For a while it felt like the bottom was falling out. But it didn’t. The work evolved, the value proposition shifted, and eventually it stabilized. Those who moved up the value chain survived. I think something similar is happening with AI, but I’m not going to pretend it’s happening on the same timeline. AI is a much more quickly moving disruption.
What I’ll say is this: the developers entering the field now need to lead with judgment earlier than my generation did. Writing decent algorithms or code that compiles aren’t differentiators when AI can do that. Understanding why certain decisions matter and being able to look at generated code and say “this won’t scale” or “this misses the actual requirement,” that’s what will stand out in the marketplace. It’s a higher bar, and the industry owes it to junior devs to be upfront about that instead of pretending nothing has changed. Recently, Mark Russinovich and Scott Hanselman from Microsoft published an excellent paper about what organizations can do to avoid some of the pitfalls of this era for junior engineering talant.
After about year of working with these tools, here’s where I’ve landed. Developers aren’t paid for typing code. The best ones never really were. It just felt that way because typing in code took up so much of the day. Developers are paid for knowing which endpoint needs the cache and which one doesn’t. They are paid for catching that the generated migration will lock a production table for twenty minutes and slow down other critical tasks. They are paid for understanding that what product owner thinks they need isn’t what they actually need, and to help them see it.
AI now handles the mechanical translation of intent into code. Developers are the ones who make sure that intent is right in the first place and the ones who know how to fix it when it isn’t. And that’s not a new skill. It’s the skill that was always underneath the typing. We just get to spend more time on it now.
I recently went deep on the tactical side of all this with my guest Cory House on an extended edition of the Blue Blazes Podcast. We discussion choosing between AI harnesses, model selection, multi-agent workflows, and how to actually structure your prompts and feedback loops. If you’re adopting AI-assisted development on your team, give it a watch or listen.
The post With AI Writing Code, What Are Developers For? appeared first on Trailhead Technology Partners.