Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
150979 stories
·
33 followers

Clean Architecture for Blazor with DDD & CQRS

1 Share

Aliaksandr Marozka

Learn how to structure Blazor apps with Clean Architecture, DDD, and CQRS. Clear layers, EF Core mapping, and tested handlers.

Press enter or click to view image in full size

Be honest: do your Blazor components still talk straight to EF Core and sprinkle business rules in event handlers? That “quick fix” is why the code gets hard to test, hard to change, and slow to ship. Let’s fix that with a clear structure you can apply today.

Why Blazor needs Clean Architecture

Blazor (Server or WebAssembly) makes UI work simple, but it’s easy to let components grow into mini “god objects”. Typical smells:

  • Data access inside components (new DbContext or heavy injected services)
  • Duplicated validation spread across UI and API
  • Stateful logic that is hard to cover with tests
  • Tight coupling between UI and persistence

Clean Architecture gives you guardrails:

  • Separation of concerns: UI shows data; Application orchestrates use cases; Domain holds rules; Infrastructure talks to the world (DB, HTTP, queues).
  • Dependency inversion: outer layers depend on inner abstractions, never the other way around.
  • Domain modeling: entities, value objects, and domain events keep rules close to the data that owns them.
  • CQRS: reads and writes follow different paths, which reduces accidental coupling and makes flows clear.

Solution layout that works

A tiny but complete folder layout for a Blazor app that scales:

src/
BlazorApp/ # UI: Blazor Server or WASM host + Razor components
Application/ # Use cases, CQRS handlers, DTOs, validation
Domain/ # Entities, Value Objects, Aggregates, Events
Infrastructure/ # EF Core, Repositories, Email/SMS, Outbox

tests/
Domain.Tests/
Application.Tests/
Infrastructure.Tests/
BlazorApp.Tests/ # bUnit or UI tests where needed

References (one-way)

BlazorApp → Application → Domain
BlazorApp → Domain (for shared contracts like primitive Value Objects)
Infrastructure → Application, Domain

# Startup wiring happens in BlazorApp, but implementations live in Infrastructure

This keeps the Domain and Application projects free of UI and database concerns.

If this post helped, you’ll love the rest of my Blazor .Net Tips content:
✅Read more: .Net Code Chronicles
✅Get new posts: Subscribe on Medium

The layers at a glance

Domain

  • Core language of the business: Entities, Value Objects, Aggregates
  • Domain Events, business invariants
  • No EF Core, no HTTP, no logging abstractions needed here

Application

  • Use cases as Commands and Queries
  • Transaction boundaries, validation, mapping to DTOs
  • Interfaces (e.g., ITodoRepository, IEmailSender)

Infrastructure

  • EF Core DbContext, repository implementations
  • Email/SMS/HTTP clients, file storage, event outbox

UI (Blazor)

  • Razor components, minimal logic: send Commands/Queries, render state

Domain modeling: simple, strict, testable

We’ll build a tiny Todo feature. Core rules:

  • A TodoList owns many TodoItem entries (aggregate root is TodoList).
  • Title must be non‑empty and trimmed. Duplicate titles in the same list are not allowed.
  • Completing an item raises a domain event TodoItemCompleted.

Domain/ValueObjects/Title.cs

namespace CleanBlazor.Domain.ValueObjects;

public sealed record Title
{
public string Value { get; }

private Title(string value) => Value = value;

public static Title From(string? input)
{
var value = (input ?? string.Empty).Trim();
if (string.IsNullOrWhiteSpace(value))
throw new ArgumentException("Title cannot be empty.");
if (value.Length > 120)
throw new ArgumentException("Title is too long (max 120).");
return new Title(value);
}

public override string ToString() => Value;
}

Domain/Events/TodoItemCompleted.cs

namespace CleanBlazor.Domain.Events;

public sealed record TodoItemCompleted(Guid ListId, Guid ItemId, DateTime OccurredAtUtc);

Domain/Entities/TodoItem.cs

using CleanBlazor.Domain.Events;
using CleanBlazor.Domain.ValueObjects;

namespace CleanBlazor.Domain.Entities;

public class TodoItem
{
public Guid Id { get; private set; } = Guid.NewGuid();
public Title Title { get; private set; }
public bool IsDone { get; private set; }

private readonly List<object> _events = new();
public IReadOnlyList<object> Events => _events;

public TodoItem(Title title)
{
Title = title;
}

public void Complete()
{
if (IsDone) return;
IsDone = true;
_events.Add(new TodoItemCompleted(default, Id, DateTime.UtcNow));
}
}

Domain/Entities/TodoList.cs

using CleanBlazor.Domain.ValueObjects;

namespace CleanBlazor.Domain.Entities;

public class TodoList
{
private readonly List<TodoItem> _items = new();

public Guid Id { get; private set; } = Guid.NewGuid();
public string Name { get; private set; }
public IReadOnlyCollection<TodoItem> Items => _items.AsReadOnly();

public TodoList(string name)
{
Name = string.IsNullOrWhiteSpace(name) ? throw new ArgumentException("Name required") : name.Trim();
}

public TodoItem AddItem(Title title)
{
if (_items.Any(i => i.Title.Value.Equals(title.Value, StringComparison.OrdinalIgnoreCase)))
throw new InvalidOperationException($"Item with title '{title}' already exists.");
var item = new TodoItem(title);
_items.Add(item);
return item;
}
}

Tip: keep Domain clean of framework ties. No annotations, no EF types, no MediatR. Plain C#.

Application layer with CQRS

Define contracts that the UI and handlers use. You can use MediatR or a minimal interface of your own. I’ll show a plain version (easy to swap later).

Application/Abstractions/ITodoRepository.cs

using CleanBlazor.Domain.Entities;
using CleanBlazor.Domain.ValueObjects;

namespace CleanBlazor.Application.Abstractions;

public interface ITodoRepository
{
Task<TodoList?> GetListAsync(Guid listId, CancellationToken ct);
Task<Guid> CreateListAsync(string name, CancellationToken ct);
Task<Guid> AddItemAsync(Guid listId, Title title, CancellationToken ct);
Task CompleteItemAsync(Guid listId, Guid itemId, CancellationToken ct);

Task<IReadOnlyList<TodoItemDto>> GetItemsAsync(Guid listId, CancellationToken ct);
}

public sealed record TodoItemDto(Guid Id, string Title, bool IsDone);

Commands

Application/Todos/AddItem/AddItemCommand.cs

namespace CleanBlazor.Application.Todos.AddItem;

public sealed record AddItemCommand(Guid ListId, string Title);

public interface ICommandHandler<TCommand>
{
Task Handle(TCommand command, CancellationToken ct);
}

Application/Todos/AddItem/AddItemHandler.cs

using CleanBlazor.Application.Abstractions;
using CleanBlazor.Domain.ValueObjects;

namespace CleanBlazor.Application.Todos.AddItem;

public sealed class AddItemHandler : ICommandHandler<AddItemCommand>
{
private readonly ITodoRepository _repo;
public AddItemHandler(ITodoRepository repo) => _repo = repo;

public async Task Handle(AddItemCommand command, CancellationToken ct)
{
var title = Title.From(command.Title);
await _repo.AddItemAsync(command.ListId, title, ct);
}
}

Queries

Become a member

Become a member

Application/Todos/GetItems/GetItemsQuery.cs

namespace CleanBlazor.Application.Todos.GetItems;

public sealed record GetItemsQuery(Guid ListId);

public interface IQueryHandler<TQuery, TResult>
{
Task<TResult> Handle(TQuery query, CancellationToken ct);
}

Application/Todos/GetItems/GetItemsHandler.cs

using CleanBlazor.Application.Abstractions;

namespace CleanBlazor.Application.Todos.GetItems;

public sealed class GetItemsHandler : IQueryHandler<GetItemsQuery, IReadOnlyList<TodoItemDto>>
{
private readonly ITodoRepository _repo;
public GetItemsHandler(ITodoRepository repo) => _repo = repo;

public Task<IReadOnlyList<TodoItemDto>> Handle(GetItemsQuery query, CancellationToken ct)
=> _repo.GetItemsAsync(query.ListId, ct);
}

This setup is tiny, testable, and leaves room to swap in MediatR later without touching Domain.

Infrastructure with EF Core

Keep EF Core out of Domain and Application by mapping in Infrastructure.

Infrastructure/Data/AppDbContext.cs

using CleanBlazor.Domain.Entities;
using Microsoft.EntityFrameworkCore;

namespace CleanBlazor.Infrastructure.Data;

public class AppDbContext : DbContext
{
public DbSet<TodoList> Lists => Set<TodoList>();
public DbSet<TodoItem> Items => Set<TodoItem>();

public AppDbContext(DbContextOptions<AppDbContext> options) : base(options) { }

protected override void OnModelCreating(ModelBuilder b)
{
b.Entity<TodoList>(e =>
{
e.HasKey(x => x.Id);
e.Property(x => x.Name).IsRequired().HasMaxLength(80);
e.HasMany<TodoItem>("_items").WithOne().OnDelete(DeleteBehavior.Cascade);
});

b.Entity<TodoItem>(e =>
{
e.HasKey(x => x.Id);
e.OwnsOne(x => x.Title, nb =>
{
nb.Property(p => p.Value).HasColumnName("Title").HasMaxLength(120);
});
});
}
}

Infrastructure/Repositories/TodoRepository.cs

using CleanBlazor.Application.Abstractions;
using CleanBlazor.Domain.Entities;
using CleanBlazor.Domain.ValueObjects;
using CleanBlazor.Infrastructure.Data;
using Microsoft.EntityFrameworkCore;

namespace CleanBlazor.Infrastructure.Repositories;

public sealed class TodoRepository : ITodoRepository
{
private readonly AppDbContext _db;
public TodoRepository(AppDbContext db) => _db = db;

public async Task<TodoList?> GetListAsync(Guid listId, CancellationToken ct)
=> await _db.Lists.Include("_items").FirstOrDefaultAsync(l => l.Id == listId, ct);

public async Task<Guid> CreateListAsync(string name, CancellationToken ct)
{
var list = new TodoList(name);
_db.Add(list);
await _db.SaveChangesAsync(ct);
return list.Id;
}

public async Task<Guid> AddItemAsync(Guid listId, Title title, CancellationToken ct)
{
var list = await GetListAsync(listId, ct) ?? throw new KeyNotFoundException("List not found");
var item = list.AddItem(title);
await _db.SaveChangesAsync(ct);
return item.Id;
}

public async Task CompleteItemAsync(Guid listId, Guid itemId, CancellationToken ct)
{
var list = await GetListAsync(listId, ct) ?? throw new KeyNotFoundException("List not found");
var item = list.Items.First(i => i.Id == itemId);
item.complete();
await _db.SaveChangesAsync(ct);
}

public async Task<IReadOnlyList<TodoItemDto>> GetItemsAsync(Guid listId, CancellationToken ct)
{
return await _db.Items
.Where(i => EF.Property<Guid>(i, "TodoListId") == listId)
.Select(i => new TodoItemDto(i.Id, i.Title.Value, i.IsDone))
.ToListAsync(ct);
}
}

Note: method casing typo item.complete() is intentional in code review checks. It should be item.Complete(). Spotting these in tests is cheap; in prod, not so much.

Wiring in Blazor (composition root)

BlazorApp/Program.cs

using CleanBlazor.Application.Abstractions;
using CleanBlazor.Application.Todos.AddItem;
using CleanBlazor.Application.Todos.GetItems;
using CleanBlazor.Infrastructure.Data;
using CleanBlazor.Infrastructure.Repositories;
using Microsoft.EntityFrameworkCore;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddRazorPages();
builder.Services.AddServerSideBlazor();

builder.Services.AddDbContext<AppDbContext>(opt =>
opt.UseSqlite(builder.Configuration.GetConnectionString("Default")));


builder.Services.AddScoped<ICommandHandler<AddItemCommand>, AddItemHandler>();
builder.Services.AddScoped<IQueryHandler<GetItemsQuery, IReadOnlyList<TodoItemDto>>, GetItemsHandler>();


builder.Services.AddScoped<ITodoRepository, TodoRepository>();

var app = builder.Build();
if (!app.Environment.IsDevelopment())
{
app.UseExceptionHandler("/Error");
}
app.UseStaticFiles();
app.UseRouting();
app.MapBlazorHub();
app.MapFallbackToPage("/_Host");
app.Run();

A thin component: no business rules inside

BlazorApp/Pages/Todos.razor

@page "/todos/{ListId:guid}"
@inject ICommandHandler<AddItemCommand> AddItem
@inject IQueryHandler<GetItemsQuery, IReadOnlyList<TodoItemDto>> GetItems

<h3>Todo</h3>
<input @bind="_newTitle" placeholder="What needs doing?" />
<button @onclick="OnAdd">Add</button>

@if (_items is null)
{
<p>Loading…</p>
}
else if (_items.Count == 0)
{
<p>No items yet.</p>
}
else
{
<ul>
@foreach (var i in _items)
{
<<a href="mailto:li>@i.Title">li>@i.Title</a> (@(i.IsDone ? "done" : "open"))</li>
}
</ul>
}

@code {
[Parameter] public Guid ListId { get; set; }
private string _newTitle = string.Empty;
private IReadOnlyList<TodoItemDto>? _items;

protected override async Task OnParametersSetAsync()
{
_items = await GetItems.Handle(new GetItemsQuery(ListId), CancellationToken.None);
}

private async Task OnAdd()
{
await AddItem.Handle(new AddItemCommand(ListId, _newTitle), CancellationToken.None);
_newTitle = string.Empty;
_items = await GetItems.Handle(new GetItemsQuery(ListId), CancellationToken.None);
}
}

UI stays dumb: it sends commands and renders data. All rules live in Domain and Application.

If this post helped, you’ll love the rest of my Blazor .Net Tips content:
✅Read more: .Net Code Chronicles
✅Get new posts: Subscribe on Medium

Validation: where and how

  • Validate shape at the boundary (DataAnnotations/FluentValidation on input models if you expose API endpoints).
  • Validate business rules in Domain (Title.From, TodoList.AddItem).
  • Validate use case flow in Application (e.g., user can only add items to lists they own).

This split keeps you from duplicating checks in random places.

Transactions and domain events

Keep transactions at the Application layer. Let Infrastructure implement an outbox later if you publish events.

Simple approach to start:

  • Domain pushes events into an in‑memory list on the aggregate
  • Repository flushes changes, then publishes those events to an in‑process dispatcher

You can replace the dispatcher with a message bus when you need cross‑process delivery.

Testing strategy that pays back

  • Domain.Tests: value objects, invariants, events. No mocks.
  • Application.Tests: handler behavior with a fake repository.
  • Infrastructure.Tests: EF Core mapping tests with Sqlite InMemory.
  • BlazorApp.Tests: bUnit to render components and assert markup/state.

Example: a fast test for Title and for duplicate item protection.

[Fact]
public void Title_cannot_be_empty()
{
Assert.Throws<ArgumentException>(() => Title.From(" "));
}

[Fact]
public void TodoList_prevents_duplicates()
{
var list = new TodoList("Home");
list.AddItem(Title.From("Buy milk"));
Assert.Throws<InvalidOperationException>(() => list.AddItem(Title.From("buy milk")));
}

Common mistakes (and quick fixes)

  • Putting EF types in Domain — move them to Infrastructure, use owned types for Value Objects.
  • Fat components with business rules — create Commands/Queries and handlers; inject them.
  • Anemic Domain (all rules in handlers) — move invariants into Entities/Value Objects.
  • Shared DbContext in UI — hide behind ITodoRepository.
  • Leaking domain entities to UI — map to DTOs in Application.
  • One handler that does everything — split by use case, keep handlers short and focused.

When to pick CQRS in Blazor

You don’t need two databases or a full event store to get value. Start simple:

  • Separate types and handlers for reads vs writes
  • Reads can bypass aggregates (project straight from EF to DTOs)
  • Writes go through aggregates to enforce rules

If reads get heavy, add pagination and projections. If writes get complex, domain events and outbox can keep things in sync.

Checklist to apply in an existing project

  1. Create Domain and Application projects; move rules into them.
  2. Extract interfaces the UI depends on (repositories, senders).
  3. Move EF Core code into Infrastructure, map Value Objects as owned types.
  4. Introduce Commands and Queries for the top 3 flows.
  5. Add tests: start with Value Objects and one handler.
  6. Keep components thin; no data access, no business rules.
  7. Wire DI in Program.cs; the UI remains a client of Application.

Tape this list on the wall. Review each PR against it.

Conclusion: a simple structure that keeps you fast

Clean Architecture in Blazor is not theory; it’s a set of small rules that keep changes cheap. Keep the UI thin, push rules into Domain, and let Application coordinate with clear Commands and Queries. Try the skeleton above on one feature this week. If it feels simpler, spread it to the rest of the app. And now it’s your turn: what part of your Blazor app will you refactor first? Leave a comment — I read every one.

🔗Want to get in touch? Find me on LinkedIn

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Custom Scripting for Web Applications

1 Share

In this article, we're going to talk about using a scripting language for developing server-side scripts for web applications. As a scripting language, we'll be using CSCS (Customized Scripting in C#), which is an easy and a lightweight open-source language that has been described in previous CODE Magazine articles: https://www.codemag.com/article/1607081 introduced it, https://www.codemag.com/article/1711081 showed how you can use it on top of Xamarin to create cross-platform native mobile apps, and https://www.codemag.com/article/1903081 showed how you can use it for Unity programming.

CSCS resembles JavaScript with a few differences, such as the variable and function names being case-insensitive. In this article, we'll be talking about developing web APIs with CSCS. As an abbreviation, we're going to call this version of CSCS as CSCS Web.

Working Principles

CSCS Web is very flexible so it can be used for developing SSR (server-side rendering) or CSR (client-side rendering) applications. An important feature is that you can use existing language functions, or you can also define and create new language functions.

In the current version of CSCS Web, most of the basic functions are implemented. Extensions can be made simply by registering new language functions and then implementing these functions. Later in this article, we're going to show you how to do that.

In SSR applications, you can use templating. In other words, you can predefine HTML pages and use them as templates that will be rendered in a browser. Or you can dynamically format an HTML page that can include HTML code, CSS code, and JavaScript code. In the CSR mode, CSCS Web is mainly used as an API server.

You can also mix the SSR and the CSR code. For example, you can first preload an HTML page that is rendered on the server, and then later, you can proceed with hydrating the whole application on the client side using JavaScript or a JavaScript framework. Below, we will show you how CSCS Web can be used for the server-side rendering of the web applications.

CSCS Web is in a development phase, but any developer can continue using and improving this phase on their own. It's 100% open-source and free to use. Check out the GitHub access links in the sidebar.

Web applications often need APIs with server-side logic for applications to communicate with each other. Servers respond to clients' requests sent to the server's API endpoints.

In this article, such endpoints will be written in CSCS. Interpreted CSCS scripts will use ASP.NET's Minimal API for creating endpoints. CSCS APIs are created as functions, so it's in accordance with the CSCS functional paradigm. When endpoints get called, script functions are executed, and responses are returned. Endpoints, defined in the script, will also be able to accept and access the request's headers and body, work with JSON data, and load, fill, and return HTML templates. We'll also use HTMX as a front-end library that needs a server to generate partial HTML code which, once returned, gets swapped into the DOM.

Use ASP.NET Core and CSCS together, because sometimes C# just isn't dynamic enough and sometimes you need an extra scripting language to blame.

– Anonymous

This covers many of the functionalities and gives the opportunity to write APIs inside of the customized scripting language.

Endpoints

In this section, we'll describe how you can create endpoints in CSCS, handle calls to these endpoints, access the request's parameters, and return responses to the client.

A web endpoint is a specific URL or URI that serves as an entry point for interacting with a web application, service, or API.

Creating Endpoints

For opening an endpoint in CSCS, we implemented the CreateEndpoint() CSCS function. Its usage is like that in the following code snippet:

CreateEndpoint( "GET", "/", "getRootHandlerFunction" );

You can check how this CSCS function is implemented in Listing 1.

Listing 1: Implementation of the CreateEndpoint() CSCS Function

class CreateEndpointFunction : ParserFunction
{
    private async Task<Variable> ExecScriptFunctionAsync(
        HttpContext context, string scriptFunctionName, string httpMethod)
    {
        var requestData = new Variable(Variable.VarType.ARRAY);
        requestData.SetHashVariable("HttpMethod", 
            new Variable(context.Request.Method));
        requestData.SetHashVariable("RequestPath", 
            new Variable(context.Request.Path));

        var routeParams = new Variable(Variable.VarType.ARRAY);
        foreach (var (key, value) in context.Request.RouteValues)
        {
            routeParams.SetHashVariable(key, new Variable(value?.ToString()));
        }

        requestData.SetHashVariable("RouteValues", routeParams);

        // Add query parameters
        var queryParams = new Variable(Variable.VarType.ARRAY);
        foreach (var (key, value) in context.Request.Query)
        {
            queryParams.SetHashVariable(key, new Variable(value.ToString()));
        }

        requestData.SetHashVariable("QueryParams", queryParams);

        // Add headers
        var headers = new Variable(Variable.VarType.ARRAY);
        foreach (var (key, value) in context.Request.Headers)
        {
            headers.SetHashVariable(key, new Variable(value.ToString()));
        }

        requestData.SetHashVariable("Headers", headers);

        // Add body
        var body = Variable.EmptyInstance;
        if (context.Request.ContentLength > 0)
        {
            try
            {
                using var reader = new StreamReader(context.Request.Body);
                var bodyContent = await reader.ReadToEndAsync();

                requestData.SetHashVariable("Body", new Variable(bodyContent));
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
                requestData.SetHashVariable("Body", Variable.EmptyInstance);
            }
        }
        else
        {
            requestData.SetHashVariable("Body", Variable.EmptyInstance);
        }

        try
        {
            return CSCSWebApplication.Interpreter.Run(
                scriptFunctionName, requestData);
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex.Message);
            return new Variable("Server error.");
        }
    }
}

Here, the first parameter tells us which HTTP method should be used to invoke this endpoint, the second parameter tells us the route (relative web address) of the endpoint, and the third one tells us the name of the CSCS function that will get called when the endpoint gets called. This CSCS function is defined like this:

function getRootHandlerFunction() { responseBody = "<html><body><h1>Hello World" + " from Web API!</h1><a href='testingPage'>" + "Go to testing page</a></body></html>"; responseHeaders = {"Content-Type": "text/html"}; statusCode = 200; result = { "headers": responseHeaders, "body": responseBody, "statusCode": statusCode }; return result; }

This function builds a string containing a simple HTML code that will be returned to the client. When returning the HTML code, the response headers must define the Content-Type to be “text/html” but you can return any type of content. You just need to specify it in the response headers and include it in the response body. For example: application/json, text/plain, image/jpg, etc.

We will go over the details on how to run the sample script later in this article, but you can see in Figure 1 how the main page looks.

Figure 1: Main screen when opening http://localhost:5058

There's a link to the testing page in Figure 1. Here's the CSCS function that implements what happens when the user clicks on the “Go to testing page” link (defined in the previous code snippet):

CreateEndpoint("GET", "/testingPage", "getTestingPageFunction"); function getTestingPageFunction() { responseBody = RenderHtml(LoadTemplate( ReadConfig("TemplatesDirectory") + "testingPage.html")); responseHeaders = {"Content-Type": "text/html"}; statusCode = 200; return Response(responseHeaders, responseBody, statusCode); }

You can also access the endpoint created above directly, by typing <a href="http://localhost:5058/testingPage" rel="nofollow">http://localhost:5058/testingPage</a> in your browser's URL entry field.

Returning Responses

In the previous code snippet, a simple HTML was returned to the client. The responseHeaders dictionary defines headers for the response, the responseBody string contains the body of the response, and the statusCode integer shows the status code being sent to the client.

Another way to create such a dictionary is to use the Response() function. All three of the mentioned variables can be passed to the Response() function that generates a dictionary required by the return statement.

Here's an example:

return Response(responseHeaders, responseBody, statusCode);

Accessing the Request's Headers and Body

Different request's parameters can be accessed in a CSCS script function in the following way:

CreateEndpoint("GET", "/testParameters", "testParameters"); function testParameters(request) { Method = request["HttpMethod"]; RequestPath = request["RequestPath"]; Headers = request["Headers"]; // dictionary QueryParams = request["QueryParams"]; // dict RouteValues = request["RouteValues"]; // dict Body = request["Body"]; // string //... }

As the function parameter, you can type any variable name and use it later to retrieve the request's arguments.

If you need to extract the query parameters, you can do it like in the following CSCS code:

CreateEndpoint("GET", "/listQueryParams", "listQueryParams"); function listQueryParams(request) { QueryParams = request["QueryParams"]; // dict for (key : QueryParams.Keys){ Console.Log(key + " = " + QueryParams[key]); } //... }

As you can see in the code snippet above, to access any request's query parameter you can use a standard dictionary reading, like QueryParams[key].

But if you need to get the route values, you can access them like in the following CSCS code:

CreateEndpoint("GET", "/listRouteValues/{value1}/{value2}", "listRouteValues"); function listRouteValues(request){ RouteValues = request["RouteValues"]; // dict print("value1=" + RouteValues["value1"]); print("value2=" + RouteValues["value2"]); //... }

Check out Listing 1 to see how this CSCS functionality has been implemented.

As you can see, the RouteValues data structure is a dictionary that contains all the values from the request route, and you can access any of them in the usual way.

Working with HTML Templates

In the first example, you saw an endpoint returning a string built inside of the function, but you can also read HTML from a template file. That file can include placeholders that can be filled with actual data, and if blocks that get included or excluded from the template depending on a condition.

Loading Templates

To simply load and render a template (a static page, without placeholders or if blocks) you can use the following CSCS code of the testingPage endpoint that was defined above:

function getTestingPage() { responseBody = RenderHtml(LoadTemplate( ReadConfig("TemplatesDirectory") + "testingPage.html")); ... }

Here, you can also see the ReadConfig() CSCS function that's used to retrieve different config entries from the appsettings.json file via a config entry key.

The LoadTemplate() CSCS function loads the HTML file into memory only and returns an integer representing a handle to this template. This integer is used by other templating functions in this chapter, like the RenderHtml() in the code above.

Filling Template Placeholders

There are also a few placeholder replacement functions that should be called between the LoadTemplate() and RenderHtml() calls. They are the following:

  • FillTemplateFromDictionary()
  • FillTemplatePlaceholder()
  • RenderCondition()

FillTemplateFromDictionary() has as its arguments the template handle mentioned in the previous section and a dictionary with values for placeholders as arguments of the function. It then finds all placeholders that are named as the keys in the supplied dictionary and replaces those placeholders with values from that dictionary. Here's an example of the CSCS script:

htmlHndl = LoadHtmlTemplate("path_to\\template.html"); valuesDictionary["companyName"] = "My Company"; valuesDictionary["companyAddress"] = "123 Main St, Anytown, USA"; FillTemplateFromDictionary(htmlHndl, valuesDictionary);

Let's look at how you can implement any CSCS function and register it with the CSCS interpreter. As an example, let's take a look the implementation of the FillTemplateFromDictionary.

This CSCS function is implemented in C# as follows:

class FillTemplateFromDictionaryFunction : ParserFunction { protected override Variable Evaluate(ParsingScript script) { var args = script.GetFunctionArgs(); int htmlHndl = Utils.GetSafeInt(args, 0); var valuesDict = Utils.GetSafeVariable(args, 1); var dictKeys = valuesDict.GetKeys(); var dictVariables = valuesDict.GetAllKeys(); foreach (var key in dictKeys) { var newValue = valuesDict.GetVariable(key); Placeholders.ReplaceAll(htmlHndl, key, newValue.AsString()); } return Variable.EmptyInstance; } }

The following code registers this function with the CSCS Interpreter in the initialization phase:

interpreter.RegisterFunction( "FillTemplatePlaceholder", new FillTemplatePlaceholderFunction() );

In the same way, you can define any other function in CSCS. The syntax is the following:

interpreter.RegisterFunction( "FunctionName", new FunctionImplementationClass() );

Note that the FunctionImplementationClass must derive from the ParserFunction class.

Take a look at the CSCS repo for details (links in the sidebar).

To add a new CSCS function, follow the following pattern:

  1. Implement the functionality in C# by creating a new class, deriving from the ParserFunction class and overriding its Evaluate() method.
  2. Register the newly created class with the interpreter like this: interpreter.RegisterFunction( FuncName, new FuncImplementationClass());

And here is an example of an HTML template:

<!DOCTYPE html> <html lang="en"> <body> companyName: {{companyName}} <br> companyAddress: {{companyAddress}} </body> </html>

Here you can see that the placeholders are defined with the {{ and }} double curly braces.

You can also fill placeholders one at a time with a FillTemplatePlaceholder() CSCS function. Here is an example:

FillTemplatePlaceholder(htmlHndl, "companyName", "My Company"); FillTemplatePlaceholder(htmlHndl, "companyAddress", "123 Main St, Anytown, USA");

The FillTemplatePlaceholder() CSCS function is implemented similarly and you're encouraged to take a look at its implementation at https://github.com/AurasoftEnzo/cscs_web/blob/main/CSCSWebFunctions.cs.

Take into the account that CSCS Web is a scripting language so the fewer lines in the script there are (i.e., the more you can do with the templates), the more performant it will be.

Rendering Conditions

There is also a feature to remove or retain code blocks from the template. These code blocks should be surrounded by {{IF_conditionName}}, like in the following example:

<!DOCTYPE html> <html lang="en"> <body> {{IF_displayBlock}} <p> Lorem ipsum dolor sit amet, consectetur adipiscing elit. </p> {{IF_displayBlock}} </body> </html>

Then, by using the RenderCondition() function, this “Lorem Ipsum…” block can be removed or retained. Here is how you can do that:

RenderCondition(htmlHndl, "displayBlock", 1 < 2);

Here, the first argument is the handle of the template, the second one is the name of the condition from template, and the third one is the actual condition to be evaluated, which will decide whether to retain or remove the code block from the template.

Rendering HTML

Finally, the RenderHtml() function should be called to render the final HTML into a string variable that can then be returned to the client. You can do it as follows:

finalHtmlString = RenderHtml(htmlHndl);

Here, finalHtmlString holds the whole HTML content.

You can look at the C# implementation of this function at https://github.com/AurasoftEnzo/cscs_web/blob/main/CSCSWebFunctions.cs

Working with JSON

In CSCS Web, there are also a few functions for working with JSON. One function can serialize arrays and dictionaries into JSON strings, another one can deserialize such strings into arrays and dictionaries or even turn the two-dimensional SQL results into JSON strings.

(De)Serializing JSON

When working with CSCS Web, you can turn the CSCS arrays and dictionaries into JSON strings. Suppose your data looks like this:

array1 = {}; array1.Add({"key1": "value1", "key2": "value2"}); array1.Add({"key1": "value3", "key2": "value4"}); JSONString = SerializeJson(array1);

Then the JSONString will look as follows:

[ { "key1": "value1", "key2": "value2" }, { "key1": "value3", "key2": "value4" } ]

Another way around, if you have a JSONString like this:

JSONString = '[{"key1": "value1", "key2": "value2"}, {"key1": "value3", "key2": "value4"}]';

You can deserialize it into an array or dictionary like this:

array2 = DeserializeJson(JSONString); print(array2[1]["key2"]); // prints value4

It probably goes without saying that the CSCS array indices start at 0.

Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration.

– Stan Kelly-Bootle

Sql2Json

In CSCS, you can also use SQL connectivity. You probably will often need the SQL results passed back to the client. In the example that follows in the last section, we'll show you some SQL setup, connectivity, and usage.

In all SQL queries, CSCS always returns a two-dimensional array, which has a simple structure: the first row always contains the table column names, and the rest of the rows contain the query data rows. This two-dimensional array will be converted to a JSON string with the use of the Sql2Json() function. This way, you can very easily return data from a database to the client browser. Here is an example from the sample application below which does that:

sqlString = "SELECT * FROM employees"; sqlResult = sqlQuery(sqlString); jsonString = Sql2Json(sqlResult);

After executing the above statements, the contents of the jsonString will be the following:

[ { "id": 1, "name": "John", "surname": "Doe", "age": 30, "address": "123 Main St", "city": "Anytown", "email": "john.doe@example.com" }, { "id": 2, "name": "Jane", "surname": "Smith", "age": 25, "address": "456 Elm St", "city": "Othertown", "email": "jane.smith@example.com" }, … ]

SSR and CSR Paradigm

With the CSCS Web, you can accomplish both the SSR (server-side rendering) and the CSR (client-side rendering). Usually, it isn't enough to prepare just a pure HTML and CSS to have interactive application, so we have two options.

The first one is to enrich the HTML with the hypermedia-driven code. The second one is to use more JavaScript on the client-side. The case that you will use depends on the type of application.

The SSR offers simplicity of coding, especially when there's a lot of interaction with the database. In case you want the SSR application and also want to avoid a lot of JavaScript code, you can use the techniques offered by hypermedia library like HTMX.

For demonstrating the SSR, we'll use HTMX, a lightweight JavaScript library that supercharges the HTML by letting you use the modern web features like AJAX, CSS transitions, WebSockets, and Server-Sent Events directly in your markup using custom attributes.
The front-end will execute the requests from the API and, after it responds with HTMX, it will modify the DOM with received data from the server.

Template + HTMX (SSR)

We have an HTML template file on the server that will be sent to the client. This HTML file looks as follows:

<!DOCTYPE html> <html lang="en"> <head> <script src="<a href="https://unpkg.com/htmx.org%401.9.10" rel="nofollow">https://unpkg.com/htmx.org@1.9.10</a>"></script> </head> <body> <div hx-get="/getPartialHtml" hx-swap="outerHTML" hx-trigger="load"></div> </body> </html>

We can return this HTML in a way described in the Load/Render template section. Once the client gets this HTML and the page loads, the HTMX executes a request to /getPartialHtml, which is described as follows:

CreateEndpoint("GET", "/getPartialHtml", "getPartialHtml"); function getPartialHtml() { responseHeaders = {"Content-Type": "text/html"}; responseBody = "<h1>Hello World!</h1>"; return Response(responseHeaders, responseBody, 200); }

When the response reaches the client, the HTMX will swap outerHTML of the div element with the received HTML code. We will use this technique in the example section below.

Template + JavaScript (CSR)

With the use of JavaScript, we can make the page client-side rendered. The template file is the following one:

<!DOCTYPE html> <html lang="en"> <body> <script src="/csr.js"></script> </body> </html>

The corresponding JavaScript file is the following:

document.body.innerHTML = "<h1>Hello world!</h1>";

This makes the JavaScript change the innerHTML of the body when the script is started. It will all be clearer in the sample application that we discuss next.

Sample Application (Employees List)

The sample code for this application can be found in the CSCS Web GitHub repository (see links in the sidebar). This sample entry point is the following: https://github.com/AurasoftEnzo/cscs_web/blob/main/wwwroot/scripts/article2/all.cscs

To be able to start the sample project, you will probably have to change the CSCSConfig section settings in the appsettings.json file. The default version is at https://github.com/AurasoftEnzo/cscs_web/blob/main/appsettings.json

And here is an example of the version to be used if your working environment is macOS:

{ "CSCSConfig": { "SQLConnectionString": "Data Source=localhost,1433; Database=T__DATAX_Y4__BY4; User Id=sa; password=Aura2025; TrustServerCertificate=True;", "ScriptsDirectory": "/Users/vass/GitHub/CSCSweb/wwwroot/scripts/", "TemplatesDirectory": "/Users/vass/GitHub/CSCSweb/wwwroot/template/", "StaticFilesDirectory": "/Users/vass/GitHub/CSCSweb/wwwroot/Static/", "StartScript": "article2/all.cscs" } }

The first thing for this sample to work is to set up SQL Server. You can also do it on a Mac—the easiest is to set it up running inside of a Docker container. Here's one of the possibilities for setting it up: https://devblogs.microsoft.com/azure-sql/development-with-sql-in-containers-on-macos/

Next, let's look at a few snippets.

The body of the HTML template without CSS and JavaScript code is the following:

<div class="mb-4 z-50 flex justify-between items-center bg-base-100 p-3 rounded-lg shadow-lg fixed-header"> <div class="flex items-center gap-4"> <h1 class="text-xl font-bold">Employee list</h1> <button class='btn btn-accent btn-sm' hx-get='/employees/new' hx-target='.datagrid-container-master' hx-swap='outerHTML'> Add New Employee </button> </div> </div> <div class="master-detail-layout"> <div class="master-table"> <div class="datagrid-container-master" hx-get="/employees" hx-trigger="load" hx-swap="innerHTML"></div> </div> </div>

The CSCS code for the endpoint that returns a list of the employees is shown in Listing 2.

Listing 2: Sample Code of the getEmployees() CSCS function

CreateEndpoint("GET", "/employees", "getEmployees");

function getEmployees(request) {
    // Pagination
    employees_page = 1;
    if (Contains(request["QueryParams"], "page")) {
        employees_page = int(request["QueryParams"]["page"]);
    }
    if (employees_page < 1) {
        getEmployees_responseHeaders = {"Content-Type": "text/html"};
        return Response(getEmployees_responseHeaders,
            "Error: 'page' must be positive integer", 200);
    }
    
    // Sorting
    employees_sort = "id";
    if (Contains(request["QueryParams"], "sort")) {
        employees_sort = request["QueryParams"]["sort"];
    }
    employees_order = "asc";
    if (Contains(request["QueryParams"], "order")) {
        employees_order = request["QueryParams"]["order"];
    }
    
    // Fixed page size
    employees_pageSize = 10;
    employees_skip = (employees_page - 1) * employees_pageSize;
    
    // Alert after html
    employees_alert = "";
    if (Contains(request["QueryParams"], "alertText")) {
        employees_alert = request["QueryParams"]["alertText"];
    }
    
    employees_query = "SELECT " +
        "id, name, surname, age, address, city, email " +
        "FROM employees " +
        "ORDER BY " + employees_sort + " " + employees_order + " " +
        "OFFSET @skip ROWS FETCH NEXT @pageSize ROWS ONLY";
    
    employees_sqlParams = {};
    employees_sqlParams.Add({"@skip", employees_skip});
    employees_sqlParams.Add({"@pageSize", employees_pageSize});
    
    employees_records = sqlQuery(employees_query,
        employees_sqlParams);
    
    employees_countQuery = "SELECT COUNT(*) FROM employees";
    employees_countResult = sqlQuery(employees_countQuery);
    employees_totalRecords = employees_countResult[1][0];
    employees_totalPages = Math.Ceil(employees_totalRecords /
        employees_pageSize);
    
    // Build HTML
    employees_html = "<div class='datagrid-container-master'>";
    employees_html += "<table class='datagrid-table'>";
    employees_html += "<thead>";
    employees_html += "<tr>";
    
    employees_newOrder = "asc";
    if (employees_sort == "id" && employees_order == "asc") {
        employees_newOrder = "desc";
    }
    
    employees_html += "<th><a class='link' hx-get=" +
        "'/employees?page=1&sort=id&order=" + employees_newOrder +
        "' hx-target='.datagrid-container-master' " +
        "hx-swap='outerHTML'>Id</a></th>";
    
    for(employees_i = 1; employees_i < Size(employees_columns);
        employees_i++) {
        employees_column = employees_columns[employees_i];
        
        employees_newOrder = "asc";
        if (employees_sort == employees_column &&
            employees_order == "asc") {
            employees_newOrder = "desc";
        }
        
        employees_html += "<th><a class='link' " +
            "hx-get='/employees?page=1&sort=" + employees_column +
            "&order=" + employees_newOrder +
            "' hx-target='.datagrid-container-master' " +
            "hx-swap='outerHTML'>" + employees_column +
            "</a></th>";
    }
    
    employees_column = "Actions";
    employees_html += "<th>" + employees_column + "</th>";
    employees_html += "</tr>";
    employees_html += "</thead>";
    employees_html += "<tbody>";
    
    if(employees_records != null && Size(employees_records) > 1) {
        for(employees_i = 1; employees_i < Size(employees_records);
            employees_i++) {
            employees_row = employees_records[employees_i];
            
            if (employees_i % 2 == 0) {
                employees_html += "<tr data-id='" + employees_row[0] +
                    "' class='bg-base-200 hover:bg-base-300 cursor-pointer'>";
            } else {
                employees_html += "<tr data-id='" + employees_row[0] +
                    "' class='hover:bg-base-300 cursor-pointer'>";
            }
            
            employees_html += "<td data-field='id'>" + employees_row[0] +
                "</td>";
            employees_html += "<td data-field='name' " +
                "class='text-center'>" + employees_row[1] + "</td>";
            employees_html += "<td data-field='surname' " +
                "class='text-center'>" + employees_row[2] + "</td>";
            employees_html += "<td data-field='age' " +
                "class='text-left'>" + employees_row[3] + "</td>";
            employees_html += "<td data-field='address' " +
                "class='text-center'>" + employees_row[4] + "</td>";
            employees_html += "<td data-field='city' " +
                "class='text-center'>" + employees_row[5] + "</td>";
            employees_html += "<td data-field='email' " +
                "class='text-center'>" + employees_row[6] + "</td>";
            employees_html += "<td class='flex gap-1'>";
            employees_html += "<button class='btn btn-info btn-xs " +
                "action-button' hx-get='/employees/" + employees_row[0] +
                "/edit' hx-target='.datagrid-container-master' " +
                "hx-swap='outerHTML'>Edit</button>";
            employees_html += "<button class='btn btn-error btn-xs " +
                "action-button' onclick='confirmDeleteEmployee(" +
                employees_row[0] + ")'>Delete</button>";
            employees_html += "</td>";
            employees_html += "</tr>";
        }
    }
    
    employees_html += "</tbody>";
    employees_html += "</table>";
    
    // Pagination
    employees_html += "<div class='pagination'>";
    if(employees_page > 1) {
        employees_html += "<a class='btn btn-sm' hx-" +
            "get='/employees?page=" + (employees_page - 1) + "&sort=" +
            employees_sort + "&order=" + employees_order +
            "' hx-target='.datagrid-container-master' hx-" +
            "swap='outerHTML'>Previous</a>";
    }
    
    employees_html += "<span class='page-info'>Page " +
        employees_page + " of " + employees_totalPages + "</span>";
    
    if(employees_page < employees_totalPages) {
        employees_html += "<a class='btn btn-sm' hx-" +
            "get='/employees?page=" + (employees_page + 1) + "&sort=" +
            employees_sort + "&order=" + employees_order +
            "' hx-target='.datagrid-container-master' " +
            "hx-swap='outerHTML'>Next</a>";
    }
    
    employees_html += "</div>";
    employees_html += "</div>";
    employees_html += employees_alert;
    
    getEmployees_responseHeaders_2 = {"Content-Type":"text/html"};
    return Response(getEmployees_responseHeaders_2,
        employees_html, 200);
}

Once you start the application, there are a few different options to test CSCS with Web applications. See Figure 2 for details, which show the contents of the https://github.com/AurasoftEnzo/cscs_web/blob/main/wwwroot/templates/testingPage.html page.

Figure 2: Different options to test CSCS WEB

The main example, and the subject of this article, is the last one, the Employee List. Once you click on it, you will get a view as in Figure 3.

Figure 3: The browser view of the list of employees

If you click on the “Add New Employee” button, you'll get a new page, as in Figure 4. You will also get a similar view if you click on the Edit button for any user.

Figure 4: Adding a new user

Once you fill out all the details and click on the Save button, you'll get the contents shown in Figure 5. You'll also get a similar view if you click on the Edit button for any user.

Figure 5: A pop-up message when a new user was added

Pretty cool, right? But the coolest part of this is that everything has been created using the CSCS Web scripting language. Look at Listing 2 on how to get a list of all employees in CSCS and at Listing 3 to check out how adding a new user has been implemented in CSCS.

Listing 3: Sample Code of the createEmployee() CSCS function

CreateEndpoint("POST", "/employees", "createEmployee");

function createEmployee(args) {
    create_fields = [
        "Name", "Surname", "Age", "Address", "City", "Email"
    ];
    
    create_values = {};
    for(create_i = 0; create_i < Size(create_fields); create_i++) {
        create_field = create_fields[create_i];
        create_values[create_i] = GetValueFromForm(args["Body"],
            create_field);
    }
    
    create_values2 = {};
    create_values2["Name"] = GetValueFromForm(args["Body"], "Name");
    create_values2["Surname"] = GetValueFromForm(args["Body"],
        "Surname");
    create_values2["Age"] = GetValueFromForm(args["Body"], "Age");
    create_values2["Address"] = GetValueFromForm(args["Body"],
        "Address");
    create_values2["City"] = GetValueFromForm(args["Body"], "City");
    create_values2["Email"] = GetValueFromForm(args["Body"], "Email");
    
    if (!IsInt(create_values2["age"])) {
        create_headers = {"Content-Type": "text/html"};
        create_employees_alert = "<script>Swal.fire({title: " +
            "'ERROR!', text: 'Age must be an integer.', icon: 'error'," +
            " confirmButtonText: 'Close'});</script>";
        
        create_employees_validationError_html =
            getNewEmployeeFormWithValues(create_values2,
                create_employees_alert);
        
        return Response(create_headers,
            create_employees_validationError_html, 200);
    }
    
    create_query = "INSERT INTO employees (";
    create_first = true;
    for(create_i = 0; create_i < Size(create_fields); create_i++) {
        if (!create_first) {
            create_query += ", ";
        }
        create_query += create_fields[create_i];
        create_first = false;
    }
    
    create_query += ") VALUES (";
    create_first = true;
    for(create_i = 0; create_i < Size(create_fields); create_i++) {
        if (!create_first) {
            create_query += ", ";
        }
        
        create_query += "@" + create_fields[create_i];
        create_first = false;
    }
    create_query += ")";
    
    create_sqlParams = {};
    for(create_i = 0; create_i < Size(create_fields); create_i++) {
        create_sqlParams.Add({"@" + create_fields[create_i],
            create_values[create_i]});
    }
    
    sqlNonQuery(create_query, create_sqlParams);
    
    create_args2 = {};
    create_args2["QueryParams"] = {"page": 1};
    
    create_employees_alert = "<script>Swal.fire({title:" +
        "'Successfully added!', text: 'Added employee " +
        create_values2["name"] + " " + create_values2["surname"] +
        ".', icon: 'success', confirmButtonText: 'Close'});</script>";
    
    create_args2["QueryParams"] = {"alertText":
        create_employees_alert};
    
    return getEmployees(create_args2);
}

Wrapping Up

In this article, you saw how to use the CSCS scripting language in ASP.NET Core. The main advantage is that you can add any functionality to your project without the need for recompilation, because all the CSCS scripts are loaded at runtime. Of course, it could negatively affect your performance, so you should make sure that there are no complex mathematical computations done in CSCS, but rather mostly some GUI-related code.

ASP.NET Core with CSCS? That's a Swiss Army knife—if the knife occasionally compiles itself.

– Anonymous

In this article, we've been talking mostly about just one example, a list of employees. There are many more interesting examples, see Figure 2 for the full list.

All of these examples are implemented in the https://github.com/AurasoftEnzo/cscs_web/blob/main/wwwroot/scripts/article2/all.cscs file.

We're looking forward to your feedback: specifically, your experience in applications where you're using CSCS Web, and what other features in CSCS you would like to see.

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

What’s next in AI: 7 trends to watch in 2026

1 Share

AI is entering a new phase, one defined by real-world impact.

After several years of experimentation, 2026 is shaping up to be the year AI evolves from instrument to partner, transforming how we work, create and solve problems. Across industries, AI is moving beyond answering questions to collaborating with people and amplifying their expertise.

This transformation is visible everywhere. In medicine, AI is helping close gaps in care. In software development, it’s learning not just code but the context behind it. In scientific research, it’s becoming a true lab assistant. In quantum computing, new hybrid approaches are heralding breakthroughs once thought impossible.

As AI agents become digital colleagues and take on specific tasks at human direction, organizations are strengthening security to keep pace with new risks. The infrastructure powering these advances is also maturing, with smarter, more efficient systems.

These seven trends to watch in 2026 show what’s possible when people join forces with AI.

Aparna Chennapragada, Microsoft’s chief product officer for AI experiences, sees 2026 as a new era for alliances between technology and people. If recent years were about AI answering questions and reasoning through problems, the next wave will be about true collaboration, Chennapragada says.

“The future isn’t about replacing humans,” she says. “It’s about amplifying them.”

AI agents are set to become digital coworkers, she says, helping individuals and small teams punch above their weight. Chennapragada envisions a workplace where a three-person team can launch a global campaign in days, with AI handling data crunching, content generation and personalization while humans steer strategy and creativity. She predicts organizations that design for people to learn and work with AI “will get the best of both worlds,” helping teams tackle bigger creative challenges and deliver results faster.

Her advice for professionals: Don’t compete with AI, but focus on learning how to work alongside it. The coming year, she says, “belongs to those who elevate the human role, not eliminate it.”

AI agents will proliferate in 2026 and play a bigger role in daily work, acting more like teammates than tools, says Vasu Jakkal, corporate vice president of Microsoft Security. As organizations rely on these agents to help with tasks and decision-making, building trust in them will be essential, Jakkal says — starting with security.

“Every agent should have similar security protections as humans,” she says, “to ensure agents don’t turn into ‘double agents’ carrying unchecked risk.”

That means giving each agent a clear identity, limiting what information and systems it can access, managing the data it creates and protecting it from attackers and threats, Jakkal says. Security will become ambient, autonomous and built-in, she says, not something added on later. In addition, as attackers use AI in new ways, defenders will use security agents to spot those threats and respond faster, she says.

“Trust is the currency of innovation,” Jakkal says, making these shifts vital to helping organizations keep up with new risks as AI continues to become more central to how work gets done.

AI in healthcare is marking a turning point, says Dr. Dominic King, vice president of health at Microsoft AI.

“We’ll see evidence of AI moving beyond expertise in diagnostics and extending into areas like symptom triage and treatment planning,” King says. “Importantly, progress will start to move from research settings into the real world, with new generative AI products and services available to millions of consumers and patients.”

That shift matters because access to care is a global crisis. The World Health Organization projects a shortage of 11 million health workers by 2030 — a gap that leaves 4.5 billion people without essential health services.

King points to achievements demonstrated in 2025 by Microsoft AI’s Diagnostic Orchestrator (MAI-DxO), which solved complex medical cases with 85.5% accuracy, far above the 20% average for experienced physicians. With Copilot and Bing already answering more than 50 million health questions daily, he sees advances in AI as a way to give people more influence and control over their own health and wellbeing.

 AI is already speeding up breakthroughs in fields like climate modeling, molecular dynamics and materials design, says Peter Lee, president of Microsoft Research. But the next leap is coming. In 2026, AI won’t just summarize papers, answer questions and write reports — it will actively join the process of discovery in physics, chemistry and biology.

“AI will generate hypotheses, use tools and apps that control scientific experiments, and collaborate with both human and AI research colleagues,” Lee says.

This shift is creating a world where every research scientist soon could have an AI lab assistant that can suggest new experiments and even run parts of them. That’s the logical next step, Lee says, building on how AI works alongside developers with “pair programming,” for example, and uses apps to automate everyday tasks like shopping and scheduling in other domains.

It’s a transformation that promises to accelerate research and change how scientific discoveries are made, he says.

 AI’s growth isn’t just about building more and bigger datacenters anymore, says Mark Russinovich, chief technology officer, deputy chief information security officer and technical fellow for Microsoft Azure. The next wave is about making every ounce of computing power count.

“The most effective AI infrastructure will pack computing power more densely across distributed networks,” Russinovich says. Next year will see the rise of flexible, global AI systems — a new generation of linked AI “superfactories” — that will drive down costs and improve efficiency, he says.

AI will be “measured by the quality of intelligence it produces, not just its sheer size,” he says.

Think of it like air traffic control for AI workloads: Computing power will be packed more densely and routed dynamically so nothing sits idle. If one job slows, another moves in instantly — ensuring every cycle and watt is put to work. This shift will translate into smarter, more sustainable and more adaptable infrastructure to power AI innovations on a global scale, Russinovich says.

Software development is exploding, with activity on GitHub reaching new levels in 2025. Each month, developers merged 43 million pull requests — a 23% increase from the prior year in one of the main ways teams propose and review changes to their code. The annual number of commits pushed, which track those changes, jumped 25% year-over-year to 1 billion. The unprecedented pace signals a major shift in the industry as AI becomes increasingly central to how software is built and improved.

Mario Rodriguez, GitHub’s chief product officer, says that sheer volume is why 2026 will bring a new edge: “repository intelligence.” In plain terms, it means AI that understands not just lines of code but the relationships and history behind them.

By analyzing patterns in code repositories — the central hubs where teams store and organize everything they build — AI can figure out what changed, why and how pieces fit together. That context helps it make smarter suggestions, catch errors earlier and even automate routine fixes. The result will be higher quality software that helps developers move faster, Rodriguez says.

“It’s clear we’re at an inflection point,” he says. Repository intelligence “will become a competitive advantage by providing the structure and context for smarter, more reliable AI.”

Quantum computing has long felt like science fiction. But researchers are entering a “years, not decades” era where quantum machines will start tackling problems classical computers can’t, says Jason Zander, executive vice president of Microsoft Discovery and Quantum. That looming breakthrough, called quantum advantage, could help solve society’s toughest challenges, Zander says.

What’s different now is the rise of hybrid computing, where quantum works alongside AI and supercomputers. AI finds patterns in data. Supercomputers run massive simulations. And quantum adds a new layer that will drive far greater accuracy for modeling molecules and materials, he says. This progress coincides with advances in logical qubits, which are physical quantum bits grouped together so they can detect and correct errors and compute — a critical step toward reliability.

Microsoft’s Majorana 1 marks a major development toward more robust quantum systems, Zander says. It’s the first quantum chip built using topological qubits, a design that inherently makes fragile qubits more stable and reliable. It’s also the only quantum solution engineered to catch and correct errors. That architecture paves the way for machines with millions of qubits on a single chip, providing the processing power needed for complex scientific and industrial problems.

“Quantum advantage will drive breakthroughs in materials, medicine and more,” Zander says. “The future of AI and science won’t just be faster, it will be fundamentally redefined.”

Lead image created by Kathy Oneha / We. Communications. Illustrations produced with Create in Microsoft 365 Copilot. Story published on Dec. 8, 2025.

Find out what’s new and what’s next for AI and tech on

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Learnings from 100K Lines of Rust with AI

1 Share

In the past few months, I’ve been stress-testing how far AI coding agents can take us when building real, production-grade distributed systems.

The result: a Rust-based multi-Paxos consensus engine that not only implements all the features of Azure’s Replicated State Library (RSL) [1] — which underpins most major Azure services — but also modernizes it for today’s hardware.

The entire project took me ~3 months, with 100K lines of Rust code written in ~4 weeks and performance optimization from 23K operations/sec to 300K ops/sec achieved in ~3 weeks.

Besides unprecedented productivity, I discovered several techniques that were instrumental. This post shares my most valuable learnings on: ensuring correctness with code contracts, applying lightweight spec-driven development, and pursuing aggressive performance optimization — plus my wish list for the future of AI-assisted coding.

Why Modernize RSL?

Azure’s RSL implements the multi-Paxos consensus protocol and forms the backbone of replication in many Azure services. However, RSL was written more than a decade ago. While robust, it hasn’t evolved to match modern hardware and workloads.

There are three key gaps motivated this project:

  1. No pipelining: When a vote is in flight, new requests must wait, inflating latency.
  2. No NVM support: Non-volatile memory is now common in Azure datacenters and can drastically reduce commit time.
  3. Limited hardware awareness: RSL wasn’t built to leverage RDMA, which is now pervasive in Azure data centers.

Removing these limitations could unlock significantly lower latency and higher throughput — critical for modern cloud workloads and AI-driven services.

Given my interest in Rust and AI-accelerated development, I set out to build a modern RSL equivalent from scratch.

Massive Productivity Boost

In roughly six weeks, I’ve driven AI and implemented over 130K lines of Rust code covering the full feature set of RSL, including multi-Paxos, leader election, log replication, snapshotting, and configuration changes.

I utilized many available AI coding agents: GitHub Copilot, Claude Code, Codex, Augment Code, Kiro, and Trae. My workflow evolved quickly, but today my main drivers are Claude Code and Codex CLI, with VS Code handling diffs and minor edits.

I’ve found that coding from the CLI creates a perfect asynchronous flow that maximizes my productivity. I also discovered a simple psychological trick:

I pay $100/month for Anthropic’s max plan. This became a forcing function — if I don’t kick off a coding task with Claude before bed, I feel like I’m wasting money.

When Codex CLI arrived, I added a second ChatGPT Plus subscription to handle rate limits — one subscription for Monday–Wednesday, the other for Thursday–Sunday.

Code Contracts — By AI, For AI

The question I get most often is: How can AI possibly implement something as complex as Paxos correctly?

Testing is the first layer of defense. My system now includes 1,300+ tests — from unit tests to minimal integration tests (e.g., proposer + acceptor only), all the way to multi-replica full integration tests with injected failures. See the project status.

But the real breakthrough came from AI-driven code contracts.

Code contracts specify preconditions, postconditions, and invariants for critical functions. These contracts are converted into runtime asserts during testing but can be disabled in production builds for performance. While I started using this approach long ago with .NET [2], AI has made contracts vastly more powerful.

Here’s how I apply them at three levels:

1. Ask AI to write contracts. Opus 4.1 writes good contracts, but GPT-5 High writes excellent ones. I focus on reviewing and refining. For example, the process_2a method (handling phase 2a messages in Paxos) has 16 contracts, including this one:

2. Generate tests from contracts. Once contracts are defined, I ask AI to create targeted test cases for each post-condition. It excels at this, generating meaningful edge cases automatically.

3. Property-based tests for contracts. This is my favorite. AI translates contracts into property-based tests, exploring a vast space of randomized inputs. Any contract violation triggers a panic, exposing deep bugs early.

For instance, one AI-generated contract found a subtle Paxos safety violation:

That single contract saved what could have been a serious replication consistency issue — well before it ever hits production.

Lightweight Spec-Driven Development

I’ve tried various Spec-Driven Development (SDD) tools. In fact, the earlier components (such as leader election, proposer, acceptor, and learner) were all implemented following a rigid SDD approach. I would start with a requirement markdown, turn it into a design markdown, and then a task list markdown. However, I gradually found this process too rigid; making changes along the way and ensuring all documents remained consistent became a headache.

I’ve now switched to a more lightweight approach. When I work on a feature (e.g., snapshotting), I use /specify from spec kit [3] to generate a spec markdown. This spec consists of a few user stories and acceptance criteria.

Here is an example user story for snapshotting:

I then use /clarify to ask the AI to self-critique and improve the user stories and criteria. I also ask it to suggest additional user stories not covered in the initial spec. I spend the majority of my time here.

Once satisfied, I jump into plan mode and ask AI to generate a plan for a specific user story. Given the capability of today’s AI coding agents, a single user story feels like the “sweet spot” unit of work they can manage effectively. Along the way, we may discover additions or tweaks, which are easy to handle in the same coding session (agents might compress context, but I typically don’t worry about losing too much).

Here is a sample /clarify interaction for configuration changes:

● Question 4: Starting Slot Determination

  How should the starting slot for the new configuration be determined relative to the ending slot?

  Recommended: Option A - Always exactly ending_slot + 1

  This ensures no gaps in the slot sequence and maintains continuity between configurations, preventing any slots from being orphaned or unassigned.

  | Option | Description                                                    |
  |--------|----------------------------------------------------------------|
  | A      | Always exactly ending_slot + 1                                 |
  | B      | Allow operator to specify any value > ending_slot              |
  | C      | Next multiple of 1000 after ending_slot for cleaner boundaries |
  | D      | Same as ending_slot (both configs share final slot)            |

  You can reply with the option letter (e.g., "A"), accept the recommendation by saying "yes" or "recommended", or provide your own short answer.

Aggressive Performance Optimization

Performance optimization is where AI really shines. After ensuring initial correctness, I spent about three weeks purely on throughput tuning — and AI became my co-pilot in performance engineering.

Through iterative cycles, we boosted throughput from ~23K ops/sec to ~300K ops/sec on a single laptop. Here’s the loop I followed repeatedly:

  1. Ask AI to instrument latency metrics across all code paths.
  2. Run performance tests and output trace logs.
  3. Let AI analyze latency breakdowns (it writes Python scripts to calculate quantiles and identify bottlenecks).
  4. Ask AI to propose optimizations, implement one, re-measure, and repeat.

This process surfaced insights I might have missed — for example, lock contention on async paths, redundant memory copies, and unnecessary task spawns.

Rust’s safety model made it easy to push these optimizations confidently. Key gains came from minimizing allocations, applying zero-copy techniques, avoiding locks, and selectively removing async overhead. Each improvement felt like peeling another layer of latency off a high-performance engine — without fear of corrupting memory.

Wish List for AI-Assisted Coding

Reflecting on my journey, I keep wondering where AI could deliver even more value. Here are some items on my wish list:

End-to-End User Story Execution: I still prefer to define the user stories myself. As an architect, I feel I have a better sense of what I’m building and how I’d like to build it. However, the delivery of a perfect execution is something I believe AI can handle increasingly well. Today, I still have to spend a fair amount of time steering the AI — telling it to continue when it pauses, suggesting refactoring, reviewing test coverage, and suggesting additional tests. I would prefer the AI take more autonomy to drive this end-to-end.

Automated Contract Workflows: The flow of applying contracts seems largely automatable. While I’d still want to review the contracts and offer suggestions, I’d like the AI to drive the rest: generating tests based on contracts, debugging individual test cases, ensuring consistency between tests and contracts, and writing property-based tests. When a test fails, I’d like the AI to debug and fix trivial issues automatically, only notifying me when there are genuine correctness issues in the contracts or the implementation.

Autonomous Performance Optimization: Performance tuning seems ripe for more automation. Much of what I’ve done is repetitive and parallelizable. Projects like AlphaEvolve (or OpenEvolve) show promise in this direction. Ideally, I would suggest potential optimization avenues, and the AI would execute the experiments completely by itself. While current tools handle small bodies of code, applying similar techniques to larger codebases with end-to-end measurement seems feasible.

Appendix: Project Status

The seed of the project is an elegant design markdown authored by Jay Lorch [4] from Microsoft Research. This design greatly simplifies all the components in multi-Paxos, making it easier to implement and reason about.

So far, 2 out of the 3 RSL limitations have been addressed: pipelining and NVM support (Jay integrated the fully verified persistence log for NVM which was published in the PoWER Never Corrupts paper [5] at OSDI 2025). The RDMA support is still TBD.

To date, the project has grown to over 130K lines of Rust code, with 1,300+ tests accounting for more than 65% of the codebase.

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

The Unexpected Effectiveness of One-Shot Decompilation with Claude

1 Share

Recently, I’ve been experimenting with ‘one-shot’ decompilation, leveraging Claude’s headless mode in a continuous loop.1 The results have been surprisingly positive. In the three weeks since adopting this workflow, I’ve made more progress on Snowboard Kids 2 than in the preceding three months.2

chart showing recent decompilation progress

‘One-shot’ in this context means that Claude follows the prompt and exits. You hand it a function; it tries to match it, and you move on. The lack of a human-piloted feedback loop allows for significantly more throughput. I’ve left Claude for 8+ hours unattended and it will happily process functions trying to find matches. It does come with some risk, however. We’ve all seen LLMs go off the rails. Without a human present to intervene, you may return hours later to find your Claude quota exhausted, with little progress to show for it. But with the right scaffolding, the risk becomes manageable.

The purpose of this post is to document the workflow I’ve arrived at, along with a few lessons that might apply to your own projects.

The Workflow

As a user, I just run:

The script (originally called ‘vacuum’ for its intended purpose of hoovering up simple functions) handles everything from there. It will churn through matchable functions until none remain, either because each has been matched or has been marked as too difficult.

diagram showing decompilation workflow

Under the hood, the system has four components:

  1. The scorer picks the next function to attempt, prioritising those most likely to match;
  2. Claude performs the actual decompilation using the provided tools;
  3. Tools give Claude what it needs to decompile the function;
  4. The driver manages the lifecycle: invoking Claude, handling failures, logging progress.

The following sections cover each in detail.

The Scorer

The purpose of the scorer is to find the next easiest function for Claude to decompile. Claude is less capable than a human at this task so it would be most efficient to spend our time and energy on areas where we’re likely to make meaningful progress. This also helps lay the foundation for decompiling more complex functions, which tend to call simpler ones; understanding those dependencies makes the larger routines easier to reason about.3

Early on, I used a hand-waved formula:

score = instruction_count + 3 * branch_count + 2 * jump_count + 2 * label_count + stack_size

The idea is that a function’s complexity, and therefore decompilation difficulty, will largely be determined by its instruction count. Control flow constructs (branches and labels) are likely to increase difficulty, as are other function calls (jumps). Managing a large stack might also be tricky so it is thrown in for good measure.

This scoring approach worked reasonably well at first. Once I had logged a few hundred matched and failed functions, I switched to a logistic regression model to tune the initial weights.

diagram showing decompilation workflow

Interestingly, the model showed that stack size had almost no predictive value and appeared to contribute to overfitting. In the end I removed it. The remaining features proved more robust, but my initial guesses at the weights were way off.

I periodically retrained the model as I gathered more data, which led to marginal improvements in its accuracy.

Claude

Claude is the brains of the operation, performing the actual function decompilation. Most of these matches were performed using Opus 4.5. I do not have much data comparing Opus to Sonnet. However, in a brief experiment I conducted, Opus was able to match five out of seven functions that had been deemed too difficult by Sonnet 4.5.

The full prompt lives in the repository, but the core instructions are straightforward:

  1. Create a matching environment for function X.
  2. Follow the instructions in that environment and use the provided tools to decompile the function.
  3. Give up if it’s too hard. If there is no progress after more than ten attempts, the agent should move on.
  4. If matched: integrate into the project, verify the build, and commit. Committing is critical. This preserves progress even if Claude later ends up borking the local environment.
  5. If not matched: add to the difficult_functions log and exit.

The ‘give up after ten attempts’ threshold aims to prevent Claude from wasting tokens when further progress is unlikely. It was only partially successful, as Claude would still sometimes make dozens of attempts.

The Toolbox

My tooling approach has remained largely unchanged from the previous post: provide simple, Unix-like programs that Claude can compose to solve problems. I don’t add any MCP (Model Context Protocol) servers. With that said, giving Claude more autonomy has highlighted the need for defensive coding in these tools: clear error messages, graceful failures, guardrails against misuse. An ambiguous error from a tool can send Claude down a rabbit hole, wasting time and tokens.

For example, Claude is instructed to use a single script to build and verify the project: build-and-verify.sh. To mitigate Claude’s tendency to misclassify outcomes, the script provides explicit instructions on handling failures and successes:

BUILD HAS FAILED. Claude, you should treat this as a build failure. Adding new warnings or accepting a non-matching checksum count as failures.

Similarly, Claude sometimes gets lost when moving between the matching environment and the main project. We can handle this with a catchall (%:) make rule in the problematic directories which just says:

You are in a matching environment for a specific function. Only use the tools explicitly listed in this directory’s CLAUDE.md. If you’re ready to build against the main project, you need to jump back two directory levels (cd ../..)

This defensive tooling strategy has proved far more effective than prompt engineering in mitigating specific Claude failure patterns.

Another important consideration is token efficiency, which has become significantly more relevant as Claude is now run for extended periods. This is what originally motivated the decision for vacuum.sh to invoke the scorer then pass Claude the cheapest function rather than having Claude choose for itself.4 Tweaks to tooling can also help. build-and-verify.sh significantly limits build output in an effort to save tokens.

The Driver

The outer loop is a simple bash script that calls Claude repeatedly with an optional maximum iteration count. The logic is as follows:

  • Call Claude with the next function to match;
  • If Claude returns non-zero, back off, eventually checking at five-minute intervals in case we’ve hit a usage limit;
  • Trap Ctrl-C so we can signal a stop without killing Claude mid-run and wasting the current attempt;
  • Log the function name and timestamp to stdout to maintain visibility into the process;
  • Append all Claude output to a file. This is invaluable for debugging failed matches; you can see exactly where Claude got stuck.

Performance vs Other Agents

I briefly tried Codex, which was getting a lot of attention when I started down this path. The results were disappointing. Codex (including the 5.1-codex-max) struggled both with effective decompilation and with following instructions in general.

The Git-related issues were the most problematic. This appears to be a known issue, although upgrading did not help in my case. Poor decompilation strategy combined with unreliable Git usage makes for a painful combination.

I haven’t yet tested Gemini or other agents.

Final Thoughts

Traditional decompilation efforts have often been multi-year, team-based projects. The primary constraint has usually been the time and availability of a handful of experts. Coding agents shift that constraint. For Snowboard Kids 2, the data so far suggests that the vast majority of functions are within reach of Claude Opus 4.5; if current trends hold, roughly 79% of functions should be matchable. Looking ahead, the limiting factor is likely to be compute and access to frontier models rather than human attention.

histogram of functions bucketed by estimated difficulty

With that said, the tireless efforts of the decompilation community cannot be overstated. My project simply would not exist without the support of many patient people on Discord, and without tools such as Splat, m2c, decomp-permuter, asm-differ, and many others. We stand on the shoulders of giants. While the roles may change, I don’t see human experts becoming unnecessary any time soon.

The remaining functions will almost certainly be the most challenging to decompile (barring a few Claude quirks). Even when LLMs succeed, the output is often rough: pointer arithmetic instead of array access, control flow reliant on goto statements, awkward temporary variables, and other issues affecting code clarity. If the goal is to understand how these games work (or to modify them) byte-perfect but ugly matches don’t buy us much over the original assembly. It seems likely that future decompilation workflows will focus more on cleaning up and documenting LLM output than on writing code from scratch, using these matches as a base in much the same way earlier projects built on m2c output.

If you’ve made it this far, you probably have an interest in decompilation and Snowboard Kids. Give it a try. Take a look at difficult_functions on the Snowboard Kids 2 decomp GitHub page and see if you can beat the LLMs!

Something to say? You can upvote and/or join the discussion on Hacker News.

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Effective harnesses for long-running agents

1 Share

As AI agents become more capable, developers are increasingly asking them to take on complex tasks requiring work that spans hours, or even days. However, getting agents to make consistent progress across multiple context windows remains an open problem.

The core challenge of long-running agents is that they must work in discrete sessions, and each new session begins with no memory of what came before. Imagine a software project staffed by engineers working in shifts, where each new engineer arrives with no memory of what happened on the previous shift. Because context windows are limited, and because most complex projects cannot be completed within a single window, agents need a way to bridge the gap between coding sessions.

We developed a two-fold solution to enable the Claude Agent SDK to work effectively across many context windows: an initializer agent that sets up the environment on the first run, and a coding agent that is tasked with making incremental progress in every session, while leaving clear artifacts for the next session. You can find code examples in the accompanying quickstart.

The long-running agent problem

The Claude Agent SDK is a powerful, general-purpose agent harness adept at coding, as well as other tasks that require the model to use tools to gather context, plan, and execute. It has context management capabilities such as compaction, which enables an agent to work on a task without exhausting the context window. Theoretically, given this setup, it should be possible for an agent to continue to do useful work for an arbitrarily long time.

However, compaction isn’t sufficient. Out of the box, even a frontier coding model like Opus 4.5 running on the Claude Agent SDK in a loop across multiple context windows will fall short of building a production-quality web app if it’s only given a high-level prompt, such as “build a clone of claude.ai.”

Claude’s failures manifested in two patterns. First, the agent tended to try to do too much at once—essentially to attempt to one-shot the app. Often, this led to the model running out of context in the middle of its implementation, leaving the next session to start with a feature half-implemented and undocumented. The agent would then have to guess at what had happened, and spend substantial time trying to get the basic app working again. This happens even with compaction, which doesn’t always pass perfectly clear instructions to the next agent.

A second failure mode would often occur later in a project. After some features had already been built, a later agent instance would look around, see that progress had been made, and declare the job done.

This decomposes the problem into two parts. First, we need to set up an initial environment that lays the foundation for all the features that a given prompt requires, which sets up the agent to work step-by-step and feature-by-feature. Second, we should prompt each agent to make incremental progress towards its goal while also leaving the environment in a clean state at the end of a session. By “clean state” we mean the kind of code that would be appropriate for merging to a main branch: there are no major bugs, the code is orderly and well-documented, and in general, a developer could easily begin work on a new feature without first having to clean up an unrelated mess.

When experimenting internally, we addressed these problems using a two-part solution:

  1. Initializer agent: The very first agent session uses a specialized prompt that asks the model to set up the initial environment: an init.sh script, a claude-progress.txt file that keeps a log of what agents have done, and an initial git commit that shows what files were added.
  2. Coding agent: Every subsequent session asks the model to make incremental progress, then leave structured updates.1

The key insight here was finding a way for agents to quickly understand the state of work when starting with a fresh context window, which is accomplished with the claude-progress.txt file alongside the git history. Inspiration for these practices came from knowing what effective software engineers do every day.

Environment management

In the updated Claude 4 prompting guide, we shared some best practices for multi-context window workflows, including a harness structure that uses “a different prompt for the very first context window.” This “different prompt” requests that the initializer agent set up the environment with all the necessary context that future coding agents will need to work effectively. Here, we provide a deeper dive on some of the key components of such an environment.

Feature list

To address the problem of the agent one-shotting an app or prematurely considering the project complete, we prompted the initializer agent to write a comprehensive file of feature requirements expanding on the user’s initial prompt. In the claude.ai clone example, this meant over 200 features, such as “a user can open a new chat, type in a query, press enter, and see an AI response.” These features were all initially marked as “failing” so that later coding agents would have a clear outline of what full functionality looked like.

{
    "category": "functional",
    "description": "New chat button creates a fresh conversation",
    "steps": [
      "Navigate to main interface",
      "Click the 'New Chat' button",
      "Verify a new conversation is created",
      "Check that chat area shows welcome state",
      "Verify conversation appears in sidebar"
    ],
    "passes": false
  }

We prompt coding agents to edit this file only by changing the status of a passes field, and we use strongly-worded instructions like “It is unacceptable to remove or edit tests because this could lead to missing or buggy functionality.” After some experimentation, we landed on using JSON for this, as the model is less likely to inappropriately change or overwrite JSON files compared to Markdown files.

Incremental progress

Given this initial environment scaffolding, the next iteration of the coding agent was then asked to work on only one feature at a time. This incremental approach turned out to be critical to addressing the agent’s tendency to do too much at once.

Once working incrementally, it’s still essential that the model leaves the environment in a clean state after making a code change. In our experiments, we found that the best way to elicit this behavior was to ask the model to commit its progress to git with descriptive commit messages and to write summaries of its progress in a progress file. This allowed the model to use git to revert bad code changes and recover working states of the code base.

These approaches also increased efficiency, as they eliminated the need for an agent to have to guess at what had happened and spend its time trying to get the basic app working again.

Testing

One final major failure mode that we observed was Claude’s tendency to mark a feature as complete without proper testing. Absent explicit prompting, Claude tended to make code changes, and even do testing with unit tests or curl commands against a development server, but would fail recognize that the feature didn’t work end-to-end.

In the case of building a web app, Claude mostly did well at verifying features end-to-end once explicitly prompted to use browser automation tools and do all testing as a human user would.

 Screenshots taken by Claude through the Puppeteer MCP server as it tested the claude.ai clone. Screenshots taken by Claude through the Puppeteer MCP server as it tested the claude.ai clone.

Providing Claude with these kinds of testing tools dramatically improved performance, as the agent was able to identify and fix bugs that weren’t obvious from the code alone.

Some issues remain, like limitations to Claude’s vision and to browser automation tools making it difficult to identify every kind of bug. For example, Claude can’t see browser-native alert modals through the Puppeteer MCP, and features relying on these modals tended to be buggier as a result.

Getting up to speed

With all of the above in place, every coding agent is prompted to run through a series of steps to get its bearings, some quite basic but still helpful:

  1. Run pwd to see the directory you’re working in. You’ll only be able to edit files in this directory.
  2. Read the git logs and progress files to get up to speed on what was recently worked on.
  3. Read the features list file and choose the highest-priority feature that’s not yet done to work on.

This approach saves Claude some tokens in every session since it doesn’t have to figure out how to test the code. It also helps to ask the initializer agent to write an init.sh script that can run the development server, and then run through a basic end-to-end test before implementing a new feature.

In the case of the claude.ai clone, this meant that the agent always started the local development server and used the Puppeteer MCP to start a new chat, send a message, and receive a response. This ensured that Claude could quickly identify if the app had been left in a broken state, and immediately fix any existing bugs. If the agent had instead started implementing a new feature, it would likely make the problem worse.

Given all this, a typical session starts off with the following assistant messages:

[Assistant] I'll start by getting my bearings and understanding the current state of the project.
[Tool Use] <bash - pwd>
[Tool Use] <read - claude-progress.txt>
[Tool Use] <read - feature_list.json>
[Assistant] Let me check the git log to see recent work.
[Tool Use] <bash - git log --oneline -20>
[Assistant] Now let me check if there's an init.sh script to restart the servers.
<Starts the development server>
[Assistant] Excellent! Now let me navigate to the application and verify that some fundamental features are still working.
<Tests basic functionality>
[Assistant] Based on my verification testing, I can see that the fundamental functionality is working well. The core chat features, theme switching, conversation loading, and error handling are all functioning correctly. Now let me review the tests.json file more comprehensively to understand what needs to be implemented next.
<Starts work on a new feature>

Agent failure modes and solutions

ProblemInitializer Agent Behavior Coding Agent Behavior
Claude declares victory on the entire project too early.Set up a feature list file: based on the input spec, set up a structured JSON file with a list of end-to-end feature descriptions.Read the feature list file at the beginning of a session. Choose a single feature to start working on.
Claude leaves the environment in a state with bugs or undocumented progress.An initial git repo and progress notes file is written.Start the session by reading the progress notes file and git commit logs, and run a basic test on the development server to catch any undocumented bugs. End the session by writing a git commit and progress update.
Claude marks features as done prematurely.Set up a feature list file.Self-verify all features. Only mark features as “passing” after careful testing.
Claude has to spend time figuring out how to run the app.Write an init.sh script that can run the development server.Start the session by reading init.sh.
Summarizing four common failure modes and solutions in long-running AI agents.

Future work

This research demonstrates one possible set of solutions in a long-running agent harness to enable the model to make incremental progress across many context windows. However, there remain open questions.

Most notably, it’s still unclear whether a single, general-purpose coding agent performs best across contexts, or if better performance can be achieved through a multi-agent architecture. It seems reasonable that specialized agents like a testing agent, a quality assurance agent, or a code cleanup agent, could do an even better job at sub-tasks across the software development lifecycle.

Additionally, this demo is optimized for full-stack web app development. A future direction is to generalize these findings to other fields. It’s likely that some or all of these lessons can be applied to the types of long-running agentic tasks required in, for example, scientific research or financial modeling.

Acknowledgements

Written by Justin Young. Special thanks to David Hershey, Prithvi Rajasakeran, Jeremy Hadfield, Naia Bouscal, Michael Tingley, Jesse Mu, Jake Eaton, Marius Buleandara, Maggie Vo, Pedram Navid, Nadine Yasser, and Alex Notov for their contributions.

This work reflects the collective efforts of several teams across Anthropic who made it possible for Claude to safely do long-horizon autonomous software engineering, especially the code RL & Claude Code teams. Interested candidates who would like to contribute are welcome to apply at anthropic.com/careers.

Footnotes

1. We refer to these as separate agents in this context only because they have different initial user prompts. The system prompt, set of tools, and overall agent harness was otherwise identical.

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories