Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
146845 stories
·
33 followers

My View of Software Engineering Has Changed For Good

1 Share

Some time ago, I wrote an article called The Illusion of Vibe Coding. In it, I argued that there are no shortcuts to mastery, and that relying on AI-generated code without deep understanding only shifts problems downstream. That article came from skepticism.

Recently, after seeing early autonomous agent systems like OpenClaw, I realized something important: the skepticism is still there, but my perspective has shifted.

In my view, we are no longer just discussing “better tools for developers.” We are starting to see the outlines of an entirely different operating model for software engineering.

We no longer writing code, we express intent

For years, the industry has framed progress as AI “assisting” engineers: smarter autocomplete, faster refactoring, better suggestions. All useful, but still stuck in the same old mindset – and I don’t think that way of thinking works anymore.

The future of software engineering isn’t just developers using AI tools. It’s autonomous coding agents and humans working together, each guiding the other.

Today, software development is all about execution. Humans break work into tickets, write and review code, test, deploy, and repeat. Even with better tools, the mental load of coordination and context switching hasn’t changed.

In The Illusion of Vibe Coding, I argued that skipping understanding makes systems fragile. I still believe that. What’s different is who I think will do the executing.

I believe autonomous coding agents will take over much of the mechanical work, while humans move up the stack.

Instead of assigning tasks, we’ll define intent: outcomes, constraints, trade-offs. Agents will plan, execute, and consult humans only when uncertainty is high.

Humans are no longer the primary executors, we become the checkpoint. From the agent’s view, we’re part of the control loop, not collaborators.

Karpaty, ex director of AI at Tesla, also notes that as execution shifts to machines, humans’ focus naturally moves toward guiding principles, architecture, and intent. This aligns with what I’ve observed: our role is increasingly supervisory and strategic, not mechanical.

Agents never forget the system, humans do

This shift isn’t just better code generation, it’s context.

Autonomous agents can read entire systems in ways humans can’t: traverse repos, inspect dependencies, analyze history, read docs, and link decisions across months or years.

What we call “tribal knowledge” is mostly a workaround for human memory. Agents don’t forget, avoid legacy code, or shy away from repos. They treat the system itself as the unit of change, not a ticket, file, or pull request.

Trust in software will shift from the output to the process itself

One aspect I think is often underestimated is verification.

I believe autonomous agents won’t just write code and wait for CI, they’ll run tests, add coverage, debug failures, and review their work against architecture.

In The Illusion of Vibe Coding, I warned against blind trust in generated code. Now, I think trust should shift from output to process. CI stops being a gate for humans and becomes a feedback loop agents actively use. Software moves from “written then checked” to continuously reasoned about.

Companies will adopt autonomous agents in different ways

I don’t think companies will adopt autonomous agents uniformly. Instead, I expect a clear split.

  • Some companies will deeply integrate autonomous agents into their systems, aligning them with architecture, security, and risk. The advantage won’t be the model itself, but the alignment layer, which will quietly become intellectual property.
  • Other companies will use cloud-based agents, prioritizing speed, accessibility, and low overhead. They’ll excel with standard architectures and fast-moving teams, even if they don’t fully grasp the business, often an acceptable trade-off.

Most organizations will likely go hybrid, using cloud agents for experimentation and self-hosted agents for core systems. As today, the most sensitive parts of the business will stay in-house.

So, what does this mean for software engineers?

I don’t believe this future eliminates engineers, but it will force a sorting.

Experts remain essential, shifting from coding to orchestrating autonomous agents, ensuring output is secure, maintainable, and aligned with intent.

I think infrastructure engineers become even more critical – they build, run, and improve the systems autonomous agents rely on. I believe the most important production system won’t be the application itself, but the system that creates and evolves it.

What’s interesting is that this shift is already changing where the bottleneck lives.

Robert C. Martin recently described this succinctly:

In other words, humans are becoming the bottleneck.

Execution can be automated; judgment and responsibility cannot. The middle layer shrinks, and value moves to architecture, risk, and knowing what not to build.

One thing responsibility that stays the same: responsibility cannot be outsourced.

What will happen to interns and junior engineers?

This is the part where I don’t have a confident or comforting answer, and pretending otherwise would be dishonest.

If agents handle most execution, the old path for juniors (doing small tasks to learn) won’t work anymore. That doesn’t doom them, but the way they gain experience will have to change.

I think Anthropic’s research is spot on: struggling and effort are essential for mastery, and removing friction too early quietly erodes learning.

And this isn’t just a junior problem; I think it applies equally to experienced developers.

When agents let us skip hard thinking, growth stalls: juniors may never build fundamentals, and experienced engineers can slowly lose judgment and architectural instinct. The ones who thrive will be those who actively choose to remain experts, even when tools make shortcuts tempting.

Two types of developers will emerge

As I described earlier, two broad roles are likely to emerge, and both will require deep knowledge:

  • One path is for engineers who orchestrate autonomous agents. They need a solid grasp of system architecture, can turn business needs into technical solutions, and have the judgment to know if a change is safe, maintainable, and scalable. That kind of judgment isn’t learned by cutting corners, it comes from wrestling with trade-offs, failure modes, and long-term consequences.
  • The other path is engineers who build and run the autonomy infrastructure itself. They need deep expertise in infrastructure, security, networking, permissions, observability, and cost management, and in self-hosted setups, also skills in model hosting, fine-tuning, or even training specialized systems.

Both paths lead to highly skilled experts. Neither is shallow.

What worries me isn’t juniors being replaced – it’s everyone getting passive, letting agents do the thinking that actually builds real skill.

My advice hasn’t changed since The Illusion of Vibe Coding: keep learning, keep experimenting, and don’t outsource your understanding. Use agents to amplify your thinking, not replace it. The difference now is urgency, and how we learn must evolve at every level.

Imagine an engineer adding a feature. Instead of just coding, they first understand why it matters, what could break, and how it fits in the system. They might use an agent to propose a solution but review it critically. Over time, they don’t just write better code, they get better at judging designs, spotting risks, and turning vague ideas into solid solutions. That’s the kind of skill that keeps you valuable when execution is cheap.

Machines will execute, humans will supervise and judge

We keep asking how AI will assist developers. That question assumes the old hierarchy stays. It won’t.

In my view, software engineering is heading toward a quiet inversion: machines will execute, and humans will supervise, constrain, and judge, not because humans are weaker, but because machines excel at execution, and humans excel at responsibility.

Vibe coding was never the goal – it was just a step. The future isn’t about coding faster. It’s about deciding, deliberately, what’s actually worth building.

The post My View of Software Engineering Has Changed For Good appeared first on ShiftMag.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

Easy FunctionGemma finetuning with Tunix on Google TPUs

1 Share
Finetuning the FunctionGemma model is made fast and easy using the lightweight JAX-based Tunix library on Google TPUs, a process demonstrated here using LoRA for supervised finetuning. This approach delivers significant accuracy improvements with high TPU efficiency, culminating in a model ready for deployment.
Read the whole story
alvinashcraft
2 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Glide: A Modern Kanban Board System

1 Share

I’ve been working on Glide, a Kanban board management system that brings together some modern web technologies to create a lightweight, efficient project management tool.

What is Glide?

Glide is a self-hosted Kanban board application built with ASP.NET Core 10.0. It’s designed to be simple to deploy and easy to use, whether you’re managing personal projects or coordinating team workflows.

Key Features

  • Multiple Boards - Create and manage multiple Kanban boards with customizable columns
  • Team Collaboration - Add team members as owners or regular members
  • Dynamic UI - HTMX-powered interactions for a smooth, responsive experience without full page refreshes
  • Flexible Authentication - OAuth2 support for GitHub, password auth supported
  • Dark Theme - Modern, responsive dark-first UI with light theme also supported

Technology Stack

The project leverages a mix of technologies:

Read the whole story
alvinashcraft
3 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Exploring the (underwhelming) System.Diagnostics.Metrics source generators: System.Diagnostics.Metrics APIs - Part 2

1 Share

In my previous post I provided an introduction to the System.Diagnostics.Metrics APIs introduced in .NET 6. In this post I show how to use the Microsoft.Extensions.Telemetry.Abstractions source generator, explore how it changes the code you need to write, and explore the generated code.

I start the post with a quick refresher on the basics of the System.Diagnostics.Metrics APIs and the sample app we wrote last time. I then show how we can update this code to use the Microsoft.Extensions.Telemetry.Abstractions source generator instead. Finally, I show how we can also update our metric definitions to use strongly-typed tag objects for additional type-safety. In both cases, we'll update our sample app to use the new approach, and explore the generated code.

You can read about the source generators I discuss in this post in the Microsoft documentation here and here.

Background: System.Diagnostics.Metrics APIs

The System.Diagnostics.Metrics APIs were introduced in .NET 6 but are available in earlier runtimes (including .NET Framework) by using the System.Diagnostics.DiagnosticSource NuGet package. There are two primary concepts exposed by these APIs; Instrument and Meter:

  • Instrument: An instrument records the values for a single metric of interest. You might have separate Instruments for "products sold", "invoices created", "invoice total", or "GC heap size".
  • Meter: A Meter is a logical grouping of multiple instruments. For example, the System.Runtime Meter contains multiple Instruments about the workings of the runtime, while the Microsoft.AspNetCore.Hosting Meter contains Instruments about the HTTP requests received by ASP.NET Core.

There are also multiple types of Instrument: Counter<T>, UpDownCounter<T>, Gauge<T>, and Histogram<T> (as well as "observable" versions, which I'll cover in a future post). To create a custom metric, you need to choose the type of Instrument to use, and associate it with a Meter. In my previous post I created a simple Counter<T> for tracking how often a product page was viewed.

Background: sample app with manual boilerplate

In this post I'm going to start from where we left off in the previous post, and update it to use a source generator instead. So that we know where we're coming from, the full code for that sample is shown below, annotated to explain what's going on; for the full details, see my previous post

using System.Diagnostics.Metrics;
using Microsoft.Extensions.Diagnostics.Metrics;

var builder = WebApplication.CreateBuilder(args);

// 👇 Register our "metrics helper" in DI
builder.Services.AddSingleton<ProductMetrics>();

var app = builder.Build();

// Inject the "metrics helper" into the API handler 👇 
app.MapGet("/product/{id}", (int id, ProductMetrics metrics) =>
{
    metrics.PricingPageViewed(id); // 👈 Record the metric
    return $"Details for product {id}";
});

app.Run();


// The "metrics helper" class for our metrics
public class ProductMetrics
{
    private readonly Counter<long> _pricingDetailsViewed;

    public ProductMetrics(IMeterFactory meterFactory)
    {
        // Create a meter called MyApp.Products
        var meter = meterFactory.Create("MyApp.Products");

        // Create an instrument, and associate it with our meter
        _pricingDetailsViewed = meter.CreateCounter<int>(
            "myapp.products.pricing_page_requests",
            unit: "requests",
            description: "The number of requests to the pricing details page for the product with the given product_id");

    }

    // A convenience method for adding to the metric
    public void PricingPageViewed(int id)
    {
        // Ensure we add the correct tag to the metric
        _pricingDetailsViewed.Add(delta: 1, new KeyValuePair<string, object?>("product_id", id));
    }
}

In summary, we have a ProductMetrics "metrics helper" class which is responsible for creating the Meter and Instrument definitions, as well as providing helper methods for recording page views.

When we run the app and monitor it with dotnet-counters we can see our metric being recorded:

Showing the metrics being reported using dotnet-counters

Now that we have our sample app ready, lets explore replacing some of the boilerplate with a source generator.

Replacing boiler plate with a source generator

The Microsoft.Extensions.Telemetry.Abstractions NuGet package includes a source generator which, according to the documentation, generates code which:

…exposes strongly typed metering types and methods that you can invoke to record metric values. The generated methods are implemented in a highly efficient form, which reduces computation overhead as compared to traditional metering solutions.

In this section we'll replace some of the code we wrote above with the source generated equivalent!

First you'll need to install the Microsoft.Extensions.Telemetry.Abstractions package in your project using:

dotnet add package Microsoft.Extensions.Telemetry.Abstractions

Alternatively, update your project with a <PackageReference>:

<ItemGroup>
  <PackageReference Include="Microsoft.Extensions.Telemetry.Abstractions" Version="10.2.0" />
</ItemGroup>

Note that in this post I'm using the latest stable version of the package, 10.2.0.

Now that we have the source generator running in our app, we can put it to use.

Creating the "metrics helper" class

The main difference when you switch to the source generator is in the "metrics helper" class. There's a lot of different ways you could structure these—what I've shown below is a relatively close direct conversion of the previous code. But as I'll discuss later, this isn't necessarily the way you'll always want to use them.

As is typical for source generators, the metrics generator is driven by specific attributes. There's a different attribute for each Instrument type, and you apply them to a partial method definition which creates a strongly-typed metric, called PricingPageViewed in this case:

private static partial class Factory
{
    [Counter<int>("product_id", Name = "myapp.products.pricing_page_requests")]
    internal static partial PricingPageViewed CreatePricingPageViewed(Meter meter);
}

The example above uses the [Counter<T>] attribute, but there are equivalent versions for [Gauge<T>] and [Histogram<T>] too.

This creates the "factory" methods for defining a metric, but we still need to update the ProductMetrics type to use this factory method instead of our hand-rolled versions:

// Note, must be partial
public partial class ProductMetrics
{
    public ProductMetrics(IMeterFactory meterFactory)
    {
        var meter = meterFactory.Create("MyApp.Products");
        PricingPageViewed = Factory.CreatePricingPageViewed(meter);
    }

    internal PricingPageViewed PricingPageViewed { get; }

    private static partial class Factory
    {
        [Counter<int>("product_id", Name = "myapp.products.pricing_page_requests")]
        internal static partial PricingPageViewed CreatePricingPageViewed(Meter meter);
    }
}

If you compare that to the code we wrote previously, there are two main differences:

  • The [Counter<T>] attribute is missing the "description" and "units" that we previously added.
  • The PricingPageViewed metric is exposed directly (which we'll look at shortly), instead of exposing a PricingPageViewed() method for recording values.

The first point is just a limitation of the current API. We actually can specify the units on the attribute, but if we do, we need to add a #pragma as this API is currently experimental:

private static partial class Factory
{
    #pragma warning disable EXTEXP0003 // Type is for evaluation purposes only and is subject to change or removal in future updates. Suppress this diagnostic to proceed.

                                                        //   Add the Unit here 👇
    [Counter<int>("product_id", Name = "myapp.products.pricing_page_requests", Unit = "views")]
    internal static partial PricingPageViewed CreatePricingPageViewed(Meter meter);
}

The second point is more interesting, and we'll dig into it when we look at the generated code.

Updating our app

Before we get to the generated code, lets look at how we use our updated ProductMetrics. We keep the existing DI registration of our ProductMetrics type, the only change is how we record a view of the page

using System.Diagnostics.Metrics;
using System.Globalization;
using Microsoft.Extensions.Diagnostics.Metrics;

var builder = WebApplication.CreateBuilder(args);
builder.Services.AddSingleton<ProductMetrics>();
var app = builder.Build();

app.MapGet("/product/{id}", (int id, ProductMetrics metrics) =>
{
    // Update to call PricingPageViewed.Add() instead of PricingPageViewed(id)
    metrics.PricingPageViewed.Add(value: 1, product_id: id);
    return $"Details for product {id}";
});

app.Run();

As you can see, there's not much change there. Instead of calling PricingPageViewed(id), which internally adds a metric and tag, we call the Add() method, which is a source-generated method on the PricingPageViewed type. Let's take a look at all that generated code now, so we can see what's going on behind the scenes.

Exploring the generated code

We have various generated methods to look at, so we'll start with our factory methods and work our way through from there.

Note that in most IDEs you can navigate to the definitions of these partial methods and they'll show you the generated code.

Starting with our Factory method, the generated code looks like this:

public partial class ProductMetrics 
{
    private static partial class Factory 
    {
        internal static partial PricingPageViewed CreatePricingPageViewed(Meter meter)
            => GeneratedInstrumentsFactory.CreatePricingPageViewed(meter);
    }
}

So the generated code is calling a different generated type, which looks like this:

internal static partial class GeneratedInstrumentsFactory
{
    private static ConcurrentDictionary<Meter, PricingPageViewed> _pricingPageViewedInstruments = new();

    internal static PricingPageViewed CreatePricingPageViewed(Meter meter)
    {
        return _pricingPageViewedInstruments.GetOrAdd(meter, static _meter =>
            {
                var instrument = _meter.CreateCounter<int>(@"myapp.products.pricing_page_requests", @"views");
                return new PricingPageViewed(instrument);
            });
    }
}

This definition shows something interesting, in that it shows the source generator is catering to a pattern I was somewhat surprised to see. This code seems to be catering to adding the same Instrument to multiple Meters.

That seems a little surprising to me, but that's possibly because I'm used to thinking in terms of OpenTelemetry expectations, which doesn't have the concept of Meters (as far as I know), and completely ignores it. It seems like you would get some weird duplication issues if you tried to use this source-generator-suggested pattern with OpenTelemetry, so I personally wouldn't recommend it.

Other than the "dictionary" aspect, this generated code is basically creating the Counter instance, just as we were doing before, but is then passing it to a different generated type, the PricingPageViewed type:

internal sealed class PricingPageViewed
{
    private readonly Counter<int> _counter;
    public PricingPageViewed(Counter<int> counter)
    {
        _counter = counter;
    }

    public void Add(int value, object? product_id)
    {
        var tagList = new TagList
        {
            new KeyValuePair<string, object?>("product_id", product_id),
        };

        _counter.Add(value, tagList);
    }
}

This generated type provides roughly the same "public" API for recording metrics as we provided before:

public class ProductMetrics
{
    // Previous implementation
    public void PricingPageViewed(int id)
    {
        _pricingDetailsViewed.Add(delta: 1, new KeyValuePair<string, object?>("product_id", id));
    }
}

However, there are some differences. The generated code uses a more "generic" version that wraps the type in a TagList. This is a struct, which can support adding multiple tags without needing to allocate an array on the heap, so it's generally very efficient. But in this case, it doesn't add anything over the "manual" version I implemented.

So given all that, is this generated code actually useful?

Is the generated code worth it?

I love source generators, I think they're a great way to reduce boilerplate and make code easier to read and write in many cases, but frankly, I don't really see the value of this metrics source generator.

For a start, the source generator is only really changing how we define and create metrics. Which is generally 1 line of code to create the metric, and then a helper method for defining the tags etc (i.e. the PricingPageViewed() method). Is a source generator really necessary for that?

Also, the generator is limited in the API it provides compared to calling the System.Diagnostics.Metrics APIs directly. You can't provide a Description for a metric, for example, and providing a Unit needs a #pragma

What's more, the fact that the generated code is generic, means that the resulting usability is actually worse in my example, because you have to call:

metrics.PricingPageViewed.Add(value: 1, product_id: id);

and specify an "increment" value, as opposed to simply being

metrics.PricingPageViewed(productId: id);

(also note the "correct" argument names in my "manual case"). The source generator also seems to support scenarios that I don't envision needing (the same Instrument registered with multiple Meter), so that's extra work that need not happen in the source generated case.

So unfortunately, in this simple example, the source generator seems like a net loss. But there's an additional scenario it supports: strongly-typed tag objects

Using strongly-typed tag objects

There's a common programming bug when calling methods that have multiple parameters of the same type: accidentally passing values in the wrong position:

Add(order.Id, product.Id); // Oops, those are wrong, but it's not obvious!

public void Add(int productId, int orderId) { /* */ }

One partial solution to this issue is to use strongly-typed objects to try to make the mistake more obvious. For example, if the method above instead took an object:

public void Add(Details details) { /* */ }

public readonly struct Details
{
    public required int OrderId { get; init; }
    public required int ProductId { get; init; }
}

Then at the callsite, you're less likely to make the same mistake:

// Still wrong, but the error is more obvious! 😅
Add(new()
{
    OrderId = product.Id,
    ProductId = order.Id,
});

It turns out that passing lots of similar values is exactly the issue you run into when you need to add multiple tags when recording a value with an Instrument. To help with this, the source generator code can optionally use strongly-typed tag objects instead of a list of parameters.

Updating the holder class with strongly-typed tags

In the examples I've shown so far, I've only been attaching a single tag to the PricingPageViewed metric, but I'll add an additional one, environment just for demonstration purposes.

Let's again start by updating the Factory class to use a strongly-typed object instead of "manually" defining the tags:

private static partial class Factory
{
    // A Type that defines the tags 👇
    [Counter<int>(typeof(PricingPageTags), Name = "myapp.products.pricing_page_requests")]
    internal static partial PricingPageViewed CreatePricingPageViewed(Meter meter);
    // previously:
    // [Counter<int>("product_id", Name = "myapp.products.pricing_page_requests")]
    // internal static partial PricingPageViewed CreatePricingPageViewed(Meter meter);
}

public readonly struct PricingPageTags
{
    [TagName("product_id")]
    public required string ProductId { get; init; }
    public required Environment Environment { get; init; }
}

public enum Environment
{
    Development,
    QA,
    Production,
}

So we have two changes:

  • We're passing a Type in the [Counter<T>] attribute, instead of a list of tag arguments.
  • We've defined a struct type that includes all the tags we want to add to a value.
    • This is defined as a readonly struct to avoid additional allocations.
    • We specific the tag name for ProductId. By default, Environment uses the name "Environment" (which may not be what you want, but this is for demo reasons!).
    • We can only use string or enum types in the tags

The source generator then does its thing, and so we need to update our API callsite to this:

app.MapGet("/product/{id}", (int id, ProductMetrics metrics) =>
{
    metrics.PricingPageViewed.Add(1, new PricingPageTags()
    {
         ProductId = id.ToString(CultureInfo.InvariantCulture),
         Environment = ProductMetrics.Environment.Production,
    });
    return $"Details for product {id}";
});

In the generated code we need to pass a PricingPageTags object into the Add() method, instead of individually passing each tag value.

Note that we had to pass a string for ProductId, we can't use an int like we were before. That's not great perf wise, but previously we were boxing the int to an object? so that wasn't great either😅 Avoiding this allocation would be recommended if possible, but that's out of the scope for this post!

As before, let's take a look at the generated code.

Exploring the generated code

The generated code in this case is almost identical to before. The only difference is in the generated Add method:

internal sealed class PricingPageViewed
{
    private readonly Counter<int> _counter;

    public PricingPageViewed(Counter<int> counter)
    {
        _counter = counter;
    }

    public void Add(int value, PricingPageTags o)
    {
        var tagList = new TagList
        {
            new KeyValuePair<string, object?>("product_id", o.ProductId!),
            new KeyValuePair<string, object?>("Environment", o.Environment.ToString()),
        };

        _counter.Add(value, tagList);
    }
}

This generated code is almost the same as before. The only difference is that it's "splatting" the PricingPageTags object as individual tags in a TagList. So, does this mean the source generator is worth it?

Are the source generators worth using?

From my point of view, the strongly-typed tags scenario doesn't change any of the arguments I raised previously against the source generator. It's still mostly obfuscating otherwise simple APIs, not adding anything performance-wise as far as I can tell, and it still supports the "Instrument in multiple Meter scenario" that seems unlikely to be useful (to me, anyway).

The strongly-typed tags approach shown here, while nice, can just as easily be implemented manually. The generated code isn't really adding much. And in fact, given that it's calling ToString() on an enum (which is known to be slow), the "manual" version can likely also provide better opportunities for performance optimizations.

About the only argument I can see in favour of using the source generator is if you're using the "Instrument in multiple Meter" approach (let me know in the comments if you are, I feel like I'm missing something!). Or, I guess, if you just like the attribute-based generator approach and aren't worried about the points I raised. I'm a fan of source generators in general, but in this case, I don't think I would bother with them personally.

Overall, the fact the generators don't really add much maybe just points to the System.Diagnostics.Metrics APIs being well defined? If you don't need much boilerplate to create the metrics, and you get the "best performance" by default, without needing a generator, then that seems like a good thing 😄

Summary

In this post I showed how to use the source generators that ship in the Microsoft.Extensions.Telemetry.Abstractions to help generating metrics with the System.Diagnostics.Metrics APIs. I show how the source generator changes the way you define your metric, but fundamentally generates roughly the same code as in my previous post. I then show how you can also create strongly-typed tags, which helps avoid a typical class of bugs.

Overall, I didn't feel like the source generator saved much in the way of the code you write or provides performance benefits, unlike many other built-in source generators. The generated code caters to additional scenarios, such as registering the same Instrument with multiple Meters, but that seems like a niche scenario.

Read the whole story
alvinashcraft
3 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

The Bigger Infinty: Exploring the Exploration Gap

1 Share

I want to talk about our testing efforts for AI features, but I’ll skip speaking to your conscience, which is already worried about quality. I know you’re losing sleep. This time, I want to look at it from a different aspect.

I’m going to talk about math.

Wait, come back. It’s going to get interesting, I promise.

In math, some infinities are larger than others. I know. There are more decimal numbers than integers, even though neither have limits. Go figure.

Our infinities for today are the “what’s left to test.” In the before times (B. AI that is), we managed risk by doing some test planning. We were limited by what we could think of, but we could see the end of the road. Since we knew we couldn’t think of everything, we explored. We asked “what if” and got some answers.

We left out an infinite number of cases we didn’t test, because we didn’t have time. And we could live with that. We told ourselves we were “covered enough.”

That changed with AI-powered apps.

We’ve already said that our genie can spit out so many things, we know who’s got the bigger infinity. The “what’s left to test” got bigger in so many ways. Infinitely, some may say.

The Test Plan Experiment

Since my brain is small and the genie’s is… well, more equipped… I gave it a task. I took my API Analysis Agent for a ride, gave it an API, and got a test plan. Then, I started from that test plan, and told the genie to reverse-engineer a test plan for the Agent itself.

In other words: How much would I need to test my Agent to actually trust its output?

The genie gave me a whole list of examples of cases I should focus on. Remember, the output is a text plan with examples, so it can differ in many ways, depending on many things. Here’s just one: The complexity of the API request body I give it as input.

Here’s what I got:

  • A simple body: Known schema, clear names. The good stuff.
  • A vague body: Known schema, but generic field names like “input” or “t1.”
  • A nested body: A mess of good names and generic ones, all with a couple of data heirarchies.

These are not specific examples. These three are classes of cases. We can think of many example cases just for them. And there are a lot of other classes just about the complexity of the input API. There are a lot more on security, performance, valid inputs and so on. The number of permutations explodes.

Our test plan for the Agent is going to be huge. And It still won’t be enough. Because these are only the cases we (and the genie) thought of.

To Infinity and Beyond?

What about the rest?

The thing is, our infinite list of uncharted cases has grown in size. This is the Exploration Gap. Can we cover it all? Nope.

But since we already have limited time, we need to manage our risks even more than we did before. Yes, this can mean using AI to run permutations of tests on our code. Yes, we still need to understand what it’s doing.

We need our test plans to make sure our starting point is correct. But we also need to carve out time for exploration. And even this exploration cannot be just a shot in the dark. We need to think about which areas are more important to explore.

And you know what? We already know how to do it. Because managing risks is what we do. From the before days until today. And beyond.

It’s just math.

Want to see what’s lurking in the dark?

On Feb 18, I’m hosting a webinar on “Anatomy of AI Quality “ I’ll show you how to identify more monsters hiding in your SDLC corners. With a little more light they are less scary.

Register Now!

Suspicious you’re already outrunning your headlights?

I help R&D leaders find the blind spots before their customers do. If you need a Forensic AI Audit to see how big your Exploration Gap really is, let’s talk.

The post The Bigger Infinty: Exploring the Exploration Gap first appeared on TestinGil.
Read the whole story
alvinashcraft
3 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Training Design for Text-to-Image Models: Lessons from Ablations

1 Share
Read the whole story
alvinashcraft
3 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories