Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
155775 stories
·
33 followers

Happy birthday to the Trump phone

1 Share
Photo of a sheet cake featuring a man holding the Trump Mobile phone.

From the day it was announced, on June 16th, 2025, the Trump phone sounded ridiculous. The T1 Phone 8002 (gold version), as it was officially called, was a combination of contradictory specs, product images that were clearly not photographs of a real phone, and the worrying requirement of a $100 deposit to secure a preorder of a $499 phone with no release date. But none of Trump Mobile's outlandish announcements were as bold as the claim that the phone would be "designed and built in the United States."

The US has next to no phone manufacturing infrastructure, few engineers with the required expertise, and little of the affordable, flexible …

Read the full story at The Verge.

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

The Small Lies Developers Tell to Keep Work Moving

1 Share

We played “Truth or Dare” with developers again, truth only, with no option to dodge the answer by doing push-ups.

Developers from different parts of the industry spoke about how they survive tight deadlines, tension inside engineering teams, and the subtle, protective lies they use to get through the week.

There’s a pressure to sound confident

In engineering teams, pressure, shifting priorities, and the need to move quickly shape communication. In that environment, “truth” is more about signaling confidence, alignment, and forward progress than precision.

Weekly meetings, fast delivery cycles, and constantly changing priorities push developers to learn that how they communicate can matter as much as the work itself.

That is why we asked how developers manage expectations. The phrase “it will be done” often appears early, sometimes before anyone fully understands the scope of the task. Under pressure, estimates tend to be optimistic, and reality usually forces adjustments as the work moves forward.

At the same time, developers handle that pressure in different ways – some narrow their focus and push through tight deadlines, while others start by asking whether those deadlines even make sense.

The truth about deadlines

Deadlines sound like a nightmare for an engineering mind, so we asked our developers how they deal with them.

Some described how short deadlines increase focus and force prioritization. Others questioned whether extreme deadlines reflect process problems rather than urgency.

Hrvoje Rančić, Senior Software Engineer, described how constant pressure can indicate systemic misalignment rather than effective planning:

I realized that deadlines can sometimes be a mechanism of manipulation. If there are too many short deadlines, something is wrong in the process. Either people are making unrealistic promises, expectations are unrealistic, or there is poor communication between product and engineering.

At the same time, other participants admitted that deadlines can help with focus and discipline, but only when they are based on a realistic scope of work, not on constant escalation of pressure.

The point is not that deadlines are bad, but that misaligned and unrealistic deadlines often point to deeper organizational issues.

Engineering is often about people, not just code

The stereotype of the isolated, antisocial engineer still exists, even though modern software development is highly collaborative. That misconception often clashes with the reality of teamwork, where communication, negotiation, and constant alignment are a key part of the job.

Kristina Valjak, Engineering Lead, summarized it:

Some people think engineers are introverts who don’t socialize, who sit in basements and stare at screens all day. Maybe it’s true that we’re not social enough, but in reality, everyone is smart.

The irony is that engineering work is rarely isolated. Most of the tension developers describe does not come from code, but from coordination between them, the product teams, and management.

How developers smooth over the truth at work

Developers often tell small, strategic lies to keep work flowing smoothly. These are not dramatic deceptions, but everyday adjustments such as overconfidence in estimates, downplaying uncertainty, or agreeing in meetings while problems are resolved in the background.

Emin Mulaimović, Junior AI Engineer, reflected on this broader communication culture, recalling an anecdote from his first job:

At my first job they told me not to say I was happy at work, so I kept complaining all the time. I think that’s the only lie I tell. I don’t tell them how much fun I actually have working there and how much I enjoy the job.

In another example, a developer described solving an issue during a break and returning to the meeting without interrupting the flow of discussion. In other cases, developers admitted to deliberately inflating estimates or simplifying status reports to avoid unnecessary tension.

This behavior is not so much about deception as it is about reducing “noise” in systems that are already complex enough.

Want to hear more? Check out the video.

Special thanks to our fellow colleagues at Infobip, the publisher of ShiftMag!

The post The Small Lies Developers Tell to Keep Work Moving appeared first on ShiftMag.

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Who Owns the Code Claude Wrote?

1 Share

The following article originally appeared on Sena Evren’s Legal Layer newsletter and is being reposted here with the author’s permission.

TL; DR

Agentic coding tools like Claude Code, Cursor, and Codex generate code that may be uncopyrightable, owned by your employer, or contaminated by open source licenses you cannot see. Some of this is settled law, some is actively contested, and this piece is clear about which is which. If you are shipping AI-assisted code and have not thought about any of this, this piece is for you.

If you shipped code this week, some of it was probably written by an AI. The question of who legally owns that code is less settled than most developers assume, and the answer depends on three things that have nothing to do with how good the code is:

  1. Whether a human made enough creative decisions to establish copyright
  2. Whether your employment contract already assigned it to your employer
  3. Whether the model pulled from GPL-licensed training data and quietly contaminated your codebase

On March 31, 2026, Anthropic accidentally published 512,000 lines of Claude Code’s source code in a routine software update through a missing configuration file. Before sunrise, the codebase was mirrored across GitHub. Before breakfast, a developer had used an AI tool to rewrite the entire thing in Python, and the “claw-code” repository hit 100,000 GitHub stars in a single day, the fastest in history. Then came the DMCA takedowns, and then came the question nobody had a clean answer to:

If Claude Code was, by Anthropic’s own lead engineer’s admission, predominantly written by Claude itself, does Anthropic even own it? Can you issue a DMCA takedown for code that copyright law may not protect?

That incident compressed every open question about AI-generated code ownership into a single news cycle. The same questions apply to your codebase.

Three risks in every AI-assisted codebase

The copyright rule nobody told you

Here is the legal baseline, in plain terms: Copyright only protects work created by a human.

The US Copyright Office has confirmed this consistently, and the DC Circuit upheld it in the Thaler case. When the Supreme Court declined to hear the Thaler appeal in March 2026, it did not endorse the lower court’s reasoning or settle the question nationally. Cert denial means the court chose not to hear the case, nothing more. What it does mean is that the DC Circuit’s ruling stands, the Copyright Office’s position is intact, and no court has yet gone the other way. Works predominantly generated by AI without meaningful human authorship are not eligible for copyright protection under current doctrine, and that position is stable even if it is not finally settled.

Two important limits on what Thaler actually decided.

  1. The case involved a painting created with zero human involvement at all. Thaler listed the AI system as sole author and made no claim of any human creative contribution. The ruling does not directly address the harder question of AI-assisted work where a human was involved but the degree of that involvement is disputed.
  2. Thaler involved visual art. No court has yet applied the human authorship doctrine specifically to code output from an AI coding tool. The logic applies, but the direct precedent does not exist yet.

What it means for you: Code that Claude Code or Cursor generated and you accepted without meaningful modification may not be copyrightable by anyone. If a competitor copies it, you may have no legal recourse, because the code sits in the public domain in everything but name.

What counts as meaningful human authorship?

The phrase that determines whether your code is protected is “meaningful human authorship,” and the Copyright Office has deliberately refused to quantify it with a percentage or a number of edits, because what courts look for is evidence that a human made genuine creative decisions:

  • Choosing the architecture
  • Deciding what to reject
  • Restructuring the output to fit a specific design

Specifying an objective to the model is not enough. Directing how the work is constructed is what counts.

In an agentic workflow, this distinction is harder to establish than it sounds. Consider a typical Claude Code session:

  • You write a one-line prompt: “build a rate limiting module for the API.”
  • Claude Code plans the approach, generates five files, and iterates through three versions.
  • You review the output, run the tests, and merge.

Your contribution in that sequence is your architectural intent and your final approval. Whether that constitutes meaningful human authorship in a courtroom is an unresolved question with no definitive court ruling yet.

The honest answer is: probably yes for modules you substantially redirected, probably no for code you accepted verbatim, and unclear for everything in between.

The middle ground is actively being litigated right now. In Allen v. Perlmutter, artist Jason Allen is challenging the Copyright Office’s denial of registration for a work he created using more than 600 detailed prompts and subsequent editing in Photoshop. The Copyright Office acknowledged the Photoshop edits as human-authored but still denied registration for the AI-generated underlying elements. That case has not been decided yet, and whatever it decides will be the closest thing to a ruling on how much human involvement is enough.

The closest existing precedent on partial protection is Zarya of the Dawn, a graphic novel where the Copyright Office granted registration for the human-authored text but denied it for the Midjourney-generated images. That decision establishes a practical principle developers can use right now: The human-authored elements of an AI-assisted codebase may be separately protectable even if the generated code itself is not. Your architecture documents, your design decisions recorded in commit messages, your ADRs, your prompt logs showing deliberate redirection, these may be protectable as human-authored expression even if the code they produced is not. Protecting what you can starts with documenting what you actually did.

What your employer probably already owns

Before you think about whether your code is copyrightable, there is a more immediate question: Even if it is, is it actually yours?

Your employment contract almost certainly says that anything you build at work belongs to your employer. That principle has a name in copyright law: the work-for-hire doctrine. Under it, any code created by an employee within the scope of their employment is owned by the employer, who is treated as the legal author, regardless of whether the code was written by hand, generated by Claude Code, or some combination. Using an AI coding tool during work hours, on a work project, on a work machine, does not change who owns the result.

Most employment contracts go further than the doctrine’s defaults. Look for a section in yours called “Intellectual Property,” “IP Assignment,” or “Work Product.” Open the contract, search for those terms, and read that section. A clause that says any of the following almost certainly covers your AI-assisted code:

  • “Any work product created using company equipment or resources”
  • “Any invention or development made during the term of employment”
  • “Any software created with the assistance of company-licensed tools”

The third one is the one to watch. If your employer licenses Claude Code, Cursor, or Copilot for the team, and you use those same tools to build a side project, a broad IP assignment clause may give the employer a claim over that project, even if you built it on your own time.

A senior developer in San Francisco described exactly this situation earlier this year. He had used Claude Code for work projects and for a personal fitness tracking app built on evenings and weekends. His company updated its IP policy and claimed everything he had built with AI assistance, including the personal app, arguing that because Claude had access to open work files in the IDE, any AI output was a derivative work of company IP.

This is the clearest example of how far this can stretch. His company’s claim rested on one phrase: The AI tools were “context-aware” of his company’s codebase. The argument does not hold up legally, because context visibility in an IDE does not make AI output a derivative work of files that were open nearby, and the connection between what Claude can see and what it generates is probabilistic pattern completion, not copying. But the argument illustrates what employers are starting to claim. If the clause is broad enough, it has surface validity regardless of what the AI actually did.

The practical rule: If you are building something on the side, use a personal account, a personal machine, and tools you pay for yourself. Keep your employer’s licensed tools out of that workflow entirely.

The open source contamination problem

Even if you own your AI-generated code, you may have already contaminated it with an open source license you cannot see.

AI coding tools are trained on massive amounts of public code, including code licensed under the GPL, LGPL, and other copyleft licenses. Copyleft licenses carry a specific obligation that travels with the code:

  • If you distribute software that is a derivative of GPL-licensed code, you must release your own source code under the same license.
  • This applies even if you did not know the code you incorporated was GPL-licensed.
  • “I did not know” is not a defense to a copyleft violation.
The GPL contamination chain

When an AI tool reproduces a substantial verbatim portion of GPL-licensed code from its training data, and you ship that code in a commercial product without releasing source, you may have created a copyleft violation without ever touching the original repository. The legal standard for infringement is substantial verbatim reproduction, not functional similarity or resemblance, and this distinction matters: an AI tool generating code that works like GPL code is different from an AI tool that reproduces GPL code word for word. The risk sits at the verbatim end of that spectrum, and the problem is that you have no way to know which side of the line your codebase is on without running a scan.

The chardet community dispute made this concrete in early 2026. This was not a filed lawsuit but a public dispute within the open source community that raised the question without resolving it legally. A developer used Claude to rewrite chardet, a Python character encoding library, and rereleased it under an MIT license, arguing that the AI rewrite was a “clean room” implementation free of the original LGPL license.

The legal question the community fought over: If Claude was trained on the LGPL-licensed codebase and its output reproduces substantial verbatim portions of that code, can the output be treated as license-free? The chardet dispute did not resolve cleanly and no court has issued a definitive ruling on this specific question. What is settled is that verbatim copying of GPL code violates the license regardless of how it was produced. What is unsettled is whether AI-generated output that reproduces training data patterns counts as verbatim copying. The working assumption among lawyers advising companies through M&A is that it probably does, and that assumption is now showing up as a standard condition in acquisition due diligence.

The Doe v GitHub litigation, still working through the Ninth Circuit as of April 2026, is asking whether GitHub Copilot reproduces licensed code without attribution in violation of copyright law and DMCA Section 1202. The district court dismissed most claims but the appeal is live. Whatever the outcome, the litigation has already changed industry behavior: GitHub Copilot added duplicate detection filters, and acquisition due diligence now routinely includes an AI codebase license scan.

What to do about all of this

Your four actions before you ship

Four concrete actions, none of which require a lawyer.

1. Run a license scan on your AI-assisted codebase

Tools that do this well:

  • FOSSA—most comprehensive, widely used in enterprise
  • Snyk Open Source—good for dev-team workflows, integrates with GitHub
  • Black Duck—standard in M&A due diligence

Each will scan your codebase, flag code that matches known open source libraries, and identify the licenses attached. If you are shipping a commercial product and have never run one of these, you are operating on assumption. The scan takes an afternoon and costs less than the first hour of a copyright dispute.

2. Document your human creative contributions as you go

The evidence that establishes meaningful human authorship is the same evidence you already produce in a normal engineering workflow. You just have to keep it deliberately rather than letting it disappear.

What to preserve:

  • Commit messages that describe what you changed and why, not just what the AI generated. “Restructured Claude’s module architecture, rejected initial state management approach, rewrote error handling from scratch” is evidence. “Add rate limiting module” is not.
  • Prompt logs. Claude Code and Cursor both retain interaction history. Export or screenshot the sessions where you made significant architectural decisions.
  • Design documents, ADRs, or any notes that predate the generated code and show you specified the structure before the AI built it.

The second commit message versus the first is the difference between a defensible authorship claim and a clean “Claude wrote this” record.

3. Read the IP clause in your employment contract before you build anything on the side

Open your contract, search for “intellectual property,” “IP assignment,” or “work product,” and read that section carefully. The specific language determines your exposure:

  • “Work product created during employment hours” is narrower than “work product created using company resources.”
  • “Relating to the company’s business” is narrower than “any software development.”
  • “Company-licensed tools” is the phrase that captures AI coding tools even on personal projects.

If the clause is broad and you want to build something independently, you have three realistic options: negotiate a written carveout before you start (easier at the start of a new role than mid-employment), use entirely personal tools on entirely personal time on a personal machine, or accept that the claim exists and decide whether the risk is worth it.

4. Check which Anthropic plan you are on before shipping for commercial use

Go to anthropic.com/legal and compare the consumer terms against the commercial terms. The difference that matters:

  • Consumer terms (free and Pro plans): Anthropic assigns outputs to you, but the IP indemnification is narrower and covers fewer scenarios.
  • Commercial terms (API and enterprise): Anthropic assigns outputs to you and will defend you against copyright infringement claims arising from your authorized use of the service and its outputs.

If you are shipping AI-assisted code in a commercial product using the free or Pro plan, the indemnification gap is real. The API or enterprise agreement is the appropriate tier. Note that neither indemnification covers a downstream GPL violation from license contamination in your codebase. That is your governance problem to solve with the license scan in action 1.

The thing worth sitting with

Anthropic’s own lead engineer publicly stated that his recent contributions to Claude Code were written entirely by the AI, and the leaked codebase that Anthropic issued 8,000 DMCA takedowns to suppress may be predominantly AI-authored. Whether Anthropic’s copyright claims over that codebase are legally valid remains an open question no court has yet resolved.

If the company that built the tool cannot cleanly assert copyright over its own AI-assisted code, the question of whether you can is worth taking seriously before it becomes relevant in a transaction, a dispute, or an acquisition conversation. The developer who documents their creative contributions from the start is in a meaningfully different legal position than the one who accepted three thousand lines of Claude output and merged without review, even if both shipped the same product.

A note on what this piece covers and what it does not

Three things in it are settled law:

  • Works lacking human authorship are uncopyrightable,
  • The work-for-hire doctrine applies regardless of how code was generated.
  • Verbatim copying of GPL-licensed code violates the license.

Two things are emerging consensus without definitive court rulings yet:

  • How much human direction is enough to establish meaningful authorship in an agentic workflow
  • Whether AI output that reproduces training data patterns counts as verbatim copying

One thing is genuine speculation:

  • Whether any of this will be litigated at scale in the near term

Most code copyright claims never reach court. The place where the unsettled questions become concrete today is M&A due diligence and institutional fundraising, where acquirers and investors are already asking these questions as a condition of closing.

If neither of those applies to your situation right now, the four actions above are still worth doing, but the urgency is lower than the piece might imply.

Further reading

1. US Copyright Office—Copyright and Artificial Intelligence (Part 2: Copyrightability)
The primary regulatory source on what qualifies as meaningful human authorship in AI-assisted works. Part 2 covers the specific tests the Office applies when reviewing AI-generated content registrations. Essential if you want to understand exactly where the legal line sits.

2. Andersen v. Stability AI, Midjourney, DeviantArt—Ninth Circuit docket
The foundational case on AI training data and copyright infringement, currently shaping how courts think about what AI models learn and reproduce. Relevant to the GPL contamination question in a way most developers have not connected yet.

3. Doe v. GitHub, Inc.—Ninth Circuit appeal
The live litigation on whether Copilot reproduces licensed code without attribution. Track this one: The Ninth Circuit decision will set the standard that determines whether AI-generated code carrying open source patterns constitutes copyright infringement.

4. GitHub—Copilot and copyright: What you need to know
GitHub’s own legal position on why Copilot outputs are not infringing. Worth reading as a counterpoint: Understanding the argument they make helps you understand where it is strong and where it has limits, particularly on the GPL training data question.

5. FOSSA—Understanding open source license obligations
A developer-friendly reference to how copyleft obligations actually work in practice: what triggers the source disclosure requirement, what constitutes a derivative work, and how the GPL, LGPL, and AGPL differ in their reach. The clearest plain-language guide available on this topic.

6. Anthropic—Usage Policy and Terms of Service
The actual document that determines your IP rights and indemnification scope when you use Claude commercially. Read sections 7 and 8 specifically: output ownership and IP indemnification. The difference between the consumer and commercial terms is stated plainly and takes 10 minutes to understand.

I write about legal architecture for AI products at Legal Layer. This piece is informational and does not constitute legal advice.



Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Turn messy production code into a useful benchmark

1 Share

TL;DR: Useful benchmarks are controlled experiments. Copy the relevant production code, trim away unrelated work, choose realistic parameters, measure one responsibility, and use short runs for direction before spending time on full runs.

Most benchmark examples look cleaner than the code we work with.

Suspiciously cleaner.

They compare string concatenation with StringBuilder. They call a static method. They pass one value in and return one value out. Those examples are useful for learning BenchmarkDotNet, but production code rarely looks like that.

Production software is usually a disgusting festering mess, but it makes money. It has dependency injection, callbacks, options, asynchronous boundaries, global state, caches, feature flags, and a decade of decisions hiding in the corners.

That mess does not mean we cannot benchmark it. It means we need to treat the benchmark as a controlled experiment.

Benchmarks only look like unit tests

The first trap is mental. A BenchmarkDotNet benchmark looks a little like a unit test: a class, attributes, methods, and a runner. But the result is completely different.

A unit test gives a pass or fail signal. A benchmark produces measurements: mean, median, standard deviation, allocation counts, ratios, outliers, and distributions. To get those measurements, the benchmark runs many times.

That changes what we should measure. Unit tests should cover edge cases. Benchmarks should focus on common hot-path cases unless an edge case is operationally important.

A benchmark should answer one cost question clearly. If it tries to answer five, the result becomes hard to trust.

Copy the code, then remove the noise

For the pipeline investigation, I copied the relevant pipeline infrastructure into a dedicated benchmark project. That sounds wrong because copy-paste is how many codebases get into trouble.

Here, copy-paste was the safety rail.

The copy was intentional. It froze the code under investigation and made the experiment controllable.

Once the code was copied, the cutting started. Irrelevant behaviors came out. The dependency injection container came out. Real input/output was replaced with completed tasks. The benchmark did not need to compare containers, transports, databases, serializers, or cloud services. It needed to compare pipeline invocation before and after the proposed optimization.

Folder structure for the copied pipeline benchmark experiment
The benchmark project keeps multiple optimization steps side by side so they can be compared.

This kind of benchmark is not a second implementation of the product. It is a lab bench. The value comes from making the experiment small enough to understand while keeping enough real code that the result still means something.

Controlled does not mean fake.

Measure one thing at a time

A good benchmark follows the single responsibility principle. If the question is “how fast does the pipeline execute,” then pipeline construction should not be part of the measured method. Setup belongs in [GlobalSetup]. The benchmark method should execute the pipeline.

[ShortRunJob]
[MemoryDiagnoser]
public class PipelineExecutionBenchmark
{
    BaseLinePipeline<IBehaviorContext> pipelineBeforeOptimizations;
    PipelineOptimization<IBehaviorContext> pipelineAfterOptimizations;
    BehaviorContext behaviorContext;

    [Params(10, 20, 40)]
    public int PipelineDepth { get; set; }

    [GlobalSetup]
    public void SetUp()
    {
        behaviorContext = new BehaviorContext();
        pipelineBeforeOptimizations = CreateBeforePipeline(PipelineDepth);
        pipelineAfterOptimizations = CreateAfterPipeline(PipelineDepth);
    }

    [Benchmark(Baseline = true)]
    public Task Before()
    {
        return pipelineBeforeOptimizations.Invoke(behaviorContext);
    }

    [Benchmark]
    public Task After()
    {
        return pipelineAfterOptimizations.Invoke(behaviorContext);
    }
}

The shape matters. [MemoryDiagnoser] adds allocation data. [Params] runs the same benchmark at several pipeline depths. [GlobalSetup] keeps setup out of the measured path. The two benchmark methods compare old and new invocation.

If construction time matters, write a separate benchmark for construction time.

Choose parameters from reality

The pipeline benchmark used depths of 10, 20, and 40. Those values were not random. They came from real customer cases and support knowledge about how deep pipelines tend to get after NServiceBus and customers add behaviors.

Could the benchmark use 5, 10, 15, 20, 25, 30, 35, 40, and 50? Sure. It would also take longer and probably teach less.

Keep the matrix small enough that developers still run it while they are thinking.

Use production evidence where you can: telemetry, logs, support cases, known customer configurations, or load-test data. If you have none of that, start with a small set of plausible values and write down the assumption.

A benchmark with honest assumptions is useful. A benchmark with accidental assumptions is dangerous.

Use short runs to steer and long runs to trust

The improve-and-benchmark part of the performance loop is iterative. Try a change. Run the benchmark. Learn something. Try another change. If each run takes fifteen minutes, that loop becomes painful.

[ShortRunJob] is useful while steering. It gives quicker feedback, which is enough to tell whether an idea is obviously wrong or promising. It is not the final proof.

Once the direction looks good, remove the short-run shortcut and run a longer benchmark. Let BenchmarkDotNet do its job: warmups, iterations, statistical analysis, and allocation diagnostics.

Short runs help you move. Long runs help you trust.

They also help with coffee.

When the serious benchmark is running, do not start a video call, a build, or three browser-heavy dashboards on the same machine. Get a beverage of your choice. I prefer tasty espresso. If your boss sees you at the coffee machine for the tenth time that day, you have a perfectly reasonable answer: “I am measuring the performance optimization that should reduce our cloud bill.”

Protect benchmark correctness

Benchmark code needs tests around it too. Not because the benchmark itself ships to customers, but because broken benchmark setup produces confident nonsense.

Watch for side effects. If a collection grows across iterations, the first iteration may measure a different operation than the last one. If a cache warms during the measured method and never resets, the benchmark may measure the cache state more than the code under investigation.

Also prevent dead code elimination. The just-in-time compiler is allowed to remove work that has no observable effect. BenchmarkDotNet helps by consuming return values, and it provides consumer helpers for cases where returning a value is not enough.

  • Keep setup outside the measured method unless setup is the thing being measured.
  • Avoid state that grows across iterations unless growth is the scenario.
  • Make sure the measured work is consumed and cannot be removed.
  • Prefer explicit types in benchmark code when it reduces confusion.
  • Do not run heavy applications, video calls, or builds during serious benchmark runs.

BenchmarkDotNet protects against many common mistakes. It cannot protect against measuring the wrong thing.

Sometimes edge cases deserve a benchmark

The general rule is to benchmark common hot-path cases. The pipeline investigation had one exception: exception handling.

Normally, exception paths should not dominate a throughput benchmark. But in a messaging system, production incidents can produce thousands of exceptions while messages move to an error queue. Framework overhead still matters in that path because the system may need to fail fast, move messages safely, and recover capacity.

That is the difference between a rule and a thoughtless rule. Common cases come first. Edge cases get benchmarks when production behavior makes them important.

The rule serves the system, not the other way around.

The benchmark is not the finish line

After the benchmark shows an improvement, the temptation is to celebrate. Hold that thought.

The benchmark compared a controlled slice. The system still has serializers, transports, handlers, persistence, transactions, logging, and runtime behavior.

A benchmark can prove a local improvement. It still cannot prove the system became faster.

The next step is to put the improvement back into the profiling harness and check the larger picture again.

A benchmark tells us the change worked in isolation. The second profile tells us whether it still matters in context.

The profile found the target. The benchmark compared the change. The profiling harness reconnects the result to the system.

Further reading

Common questions

Is copy-pasting production code into a benchmark repository acceptable?

Yes, for a controlled experiment. Do not confuse the copy with production source. Keep the benchmark isolated, document what was removed, and avoid treating the copied code as another product implementation.

How many parameters should a benchmark have?

As few as possible while still representing the important production cases. If adding a parameter does not change a decision, it probably does not belong in the first benchmark.

Should benchmarks run in continuous integration right away?

Not necessarily. Local benchmark experiments are a good way to build performance awareness. Automated regression testing is a later maturity step, and it has its own problems.

Performance loop status

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

How Duende IdentityServer Filters Claims (And Why It Matters)

1 Share
The other day, Anders Abel, founder of Sustainsys, and I were chatting about the parts of Duende IdentityServer that quietly do the right thing without anyone noticing. The conversation started from a recent post about ASP.NET Core cookie size limits, specifically Fix #7, which recommends using IdentityServer’s configuration system to control which claims end up in tokens.
Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

What GitHub Copilot Taught Me About the AI Agent Harness

1 Share

TL;DR 

GitHub Copilot didn’t just change how I write code — it changed how I think about AI agents. 

What made the difference wasn’t the model itself, but the AI agent harness around it: the context, tools, guardrails, and workflows that allow AI to operate safely inside real systems. This post breaks down what Copilot gets right — and what every enterprise should learn before deploying AI agents at scale. 

Table of Contents

The shift in Experience 

As an app developer, my first real encounter with agentic AI happened while following the daily grind of building software. 

Like most developers, I started using GitHub Copilot for the basics: autocomplete, unit tests, refactoring, and explaining legacy code. At first, it felt like a smarter-than-average assistant, but strictly reactive. 

Then the experience changed. 

Copilot stopped just suggesting the next line of code and started participating in the workflow. It began reasoning over entire repositories, drafting architectural changes, and even working through tasks on a branch before handing the results back for review. 

The breakthrough wasn’t the AI model alone. It was the harness surrounding the model that makes it an agent. 

Why AI Agent Harness? 

An AI agent harness is the control layer that connects a reasoning model to real systems through context, tools, and rules — enabling safe, reviewable execution rather than freeform responses. This is a foundational concept for building scalable, trusted enterprise AI agents. 

GitHub Copilot succeeds because it lives where developers work: the codebase, the IDE, the terminal, and the pull request. It isn’t just a window; it’s an integrated participant in our existing environment. 

An agent harness is the controlled environment that allows an AI model to do useful work safely. It provides the model with the right knowledge, tools, guardrails, and human touchpoints. Instead of existing in a vacuum, the model fits into established engineering controls like diffs, approvals, and automated tests. 

This structure makes the AI trustworthy. We are seeing this same pattern emerge in other domains through harnesses like Microsoft 365 Copilot and Claude Co-work.

The Anatomy of an AI Agent Harness 

A functional harness consists of four key pillars: 

  • The Model: The reasoning core (e.g., GPT-4o, Claude 3.5, Gemini 1.5). 
  • The Context: The specific data fed to the model at the time of the request (RAG, open files, recent emails, or session state and memory). 
  • The Toolset: The integrations that allow the agent to act in the real world—reading files, executing scripts, or calling APIs. 
  • The Ruleset: The guardrails and system instructions that define not just what the AI can do, but what it is forbidden from doing. This includes permissions, policies, approval flows, and safety constraints. 

 

While the harness defines the control layer, the execution layer sits on top of it. 

This is where the agent: 

  • Interprets intent 
  • Generates or refines a plan 
  • Selects and orchestrates tools (“skills”) 
  • Iteratively executes tasks and evaluates outcomes 

Agent vs. Chatbot: What’s the Difference? 

When I use Copilot, I’m not looking for generic answers; I’m seeking help with a specific task. I might ask it to refactor a component or fix a bug. The value lies in the AI’s proximity to the work. 

That is the fundamental divide: 

  • A chatbot waits for a question. 
  • A harnessed agent understands the workspace, follows an approved process, and produces a reviewable outcome. 

In a business context, this means moving the AI out of a separate tab and into the flow of Outlook, Teams, and Excel—connected through enterprise knowledge like SharePoint. 

In short, chatbots optimise for conversation, while harnessed agents optimise for outcomes inside governed systems. 

Moving Beyond Traditional Apps 

Traditional apps rely on screens, forms, and menus. They are powerful, but they place the “cognitive load” on the user. A salesperson must know where the proposal template lives; a project manager must hunt through five different spreadsheets to find the “source of truth.” 

AI Agent Harnesses flip this dynamic. 

A harness sits across systems to bridge the gap between intent and outcome. Instead of navigating five different platforms, a user can simply say: “Prepare me for this customer meeting.” 

This doesn’t mean the agent should act autonomously. Just as GitHub Copilot shouldn’t blindly merge code into production, a business harness shouldn’t approve expenses or change records without oversight. The goal isn’t total autonomy; it’s assisted execution with reviewable outcomes. 

 

What did GitHub Copilot Teach me about building effective AI Agent harnesses?  

  1. Bring the agent to the work, not the work to the agent Developers don’t copy-paste their entire repo into a chatbot. They use Copilot where the code lives. Effective business AI must live in the tools teams already use. 
  2. Context is everything Copilot shines when a repository is well-structured and documented. Similarly, a business agent is only as good as the data and context you provide it. 
  3. Focus on outputs, not just answers Don’t just ask, “Tell me about this project.” Ask the agent to “Create a one-page executive summary including risks and next actions.” Value lies in the artifact, not the conversation. 
  4. Keep humans in the loop We don’t treat generated code as perfect. We test it, review the diff, and then merge. Apply the same rigour to AI-generated business reports or data analysis. 
  5. Turn repeated prompts into reusable patterns If your team constantly asks an AI to summarise project status, don’t keep typing the prompt. Turn that pattern into a reusable agent skill or a specialised agent. 

The Bottom Line 

 The real unlock in AI isn’t autonomy — it’s alignment. When agents are harnessed to enterprise systems, governed by existing controls, and designed for human review, they stop being experiments and start becoming teammates. Organisations that invest in harnesses, not just models, will be the ones that turn AI from novelty into capability. 

Thinking about building AI agents that actually fit your environment? 

Arinco works with engineering and platform teams to design agent harnesses that integrate with real workflows — Microsoft 365, Azure, GitHub, and lineofbusiness systems — with the guardrails enterprises need. Learn more about Arinco’s AI expertise. 

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories