Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
156907 stories
·
33 followers

7.0.1767.0

1 Share
No content.
Read the whole story
alvinashcraft
15 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

dotnet-1.13.0

1 Share

What's Changed

  • .NET: Harden dotnet-format workflow shell handling by @SergeyMenshykh in #6796
  • .NET: Add AgentSkillsSourceContext to AgentSkillsSource.GetSkillsAsync by @SergeyMenshykh in #6797
  • .NET: Foundry Hosting per-user session isolation and Responses v2 protocol fast-fail by @rogerbarreto in #6832
  • .NET: Pin patched OpenAPI dependencies to unblock NU1903 in sample restores by @rogerbarreto with @Copilot in #6853
  • Make skills source classes public with Experimental attribute by @SergeyMenshykh in #6838
  • .NET: Consolidate skill-source caching and make skill sources disposable by @SergeyMenshykh in #6827
  • .NET: Add skill approval options by @SergeyMenshykh in #6843
  • .NET: [BREAKING] Add file editing tools and align FileAccess/FileMemory store API by @westey-m in #6807
  • .NET: [BREAKING] Refactor OpenAI Hosting OptionsMapping to disallow passing options by default by @westey-m in #6855
  • .NET: Remove Experimental attribute from Skills API in Microsoft.Agents.AI by @SergeyMenshykh in #6861
  • .NET: Foundry Hosting gracefully tolerates lacking user identity when run locally by @rogerbarreto in #6870
  • .NET: Bump Azure.AI.Projects to 2.1.0-beta.4 by @rogerbarreto in #6795
  • .NET: Make default-approval harness features configurable + customizable shell tool by @westey-m in #6880
  • .NET: Improving DotNet samples by @alliscode in #6869
  • .NET: fix: Require explicit TokenCredential in AddFoundryToolboxes by @SergeyMenshykh in #6877
  • .NET: Update .NET version to 1.13.0 by @SergeyMenshykh in #6900

New Contributors

Full Changelog: dotnet-1.12.0...dotnet-1.13.0

Read the whole story
alvinashcraft
20 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

The best code is the one you shift+delete

1 Share

Everyone talking about coding models fixates on the same number: how fast the thing generates code. This misses the point by a lot. The story isn't about how fast the model writes code I would have written anyway.

It's that the model lets me do things that I might have done before but were expensive enough that I didn’t bother. I had three separate interactions this week that led to this blog post.

We had a production problem on an instance and no clear idea what was going on. What we did have was the log: something like 25-30 MB of compressed text describing everything that happened. And the actual problem wasn't spotting an error: finding errors is easy. The problem was correlation. We needed to line up different events across the timeline and understand how they were related.

In the past, I would have to trawl through the log and hope that something would pop up. These days, we can try handing the whole thing to the model and let it figure it out. If the log file wasn’t that big, it might even work. At dozens of MB, it doesn’t work (and it is quite expensive to try).

I went the other way. I told the model: “Write me a script that looks at the structure of this log (I gave it the first ten rows). I want the script to extract and aggregate the parts I care about, and render the result in a nice table to make it easier to understand.”

I had the view in under a minute, then I could explore the log and iterate:

  • “Oh, I see that there are a lot of indexes. How many of them are for the same database?”
  • “Give me a histogram of index changes and their versions over time.”

The model wrote some code, produced a view, and I looked at it. Rinse & repeat until I had a pretty good idea what was going on.

The customer had several different versions of their application, each with its own set of indexes, and they kept overwriting one another, leading to a huge amount of indexing overhead. RavenDB actually has a dedicated feature for that scenario.

Here's the part that matters: I never read the code the model wrote. The moment the investigation was done, I threw all of it away. It's throwaway code whose entire purpose was to help me see, and once I had seen enough, I discarded it.

Without the model to write this code, I could have written it myself, but it is enough of a chore that it probably wouldn’t make sense. Doing that manually would have taken roughly the same amount of time.


The second interaction is the opposite kind of work. I'm doing a fairly significant refactor of how a particular query executes in Corax, and that code is going into the product and staying there for a decade or two.

Here, the model writes and I drive. I tell it the overall direction, it goes somewhere, and then I decide if I like the result. I find it genuinely easier to react to something than to produce it from a blank page — having a first draft to push against is faster than writing it all myself. Nevertheless, this is my code. I went over every single line, and I know exactly what's in there.

That last part takes real discipline, and it's worth being honest about why. When you're in the zone chasing a change (try something out, revert, try something else, etc.), it is very easy to surface a few hours later staring at two thousand lines of changes you never actually wrote. You went through a dozen iterations, and somewhere in there the code stopped being something you authored and became something that merely happened. Guarding against that is really important, because otherwise that isn’t your code.

How do I make sure it's still mine? I lean on tests, of course — regression tests to prove I didn't break the old behavior, and new tests built alongside the change to pin down the new behavior. That's the baseline for anything long-lived.

The technique I found most useful for confirming that the change is really mine is a little unusual. I had it build a harness that runs a set of scenarios against both the old version and the new one. It's a small app that issues queries and operations to the database and visualizes the results.

Here is what this looked like:

You can see that I have a bunch of scenarios that I’m testing, and it is very easy for me to track progress and know where I need to pay attention. The actual app had a lot more capabilities: what got faster, what got slower, the ranges, the memory used, everything and the kitchen sink went into that, in a format that made sense for the sort of work I was doing.

Each time I had a new direction, it was either driven by this application or I asked the model to add it to the application, so I could keep working on it. I kept working until nothing in the new version was slower than the old, and the headline paths were dramatically faster.

As an example of what this looked like most of the time, I ran a query, and then I inspected the structure we got back. Here is what some of that looked like:

And as I went, I kept changing the harness itself — show me this instead, group it that way. Trivial to do, because the harness is also throwaway. I'm not carrying it forward. I don't care about its code quality. I never even looked at its code. It exists to make a point, and once it's made the point, it's gone.

I also used the model to add introspection hooks and visibility into what was going on inside the system, surfacing stuff that you would usually have to scratch your head and debug to understand. That meant that I was able to look at a problematic query, then just look at its query plan and the timing in it. I usually knew where I needed to pay attention from there.

To be honest, that part feels a lot like cheating.


In the cases of the log analyzer and the comparison harness, the code is literally disposable. It’s scaffolding that would be thrown away after the work is done. I didn’t pay any attention to that code (I never read it), and it was never meant to be useful for anything else.

In the case of the production code, I went over each line of code so many times, I dreamt of it. A lot of the code there consists of annoying building blocks (building a visualization of the query plan as a graph, for example), which were sped up enormously by asking the model to build it for us. A lot of other code there is hand-crafted to say exactly what I needed it to.

But the fact that I can get good scaffolding from the model for cheap changes a lot of the usual considerations. Because scaffolding is literally disposable code, I don’t have to worry about the usual code quality concerns. The log analyzer would probably take two or three hours to write (without the pretty graphics, which were helpful for easily identifying what was going on).

The comparison harness would be multiple weeks of effort and would probably be a non-interactive ASCII table. In fact, I don’t need to guess. Scaffolding isn’t something that is new, I do that all the time. Here is an example of one, written about a decade ago:

In the image above, you can find the internal structure of a B+Tree inside RavenDB. Contrast that with the following scaffolding for query plans. That one, by the way, is actually staying in the product.

Compare that to the level of insight that you can derive from the query plan higher up in this post. The B+Tree scaffolding, by the way, is essential to understanding the more complex scenarios. It paid for the time it took to write it many times over.

The ability to now effectively do the same at very little cost means that the act of building software itself is now easier. Not because someone else is writing the core code, but because everything else that we need to do is also easier.

Read the whole story
alvinashcraft
34 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Open Source AI Gap Map

1 Share

Open Source AI Gap Map

Current AI is "a global partnership building a public option for AI", founded as a non-profit at the AI Action Summit in Paris in February 2025 and backed by serious capital ($400m already committed).

They launched their Gap Map a couple of days ago - an attempt at indexing the current state of open source AI:

The Gap Map v0.1 details 421 products in depth: 266 software tools and libraries, 85 models, 50 datasets, and 20 hardware projects, produced by 228 organizations. These products are organized into 14 categories across 3 layers of the stack (model components, product / UX, and infrastructure). The remaining 24,400 artifacts constitute the uncategorized long tail of the open source AI ecosystem, and will carry no score until they are researched and cited.

The map itself is interesting to explore, but I'm more excited about the underlying data - released under an MIT license in the currentai-org/os-ai-map GitHub account: 1,184 YAML files plus the notebooks, schemas and other scripts used to help gather them.

Since the files are on GitHub you can use Datasette Lite to explore some of them - here are 16,185 GitHub repos the project is tracking as a CSV file loaded into Datasette Lite.

Tags: open-source, ai, datasette-lite, generative-ai, local-llms, llms

Read the whole story
alvinashcraft
46 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Context vs. Memory Engineering in Agentic AI Systems

1 Share
Compression on Arrival Tool outputs should be compressed after a call returns, not after the window fills.

Read the whole story
alvinashcraft
58 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Call For Papers Listings for 7/03

1 Share

A collection of upcoming CFPs (call for papers) from across the internet and around the world.

The post Call For Papers Listings for 7/03 appeared first on Leon Adato.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories