Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
152473 stories
·
33 followers

The disappearing AI middle class

1 Share
A conceptual illustration of a professional figure standing on a precipice after stepping through a standalone red door. He gazes out over a vast, cloudy abyss with a hand to his forehead in a gesture of concern, symbolizing the uncertainty and the widening gap in the AI industry’s new "barbell economy."

In 24 hours last week, OpenAI and DeepSeek made opposite bets on what frontier AI is worth. One says it is a closed product that just got more expensive. The other says it is open infrastructure that just got dramatically cheaper. The price gap between the two ends of the market is now wider than it has been in years, and the comfortable middle that most coding agents have been routing through is thinning out.

Until last week, you could pick a model on a fairly smooth price-performance curve. There was a top tier, a middle tier, and a budget tier, and most workloads found a comfortable home somewhere on the slope. That curve still exists, but it has stretched. What used to be a continuous gradient now looks more like two clusters with a gap in between, and developers building agents, coding assistants, and high-volume inference pipelines now have to think harder about which side to route to.

The comfortable middle that most coding agents have been routing through is thinning out.

The 24-hour split

On April 23, OpenAI shipped GPT-5.5, priced at $5 per million input tokens and $30 per million output tokens. That is exactly double the GPT-5.4 rate of $2.50 and $15. The model uses a 1M token context window and scores 82.7% on Terminal-Bench 2.0, up from 75.1% on GPT-5.4. OpenAI argues that the price hike is offset by token efficiency, claiming that GPT-5.5 uses fewer tokens to complete the same Codex task. The company has not published a precise effective-cost figure on its launch page, so the per-task economics depend on the workload.

On April 24, DeepSeek released V4-Pro and V4-Flash. V4-Pro is listed at $1.74 per million input tokens and $3.48 per million output tokens, with a launch discount documented through May 5, 2026. V4-Flash is priced at $0.14 input and $0.28 output. Both ship under the MIT license with full open weights on Hugging Face, and both default to a 1-million-token context window. V4-Pro hits 80.6% on SWE-bench, verified per the model card, within striking distance of Claude Opus 4.6.

Two pricing announcements in one weekend, in opposite directions. At list price, V4-Pro output tokens cost roughly one-ninth as much as GPT-5.5 output. Under the launch discount, the gap widens further. V4-Flash sits another order of magnitude below that. The arithmetic is striking. The framing matters more.

The widening gap for AI costs

ModelInput (per 1M)Output (per 1M)Context
Open AI GPT-5.5$5.00$30.001M Tokens
Anthropic Opus 4.7$5.00$25.001M Tokens
DeepSeek V4-Pro$1.74$3.481M Tokens
DeepSeek V4-Flash$0.14$0.281M Tokens

What OpenAI is actually selling

GPT-5.5 is not just a smarter model. It is the centerpiece of a stack. Codex inherits the upgrade with expanded computer use, browser interaction, and longer agentic runs. ChatGPT is the default for the Plus, Pro, Business, and Enterprise tiers. The API gets it with the same 1M context window the consumer surface now has.

The bet is that intelligence, the serving stack, the agent harness, and computer use are one product, and that product is worth twice the per-token price of the previous generation. Greg Brockman framed it during the launch briefing as a model that takes a sequence of actions, uses tools, checks its own work, and keeps going until a task is finished. The customer is the enterprise that wants the whole thing from a single vendor, with a single API key, a single safety review, and a single billing line. OpenAI is not selling tokens. It is selling outcomes, and outcomes are now priced accordingly.

OpenAI is not selling tokens. It is selling outcomes, and outcomes are now priced accordingly.

This also explains the cadence. GPT-5.4 shipped in early March. GPT-5.5 followed six weeks later. That is not a benchmark race. It is an enterprise procurement strategy. OpenAI is releasing fast enough to stay the default in every Q3 budget conversation, and pricing high enough to fund the next training run without diluting the premium positioning. The closed product is the moat.

OpenAI has not retired the cheaper tiers. GPT-5.4, GPT-5.4 mini, and GPT-5.4 nano remain on the price list, alongside Batch, Flex, Priority, and cached input rates. The middle of the OpenAI catalog still exists. What changed is where the flagship sits, and the flagship is what coding agents and frontier workloads default to.

What DeepSeek is actually shipping

V4 is not a price war move. The pricing is downstream of three different decisions.

The first is architectural. V4-Pro is a Mixture-of-Experts model with 1.6 trillion total parameters and 49 billion active per token. V4-Flash runs 284 billion total with 13 billion active. DeepSeek’s model card describes a hybrid attention scheme that combines compressed sparse attention with heavily compressed attention, designed to reduce 1M-token inference FLOPs and KV cache. The model achieves near-frontier benchmark scores while activating a small fraction of its weights per token. Smarter architecture, less compute.

The second is distribution. The MIT license is the most permissive open-source license available. Anyone can download the weights, host them, fine-tune them, embed them in a product, and ship that product commercially. V4-Flash at 13B active parameters runs on a multi-GPU cluster that mid-size teams can afford. V4-Pro requires more serious infrastructure, but the option exists. DeepSeek is betting that frontier intelligence becomes infrastructure the way Linux did, and that the lab releasing the weights captures the ecosystem rather than the runtime margin.

DeepSeek is betting that frontier intelligence becomes infrastructure the way Linux did, and that the lab releasing the weights captures the ecosystem rather than the runtime margin.

The third is hardware. On the same day, Huawei announced that its Ascend supernodes offer full support for V4 inference. Reuters reported that V4 was adapted for Huawei’s most advanced Ascend AI chips and that Huawei said its chips were used for part of V4-Flash’s training.

DeepSeek did not say whether V4-Pro was trained on the same hardware as the earlier V3 and R1 models, which ran on Nvidia. SMIC, the Chinese contract manufacturer that fabricates Ascend silicon, jumped 10% in Hong Kong trading on the news.

Hua Hong Semiconductor jumped 15%. The narrower signal is that high-end open-weight inference, and at least part of one model’s training, can be adapted to the Ascend stack. That is not the same as full independence from Nvidia, but it is the first frontier-tier release where the question is even worth asking.

One important caveat: DeepSeek V4 is text-only at launch. DeepSeek has stated that multimodal capabilities are in progress, but image and video are not yet supported. For workloads that require multimodal reasoning, V4 is not a drop-in alternative to GPT-5.5 or Opus 4.6 today.

Cheaper inference is the consequence of these three decisions, not the strategy. The strategy is to make text intelligence look like a commodity.

The middle is thinning, not gone

Before last week, a developer building a coding agent had a comfortable middle option. GPT-5.4 at $2.50 and $15 sat in a sweet spot. Cheap enough to scale, smart enough for most agentic work, hosted by a vendor everyone trusts. That tier is still on the price list, but it is no longer the flagship, and the new flagship costs twice as much.

GPT-5.5 took the upper slot at $5 and $30. V4-Pro took the lower slot at one-ninth of GPT-5.5 on output, before any discount. V4-Flash sits another order of magnitude below that. Anthropic’s Opus 4.7 at roughly $5 input and $25 output sits next to GPT-5.5 in the premium tier, not in the gap between premium and open-weight.

For developers, the choice is no longer purely about which model is on a smooth curve. The choice is which economics to route to for which task. Pay for the integrated product or run the open infrastructure. Many production stacks will end up routing across both because the price gap is now wide enough to justify the engineering cost of routing logic.

What this means for the harness layer

Three concrete shifts follow from the polarization.

The first shift is that agent harnesses become more model-agnostic by necessity. Cursor, Claude Code, OpenAI Codex, and the open-source harnesses OpenClaw and Hermes Agent now all benefit from clean routing logic that can move workloads between the two economies based on task complexity.

A coding agent that uses GPT-5.5 for the planning step and V4-Flash for the bulk-edit step is no longer exotic. It becomes an obvious architecture once the price gap is this wide. DeepSeek has noted that V4 is optimized for agent tools, including Claude Code and OpenClaw, suggesting the harness ecosystem has been waiting for this.

The second is that self-hosting math changes for the first time in two years. V4-Flash at 284B total parameters and 13B active runs on multi-GPU setups that mid-size teams can afford. The trade-off is real. You give up the managed reliability of a hyperscaler API in exchange for predictable inference costs and full control over the model. For workloads where token volume is the binding constraint and multimodality is not required, that trade-off is now sharper than it was a week ago.

The third is that the Nvidia-only assumption is starting to look less absolute. The market reaction to V4 was not solely about DeepSeek. It was about the realization that a frontier-tier model can ship optimized for non-Nvidia silicon, and that Chinese AI infrastructure is closer to running on domestic chips than most observers assumed a year ago. For developers, this expands the long-run set of viable inference targets. For Nvidia, it tightens the timeline on the China question.

What’s next

The cost frontier no longer behaves like a smooth curve. It is two clusters of economics with a stretched gap in the middle, and the gap is not going to close on its own in the near term. OpenAI will continue to release fast and price up, because the integrated product is the moat. DeepSeek will continue to release open weights and price down, because the commodity infrastructure thesis depends on adoption. Both can be right for different workloads, and the same agent can route between both within a single task.

Anthropic’s Claude Opus 4.7 sits in the premium tier with OpenAI for now, but the next 90 days will reveal whether anyone tries to defend the thinning middle. The Chinese open-weight competition behind DeepSeek (Qwen, Kimi, GLM) will face pressure to match V4’s pricing and feature set, or risk ceding ground. And the harness layer is about to become the most interesting place in the stack, because routing logic across two economics is no longer optional. The next piece will look at how the open-source harnesses are positioning for exactly this moment. Stay tuned.

The post The disappearing AI middle class appeared first on The New Stack.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Announcing Files v4.0.41

1 Share
Announcing Files Preview v4.0.41 for users of the preview version.

Read the whole story
alvinashcraft
18 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

ReSharper Made VS Code a Real Option for My .NET Work

1 Share

Most posts about ReSharper for VS Code want to convert you. This one doesn’t.

I write most of my .NET in Rider. I’m not moving. What changed for me back in March 2026, when JetBrains shipped the official ReSharper extension for VS Code, is something smaller and more useful than a conversion story. I can finally pick up VS Code for a C# task and not feel like I just took a forty-percent IQ hit at the door.

If that sounds dramatic, you have probably never tried to triage a production bug from a borrowed laptop.

The Quiet Tax on Doing .NET in VS Code

Here is the part nobody at Microsoft puts in a slide deck. The C# story in VS Code, even with C# Dev Kit, has always been the polite-but-shallow option. Rename works. Go-to-definition works. Build works. The basics behave. Then you ask for anything past the basics and you watch the seams come apart.

You want to extract an interface from a class. You can’t, really. You want to ctrl-click into a NuGet package’s source. You can’t. You want to move six classes out of one giant file. You can do it the manual way, with cut and paste, like you’re back in 2010. You want a real Solution Explorer. You get a tree that almost works the way Visual Studio’s does and then surprises you somewhere uncomfortable.

This was the tax. Every time I opened VS Code for a C# task, I paid it. Pair programming on a teammate’s Mac, peeking at a coworker’s pull request over coffee, working inside a dev container, an SSH session into a build box, a quick fix on a Linux laptop that will never see Visual Studio. All of those moments came with the same compromise. Reach for VS Code, accept that you are about to do half the job in twice the time, get on with it.

That tax is what got cheaper this year. Not gone. Cheaper.

What Actually Showed Up

ReSharper for VS Code is the same engine that has been running in Visual Studio for two decades and powering Rider for ten years, packaged for VS Code and any compatible editor. It went through a public preview most of last year, and the official 1.0 landed in early March 2026.

It runs in VS Code, Cursor, Google Antigravity, VSCodium, Windsurf, basically anything VS Code-compatible. It is on the Open VSX Registry now, so the Cursor and Codium crowd gets auto-updates rather than chasing VSIX downloads.

Free for non-commercial work, including learning, OSS contributions that don’t earn you money, content creation, and hobby projects. Paid licenses for commercial use, included in the regular ReSharper, dotUltimate, and All Products Pack subscriptions.

Two extension IDs you’ll actually see when you install it:

  • JetBrains.resharper-code is the C# language support
  • JetBrains.ReSharper is the wrapper

In practice, type “ReSharper” into the Extensions panel, click Install, and the right pieces land.

Why You’d Pick This Over C# Dev Kit

C# Dev Kit is not bad. That is also the most generous thing I can say about it.

It is a perfectly serviceable answer to a question Microsoft mostly asked itself. JetBrains spent twenty years asking a harder version of the same question and shipping the answer in shorter loops. The gap shows.

A few places the gap shows most:

Inspections that earn their keep. Roslyn catches what Roslyn catches. ReSharper catches the rest. The 2026.1 release added a wave of inspections built around runtime safety problems. Short-lived HttpClient instances that will exhaust sockets. LINQ chains that quietly enumerate twice. ImmutableArray<T> initializations that compile fine and behave wrong at runtime. These are the bugs that ship to production and ruin a Friday.

Refactorings with teeth. Extract Interface, Change Signature, Convert Method to Property, Make Static, Safe Delete, Move Type to File, Invert If, Introduce Variable, Inline Variable. C# Dev Kit’s refactoring menu is a postcard. ReSharper’s is a phone book.

Screenshot 4 (The phone book). ReSharper’s Alt+Enter menu open on a single line of code, fully expanded. Eight to twelve entries visible, each with descriptive text and a hotkey. Place this directly after the postcard line so the visual punchline lands hard. Caption: “One line. Bring snacks.”

Navigation that works on code you don’t own. Ctrl-click into a method that lives inside a NuGet package and ReSharper decompiles the assembly so you can read the actual implementation right there in the editor. C# Dev Kit cannot do this. If you have ever debugged a third-party library by guessing what it does, you already know how badly you want this feature.

A real Solution Explorer. Source generators visible in the tree. NuGet packages manageable in place. Project references where a .NET dev’s hand reaches for them. The same one Rider users already trust.

Live templates and postfix completion. Type the name of a collection, then a dot, then foreach, then Tab. You get a fully-formed loop with the iterator variable named correctly. Type any expression, dot, notnull, and ReSharper wraps it in a guard for you. The first week, this feels like a parlor trick. The second week, plain VS Code feels broken.

Cross-editor consistency. If you also work in Rider or full Visual Studio with ReSharper, your code style settings, your shortcuts, and your muscle memory all ride along.

One honest caveat before we go further. ReSharper does not yet ship its own debugger for VS Code. It is on the roadmap. Until it lands, you keep Microsoft’s C# extension installed and use vsdbg. The setup section below covers exactly how to wire that up.

The Features That Pay Back the Install in the First Hour

Inspections that show up the moment you open a file

Open a project, watch the gutter start working. Most of what you see will be familiar style or correctness hints. Some of it won’t be. ReSharper flags an async method declared as returning void with a one-line explanation of why that pattern eats exceptions silently. It catches a LINQ chain that calls Count() inside a loop. It tells you when an ImmutableArray<T> got initialized in a way the runtime treats as the default value. Hit Alt+Enter (or Cmd+. on Mac) and the lightbulb usually has the fix queued and ready.

Refactorings you would trust on a Friday afternoon

Rename works across project boundaries, including in Razor and Blazor markup. Extract Interface lifts the members you check off, creates the new interface, and rewrites the source class to implement it. Move Type to File takes the six classes you crammed into one .cs file and gives each one its own home without breaking references. The first time you run Change Signature on a method that has eighteen call sites and watch ReSharper rewrite all of them correctly, you start trusting it.

Navigation, including into code you didn’t write

Go to Everything (Ctrl+T on Windows, Cmd+T on Mac) is the best fuzzy-finder in the .NET tooling field. Types, files, symbols, all in one box. Go to Implementation on an interface declared inside a NuGet package drops you into a decompiled view of the actual implementation. Find Usages groups results by project, file, and usage kind, so you can scan a hundred-result list without your eyes glazing over.

A Solution Explorer that should have been there from day one

Tree view of the solution, organized the way a .NET dev expects. Projects, dependencies, NuGet packages, source generators, all of it. Right-click for Add Project, Manage References, Manage NuGet. None of this was missing in C# Dev Kit, but it was rougher and a step or two slower at every interaction.

Unit testing that finds your tests on its own

NUnit, xUnit, and MSTest discovered automatically. Run from the gutter. Results in a panel. One click jumps from a failing test to the line that broke it. If you have used the test runner in Rider, this is the same one wearing a VS Code coat.

Live templates and postfix completion (the secret weapon)

The feature that turns ReSharper users into ReSharper missionaries. Type a collection name, dot, foreach, Tab, get a real loop with the iterator named for you. Postfix completion lets you type the expression first and the wrapper second. someValue.notnull becomes a null guard around someValue. someCollection.first becomes a First() call you can refine. Compounded over a workday, these save you hundreds of keystrokes you didn’t know you were spending.

The Setup: Five Minutes, Done Right

This is where most blog posts hand-wave. Don’t skip this part. There is one specific combination of extensions that gets you the JetBrains experience without giving up F5, and another combination that fights itself for an hour.

Install these:

  1. The ReSharper extension. Search “ReSharper” in the VS Code Extensions panel and install. Same name on Open VSX if you are in Cursor or VSCodium.
  2. The .NET SDK. ReSharper assumes a working dotnet is on your PATH. Run dotnet --info in a terminal to confirm. If that command works, you are set.
  3. The Microsoft C# extension (ms-dotnettools.csharp). Keep this one. ReSharper does not yet ship its own debugger, and the Microsoft C# extension is what brings the coreclr debugger you need for ASP.NET Core, console apps, Blazor Server, and the rest.

Switch off these:

  1. C# Dev Kit (ms-dotnettools.csdevkit). JetBrains’ own getting-started page recommends turning it off while ReSharper is active, and the recommendation is correct. The two extensions overlap heavily on Solution Explorer, the test runner, and project templates. Run them both at once and you get duplicate squigglies, doubled inspections, and slower indexing. You can disable per workspace if you want to keep it around for projects where you’d rather use it.
  2. IntelliCode for C# Dev Kit. Switch off alongside the Dev Kit, for the same reason.

The combination that works: ReSharper on for the smart features. Microsoft C# extension on for the debugger. C# Dev Kit off. That is the lineup.

One-time configuration:

  1. Open the workspace. ReSharper auto-detects .sln, .slnx, .slnf, or a bare .csproj. If it finds more than one solution, it shows an Open Solution picker.
  2. Let it index. First open on a big solution takes a minute. After that, opens are quick.
  3. launch.json for debugging. The Microsoft C# extension will offer to generate one if you don’t already have it. The minimal ASP.NET Core configuration looks like this:
{

  "name": ".NET Launch",

  "type": "coreclr",

  "request": "launch",

  "preLaunchTask": "build",

  "program": "${workspaceFolder}/bin/Debug/net9.0/YourApp.dll",

  "cwd": "${workspaceFolder}",

  "env": { "ASPNETCORE_ENVIRONMENT": "Development" },

  "serverReadyAction": {

    "action": "openExternally",

    "pattern": "\\bNow listening on:\\s+(https?://\\S+)"

  }

}
  1. Optional but worth doing. Open Settings, search “ReSharper,” and tune the inlay hints and inspection severity if the defaults feel too loud out of the gate.

Where to Go When You Want More

If the install sticks for you, these are the bookmarks worth keeping:

  • The official docs at jetbrains.com/help/resharper-vscode/. Surprisingly readable, with separate getting-started pages for VS Code, Cursor, and Antigravity.
  • The .NET Tools Blog at blog.jetbrains.com/dotnet/. Release notes are where the real feature deltas live.
  • Issue tracker and community forum, both linked from inside the extension. Click the R# icon, then Contact Support.
  • Keyboard shortcuts. If you are coming from Visual Studio with ReSharper or from Rider, install the JetBrains keymap for VS Code so Ctrl+T, Alt+Enter, and the rest behave the way your fingers remember.
  • Pricing at jetbrains.com/resharper/buy/. Free for non-commercial as said before, paid otherwise.

A Tuesday That Actually Worked

Last Tuesday I cloned a thirty-project ASP.NET Core monolith I had never seen before. The kind of repo that opens in Visual Studio with a coffee break attached. I was on a Linux laptop in a coffee shop, on hotel WiFi, and I needed to ship a small fix before standup.

I opened it in VS Code. Not because I wanted to. Because Visual Studio was not on the laptop, and never will be.

ReSharper indexed the solution in about ninety seconds. Ctrl+T, three letters of the controller name, landed on the file. Ctrl+B into the service it called. Ctrl+B again into a method that lived inside a NuGet package, decompiled in front of me, the bug staring up. Alt+Enter, Extract Method, write a test, F5 to run with the Microsoft debugger, breakpoint hit, fix verified, push the branch, PR up.

Twenty-two minutes. From a laptop that will never see Visual Studio.

That’s the case I want to make. I am not switching editors. Rider is still where I live. I am just glad that VS Code stopped being the place where I do half the work in twice the time. The tax is cheaper. The option is real. And the next time I am on a borrowed Mac at three minutes to standup, I know exactly what extension I am installing first.

The post ReSharper Made VS Code a Real Option for My .NET Work first appeared on Chris Woody Woodruff | Fractional Architect.

Read the whole story
alvinashcraft
34 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Vibing, Harness and OODA loop

1 Share

cover

Hey, have a look at what I made during the weekend. I had some time, grabbed a beer, turned on the computer and tried to code this feature. If I could do so much during the weekend, how much could you and your team do with it in 2 weeks?

It’s almost a 1:1 quote of what I heard from the startup founder I worked with over 10 years ago. I’m sure that you’ve heard similar phrases from people you worked with. We all know the annoying type of person who doesn’t code anymore but thinks, “I still got it!”. Then they threw a piece of stuff at you to “just fine-tune it a bit and do final touches”. Then they’re the first ones to ask “Why so long?“.

Nowadays, the Internet is full of such people. They shout about what they did with Claude or how much progress LLM tools have made. Some even predict the end of coding. I already wrote that this is wrong perspective. I won’t repeat that, but I want to say that…

Vibing isn’t new and isn’t always an issue.

I’m saying that LLM tools are an appraisal for ignorance. The more ignorant we are of the topic we’re working with, the better we see the outcomes. And that, by itself, is not always bad, as there’s power in ignorance if we focus on getting it done with the simplest tools we have.

Still, this can be terrible if we fall in love too much with what we’ve vibed.

To understand why that “weekend beer” energy is both a superpower and a liability, we need to look at the OODA Loop.

OODA loop

Disclaimer, it’s not a competition for Ralph Wiggum Loop. It’s much older and generic.

Military strategist John Boyd developed the OODA loop (Observe, Orient, Decide, Act) for fighter pilots. In a dogfight, the pilot who cycles through these four stages the fastest and most accurately survives.

In software, the “dogfight” is the gap between your intent and the production-ready feature.

OODA loop is built from four steps:

  1. Observe - This is the intake of raw, unfiltered information. In our world, this means looking at the state of the system.
  2. Orient - This is the most critical and difficult stage. It’s where you filter your observations through your experience, culture, and technical knowledge.
  3. Decide - Based on your orientation, you formulate a hypothesis.
  4. Act - You execute.

Getting back to my favourite founder and LLM-based tools.

The reason founder could build a PoC in a weekend while the team needed more than two weeks is that he bypassed the Observe and Orient phases. He went straight from a vague idea to Act.

If we skip or brush past the observation step, it feels like lightning speed. If the fancy UI grid is there and it does something we wanted, we move on. We’ve outsourced Orientation to our own ego. It’s too easy to assume that because we wrote it, it works.

Observation is the intake of raw data. In a professional environment, our eyes aren’t enough. We need a Harness. If we don’t have automations, tests, integration tests, and pristine traces, we aren’t observing the system; we’re just looking at it. If the inputs are messy, our observation is clouded.

But real engineering, the kind that takes those “two weeks”, is about closing the loop properly. That’s also where we need different perspectives and knowledge sharing.

Orientation is where you process those observations. This is the part where LLMs make us feel smarter than we are. If we don’t understand how a database handles concurrent connections, our “orientation” of a generated script will be shallow. We’ll see code that “looks” right, decide it’s fine, and act by deploying it.

The “I still got it” crowd loves the Decide and Act phases because that’s where the visible progress happens. LLM tools have made these phases nearly instantaneous. We can decide to build a feature and have the code for it in ten seconds.

The problem is that the faster we Act, the faster we need to Observe. If our “Act” phase takes seconds but our “Observe” phase requires a manual weekend of clicking around and drinking beer, our OODA loop is broken. We’re just generating a pile of stuff that we haven’t actually verified.

That’s why the team usually needs more than an imaginary “two weeks”. They are not “fine-tuning” the single-brilliant-dude masterpiece. They are building the infrastructure required to make the OODA loop sustainable.

And to make that possible, they need to run the full loop: Observe, Orient, Decide, Act. And do it multiple times. That takes time, but it’s required to assess the direction, automate what needs to be automated, and ensure they can iterate further and run this loop sustainably. That’s critical for delivering the outcome at the expected pace.

Of course, there’s a danger here, overfocusing on the Orient and Decide can lead to overengineering, building stuff we don’t need. That’s where ignorance can be blissful, especially when we connect it with humility. Being humble about what we don’t know and trying things the easiest way, then learning and making enhancements. Still, humility fails under deadline pressure. The harness doesn’t.

Let me give you…

The example

I’m adding proper Observability and Open Telemetry to Emmett right now. I spent some time working on it and instrumented the first component: Command Handling.

Of course, I had tests to prove it works, but I don’t trust them enough, and I wanted to try it on a real sample, since you never know until you run it. Even the best test suite won’t tell you all.

So I decided to plug it into the sample. See if it works, how ergonomic the API is and how it fits conventions in this area.

To do it, I decided to use Grafana stack and set it up with Docker Compose. So, stable, boring stack. Not going to lie, I vibed the config. Not that there are no docs, but I intentionally wanted to see the typical config people use.

If someone says LLM-based tools are great at proof of concepts, they don’t run the stuff they vibed. If I made the observation based on the initial config, then an oriented decision would be that it won’t work. Of course, then I did the typical back-and-forth, with the LLM tool doing some Linux command Voodoo to make it work. Once. Then, if you try to repeat it, you won’t know how to do it without doing Voodoo again.

Again, that’s not much different from the other stuff we do. I’m sure that you had multiple cases, when someone didn’t use Continuous Deployment tools, but clicked through Azure, AWS, GCP portal, deployed the stack, and then there was no trace on how to set it up again (e.g. to have a different environment for testing or demos for customers).

So, we need a harness, we need a leash to keep our process on track.

How to do the harness? My advice is to start simple. We may ask LLMs to give us shell scripts, and we may ask them to run them multiple times. We also need experience and knowledge of what we want to achieve and the tools we use. It’s fine not to remember all the YAML config to set up the Grafana stack, but it’s not fine not to understand why you even use it, how it relates, and how to set it up.

Still, our first loop can close on the first working solution, even a manually vibed one. But that’s not even a PoC. We need to automate them.

I asked LLM to take notes on what issues it had, and it solved them. Then, based on that, I asked to research how to code it in TypeScript. And to use tools I know, used in past, validating if there are no new more modern ones. For instance, I was a big fan of Gulp.js and Bullseye in the past, but they’re mostly dead. I wanted to have something in the same spirit, using native, maintained tooling.

I ended up with the following tools:

Then I asked it to create the script to automate the shell Voodoo they did to make Grafana stack and Docker Compose work.

Essentially, it should:

  1. Run Docker Compose script starting up services (Grafana, Prometheus, Loki, Tempo, PostgreSQL, etc.).
  2. Wait for them to check when they’re ready (it usually takes some time).
  3. Start the application and make a request.
  4. Check if the predefined dashboard with Emmett metrics appears, and shows expected traces and metrics.

Initial diagnostic tools looked like that

async function fetchWithDiag(label: string, url: string, init?: RequestInit) {
  const res = await fetch(url, init);
  if (!res.ok) {
    const body = await res.text().catch(() => '(could not read body)');
    console.error(`\n  ✗ ${label} → HTTP ${res.status}\n  body: ${body}\n`);
  }
  return res;
}

async function diagnoseCollector() {
  const text = await fetch(URLS.otelCollectorMetrics)
    .then((r) => r.text())
    .catch(() => 'unreachable');
  const emmett = text
    .split('\n')
    .filter((l) => l.startsWith('emmett_') && !l.startsWith('#'))
    .slice(0, 5);
  console.log(
    emmett.length
      ? `\n  collector /metrics (emmett lines):\n  ${emmett.join('\n  ')}`
      : '\n  collector /metrics: no emmett_* lines found',
  );
}

async function diagnosePrometheus() {
  const json = await fetch(
    `${URLS.prometheus}/api/v1/label/__name__/values`,
  )
    .then((r) => r.json() as Promise<{ data: string[] }>)
    .catch(() => ({ data: [] as string[] }));
  const emmett = json.data.filter((n) => n.startsWith('emmett_'));
  console.log(
    emmett.length
      ? `\n  Prometheus emmett_* metrics: ${emmett.join(', ')}`
      : '\n  Prometheus: no emmett_* metrics found yet',
  );
}

async function diagnoseLoki() {
  const labels = await fetch(`${URLS.loki}/loki/api/v1/labels`)
    .then((r) => r.json() as Promise<{ data?: string[] }>)
    .catch(() => ({ data: [] as string[] }));
  console.log(`\n  Loki labels: ${(labels.data ?? []).join(', ') || '(none)'}`);
}

async function diagnoseDockerLogs(service: string, lines = 10) {
  const { stdout } = await execa('docker', [
    ...COMPOSE,
    'logs',
    '--tail',
    String(lines),
    service,
  ]).catch(() => ({ stdout: '(could not get logs)' }));
  console.log(`\n  docker logs ${service} (last ${lines}):\n  ${stdout.split('\n').join('\n  ')}`);
}

Are they pretty? No. Can they be improved? Yes. Do they have to be improved at this specific moment? No.

The setup uses test infrastructure


const CLEANUP = process.env['CLEANUP'] === '1' || process.env['CLEANUP'] === 'true';
const CLEANUP_AFTER = process.env['CLEANUP_AFTER'] === '1' || process.env['CLEANUP_AFTER'] === 'true';
const NO_START = process.env['NO_START'] === '1' || process.env['NO_START'] === 'true';

// ─── configuration ───────────────────────────────────────────────────────────

const COMPOSE = ['compose', '-f', 'docker-compose.yml', '--profile', 'observability'];

const URLS = {
  app: 'http://localhost:3000',
  prometheus: 'http://localhost:9090',
  tempo: 'http://localhost:3200',
  loki: 'http://localhost:3100',
  grafana: 'http://localhost:3001',
  otelCollectorMetrics: 'http://localhost:8889/metrics',
};

// Fresh client per run — avoids stale cart state from previous runs.
const SERVICE_NAME = 'expressjs-with-postgresql';
const CLIENT_ID = randomUUID();
const CART_ENDPOINT = `${URLS.app}/clients/${CLIENT_ID}/shopping-carts/current/product-items`;
const CONFIRM_ENDPOINT = `${URLS.app}/clients/${CLIENT_ID}/shopping-carts/current/confirm`;

// Matches the .http file — unitPrice is resolved server-side.
const ADD_PRODUCT_BODY = JSON.stringify({ productId: randomUUID(), quantity: 10 });


before(async () => {
  console.log(`\n▶ client ID for this run: ${CLIENT_ID}\n`);

  if (NO_START) {
    console.log('▶ --no-start: skipping docker compose and app startup');
    return;
  }

  if (CLEANUP) {
    console.log('▶ --cleanup: killing port 3000 and tearing down stack (down -v)…');
    await execa('bash', ['-c', 'fuser -k 3000/tcp 2>/dev/null || true']).catch(() => {});
    await new Promise((r) => setTimeout(r, 500));
    await execa('docker', [...COMPOSE, 'down', '-v', '--remove-orphans'], {
      stdio: 'inherit',
    });
  }

  const stackReady = await fetch(`${URLS.prometheus}/-/ready`)
    .then((r) => r.ok)
    .catch(() => false);

  if (stackReady) {
    console.log('▶ observability stack already up — skipping docker compose up');
  } else {
    console.log('▶ starting observability stack…');
    await execa('docker', [...COMPOSE, 'up', '-d'], { stdio: 'inherit' });
  }

  console.log('▶ waiting for backends…');
  await waitFor(() => checkUrl('Prometheus', `${URLS.prometheus}/-/ready`), {
    timeout: 90_000, label: 'Prometheus',
  });
  await waitFor(() => checkUrl('Grafana', `${URLS.grafana}/api/health`), {
    timeout: 90_000, label: 'Grafana',
  });
  await waitFor(() => checkUrl('Tempo', `${URLS.tempo}/ready`), {
    timeout: 90_000, label: 'Tempo',
  });
  await waitFor(() => checkUrl('Loki', `${URLS.loki}/ready`), {
    timeout: 90_000, label: 'Loki',
  });

  // /health returns { status: 'ok', service: 'expressjs-with-postgresql' } —
  // checking service name lets us distinguish our app from other processes on :3000.
  const checkOurApp = () =>
    checkUrl('app /health', `${URLS.app}/health`, async (res) => {
      const json = (await res.json().catch(() => ({}))) as { service?: string };
      if (json.service !== SERVICE_NAME) {
        console.log(
          `    app /health: service="${json.service ?? '(missing)'}", expected="${SERVICE_NAME}"`,
        );
        return false;
      }
      return true;
    });

  const appIsOurs = stackReady && (await checkOurApp());

  if (appIsOurs) {
    console.log('▶ app already running and healthy — skipping npm start');
  } else {
    const portTaken = await fetch(URLS.app).then(() => true).catch(() => false);
    if (portTaken) {
      // Port is occupied but not by our app — stale process or unrelated service.
      console.error(
        '\n  ✗ Port 3000 is occupied by a process that is not this app.\n' +
          '  It may be a stale version of this app (connected to a wiped database)\n' +
          '  or a completely different service.\n' +
          '  Fix: run  npm run verify:observability:cleanup  to kill it and restart,\n' +
          '  or manually free port 3000.\n',
      );
      process.exit(1);
    }

    console.log('▶ starting app…');
    app = execa('npm', ['start'], { stdio: 'inherit' });

    await waitFor(checkOurApp, { timeout: 60_000, label: 'app /health' });
  }

  console.log('▶ setup complete\n');
});

As you see, nothing fancy, the cleanup is even simpler

after(async () => {
  if (app) {
    console.log('\n▶ stopping app…');
    app.kill('SIGTERM');
    await app.catch(() => {});
    console.log('▶ app stopped');
  }

  if (CLEANUP_AFTER) {
    console.log('▶ tearing down stack (down -v)…');
    await execa('docker', [...COMPOSE, 'down', '-v', '--remove-orphans'], {
      stdio: 'inherit',
    });
    console.log('▶ stack torn down');
  } else {
    console.log('▶ stack is still running');
    console.log('▶ to clean up: npm run verify:observability:cleanup');
  }
});

Having that we can run tests:


test('successful command returns x-trace-id header', async () => {
  const res = await fetchWithDiag('POST add product', CART_ENDPOINT, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: ADD_PRODUCT_BODY,
  });

  assert.equal(res.status, 204, `Expected 204 — body logged above`);

  const header = res.headers.get('x-trace-id');
  if (!header) {
    console.error(
      '  ✗ x-trace-id missing — verify the wrapper app in src/index.ts ' +
        'adds it via @opentelemetry/api before mounting the emmett app',
    );
  }
  assert.ok(header, 'x-trace-id header missing');
  assert.match(header, /^[0-9a-f]{32}$/, `"${header}" is not a 32-hex trace ID`);

  traceId = header;
  console.log(`  trace ID: ${traceId}`);
});

test('OTel collector exposes Emmett metrics on port 8889', async () => {
  // Send a few more requests so metrics are definitely recorded.
  for (let i = 0; i < 5; i++) {
    await fetch(CART_ENDPOINT, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: ADD_PRODUCT_BODY,
    });
  }

  try {
    await waitFor(
      async () => {
        let text: string;
        try {
          const res = await fetch(URLS.otelCollectorMetrics);
          text = await res.text();
        } catch {
          console.log('    collector :8889: connection refused');
          return false;
        }
        const emmettLines = text.split('\n').filter((l) => l.startsWith('emmett_') && !l.startsWith('#'));
        if (emmettLines.length === 0) {
          const allFamilies = [...new Set(text.split('\n').filter((l) => !l.startsWith('#') && l).map((l) => l.split('{')[0]))].slice(0, 5);
          console.log(`    collector :8889: no emmett_* metrics yet. Present: ${allFamilies.join(', ') || '(none)'}`);
          return false;
        }
        return true;
      },
      { timeout: 90_000, interval: 5_000, label: 'emmett metrics on collector :8889' },
    );
  } catch (err) {
    await diagnoseCollector();
    await diagnoseDockerLogs('otel-collector');
    throw err;
  }
});

I put it into a single file that can be run as a regular Node.js script.

It already showed me (and Claude) that what they initially did wasn’t working if you try to run it multiple times. It also showed that doing a full cleanup and rebuild, and making it reproducible, needs more work.

Is it done? Not yet; it takes too much time and resources to run it continuously throughout the pipeline. The code is a bit messy, so it needs to be organised. It’s segmented into blocks, includes basic automation and tests, and has already gone through some failures to get it done.

Could I do it better? Sure, but that’s not the point. I wanted to show you my findings during weekend vibing (without beer tho),the real, not polished iteration.

The main idea behind OODA loops is not to be perfect, but to iterate quickly, gather feedback as soon as possible, learn from it, develop another theory, and verify it through action.

It’s not about vibing, but it’s also not about analysis paralysis.

I hope you’re now better equipped to think about when vibing — with beer or without, with LLMs or without — actually helps, and when it doesn’t.

Vibe coding is just high-frequency steering. It only works if you have a Harness: a mechanical way to observe and orient, so you don’t steer the whole project into a wall.

Act takes seconds now. Observe takes as long as it always did. Without a harness, you’re not going faster; you’re just making more stuff you haven’t checked.

Harness is not magic, a new discipline, or the next buzzword; I hope I showed you that a bit in this article on what it may look like.

So iterate fast, but wisely remembering to do the full loop. It’s great that LLMs can help us make Acting faster, but we should not skip other steps. We should aim for a fast feedback loop to iterate in the right direction and achieve continuous improvement, to deliver proper value.

Just like Vibing isn’t new, we shouldn’t abandon old engineering practices. We should also not replace collaboration with solitary self-high fives.

Check also:

Cheers!

Oskar

p.s. Ukraine is still under brutal Russian invasion. A lot of Ukrainian people are hurt, without shelter and need help. You can help in various ways, for instance, directly helping refugees, spreading awareness, putting pressure on your local government or companies. You can also support Ukraine by donating e.g. to Red Cross, Ukraine humanitarian organisation or donate Ambulances for Ukraine.

Read the whole story
alvinashcraft
43 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Things to Do in Philadelphia This Week & Weekend

1 Share

All aboard! This week’s list of things to do in Philadelphia flips the calendar to May — and festival season is coming at you like a train.

Philly’s stacked with festivals this week, including the return of beloved faves like the Rittenhouse Row Spring Festival (Saturday), the South Street Spring Fest (Saturday), the Fishtown Music & Arts Festival (Saturday), and the Chestnut Hill Home + Garden Festival (Sunday). Whew!

Cheer on tens of thousands of runners as they lace up for one of Philly’s most iconic races: the annual Independence Blue Cross Broad Street Run (Sunday).

And Romeo and Juliet gets reimagined at the Academy of Music (opens Thursday), and it’s your last chance to experience Philly Theatre Week (through Sunday).

Plus, it’s both Asian American and Pacific Islander Heritage Month and Jewish American Heritage Month all throughout May, with events at Shofuso Japanese Cultural Center, the Weitzman National Museum of American Jewish History and more.

It’s Philly’s biggest year yet! Make the most of it by booking the Visit Philly Overnight Package, which comes with free hotel parking and complimentary tickets to some of the most popular attractions in each of Greater Philadelphia’s five counties including Universal Theme Parks: The Exhibition at The Franklin Institute, the Mercer Museum in Bucks County, Longwood Gardens in Chester County, the Brandywine Museum of Art in Delaware County and Elmwood Park Zoo in Montgomery County.
Below, find the best things to do in Philadelphia this week and weekend, April 27 to May 3, 2026.
Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

Week in Review: Most popular stories on GeekWire for the week of April 19, 2026

1 Share

Get caught up on the latest technology and startup news from the past week. Here are the most popular stories on GeekWire for the week of April 19, 2026.

Sign up to receive these updates every Sunday in your inbox by subscribing to our GeekWire Weekly email newsletter.

Most popular stories on GeekWire

Opinion: Which capitalism are we defending?

Seattle venture capitalist Nick Hanauer responds to Chris DeVore’s recent call for Democrats to embrace capitalism, arguing that the real question isn’t whether to support free markets but which version of capitalism America should be building. … Read More

Read the whole story
alvinashcraft
10 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories