Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
152193 stories
·
33 followers

Is AI Safe Enough for Safety-critical

1 Share

There’s a lot being said and done about AI in automotive. One of the challenges is the use of AI itself to work inside a vehicle, another is using AI tools to test and potentially fix code issues for use in safety-critical. It turns out the second category is something that is viable. Read my recent article “Accelerating MISRA compliance using AI-augmented static analysis” in Electronics World.

Whimsical robot testing a car

Using AI to test automotive software

 

Is AI Safe Enough for Safety-critical originally appeared on Code Curmudgeon on April 21, 2026.

The post Is AI Safe Enough for Safety-critical appeared first on Code Curmudgeon.

Read the whole story
alvinashcraft
30 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Agent Sprawl Is Here. Your IaC Platform Is the Answer.

1 Share

Somewhere in your company right now, a developer is building an AI agent. Maybe it’s a release agent that cuts tags when tests pass. Maybe it’s a cost agent that shuts down idle EC2 overnight. It’s running, it’s in production, and there’s a decent chance the platform team doesn’t know it exists.

This isn’t a thought experiment. OutSystems just surveyed 1,900 IT leaders and the numbers are rough: 96% of enterprises run AI agents in production today, 94% say the sprawl is becoming a real security problem, and only 12% have any central way to manage it. Twelve percent. You can read the full report here.

The real question is where those agents run. Inside the platform you’ve already built, or somewhere off to the side where nobody on the platform team can see them.

The new platform tension

Platform teams have always had two jobs that pull in opposite directions. Let developers ship without waiting on a ticket. Keep the infrastructure coherent while they do. Golden paths, review stacks, a catalog of components that don’t fight each other.

Agents break the second half of that deal.

A developer with a sharp prompt can spin up an SRE agent that watches a queue, a release agent that cuts tags when the test suite goes green, or a cost agent that kills idle infra at 2 a.m. That’s useful. It’s also running on your production cloud account, using credentials you never provisioned, writing to systems you never approved, and the only audit trail is whatever the developer remembered to log. The Salesforce 2026 Connectivity Benchmark pegs the average enterprise at twelve agents today, projected to grow 67% over the next two years. Most teams aren’t ready for one, let alone twenty.

This is the same shape as every sprawl problem before it. I wrote about the last one in How Secrets Sprawl Is Slowing You Down, and the pattern keeps repeating. When something useful gets cheap, it proliferates. When it proliferates without structure, it becomes a liability.

The clock is also ticking on the compliance side. The EU AI Act’s high-risk obligations kick in on 2 August 2026. Colorado’s AI Act goes live on 30 June 2026 after last year’s delay. A folder of unreviewed agent scripts isn’t going to hold up against either of those.

Three ways to respond (only one of them works)

There are roughly three paths from here.

Three platform responses to agent sprawl: do nothing leads to unmanaged sprawl, mandating centralization creates developer friction, and making the platform the obvious path produces voluntary adoption

Do nothing. Accept the sprawl and hope nothing catches fire. This is the default, and it’s also how you end up explaining to an auditor why some finance agent moved data between three systems last Thursday and nobody remembers which prompt triggered it.

Mandate centralization. Tell developers every agent has to be registered and approved before it runs. This sounds responsible on a slide, and it falls apart inside a sprint. Developers route around friction. If the official path takes a week and the unofficial path takes an afternoon, the unofficial path wins, and you’ve just pushed the sprawl underground where you can’t see it anymore.

Make the platform the obvious path. Build the thing developers actually want to use. A place where an agent inherits the guardrails, credentials, policies, and audit trail by default, because that’s what’s on offer. Adoption becomes a side effect of shipping something good.

Option three is the only one that scales. It’s also the one where most platform teams look at their existing stack and assume they need to build a pile of new scaffolding. I don’t think they do, and the rest of this post is why.

The seven things an AI agent needs from your platform

An agent needs seven concrete things from the platform it runs on. Each one maps to a Pulumi primitive you already own.

Seven things an AI agent needs from your platform mapped to their Pulumi primitives: context to state graph and Neo, integrations to providers and ESC, actions to Deployments and Automation API, policy to CrossGuard, audit to Pulumi Cloud activity log, review to pulumi preview and review stacks, approval to Deployments approvals and Neo

1. A trustworthy context lake

Agents are only as good as the context they can reason over. Drop a generic LLM into your cloud account and you’ll get plausible-sounding nonsense, because the model has never seen your environment. What you actually need is a grounded source of truth: what resources exist, how they relate, which stack owns what, which version is running where.

Pulumi state is already that. Your program graph, your stack outputs, your resource metadata, all of it adds up to a structured record of what you’ve actually deployed. Pulumi Neo reasons directly over that graph, which is why it can tell you why a deployment drifted instead of guessing. I wrote the long version of that argument there. Short version: you already have the context lake. Point agents at it.

2. Pre-cleared integrations

An agent that needs to touch five systems shouldn’t need five separate credential dances. That’s where credential sprawl starts. Every agent gets a long-lived key, every key ends up in somebody’s .env, and every rotation turns into an incident.

The Pulumi surface here is the 200+ providers plus Pulumi ESC handling dynamic credentials through OIDC. An agent doesn’t ask for an AWS access key. It asks ESC for a short-lived, scoped token bound to the environment it’s allowed to operate in, and the token expires when the task ends. No static keys, no rotation pain, no awkward postmortem about how something got committed to GitHub. The ESC patterns I walked through in the Claude skills post work just as well for an autonomous agent as they do for a human developer, which is really the whole point.

3. Governed actions

There’s a real difference between “an agent can see your infrastructure” and “an agent can change your infrastructure.” The second one is where you actually need structure. Pulumi Deployments gives you that structure: defined workflows, controlled triggers, running inside your Pulumi Cloud boundary instead of whatever environment the developer happened to spin up. The Automation API lets you build higher-order orchestration on the same primitives your developers already use.

The framing I keep coming back to goes like this. An agent shouldn’t call pulumi up directly. It should submit an action to a governed pipeline that runs pulumi up on its behalf, inside an environment you control, with a log trail and the guardrails already in place. Same effect, very different threat model.

4. Deterministic policy

Real governance lives outside the prompt. “Please don’t delete production” is a wish written into a system prompt, not an enforced control. And when an agent overrides your intent to do what it thought you meant, it’s behaving exactly the way the technology was designed to behave.

Pulumi Policies is the answer the IaC community landed on years ago: policy as code, written in a real programming language, evaluated deterministically at preview and update time. Disallow production RDS deletions. Require encryption at rest. Block S3 buckets with public ACLs. An agent running through Pulumi hits those gates whether it “wants” to or not, because the gates live in the pipeline and not in the prompt. This is the pillar most teams underweight, and it’s the first one most auditors ask about.

5. An audit trail

When something goes wrong at 3 a.m. (and with enough agents running, something will), you need answers fast. What changed, who changed it, and why. Not just “which agent,” but which version of which agent, triggered by what event, authorized by which policy, touching which resources.

Pulumi Cloud’s activity log, the stack update history, and ESC audit logs already capture all of that. Every update is versioned. Every secret access is logged. Every policy evaluation is recorded. When an agent submits a change through your Pulumi pipeline, it inherits that audit surface for free. The alternative is reconstructing an incident from a mix of Slack messages, container logs, and developer memory, which is roughly the state most teams without a platform are in today.

6. A review process

Not every agent action should wait for a human. But agents do need a promotion path, the same way new platform components do. Experimental, then reviewed, then trusted, then autonomous. That’s exactly what pulumi preview, review stacks, and Deployments PR workflows already model for human contributors. An agent that wants to make a change should have to submit it the same way a junior engineer would. As a diff, with a plan, against a preview environment, until it earns the trust to skip steps.

This connects back to the pattern I laid out in Golden Paths: Infrastructure Components and Templates. Golden paths were never only for humans. They’re just paths, and agents can walk them too.

7. Human-in-the-loop approval

The last pillar is the one that keeps the other six honest. Some decisions shouldn’t be automated, full stop. Production rollbacks outside business hours. Destructive changes above a certain blast-radius threshold. Anything that touches a regulated data boundary. For those cases, you want a forced human checkpoint that the agent can’t route around.

Pulumi Deployments approvals already play that role for human changes. Pulumi Neo’s review steps add the AI-aware version: a structured plan, a diff, a named approver, and a record of what they decided and why. I walked through what this looks like in practice in Self-Verifying AI Agents. Short version: an agent that proposes is much safer than an agent that commits.

Why IaC is the natural substrate for this

Step back from the seven pillars and look at what they have in common. Context, integrations, governed actions, deterministic policy, audit, review, approval. None of those are new problems that AI agents invented. They’re the problems infrastructure-as-code has been quietly solving for a decade, for human developers.

Every meaningful agent action ends up being a change, whether that’s to infrastructure, configuration, secrets, or state. IaC is the one layer in your stack that already treats change as the unit of work. Plan, preview, apply, record. If you want governance for agents and you don’t want to build it twice, the most efficient move is to route agent changes through the same substrate your humans already use.

I made the same point from a different angle in Token Efficiency vs Cognitive Efficiency: Choosing IaC for AI Agents. An IaC platform that models your world as a graph of typed resources is a much better reasoning surface for an agent than a stack of YAML or a bash script somebody wrote on a Friday. The structure is what makes it work.

What this means for the platform engineer

There’s a narrative floating around that AI is going to make platform engineers less relevant. I haven’t seen it hold up against an actual production environment. Every stat I’ve looked at points the other way. Gartner expects 70% of enterprises to deploy agentic AI as part of IT infrastructure and operations by 2029, up from less than 5% in 2025. LangChain’s State of Agent Engineering report already has 57% of teams running agents in production today. And Gartner projects that 80% of large software engineering orgs will have a platform team by end of 2026, up from 45% in 2022. More agents means more changes, more changes means more blast radius, and more blast radius means more need for the thing platform teams are uniquely equipped to provide.

Your classic responsibilities haven’t gone anywhere either. Golden paths, service catalogs, CI/CD, on-call rotations, all of that is still yours. Agents are an additional layer that needs the same discipline. The upside is that if your platform already runs on a mature IaC surface, you’re extending a muscle you’ve been building for years instead of growing a new one.

The developer-facing side matters too. A developer building an agent needs to know what’s available to them, needs templates that work on the first try, and needs to see what teammates have already built so they don’t start from a blank page. That’s the territory the Claude skills post and IDP Strategy: Self-Service Infrastructure That Balances Autonomy With Control cover. That’s the experience layer that makes developers actually choose your platform instead of routing around it. You need both sides working at once. The governance your security team cares about, and the experience your developers will actually reach for.

Close the window

The agents your developers are shipping this week are going to outlive the experiment that started them. Some of them will become critical. At least one will cause an incident. At least one will eventually show up in an audit. All of them are going to be easier to govern if they were built on your platform from day one than if you try to wrap policy around them later.

If you want the longer view on where this is going, AI Predictions for 2026: A DevOps Engineer’s Guide is the companion piece. If you want the developer-facing version of the grounding argument, Grounded AI is what to read next.

Either way, here’s where I land. The substrate for agent governance is already running in your stack. You’ve been pointing it at human changes for years. Now point it at the agents too.

See how Pulumi Neo governs agent actions
Read the whole story
alvinashcraft
57 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Setting Up Claude Code Agent Teams With Wsl2 and Tmux on Windows

1 Share

Claude Code’s agent teams feature lets you spin up multiple Claude instances that work in parallel, each in its own visible terminal pane. On Windows, visualizing this properly requires WSL2 and tmux — here’s how to set it up from scratch.

Prerequisites

  • Windows 10/11 with WSL2 available
  • A Claude Code subscription (Pro or Max) with Opus 4.6 access
  • Your repos somewhere accessible (e.g. C:\dev\github\reponame)

Step 1 — Enable WSL2 and Install Ubuntu

Open an elevated PowerShell and run:

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

Getting Started with dbWatch

1 Share

Explore dbWatch, a powerful tool for database administrators that simplifies monitoring and managing databases across platforms.

The post Getting Started with dbWatch appeared first on MSSQLTips.com.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

Orchestration Is All You Need

1 Share

What agentic engineering looks like when you really mean it.

In the last year, as our AI software engineering tools have evolved, so have our software engineering processes. We have learned to take advantage of the tools to cut down the latency of our SDLC and CI/CD processes. We’ve already learned that if we’re not careful, we’re going to vibe code AI slop,” the unchecked code and configuration that an AI coding agent generates when run without guardrails.

The kind of guardrails I mean come straight out of the software engineering 101 playbook:

  • Getting feedback from multiple points of view
  • Checking technical designs for completeness, consistency and accuracy
  • Comprehensive unit, UI, integration, performance and security test suites
  • Thorough error instrumentation and run-time monitoring in operation
  • Human taste and judgment as the definition of done”

Using agents for what they’re good at (coding, checking, testing, iterating) while letting the humans set the bar for quality elevates your process from it works on my machine” and defines the agentic engineering” discipline. You can use agents interactively to get the quality you’re looking for, if you’re careful to apply the guardrails. But if you want to build automated pipelines that require less and less human interaction over time, you’re going to have to start with something called harness engineering.

Harness engineering” is the act of gathering together the underlying LLM with the tools and context (data, instructions, information about the environment) into a program optimized for a specific task. Each one of the CLI coding agents itself is a harness that’s optimized for getting the most out of the underlying LLM from a latency, throughput and quality point of view, packaged in a TUI with programmatic access built in.

And while it’s easy enough for a human to switch between tools from an interactive point of view, it’s real work to be able to switch between codex and opencode in your automated processes. While each of them provides the programmatical surface area you need to automate them, all agents are different in their invocation arguments, service endpoints, etc. To solve that problem, we need harness engineering again, but one level of abstraction up.

As an example, the Gas City OSS project defines a provider protocol that we implement on top of each of the major (and several of the long tail) CLI coding agents to enable them to be swapped in and out of city, pack and formula definitions (more on that later):

// Pseudo-code for the canonical in-memory worker API.
type FactoryWorker interface {
    // Lifecycle
    Start(context.Context) error
    StartResolved(context.Context, string, runtime.Config) error
    Attach(context.Context) error
    Create(context.Context, CreateMode) (sessionpkg.Info, error)
    Reset(context.Context) error
    Stop(context.Context) error
    Kill(context.Context) error
    Close(context.Context) error
    Rename(context.Context, string) error
    State(context.Context) (State, error)

    // Messaging
    Message(context.Context, MessageRequest) (MessageResult, error)
    Interrupt(context.Context, InterruptRequest) error
    Nudge(context.Context, NudgeRequest) (NudgeResult, error)

    // Transcript and History
    History(context.Context, HistoryRequest) (*HistorySnapshot, error)
    Transcript(context.Context, TranscriptRequest) (*TranscriptResult, error)
    TranscriptPath(context.Context) (string, error)
    AgentMappings(context.Context) ([]AgentMapping, error)
    AgentTranscript(context.Context, string) (*AgentTranscriptResult, error)

    // ... and more!
}

Not only does this allow you to eliminate the programmatic differences that get in the way of your swapping yesterday’s best coding agent out for today’s best agent out of your automations, but it also allows you to easily take advantage of the variations between the models, e.g. cost, limits, quality, availability, suitability to a specific job (e.g. planning, coding, image processing).

The use of the provider harness in Gas City allows you to turn a CLI coding agent into a factory worker” that you can tailor to the specific job (“you’re an expert UX designer…”) and drop into your custom workflows (“write the plan, then get it reviewed by these other agents, then ask the UX designer agent to flesh out the jobs to be done, …”) that we call orchestrations.

orchestration all the way down

Reminding ourselves of human software engineering teams again, we have a series of workflows that human teams go through to get a product or feature out the door. For example, imagine an engineer that’s just written some code. Do they immediately release it to customers? No. They must solicit review feedback from their teammates first, dealing with any issues that arise. Only when they get the thumbs up are they allowed to merge new code into the main product codebase.

Now, if Eddie the Engineer is a human, he can do his coding, solicit feedback and iterate all on his own. Likewise, his teammates will get their notifications and do their part to keep the flow of software moving.

However, if Eddie is an AI coding agent, to do his part requires that he’s orchestrated,” e.g. that he’s been defined as an expert software engineer running on the codex provider using the GPT5.4 model. Once an agent is included in an orchestration, we can have as many instances of it as we want, each dedicated to execution on its own work item with its own agent Eddie instance.

Once each Eddie is done with his work, the orchestration engine will hand it off to as many LLMs as you decide to review the work, gathering that feedback to iterate on until Eddie’s code gets a clean bill of health.

This isn’t theoretical. The code review orchestration we use with Gas City combines a dynamic set of agents configured to focus on security, performance, React, etc. that are then spread across the codex, gemini and claude providers so that we get a wide range of review feedback to iterate on. We know from experience that this greatly improves the quality of our code (and plans and tests and …).

Every part of our human software engineering processes can be turned into an agentic orchestration. The artifact-review-iterate process is the same for engineers, PMs, UX designers, etc. And those processes string together into the overall software process itself.

The factory worker harnesses allow you to bring the CLI coding agents into your orchestrations, but it’s the orchestrations that allow you to spread the work across as many agents as you can spin up.

embracing the bitter lesson

There’s a tempting alternative to all of this: just wait for the models to get good enough that you don’t need orchestrations. Eventually, the thinking goes, GPT-7 or Claude 6 or Gemini Ultra will be capable enough that you can hand it your repo and your customer feedback and walk away.

This is the wrong side of the bitter lesson. Rich Sutton’s observation was that general methods that leverage compute have always eventually beaten methods that leverage human-encoded domain knowledge. Waiting for the perfect model is turning away from that lesson. It’s a bet that compute will eventually stop mattering, that there’s a model capability threshold past which the apparatus becomes unnecessary. And while the singularity is by definition the horizon beyond which we cannot predict the future, the past is pretty clear about the benefits of being on the right side of that lesson.

A software factory is on the right side. Orchestrations are how you turn more compute into better output, indefinitely. Five agents from different providers providing feedback is how you broaden your range of feedback no matter how good any specific agent’s response is. A better model just makes the orchestration better.

And now let’s talk about how you might build the software factory to run your orchestrations.

Gas City: build your software factory

Like Steve Yegge’s Gas Town that inspired it, Gas City provides the harness that turns CLI coding agents from different vendors into factory workers that can be plugged into formulas” (orchestrations) to execute beads” (work) against multiple rigs” (projects).

Another name for an agentic orchestration system is a software factory” or dark factory” (so named because you don’t need humans to run it). Gas City provides the pieces — the agentic harness, the orchestration engine, the automation engine — and lets you configure the software factory to match the way that you want to work.

Want to start from scratch to build your factory alongside your software? You can. Want to have a fully featured Gas Town out of the box? Gas City provides that as a pre-packaged pack” that you can choose at initialization time. Want to define your own pack that defines your favorite combo of agents, formulas (orchestrations) and orders (automatically triggered scripts or formulas) and share it with a team? That’s exactly what packs are for.

In fact, we’ve engineered Gas City such that any orchestration system you can think of — Ralph Loops, StrongDM Software Factories, Anthropic Agent Teams — can be captured in a pack. We’ve already started to gather some of them in the gascity-packs repo with more to come.

Those packs come from our own use of Gas City to build Gas City, as you might expect. On a single day, we were able to merge 74 PRs. Of course, the only way we can handle that kind of volume is by using Gas City itself.

But what surprised us wasn’t just that we are using Gas City for implementation tasks. We’ve also built issue triage, deployment and operations. We’ve hired 15 agentic employees to help us run Gas City, Inc, ranging from marketing to release management. And, leaning on the techniques in OpenClaw, each of them has a personality and the ability to chat with us via Discord.

What we’re finding is that not only does Gas City evolve based on bug reports and feature requests, but the Gas City software factory itself evolves to meet the needs of the software. The more we push on the limits of what orchestrated agents can do, the more we realize that we’re just getting started!

getting involved

The Gas Town ecosystem has grown quite a bit in popularity since Steve Yegge submitted the first commit to the Beads project in October. Together, Beads, Gas Town and Gas City have gathered more than 35,000 stars, 5000 PRs and 2500 issues from more than 500 contributors.

In addition, we’ve got The Wasteland, which is our burgeoning set of federated Gas Towns and Cities for centralized management of distributed work. And we’ve got a hosted version of Gas City coming soon.

Of course, we’ve also got Gas City Hall, the community for all things in Gaslandia, including the Discord server, the social news feed, the docs and a link to the repo. Or perhaps you’d like some hands-on training? Then sign up for Software Factory Intensive, a two-day workshop from our partners at Actual.ai on how to build your very own software factory with Gas City.

where are we

It might not have sunk in yet, but your job has changed. Instead of building the software, you’re going to be building the factory that builds the software. And reviews it. And validates it. And deploys and operates and maintains it.

But the agents still need you to define what done” looks like. Encode those definitions into reusable orchestrations, spinning up as many as you need, and let the factory carry the work your team used to carry by hand.

Start with Gas City. Plug in the Gas Town pack or build your own from scratch. The ecosystem is open source, the community is growing, and we’re using it ourselves every day to build the thing you’re about to install.

Go build your factory.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

v2026.4.21

1 Share

OpenClaw 2026.4.21

Read the whole story
alvinashcraft
2 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories