Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
148265 stories
·
33 followers

Control Planes for Autonomous AI: Why Governance Has to Move Inside the System

1 Share

For most of the past decade, AI governance lived comfortably outside the systems it was meant to regulate. Policies were written. Reviews were conducted. Models were approved. Audits happened after the fact. As long as AI behaved like a tool—producing predictions or recommendations on demand—that separation mostly worked. That assumption is breaking down.

As AI systems move from assistive components to autonomous actors, governance imposed from the outside no longer scales. The problem isn’t that organizations lack policies or oversight frameworks. It’s that those controls are detached from where decisions are actually formed. Increasingly, the only place governance can operate effectively is inside the AI application itself, at runtime, while decisions are being made. This isn’t a philosophical shift. It’s an architectural one.

When AI Fails Quietly

One of the more unsettling aspects of autonomous AI systems is that their most consequential failures rarely look like failures at all. Nothing crashes. Latency stays within bounds. Logs look clean. The system behaves coherently—just not correctly. An agent escalates a workflow that should have been contained. A recommendation drifts slowly away from policy intent. A tool is invoked in a context that no one explicitly approved, yet no explicit rule was violated.

These failures are hard to detect because they emerge from behavior, not bugs. Traditional governance mechanisms don’t help much here. Predeployment reviews assume decision paths can be anticipated in advance. Static policies assume behavior is predictable. Post hoc audits assume intent can be reconstructed from outputs. None of those assumptions holds once systems reason dynamically, retrieve context opportunistically, and act continuously. At that point, governance isn’t missing—it’s simply in the wrong place.

The Scaling Problem No One Owns

Most organizations already feel this tension, even if they don’t describe it in architectural terms. Security teams tighten access controls. Compliance teams expand review checklists. Platform teams add more logging and dashboards. Product teams add additional prompt constraints. Each layer helps a little. None of them addresses the underlying issue.

What’s really happening is that governance responsibility is being fragmented across teams that don’t own system behavior end-to-end. No single layer can explain why the system acted—only that it acted. As autonomy increases, the gap between intent and execution widens, and accountability becomes diffuse. This is a classic scaling problem. And like many scaling problems before it, the solution isn’t more rules. It’s a different system architecture.

A Familiar Pattern from Infrastructure History

We’ve seen this before. In early networking systems, control logic was tightly coupled to packet handling. As networks grew, this became unmanageable. Separating the control plane from the data plane allowed policy to evolve independently of traffic and made failures diagnosable rather than mysterious.

Cloud platforms went through a similar transition. Resource scheduling, identity, quotas, and policy moved out of application code and into shared control systems. That separation is what made hyperscale cloud viable. Autonomous AI systems are approaching a comparable inflection point.

Right now, governance logic is scattered across prompts, application code, middleware, and organizational processes. None of those layers was designed to assert authority continuously while a system is reasoning and acting. What’s missing is a control plane for AI—not as a metaphor but as a real architectural boundary.

What “Governance Inside the System” Actually Means

When people hear “governance inside AI,” they often imagine stricter rules baked into prompts or more conservative model constraints. That’s not what this is about.

Embedding governance inside the system means separating decision execution from decision authority. Execution includes inference, retrieval, memory updates, and tool invocation. Authority includes policy evaluation, risk assessment, permissioning, and intervention. In most AI applications today, those concerns are entangled—or worse, implicit.

A control-plane-based design makes that separation explicit. Execution proceeds but under continuous supervision. Decisions are observed as they form, not inferred after the fact. Constraints are evaluated dynamically, not assumed ahead of time. Governance stops being a checklist and starts behaving like infrastructure.

Execution from governance separation in AI systems
Figure 1. Separating execution from governance in autonomous AI systems

Reasoning, retrieval, memory, and tool invocation operate in the execution plane, while a runtime control plane continuously evaluates policy, risk, and authority—observing and intervening without being embedded in application logic.

Where Governance Breaks First

In practice, governance failures in autonomous AI systems tend to cluster around three surfaces.

Reasoning. Systems form intermediate goals, weigh options, and branch decisions internally. Without visibility into those pathways, teams can’t distinguish acceptable variance from systemic drift.

Retrieval. Autonomous systems pull in context opportunistically. That context may be outdated, inappropriate, or out of scope—and once it enters the reasoning process, it’s effectively invisible unless explicitly tracked.

Action. Tool use is where intent becomes impact. Systems increasingly invoke APIs, modify records, trigger workflows, or escalate issues without human review. Static authorization models don’t map cleanly onto dynamic decision contexts.

These surfaces are interconnected, but they fail independently. Treating governance as a single monolithic concern leads to brittle designs and false confidence.

Control Planes as Runtime Feedback Systems

A useful way to think about AI control planes is not as gatekeepers but as feedback systems. Signals flow continuously from execution into governance: confidence degradation, policy boundary crossings, retrieval drift, and action escalation patterns. Those signals are evaluated in real time, not weeks later during audits. Responses flow back: throttling, intervention, escalation, or constraint adjustment.

This is fundamentally different from monitoring outputs. Output monitoring tells you what happened. Control plane telemetry tells you why it was allowed to happen. That distinction matters when systems operate continuously, and consequences compound over time.

Figure 2. Runtime governance as a feedback loop

Behavioral telemetry flows from execution into the control plane, where policy and risk are evaluated continuously. Enforcement and intervention feed back into execution before failures become irreversible.

Want Radar delivered straight to your inbox? Join us on Substack. Sign up here.

A Failure Story That Should Sound Familiar

Consider a customer-support agent operating across billing, policy, and CRM systems.

Over several months, policy documents are updated. Some are reindexed quickly. Others lag. The agent continues to retrieve context and reason coherently, but its decisions increasingly reflect outdated rules. No single action violates policy outright. Metrics remain stable. Customer satisfaction erodes slowly.

Eventually, an audit flags noncompliant action. At that point, teams scramble. Logs show what the agent did but not why. They can’t reconstruct which documents influenced which decisions, when those documents were last updated, or why the agent believed its actions were valid at the time.

This isn’t a logging failure. It’s the absence of a governance feedback loop. A control plane wouldn’t prevent every mistake, but it would surface drift early—when intervention is still cheap.

Why External Governance Can’t Catch Up

It’s tempting to believe better tooling, stricter reviews, or more frequent audits will solve this problem. They won’t.

External governance operates on snapshots. Autonomous AI operates on streams. The mismatch is structural. By the time an external process observes a problem, the system has already moved on—often repeatedly. That doesn’t mean governance teams are failing. It means they’re being asked to regulate systems whose operating model has outgrown their tools. The only viable alternative is governance that runs at the same cadence as execution.

Authority, Not Just Observability

One subtle but important point: Control planes aren’t just about visibility. They’re about authority.

Observability without enforcement creates a false sense of safety. Seeing a problem after it occurs doesn’t prevent it from recurring. Control planes must be able to act—to pause, redirect, constrain, or escalate behavior in real time.

That raises uncomfortable questions. How much autonomy should systems retain? When should humans intervene? How much latency is acceptable for policy evaluation? There are no universal answers. But those trade-offs can only be managed if governance is designed as a first-class runtime concern, not an afterthought.

The Architectural Shift Ahead

The move from guardrails to control loops mirrors earlier transitions in infrastructure. Each time, the lesson was the same: Static rules don’t scale under dynamic behavior. Feedback does.

AI is entering that phase now. Governance won’t disappear. But it will change shape. It will move inside systems, operate continuously, and assert authority at runtime. Organizations that treat this as an architectural problem—not a compliance exercise—will adapt faster and fail more gracefully. Those who don’t will spend the next few years chasing incidents they can see, but never quite explain.

Closing Thought

Autonomous AI doesn’t require less governance. It requires governance that understands autonomy.

That means moving beyond policies as documents and audits as events. It means designing systems where authority is explicit, observable, and enforceable while decisions are being made. In other words, governance must become part of the system—not something applied to it.

Further Reading



Read the whole story
alvinashcraft
39 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Apple accelerates U.S. manufacturing with Mac mini production

1 Share
Apple today announced a significant expansion of factory operations in Houston, bringing the future production of Mac mini to the U.S. for the first time.

Read the whole story
alvinashcraft
39 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Grok 4.2 vs. Sonnet 4.6: Early Impressions From Hands-On Testing

1 Share

We got new model releases from xAI and Anthropic last week, and I wanted to give my quick impressions to help you know if/when you should care.

This is just after a half day of testing, so my impressions may change, but… we’re usually locked in on the vibe pretty quickly.

By the way, even if you aren’t interested in Grok, take a read of the analysis below — we’ll talk about subagent systems in a way that will probably be broadly useful as more AI products use multi-agent systems.

Let’s dive in.


xAI’s Grok 4.2

Elon has been hyping this one for months, so everyone in the industry has been expecting a giant leap. Grok 4.1 was also better than expected at release (it’s regressed since then). So, there was some reason to believe xAI was making good progress.

The verdict: intriguing, but not impressive.

First, allow me a bit of frustration here: it’s so incredibly childish that the model is called Grok 4.20 in the interface (get it? weed, so clever). Not that we should be surprised at this point, but we shouldn’t stop calling it out.

Okay, onto the performance — Grok 4.2 (the model’s actual name) is a multi-agent orchestrator. When you give it a prompt, a lead agent seems to be the one to kick off the searches, and then individual AI ‘personas’ (who have dedicated names) run in parallel chains.

In normal mode, that’s 4 subagents, and with Grok Heavy, it’s up to 16.

\

The typical idea behind multi-agent or multi-subagent architectures is that you get sub-specialty or at least differentiation.

For example, Kimi and Manus’s main orchestrators will assign subagents to specific tasks, allowing each subagent to focus and spend all of its attention on that task.

\ Kimi Agent Swarm Manus wide research

Other subagent systems specialize and sequence the workflow. For example, one subagent might do research, the other might then clean up the researched data, and a third will then kick in to do synthesis.

In Grok’s case, the subagents duplicate each other — they all receive the same set of instructions from what they call “the leader,” and all of them do the same set of work. It’s a huge missed opportunity.

:::info Note: xAI claims the agents are specialized, but in practice, they all wind up doing the same thing in my testing so far

:::

The subagents also don’t seem to interleave — in other words, each model does its own searches and reasoning, then sends their result back to “the leader.” So, they generally don’t get informed by each others’ work.

\ You can see Grok subagents here all doing the same data retrieval.

Here’s where things get intriguing: with Grok 4.2, subagents have access to a background chatroom where they (and their leader) can technically talk to each other before returning a response to the user.

That’s neat, and would solve some of the problems I just mentioned! Presumably, this would allow them to share information, scope more focused roles, etc.

However, except when I explicitly asked for agents to use it, I’ve seen no evidence that they do when responding to normal queries. Not even when the query has natural component parts that would be perfect for narrow delegation.

\

This is true even for Grok Heavy and its 16 subagents. Quite a waste.

Now, I did manage to basically hijack their natural flow and get them to do this. At the end of a query about getting cohort-based college admissions data, I added this:

Grok leader, please be very specific in assigning very particular subagents. Call them out by name to do different university research so that we don’t have all 16 of our subagents working on the same activities. Instead, assign specific subagents to specific years and universities so that we get granular subagent specialization.

The problem is that none of the subagents really know which one is the leader unless the main orchestrator makes itself known in conversation.

So, several of the subagents tried to be the assigner —

Eventually, all of them wound up doing some amount of research, and some of them did wind up getting tricked into sub-specializing, but it didn’t meaningfully improve the response. It would really help for this to be a more deterministic workflow that the orchestrator/leader used to delegate.

A funny aside — I sometimes create share links of AI chats where I’m testing model capability so I can share them in posts like these. Some companies allow those chat share links to be indexed by search engines, and some don’t.

Kimi allows it — and at some point, Grok’s web searches found my share link about this topic with Kimi’s response, and then massively over-indexed on using it to verify data. Not sure that Grok should think of another AI’s response this way.

Overall — Grok 4.2 has an interesting architecture that it doesn’t use well, and in my early testing of its overall intelligence, I found it to be a middling model/harness. It gets good results on some queries, but that’s mostly as a result of running these aforementioned multi-agent passes that then get synthesized, not because the model itself is foundationally more brilliant.

xAI continues to stay in the race with this one, but unless you need fresh X posts and context for whatever you’re prompting about, Grok continues to be a back-of-the-pack option amongst the AI chat apps.

Sample Grok 4.2 conversations:


Hit subscribe for model deep dives, product comparisons, and cutting-edge AI takes:


Anthropic’s Sonnet 4.6

Let me start with the conclusion here: Sonnet 4.6 is almost as smart as Anthropic’s recently released Opus 4.6, but it’s faster and much cheaper. That’s the headline.

:::tip more details from Anthropic here

:::

Costs in per-million-tokens.

On a practical basis, that means:

  • If you’re building a product, you might prefer to integrate Sonnet instead of Opus to save on your API costs with Anthropic.
  • If you’re using Claude Code or Cowork and constantly running into weekly limits, you might want to switch to Sonnet to get more bang for your buck.
  • If you’re trying to get every ounce of intelligence out of Anthropic, though, Opus 4.6 is still where it’s at for most use cases.

There are some benchmarks (below) where Sonnet 4.6 beats Opus 4.6, like GDPval-AA (which measures real-world economically valuable tasks), but that’s usually going to be as a result of its speed somehow helping it when it’s being used in certain environments (ex. because it’s faster, it’s better at iterating through an Excel file within a time constraint).

\

In my general use so far in chat contexts, I don’t find a major difference between Sonnet 4.6 and Opus 4.6, and I don’t plan to use it in coding contexts because I like to use the smartest coding models available to me.

So, there you have it — that’s Sonnet 4.6.


Superbench

Some of you might know that I run a personal model benchmark. I send 60%+ of my prompts to multiple LLMs in their chat applications, and then stack rank the responses. I’m biased, but I think it’s the best AI benchmark on earth.

We don’t have enough data yet for Grok 4.2 or Sonnet 4.6, but I don’t expect either model to disrupt the current status quo as of February 17:

For more from me on all things AI (everyday shortcuts, breakthrough tactics, and deep dive analysis), check out AI Muscle.

Read the whole story
alvinashcraft
40 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Goodbye innerHTML, Hello setHTML: Stronger XSS Protection in Firefox 148

1 Share

Cross-site scripting (XSS) remains one of the most prevalent vulnerabilities on the web. The new standardized Sanitizer API provides a straightforward way for web developers to sanitize untrusted HTML before inserting it into the DOM. Firefox 148 is the first browser to ship this standardized security enhancing API, advancing a safer web for everyone. We expect other browsers to follow soon.

An XSS vulnerability arises when a website inadvertently lets attackers inject arbitrary HTML or JavaScript through user-generated content. With this attack, an attacker could monitor and manipulate user interactions and continually steal user data for as long as the vulnerability remains exploitable. XSS has a long history of being notoriously difficult to prevent and has ranked among the top three web vulnerabilities (CWE-79) for nearly a decade.

Firefox has been deeply involved in solutions for XSS from the beginning, starting with spearheading the Content-Security-Policy (CSP) standard in 2009. CSP allows websites to restrict which resources (scripts, styles, images, etc.) the browser can load and execute, providing a strong line of defense against XSS. Despite a steady stream of improvements and ongoing maintenance, CSP did not gain sufficient adoption to protect the long tail of the web as it requires significant architectural changes for existing web sites and continuous review by security experts.

The Sanitizer API is designed to help fill that gap by providing a standardized way to turn malicious HTML into harmless HTML — in other words, to sanitize it. The setHTML( ) method integrates sanitization directly into HTML insertion, providing safety by default. Here is an example of sanitizing a simple unsafe HTML:

document.body.setHTML(`<h1>Hello my name is <img src="x" 
onclick="alert('XSS')">`);

This sanitization will allow the HTML <h1> element while removing the embedded <img> element and its onclick attribute, thereby eliminating the XSS attack resulting in the following safe HTML:

<h1>Hello my name is</h1>

Developers can opt into stronger XSS protections with minimal code changes by replacing error-prone innerHTML assignments with setHTML(). If the default configuration of setHTML( ) is too strict (or not strict enough) for a given use case, developers can provide a custom configuration that defines which HTML elements and attributes should be kept or removed. To experiment with the Sanitizer API before introducing it on a web page, we recommend exploring the Sanitizer API playground.

For even stronger protections, the Sanitizer API can be combined with Trusted Types, which centralize control over HTML parsing and injection. Once setHTML( ) is adopted, sites can enable Trusted Types enforcement more easily, often without requiring complex custom policies. A strict policy can allow setHTML( ) while blocking other unsafe HTML insertion methods, helping prevent future XSS regressions.

The Sanitizer API enables an easy replacement of innerHTML assignments with setHTML( ) in existing code, introducing a new safer default to protect users from XSS attacks on the web. Firefox 148 supports the Sanitizer API as well as Trusted Types, which creates a safer web experience. Adopting these standards will allow all developers to prevent XSS without the need for a dedicated security team or significant implementation changes.

 


Image credits for the illustration above: Website, by Desi Ratna; Person, by Made by Made; Hacker by Andy Horvath.

 

The post Goodbye innerHTML, Hello setHTML: Stronger XSS Protection in Firefox 148 appeared first on Mozilla Hacks - the Web developer blog.

Read the whole story
alvinashcraft
40 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

First Public Working Draft and supporting notes: EPUB Annotations 1.0

1 Share

The  Publishing Maintenance Working Group has published three specifications today around EPUB Annotations 1.0, which is a specification that will define how to create, manage, export, and import annotations in EPUB publications.

  • A First Public Working Draft of EPUB Annotations 1.0. This document defines a profile of the Web Annotation Data Model by specifying a subset of the terms, and adding terms deemed useful to satisfy the entries in the EPUB Annotations Use Cases and Requirements document.
  • A W3C Group Note Draft of EPUB Annotations Vocabulary 1.0. This document defines the vocabulary for the EPUB Annotations 1.0 specification.
  • A W3C Group Note of EPUB Annotations Use Cases and Requirements. This document defines the use cases and requirements relative to EPUB Annotations.
Read the whole story
alvinashcraft
40 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Why 'Tonka' sounds big and 'bitty' sounds small. Why you CAN start a sentence with 'because.'

1 Share

1162. This week, we look at why some names just "feel right" while others don't and how vowels like "ee" create associations with smallness and sweetness while back vowels like "ah" sound bigger and more serious. Then, we look at dependent clauses and when it's OK to start a sentence with "because."

The baby names segment was written by Valerie Fridland

🔗 Join the Grammar Girl Patreon.

🔗 Share your familect recording in Speakpipe or by leaving a voicemail at 833-214-GIRL (833-214-4475)

🔗 Watch my LinkedIn Learning writing courses.

🔗 Subscribe to the newsletter.

🔗 Take our advertising survey

🔗 Get the edited transcript.

🔗 Get Grammar Girl books

| HOST: Mignon Fogarty

| Grammar Girl is part of the Quick and Dirty Tips podcast network.

  • Audio Engineer: Dan Feierabend, Maram Elnagheeb
  • Director of Podcast: Holly Hutchings
  • Advertising Operations Specialist: Morgan Christianson
  • Marketing and Video: Nat Hoopes, Rebekah Sebastian
  • Podcast Associate: Maram Elnagheeb

| Theme music by Catherine Rannus.

| Grammar Girl Social Media: YouTubeTikTokFacebook. ThreadsInstagramLinkedInMastodonBluesky.


Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.





Download audio: https://dts.podtrac.com/redirect.mp3/media.blubrry.com/grammargirl/stitcher.simplecastaudio.com/e7b2fc84-d82d-4b4d-980c-6414facd80c3/episodes/121749cc-5f2c-4dd8-a711-b89236f46b6f/audio/128/default.mp3?aid=rss_feed&awCollectionId=e7b2fc84-d82d-4b4d-980c-6414facd80c3&awEpisodeId=121749cc-5f2c-4dd8-a711-b89236f46b6f&feed=XcH2p3Ah
Read the whole story
alvinashcraft
40 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories