Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
148267 stories
·
33 followers

Guide to Architect Secure AI Agents: Best Practices for Safety

1 Share

Imagine you’ve just bought a fancy new high-performance sports car. It’s shiny, it’s fast, and it’s packed with the latest technology to make driving a more efficient, pleasurable experience. But what if I told you that this car, as high-tech as it is, could potentially decide to make a detour all on its own? Or share your destination details with unwelcome parties? It sounds like a plot from a science fiction movie, right? But this isn’t far from the reality we are beginning to navigate with the rise of AI agents.

AI agents are like these intelligent cars, but instead of transporting people, they navigate the vast and complex roads of data and decision-making. They perceive context, reason through goals and constraints, and perform actions autonomously through various tools and services. This autonomy allows us to delegate tasks, big and small, without needing to micromanage each step of the process. Tell an AI agent the goals, and it crafts its path to achieving them.

This video is from IBM Technology.

However, as with any powerful technology, AI agents present not just opportunities but also considerable risks. If not properly managed and secured, these agents could act unpredictably or be exploited maliciously, leading to potentially severe consequences. Security and control are paramount to ensure that AI agents work within designated boundaries and don’t go rogue.

Take, for instance, the collaborative guide released by IBM and Anthropic, which details securing enterprise AI agents using the MCP (Model, Control, Protect) framework. This approach integrates strategic planning at each stage of the AI lifecycle—from development through deployment and maintenance, ensuring agents remain safe, reliable, and aligned with organizational goals.

The New Paradigm: From Static to Adaptive

The shift towards using AI agents marks a paradigm change from traditional static systems to dynamic, adaptive systems. Historically, logic systems were deterministic: identical inputs always yielded identical outputs. AI agents, by contrast, operate on probabilistic logic where the same inputs can lead to different outcomes based on the agent’s learning and adaptation over time. This adaptability enhances the agent’s effectiveness but also introduces variability that must be managed.

Continuous Cycle of Development and Evaluation

AI development is no longer just about coding; it’s about continuous evaluation and realignment. Teams must adapt a cyclic approach that includes planning, coding, testing, deploying, monitoring, and then looping back to planning. Each step is crucial and must integrate security—hence the concept of DevSecOps, where security measures are embedded throughout the development and operational processes of the AI lifecycle.

Mitigating Architectural Risks and Building Secure Frameworks

A significant concern with AI agents is their potential to extend an organization’s attack surface. Each new technology implementation opens up new avenues for attack, and AI agents are no different. They can become points of vulnerability, avenues for data leakage, or even tools for escalating network privileges if not adequately controlled.

To counter these risks, organizations must enforce strict system controls, define clear roles and permissions, and ensure agents operate within a secure, constrained environment—much like sandboxing in software development. It’s crucial to establish what an agent can and cannot do, ensuring it operates both effectively and safely within its defined limits.

Human in the Loop: Oversight and Continuous Assessment

Despite their autonomy, AI agents require human oversight. The ‘human in the loop’ approach ensures that there is always a way for human intervention, especially in critical decision-making processes or when an anomaly is detected. Continuous monitoring and threat detection are essential to respond to and mitigate risks in real time. Proactive threat hunting and regular risk assessments further fortify the security framework, ensuring the agents do not act beyond their intended scope.

The Future with Secure AI Agents

AI agents represent a technological evolution with the potential to redefine productivity and efficiency across many industries. The guidelines from IBM and Anthropic provide a robust framework for safely integrating these powerful tools into business processes. For companies that master secure AI agent implementation, the payoff will be a significant competitive edge. Conversely, those who neglect these security considerations risk more than just operational failure; they risk the very integrity of their data and systems.

As we stand on the brink of an era dominated by intelligent, autonomous agents, the balance of innovation and security has never been more critical. By embracing best practices for AI security, we ensure that these agents serve us, not the other way around.

Read the whole story
alvinashcraft
7 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Cowork and plugins for teams across the enterprise

1 Share
Cowork and plugins for teams across the enterprise
Read the whole story
alvinashcraft
7 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Control Planes for Autonomous AI: Why Governance Has to Move Inside the System

1 Share

For most of the past decade, AI governance lived comfortably outside the systems it was meant to regulate. Policies were written. Reviews were conducted. Models were approved. Audits happened after the fact. As long as AI behaved like a tool—producing predictions or recommendations on demand—that separation mostly worked. That assumption is breaking down.

As AI systems move from assistive components to autonomous actors, governance imposed from the outside no longer scales. The problem isn’t that organizations lack policies or oversight frameworks. It’s that those controls are detached from where decisions are actually formed. Increasingly, the only place governance can operate effectively is inside the AI application itself, at runtime, while decisions are being made. This isn’t a philosophical shift. It’s an architectural one.

When AI Fails Quietly

One of the more unsettling aspects of autonomous AI systems is that their most consequential failures rarely look like failures at all. Nothing crashes. Latency stays within bounds. Logs look clean. The system behaves coherently—just not correctly. An agent escalates a workflow that should have been contained. A recommendation drifts slowly away from policy intent. A tool is invoked in a context that no one explicitly approved, yet no explicit rule was violated.

These failures are hard to detect because they emerge from behavior, not bugs. Traditional governance mechanisms don’t help much here. Predeployment reviews assume decision paths can be anticipated in advance. Static policies assume behavior is predictable. Post hoc audits assume intent can be reconstructed from outputs. None of those assumptions holds once systems reason dynamically, retrieve context opportunistically, and act continuously. At that point, governance isn’t missing—it’s simply in the wrong place.

The Scaling Problem No One Owns

Most organizations already feel this tension, even if they don’t describe it in architectural terms. Security teams tighten access controls. Compliance teams expand review checklists. Platform teams add more logging and dashboards. Product teams add additional prompt constraints. Each layer helps a little. None of them addresses the underlying issue.

What’s really happening is that governance responsibility is being fragmented across teams that don’t own system behavior end-to-end. No single layer can explain why the system acted—only that it acted. As autonomy increases, the gap between intent and execution widens, and accountability becomes diffuse. This is a classic scaling problem. And like many scaling problems before it, the solution isn’t more rules. It’s a different system architecture.

A Familiar Pattern from Infrastructure History

We’ve seen this before. In early networking systems, control logic was tightly coupled to packet handling. As networks grew, this became unmanageable. Separating the control plane from the data plane allowed policy to evolve independently of traffic and made failures diagnosable rather than mysterious.

Cloud platforms went through a similar transition. Resource scheduling, identity, quotas, and policy moved out of application code and into shared control systems. That separation is what made hyperscale cloud viable. Autonomous AI systems are approaching a comparable inflection point.

Right now, governance logic is scattered across prompts, application code, middleware, and organizational processes. None of those layers was designed to assert authority continuously while a system is reasoning and acting. What’s missing is a control plane for AI—not as a metaphor but as a real architectural boundary.

What “Governance Inside the System” Actually Means

When people hear “governance inside AI,” they often imagine stricter rules baked into prompts or more conservative model constraints. That’s not what this is about.

Embedding governance inside the system means separating decision execution from decision authority. Execution includes inference, retrieval, memory updates, and tool invocation. Authority includes policy evaluation, risk assessment, permissioning, and intervention. In most AI applications today, those concerns are entangled—or worse, implicit.

A control-plane-based design makes that separation explicit. Execution proceeds but under continuous supervision. Decisions are observed as they form, not inferred after the fact. Constraints are evaluated dynamically, not assumed ahead of time. Governance stops being a checklist and starts behaving like infrastructure.

Execution from governance separation in AI systems
Figure 1. Separating execution from governance in autonomous AI systems

Reasoning, retrieval, memory, and tool invocation operate in the execution plane, while a runtime control plane continuously evaluates policy, risk, and authority—observing and intervening without being embedded in application logic.

Where Governance Breaks First

In practice, governance failures in autonomous AI systems tend to cluster around three surfaces.

Reasoning. Systems form intermediate goals, weigh options, and branch decisions internally. Without visibility into those pathways, teams can’t distinguish acceptable variance from systemic drift.

Retrieval. Autonomous systems pull in context opportunistically. That context may be outdated, inappropriate, or out of scope—and once it enters the reasoning process, it’s effectively invisible unless explicitly tracked.

Action. Tool use is where intent becomes impact. Systems increasingly invoke APIs, modify records, trigger workflows, or escalate issues without human review. Static authorization models don’t map cleanly onto dynamic decision contexts.

These surfaces are interconnected, but they fail independently. Treating governance as a single monolithic concern leads to brittle designs and false confidence.

Control Planes as Runtime Feedback Systems

A useful way to think about AI control planes is not as gatekeepers but as feedback systems. Signals flow continuously from execution into governance: confidence degradation, policy boundary crossings, retrieval drift, and action escalation patterns. Those signals are evaluated in real time, not weeks later during audits. Responses flow back: throttling, intervention, escalation, or constraint adjustment.

This is fundamentally different from monitoring outputs. Output monitoring tells you what happened. Control plane telemetry tells you why it was allowed to happen. That distinction matters when systems operate continuously, and consequences compound over time.

Figure 2. Runtime governance as a feedback loop

Behavioral telemetry flows from execution into the control plane, where policy and risk are evaluated continuously. Enforcement and intervention feed back into execution before failures become irreversible.

Want Radar delivered straight to your inbox? Join us on Substack. Sign up here.

A Failure Story That Should Sound Familiar

Consider a customer-support agent operating across billing, policy, and CRM systems.

Over several months, policy documents are updated. Some are reindexed quickly. Others lag. The agent continues to retrieve context and reason coherently, but its decisions increasingly reflect outdated rules. No single action violates policy outright. Metrics remain stable. Customer satisfaction erodes slowly.

Eventually, an audit flags noncompliant action. At that point, teams scramble. Logs show what the agent did but not why. They can’t reconstruct which documents influenced which decisions, when those documents were last updated, or why the agent believed its actions were valid at the time.

This isn’t a logging failure. It’s the absence of a governance feedback loop. A control plane wouldn’t prevent every mistake, but it would surface drift early—when intervention is still cheap.

Why External Governance Can’t Catch Up

It’s tempting to believe better tooling, stricter reviews, or more frequent audits will solve this problem. They won’t.

External governance operates on snapshots. Autonomous AI operates on streams. The mismatch is structural. By the time an external process observes a problem, the system has already moved on—often repeatedly. That doesn’t mean governance teams are failing. It means they’re being asked to regulate systems whose operating model has outgrown their tools. The only viable alternative is governance that runs at the same cadence as execution.

Authority, Not Just Observability

One subtle but important point: Control planes aren’t just about visibility. They’re about authority.

Observability without enforcement creates a false sense of safety. Seeing a problem after it occurs doesn’t prevent it from recurring. Control planes must be able to act—to pause, redirect, constrain, or escalate behavior in real time.

That raises uncomfortable questions. How much autonomy should systems retain? When should humans intervene? How much latency is acceptable for policy evaluation? There are no universal answers. But those trade-offs can only be managed if governance is designed as a first-class runtime concern, not an afterthought.

The Architectural Shift Ahead

The move from guardrails to control loops mirrors earlier transitions in infrastructure. Each time, the lesson was the same: Static rules don’t scale under dynamic behavior. Feedback does.

AI is entering that phase now. Governance won’t disappear. But it will change shape. It will move inside systems, operate continuously, and assert authority at runtime. Organizations that treat this as an architectural problem—not a compliance exercise—will adapt faster and fail more gracefully. Those who don’t will spend the next few years chasing incidents they can see, but never quite explain.

Closing Thought

Autonomous AI doesn’t require less governance. It requires governance that understands autonomy.

That means moving beyond policies as documents and audits as events. It means designing systems where authority is explicit, observable, and enforceable while decisions are being made. In other words, governance must become part of the system—not something applied to it.

Further Reading



Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Apple accelerates U.S. manufacturing with Mac mini production

1 Share
Apple today announced a significant expansion of factory operations in Houston, bringing the future production of Mac mini to the U.S. for the first time.

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Grok 4.2 vs. Sonnet 4.6: Early Impressions From Hands-On Testing

1 Share

We got new model releases from xAI and Anthropic last week, and I wanted to give my quick impressions to help you know if/when you should care.

This is just after a half day of testing, so my impressions may change, but… we’re usually locked in on the vibe pretty quickly.

By the way, even if you aren’t interested in Grok, take a read of the analysis below — we’ll talk about subagent systems in a way that will probably be broadly useful as more AI products use multi-agent systems.

Let’s dive in.


xAI’s Grok 4.2

Elon has been hyping this one for months, so everyone in the industry has been expecting a giant leap. Grok 4.1 was also better than expected at release (it’s regressed since then). So, there was some reason to believe xAI was making good progress.

The verdict: intriguing, but not impressive.

First, allow me a bit of frustration here: it’s so incredibly childish that the model is called Grok 4.20 in the interface (get it? weed, so clever). Not that we should be surprised at this point, but we shouldn’t stop calling it out.

Okay, onto the performance — Grok 4.2 (the model’s actual name) is a multi-agent orchestrator. When you give it a prompt, a lead agent seems to be the one to kick off the searches, and then individual AI ‘personas’ (who have dedicated names) run in parallel chains.

In normal mode, that’s 4 subagents, and with Grok Heavy, it’s up to 16.

\

The typical idea behind multi-agent or multi-subagent architectures is that you get sub-specialty or at least differentiation.

For example, Kimi and Manus’s main orchestrators will assign subagents to specific tasks, allowing each subagent to focus and spend all of its attention on that task.

\ Kimi Agent Swarm Manus wide research

Other subagent systems specialize and sequence the workflow. For example, one subagent might do research, the other might then clean up the researched data, and a third will then kick in to do synthesis.

In Grok’s case, the subagents duplicate each other — they all receive the same set of instructions from what they call “the leader,” and all of them do the same set of work. It’s a huge missed opportunity.

:::info Note: xAI claims the agents are specialized, but in practice, they all wind up doing the same thing in my testing so far

:::

The subagents also don’t seem to interleave — in other words, each model does its own searches and reasoning, then sends their result back to “the leader.” So, they generally don’t get informed by each others’ work.

\ You can see Grok subagents here all doing the same data retrieval.

Here’s where things get intriguing: with Grok 4.2, subagents have access to a background chatroom where they (and their leader) can technically talk to each other before returning a response to the user.

That’s neat, and would solve some of the problems I just mentioned! Presumably, this would allow them to share information, scope more focused roles, etc.

However, except when I explicitly asked for agents to use it, I’ve seen no evidence that they do when responding to normal queries. Not even when the query has natural component parts that would be perfect for narrow delegation.

\

This is true even for Grok Heavy and its 16 subagents. Quite a waste.

Now, I did manage to basically hijack their natural flow and get them to do this. At the end of a query about getting cohort-based college admissions data, I added this:

Grok leader, please be very specific in assigning very particular subagents. Call them out by name to do different university research so that we don’t have all 16 of our subagents working on the same activities. Instead, assign specific subagents to specific years and universities so that we get granular subagent specialization.

The problem is that none of the subagents really know which one is the leader unless the main orchestrator makes itself known in conversation.

So, several of the subagents tried to be the assigner —

Eventually, all of them wound up doing some amount of research, and some of them did wind up getting tricked into sub-specializing, but it didn’t meaningfully improve the response. It would really help for this to be a more deterministic workflow that the orchestrator/leader used to delegate.

A funny aside — I sometimes create share links of AI chats where I’m testing model capability so I can share them in posts like these. Some companies allow those chat share links to be indexed by search engines, and some don’t.

Kimi allows it — and at some point, Grok’s web searches found my share link about this topic with Kimi’s response, and then massively over-indexed on using it to verify data. Not sure that Grok should think of another AI’s response this way.

Overall — Grok 4.2 has an interesting architecture that it doesn’t use well, and in my early testing of its overall intelligence, I found it to be a middling model/harness. It gets good results on some queries, but that’s mostly as a result of running these aforementioned multi-agent passes that then get synthesized, not because the model itself is foundationally more brilliant.

xAI continues to stay in the race with this one, but unless you need fresh X posts and context for whatever you’re prompting about, Grok continues to be a back-of-the-pack option amongst the AI chat apps.

Sample Grok 4.2 conversations:


Hit subscribe for model deep dives, product comparisons, and cutting-edge AI takes:


Anthropic’s Sonnet 4.6

Let me start with the conclusion here: Sonnet 4.6 is almost as smart as Anthropic’s recently released Opus 4.6, but it’s faster and much cheaper. That’s the headline.

:::tip more details from Anthropic here

:::

Costs in per-million-tokens.

On a practical basis, that means:

  • If you’re building a product, you might prefer to integrate Sonnet instead of Opus to save on your API costs with Anthropic.
  • If you’re using Claude Code or Cowork and constantly running into weekly limits, you might want to switch to Sonnet to get more bang for your buck.
  • If you’re trying to get every ounce of intelligence out of Anthropic, though, Opus 4.6 is still where it’s at for most use cases.

There are some benchmarks (below) where Sonnet 4.6 beats Opus 4.6, like GDPval-AA (which measures real-world economically valuable tasks), but that’s usually going to be as a result of its speed somehow helping it when it’s being used in certain environments (ex. because it’s faster, it’s better at iterating through an Excel file within a time constraint).

\

In my general use so far in chat contexts, I don’t find a major difference between Sonnet 4.6 and Opus 4.6, and I don’t plan to use it in coding contexts because I like to use the smartest coding models available to me.

So, there you have it — that’s Sonnet 4.6.


Superbench

Some of you might know that I run a personal model benchmark. I send 60%+ of my prompts to multiple LLMs in their chat applications, and then stack rank the responses. I’m biased, but I think it’s the best AI benchmark on earth.

We don’t have enough data yet for Grok 4.2 or Sonnet 4.6, but I don’t expect either model to disrupt the current status quo as of February 17:

For more from me on all things AI (everyday shortcuts, breakthrough tactics, and deep dive analysis), check out AI Muscle.

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Goodbye innerHTML, Hello setHTML: Stronger XSS Protection in Firefox 148

1 Share

Cross-site scripting (XSS) remains one of the most prevalent vulnerabilities on the web. The new standardized Sanitizer API provides a straightforward way for web developers to sanitize untrusted HTML before inserting it into the DOM. Firefox 148 is the first browser to ship this standardized security enhancing API, advancing a safer web for everyone. We expect other browsers to follow soon.

An XSS vulnerability arises when a website inadvertently lets attackers inject arbitrary HTML or JavaScript through user-generated content. With this attack, an attacker could monitor and manipulate user interactions and continually steal user data for as long as the vulnerability remains exploitable. XSS has a long history of being notoriously difficult to prevent and has ranked among the top three web vulnerabilities (CWE-79) for nearly a decade.

Firefox has been deeply involved in solutions for XSS from the beginning, starting with spearheading the Content-Security-Policy (CSP) standard in 2009. CSP allows websites to restrict which resources (scripts, styles, images, etc.) the browser can load and execute, providing a strong line of defense against XSS. Despite a steady stream of improvements and ongoing maintenance, CSP did not gain sufficient adoption to protect the long tail of the web as it requires significant architectural changes for existing web sites and continuous review by security experts.

The Sanitizer API is designed to help fill that gap by providing a standardized way to turn malicious HTML into harmless HTML — in other words, to sanitize it. The setHTML( ) method integrates sanitization directly into HTML insertion, providing safety by default. Here is an example of sanitizing a simple unsafe HTML:

document.body.setHTML(`<h1>Hello my name is <img src="x" 
onclick="alert('XSS')">`);

This sanitization will allow the HTML <h1> element while removing the embedded <img> element and its onclick attribute, thereby eliminating the XSS attack resulting in the following safe HTML:

<h1>Hello my name is</h1>

Developers can opt into stronger XSS protections with minimal code changes by replacing error-prone innerHTML assignments with setHTML(). If the default configuration of setHTML( ) is too strict (or not strict enough) for a given use case, developers can provide a custom configuration that defines which HTML elements and attributes should be kept or removed. To experiment with the Sanitizer API before introducing it on a web page, we recommend exploring the Sanitizer API playground.

For even stronger protections, the Sanitizer API can be combined with Trusted Types, which centralize control over HTML parsing and injection. Once setHTML( ) is adopted, sites can enable Trusted Types enforcement more easily, often without requiring complex custom policies. A strict policy can allow setHTML( ) while blocking other unsafe HTML insertion methods, helping prevent future XSS regressions.

The Sanitizer API enables an easy replacement of innerHTML assignments with setHTML( ) in existing code, introducing a new safer default to protect users from XSS attacks on the web. Firefox 148 supports the Sanitizer API as well as Trusted Types, which creates a safer web experience. Adopting these standards will allow all developers to prevent XSS without the need for a dedicated security team or significant implementation changes.

 


Image credits for the illustration above: Website, by Desi Ratna; Person, by Made by Made; Hacker by Andy Horvath.

 

The post Goodbye innerHTML, Hello setHTML: Stronger XSS Protection in Firefox 148 appeared first on Mozilla Hacks - the Web developer blog.

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories