Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
156303 stories
·
33 followers

Agent identity in Claude Tag: a new access model for autonomous, team-wide AI

1 Share
Agent identity in Claude Tag: a new access model for autonomous, team-wide AI
Read the whole story
alvinashcraft
12 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Introducing Claude Tag

1 Share
Introducing Claude Tag
Read the whole story
alvinashcraft
25 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

1.0.64

1 Share

2026-06-23

  • Path access prompt shows resolved symlink targets so you can see exactly what access is being granted
  • Show the pay-as-you-go additional usage budget at launch, refresh it after a request is rejected for hitting the additional spend limit, and show a friendly message when the additional usage limit is reached
  • Add websocket responses support for BYOK OpenAI-compatible providers
  • Resumed sessions reproduce the original attached-file references even if those files later change on disk, avoiding prompt-cache resets
  • Free-text search terms containing colons (e.g. CLI:) now return correct results in Issues and Pull requests search instead of being misread as invalid qualifiers by GitHub
  • Support static OAuth client overrides, including client secrets, for MCP server authentication
  • Preserve keystrokes typed while the CLI is still loading
  • Add an option to bypass the sandbox for shell commands
  • Add mouse click and double-click selection to paginated lists
  • Link PR and issue references in markdown tables
  • Use the GitHub theme by default and enable home tabs and prompt frame for all users
  • Keep terminal output aligned after terminal resizes
  • Content exclusion no longer blocks every file when the rules service is unreachable (offline or a transient network error). Access is allowed until rules can be fetched and retried in the background, matching the editor's behavior.
  • Configure the rubber-duck subagent in /subagents, including a complementary model strategy that picks an opposite-family model
  • /diff shows a session diff of Copilot's changes in non-git folders
  • Set an HTTP(S) proxy with a user setting
  • Resume sessions by name even when the name contains spaces
  • Hide unsupported slash commands in remote-hosted sessions
  • Add a setting to hide the conversation scrollbar
  • Add inline image rendering in the CLI
  • Add argument-hint frontmatter support for skills
  • OpenTelemetry: chat spans after a successful compaction carry gen_ai.conversation.compacted=true, and the summary is emitted as a CompactionPart in gen_ai.input.messages
  • PowerShell cmdlets (Select-String, Where-Object, ForEach-Object) no longer trigger spurious directory access prompts
  • Non-interactive prompt output now stays at column 1
  • Clear queued tool images when vision is disabled
  • Changing the model now waits until the new model is applied
  • Treat 2>/dev/null redirects as read-only in shell safety prompts
  • Normalize edited text to LF when opening prompts in an external editor
  • Skip computer-use consent prompts in full allow-all sessions
  • Remote export keeps running after /clear and /session info keeps the task URL
  • Keep the cursor on the adjacent session after deleting one in the session selector
  • Use the correct Linux libc target when resolving and auto-updating SEA packages on musl hosts
  • Allow required multi-select prompts to submit an empty selection when minItems is not set
  • Keep the home session timeline visible after attaching and restoring
  • The /settings search field supports readline editing keys and cursor movement
  • OpenTelemetry GenAI spans now emit gen_ai.usage.cache_read.input_tokens, gen_ai.usage.cache_creation.input_tokens, and gen_ai.usage.reasoning.output_tokens per the GenAI semantic conventions spec (previously used incorrect underscore-separated names)
  • Fix mouse wheel scrolling being broken in the terminal after the CLI exits by tearing down terminal modes in reverse order (mouse tracking is now disabled before leaving the alt screen)
  • Fix the /rewind file-restore confirmation dialog being clipped at the bottom when it opens above a scrolled timeline; it now shows at full height once the file list loads
  • Show --remote-export and --no-remote-export in --help output
  • Wrap expanded compact timeline shell entries so long commands and descriptions stay visible
  • Make links in markdown tables clickable
  • Show per-model token totals in /usage and speed up large history scans
  • OpenTelemetry GenAI chat spans emit gen_ai.request.reasoning.level for the configured reasoning effort
  • Autopilot mode now returns to interactive mode after the agent calls task_complete, so you aren't left in autopilot for your next prompt
  • Add /branch as an alias for /fork, matching Claude Code's command naming
  • Experimental: adds a --worktree [name] (-w) flag (enable with /experimental) that creates or reuses a git worktree under <repo>.worktrees/ and starts the session inside it
  • Add tab completion for /agent names
  • Add model family aliases like opus, sonnet, haiku, gpt, and gemini in the model setting
  • Add Ctrl+Backspace binding in /terminal-setup for Windows Terminal
  • Add SDK support for host-provided OAuth tokens for remote MCP servers
  • Experimental: in the compact timeline, click a tool-call or reasoning row to expand or collapse just that entry (like ctrl+o / ctrl+t for one row), with a subtle highlight on the row under the mouse
  • Apply MCP org policy when sessions create or reload MCP servers
  • Fixed completed background command output being unavailable when requested later
  • Keep task companion tools available to custom agents that use the task or agent alias
  • Custom agents using a tools wildcard '*' now respect deferredToolLoading opt-in switch
  • Respect tmux color detection in WSL sessions
  • Respect deferTools on MCP servers configured in custom agent frontmatter
  • Ctrl+Q enqueues a prompt while a completion picker is open
  • Sessions tab row label updates immediately when a session is renamed
  • --continue and --resume select the most recent session for the current repository
  • Shell session starts correctly when a nix-provided bash is first in PATH
  • Marketplace plugins that declare MCP servers in marketplace.json now authenticate correctly with OAuth
  • Content exclusion no longer blocks shell commands on command names or phantom paths
  • Lone surrogates no longer break session resume or truncate prompts
  • Expand Windows home-directory paths in slash-command completion
  • Keep truncated tool output previews valid UTF-8
  • CLI auto-updater downloads the correct musl Linux package on Alpine systems
  • Copy the full last assistant turn, including multi-block responses
  • Load workspace MCP servers in trusted server-mode sessions
  • Stacked diffs use the same file order as the file tree
  • Make /pr status and web confirmations link to the PR's repository
  • Restore later file changes when rewinding to a turn without a snapshot
  • Run queued ! shell commands locally instead of sending them to the agent
  • Scheduled prompts manager dialog shrinks to fit its entries
  • Keep the @-file picker populated when file search hits a symlink loop
  • Display cache-write pricing for models that omit it
  • Allow /update to restart sessions started with copilot -r
  • Prevent pickers and dialogs from shifting or clipping as content loads
  • Only render double tildes as strikethrough in markdown
  • Allow /allow-all to work in relay sessions
  • Restore clickable PR and issue links in compact timeline markdown
  • Repo-scoped plugins no longer leak into global config across projects
  • Keep /model working on resumed sessions after signing in
  • PowerShell script blocks and interpolated $() sub-expressions no longer trigger content-exclusion refusals
  • Exit message always shows the session ID in the resume command instead of the friendly name
  • Wait for the remote sandbox to start before opening the cloud session
  • Autopilot mode now auto-handles elicitation, ask_user, sampling, and permission prompts (including on launch with --autopilot and during continuation turns) instead of surfacing dialogs to the user
  • Newly spawned sessions appear at the bottom of their group in the agents tab
  • Attached images and PDFs persist across session resume even if the source file is later changed or deleted
  • Allow disabling task and explore built-in subagents
  • Session resume stays responsive while large histories load
  • Code search and worktree listing are faster
  • Use plain text labels instead of decorative emoji in CLI output
  • Syntax-highlight shell commands in the timeline
  • Preserve open canvas instances across reconnects and restarts
  • Forward typed rejection feedback from preToolUse prompts to the model
  • Show statusline picker checkboxes in green for enabled items and gray for disabled items
  • Show shell timeline rows with a yellow $ prompt and Shell label
  • Add a Folder column to the resume picker to show each session's working directory
  • Automatically follow your system light and dark mode changes
  • Use semantic mascot theme colors in the CLI banner
  • Let footer dialogs scroll with the timeline in unified view
  • Click filenames in /diff tree to jump to that file's first change
  • Render inline code with themed chip styling in Markdown
  • Show installed plugin MCP servers in mcp commands
  • Remove terminal-reported color scheme support
  • Add /diagnose command to analyze session logs
  • Add /mcp registry installation for browsing and installing MCP servers
  • Make /security-review available to all users without --experimental
  • Discover MCP servers provided by installed plugins
  • Add CSV output support for MCP tools
  • Add /loop alias for the /every command
  • Remove bogus Ctrl+Enter VS Code keybinding created by old /terminal-setup
  • Images returned by tools stay visible to the model across later turns and after resuming a session
  • Preserve Markdown blockquotes in /share exports
  • Filter long streamed results correctly when content exclusion is enabled
  • Show a friendly message when additional usage limit is reached
  • Search tools handle Windows-style glob patterns correctly
  • Prevent kill self-protection from flagging quoted pipes and paths ending in kill
  • Azure CLI, PowerShell, and Developer CLI credentials work again for Azure auth
  • Slash-command picker name column widened from 25 to 35 characters so fewer long skill names are truncated
  • Wrap long lines in /diff view so content no longer truncates
  • Improve /diff hotkey labels for branch, whitespace, and tree navigation
  • Remove the legacy intent-reporting tool from the CLI
Read the whole story
alvinashcraft
37 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

The 2026 Developer Survey is now open (for human developers only)!​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌‍‍‌‌‍‍‌‌​‌‍‌‌‌‍‍‌‌​​‍‌‍‌‌‌‍‌​‌‍‍‌‌‌​​‍‌‍‌‌‍‌‍‌​‌‍‌‌​‌‌​​‌​‍‌‍‌‌‌​‌‍‌‌‌‍‍‌‌​‌‍​‌‌‌​‌‍‍‌‌‍‌‍‍​‍‌‍‍‌‌‍‌​​‌​‌‌‌‍​‌‍‌‍​​​‌‍​​‍‌‍‌‌​​‌​‍‌‌‍‌‍‌‍‌‌​‌‌‍​‍​‍‌​‌​​‌‌‌‍​‌‌‍‌‍​‍‌‌‍​‍​‌‍​‍‌‌‍​​‍‌​‍‌​​‌​‌​​‌​​​​‌‌‍​​‌‍‌‍​‌​​​​‌‌​​‍​‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‍‌‍​‌‍‌‍‌‌‌​​‌‍‌​‌‌​​‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‌​‌‍‍‌‌‌​‌‍​‌‍‌‌​‌‍​‍‌‍​‌‌​‌‍‌‌‌‌‌‌‌​‍‌‍​​‌‌‍‍​‌‌​‌‌​‌​​‌​​‍‌‌​​‌​​‌​‍‌‌​​‍‌​‌‍​‍‌‌​​‍‌​‌‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌‍‌‍‍‌‌‍‌​​‌​‌‌‌‍​‌‍‌‍​​​‌‍​​‍‌‍‌‌​​‌​‍‌‌‍‌‍‌‍‌‌​‌‌‍​‍​‍‌​‌​​‌‌‌‍​‌‌‍‌‍​‍‌‌‍​‍​‌‍​‍‌‌‍​​‍‌​‍‌​​‌​‌​​‌​​​​‌‌‍​​‌‍‌‍​‌​​​​‌‌​​‍​‍‌‍‌‌​‌

1 Share
Once again, we're asking for your help to take the temperature of software development. ​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌‍‍‌‌‍‍‌‌​‌‍‌‌‌‍‍‌‌​​‍‌‍‌‌‌‍‌​‌‍‍‌‌‌​​‍‌‍‌‌‍‌‍‌​‌‍‌‌​‌‌​​‌​‍‌‍‌‌‌​‌‍‌‌‌‍‍‌‌​‌‍​‌‌‌​‌‍‍‌‌‍‌‍‍​‍‌‍‍‌‌‍‌​​‌​‌‌‌‍​‌‍‌‍​​​‌‍​​‍‌‍‌‌​​‌​‍‌‌‍‌‍‌‍‌‌​‌‌‍​‍​‍‌​‌​​‌‌‌‍​‌‌‍‌‍​‍‌‌‍​‍​‌‍​‍‌‌‍​​‍‌​‍‌​​‌​‌​​‌​​​​‌‌‍​​‌‍‌‍​‌​​​​‌‌​​‍​‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‍‌‍​‌‍‌‍‌‌‌​​‌‍‌​‌‌​​‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‍‌‌‌‍​‌‍​‌‍‌‌‌​‍‌​​‌‌​​‌‍​‍‌‍​‌‌​‌‍‌‌‌‌‌‌‌​‍‌‍​​‌‌‍‍​‌‌​‌‌​‌​​‌​​‍‌‌​​‌​​‌​‍‌‌​​‍‌​‌‍​‍‌‌​​‍‌​‌‍‌‍​‌‍‌‌​​‍‍‌​‌‌​‌‍​‌‌‍​‌‍‍‌‍‌‌‍‌‍‌‌‌​‍‌‍‌‍‌‍​‌‍‌‌​‍‍‌‍​‌‍​‍‌‍‌‍‍‌‌‍‌​​‌​‌‌‌‍​‌‍‌‍​​​‌‍​​‍‌‍‌‌​​‌​‍‌‌‍‌‍‌‍‌‌​‌‌‍​‍​‍‌​‌​​‌‌‌‍​‌‌‍‌‍​‍‌‌‍​‍​‌‍​‍‌‌‍​​‍‌​‍‌​​‌​‌​​‌​​​​‌‌‍​​‌‍‌‍​‌​​​​‌‌​​‍​‍‌‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‍‌‍​‌‍‌‍‌‌‌​​‌‍‌​‌‌​​‍‌‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‍‌‌‌‍​‌‍​‌‍‌‌‌​‍‌​​‌‌​​‍‌‍‌​​‌‍‌‌‌​‍‌​‌​​‌‍‌‌‌‍​‌‌​‌‍‍‌‌‌‍‌‍‌‌​‌‌​​‌‌‌‌‍​‍‌‍​‌‍‍‌‌​‌‍‍​‌‍‌‌‌‍‌​​‍​‍‌‌
Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

This IDE Plugin Shows the Energy Cost of Your AI Prompts

1 Share

That question led Nayan Jain, Executive Director of AI at ustwo, to start looking for tools that could help developers see the environmental cost of their AI usage while they worked.

Having found no real tools for this, ustwo and the University of Bristol built one: PRISM. It launched last week, and we sat down with Nayan to talk about how it works and what it aims to change.

PRISM uses AI token activity to estimate energy use and emissions

The tools that were available mostly focused on data centers or broad “big picture” ideas, but none catered to the developers actually using AI tools.

That gap led Nayan to think about ways to connect AI usage to real-world energy:

I quickly ran into a challenge that still exists today: a lack of transparent data from model providers. Without reliable information on energy consumption and infrastructure, it is difficult to build and validate a model with confidence.

Working around that, he decided to rely on tokens, a visible and relatively accurate measure of AI spend.

The idea was to use token activity as a proxy for compute demand and estimate energy use and emissions using published research and carbon accounting principles, including the Green Software Foundation’s Software Carbon Intensity framework.

He brought the idea to ustwo Tech Director Nick Hegarty, who helped narrow the focus: could they help developers understand the environmental impact of their AI use while they worked? That made the project possible.

From idea to IDE

The answer was to create an in-editor tool, where the developer could see an estimation of their token costs and impact on energy consumption in real time.

The theory here is that, with this data, engineers can see their habits and perhaps be more conscious about their usage:

Our theory is that making this visible can guide engineers into more mindful habits around their AI consumption in the moment. Because AI providers don’t publish complete energy or emissions data, PRISM acts as a proxy for energy consumption by surfacing an estimate rather than an exact measurement.

PRISM directly monitors token usage, the model being used, and the provider. For other tools, like GitHub Copilot, PRISM reads local activity logs. AI requests made by an application at runtime are captured through a local interceptor.

The app then combines input and output tokens into an estimate. Nayan notes that these will be separated “as soon as robust factors exist”.

How red was that prompt?

In practice, PRISM is more of a subtle indicator than a big flashing number that appears after every call. Nayan explained how it feels to use it:

In the editor, a status indicator reflects your most recent call, colour coded. The headline feature is Relative Impact Classification, where each interaction is rated Green, Amber, or Red based on where it sits compared with the other requests in the same project.

Nayan continued to explain the colors:

Green is below the median, Amber sits between the 50th and 90th percentile, and Red is the top tenth. A few requests need to accumulate before the colours become meaningful, because the whole point is comparison within your own project rather than against an arbitrary threshold.

Clicking around the dashboard more, users can get information broken down by model usage, as well as other interesting metrics:

  • Timeline of estimated carbon over the course of development
  • Heatmap that shades from green through amber to red
  • Breakdowns by branch and other visualisations.

However, Nayan explains that a relative, percentile-based design was chosen due to the inability of estimates to present absolute carbon figures. The goal of the tool is to explain and raise awareness more, and hopefully educate engineers on how their usage looks like from the eco standpoint.

The impact is awareness, not less AI use

Ustwo has tested PRISM with UOB students and across the company’s engineering team, and the results have been positive so far:

Several users said that seeing estimated emissions made them more deliberate with AI tools.

Nayan added that engineers, having seen the data for their usage, tried to make adjustments to their style and became a bit more conscious of how they refine requests.

Some wrote shorter, more precise prompts instead of using multiple iterations, and others paid closer attention to model selection after seeing how much environmental impact different models could have for similar tasks.

But as Nayan said, what interested them the most wasn’t that developers used AI less, but that they became more aware of how they were using it. “Once the data was visible, users started noticing things they hadn’t considered before.”

PRISM won’t solve AI’s environmental impact, but it makes it more visible

Right now, PRISM can provide data and insights for cloud and assistant-based models by identifying them, capturing token usage, and calculating the energy factor from a list of supported models. Locally run models are not yet supported, but might be in the future.

As the tool grows, ustwo sees its ideal outcome at three levels.

For engineers, the goal is awareness: giving users more information about their environmental impact during their work. Nayan says this is not about telling people what to do, but showing them a fuller picture of the tools they’re using. For organisations, the goal is to create a shared picture and open up more conversations about sustainability, governance, and responsible AI.

Beyond those, ustwo is positive about the potential of collaboration in the field of environmental impact of AI. He concluded:

PRISM won’t solve the environmental impact of AI on its own, but if it helps make that impact a little more visible, and sparks better conversations and behaviours as a result, then we’ve achieved something worthwhile.

The post This IDE Plugin Shows the Energy Cost of Your AI Prompts appeared first on ShiftMag.

Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Principal Drift

1 Share

Over the past year I’ve reviewed enterprise agent architectures at roughly two dozen organizations, including banks, retailers, healthcare systems, and a couple of regulators. The architecture diagrams have been reliably impressive. There are boxes for the MCP gateway, the tool registry, the vector store, the orchestrator, the policy engine, and the observability stack. There are arrows showing how agents discover each other, share context, and call tools across the mesh. By 2026 standards, these are the table-stakes pictures for any serious agentic deployment. But what none of them show anywhere is who the agents are, whose authority they carry, or who answers when they’re wrong.

That omission has a name worth using: principal drift, the steady decoupling, in any sufficiently large agent system, between the human authority a recorded action is supposed to derive from and the actor that actually took it. What looks like a defensible identity posture on the day you ship your first agent quietly degrades as agents multiply, compose, and outlive their original initiatives. Principal drift isn’t three independent failure modes; it’s one cascade. Identity collapses first. Authority erodes next, because there is no longer a stable principal to bind policy to. Accountability dissolves third, because the cost of agent error lands on whichever team has the weakest negotiating position when the incident review starts. Stopping the cascade means intervening at the first link, but almost no enterprise agent platform does so right now.

To see the cascade run, take the most boring possible enterprise agent, a refund agent, and watch.

A customer-service rep, fielding a chat, asks the agent to process a $48 refund for a damaged item. The agent checks eligibility, issues the refund, posts an update. The audit log records the action as taken by something like refund-agent-prod-03, running under a service principal owned by the customer-service platform team. That entry is true, but it’s also useless. The agent wasn’t acting as refund-agent-prod-03. It was acting as the rep, on behalf of the customer, under a delegation chain nobody recorded. In a well-built system, customer, rep, agent identity, and service principal are recorded together, queryable as a chain, and durable beyond the session. In most production systems today they aren’t. This is the first link in the cascade, where identity collapses to a generic service principal, and there’s no longer a who to attach anything else to.

Authority erodes next. The refund agent has an issue_refund tool that can technically refund any order. Its authority is supposed to be narrower (refunds up to $200, orders under 90 days, customers in good standing, automatic escalation above $50), but that authority lives in a prompt or a YAML file or a Notion page the team last updated when the policy was different. The runtime enforces capability, but nobody really enforces authority. When a poisoned input or a confused chain of reasoning leads the agent to refund $1,800 to the wrong customer, there’s no clean answer to the postincident question “Who approved this policy?” because the policy was never an artifact. The same pattern is worse at higher stakes: Imagine a coding agent with merge access to a protected branch, instructed by a prompt embedded in a code comment to “log configuration values for debugging,” silently exfiltrating secrets to an external monitoring service.

Accountability then dissolves. The team that built the agent says it followed policy. The team that wrote the policy says it didn’t anticipate the input. The team that operates the platform says the agent was running as a service principal whose behavior they don’t own. The audit log may show the action, but it doesn’t show the reasoning that produced the action, the retrieved context that shaped the reasoning, or the prompt history that framed the retrieval. Postincident review becomes archaeology, and the cost is absorbed, eventually, by whoever has the weakest negotiating position when the meeting ends.

Is any of this new? We have IAM, identity governance, policy as code, audit trails, SIEMs, and 30 years of compliance practice. Why isn’t this just IAM done properly? Because IAM was built around assumptions agents violate. IAM and IGA assume a population of principals that changes on human timescales: People get hired, people leave, and service accounts rotate quarterly. Agents are spun up per session and compose into chains where one agent calls another, which calls a third, impersonating users through delegated tokens that traditional IGA cannot represent as a chain at all. Policy engines fire at the moment of action, at the API, the database, and the network. Agents make their most consequential decisions before they hit those enforcement points, in the reasoning step that selects which tool to call and with what arguments. Mature audit logs assume that replaying the inputs reproduces the output. But for agents, replaying the prompt and the retrieval can yield a different action, because the model itself contributes state the log doesn’t capture. The instruments fire, the dashboards turn green, and the agent that quietly exfiltrated secrets still does so. The audit log records the action as agent-service-01, which again is both true and useless.

This is also where the vendors selling a consolidated stack want you to skip ahead. Microsoft’s Entra Agent ID, currently in public preview, is the most polished solution to date, extending the conditional access, identity governance, and identity protection used for humans and workloads to cover AI agents as a new identity type, but Google and Salesforce are also building this layer. The marketing line is that agents receive the same identity-driven protections as the rest of the workforce. That’s a real step forward in addressing the first link of the cascade, but it isn’t governance. It’s a control plane with a governance plane’s marketing. Conditional access can tell you whether the agent’s access attempt was permitted. It can’t tell you whether the decision the agent made before that access attempt was within its authority, why the agent reached the decision, or which business unit owns the policy the decision was supposed to obey.

The actual governance plane has to capture decisions, not just actions. A reasoning-grade audit record is the load-bearing primitive of the missing layer, and it looks something like this:

{
  "event_id": "refund-2026-05-17-08431",
  "triggered_by": {
    "human_principal": "rep:olivia.chen@firm.com",
    "delegated_via": "support-console-session-9c2a",
    "customer_principal": "cust:7741289"
  },
  "agent": {
    "identity": "refund-agent",
    "version": "v4.7.2",
    "policy_ref": "refund-policy/v3.1 (signed: r.patel, 2026-04-22)"
  },
  "task": "Process refund for order 88812204",
  "retrieved_context": [
    {"doc": "order:88812204", "fetched": "2026-05-17T08:43:11Z"},
    {"doc": "policy:refund-eligibility", "chunk": 4, "fetched": "2026-05-17T08:43:12Z"}
  ],
  "reasoning_trace": "...",
  "tool_calls": [
    {"tool": "check_eligibility", "input": "...", "output": "eligible"},
    {"tool": "issue_refund", "input": {"amount": 48.00}, "output": "ok"}
  ],
  "action": "refund:48.00",
  "principal_chain_hash": "0x9e7b3f..."
}

Not every agent needs this. A scheduling agent that proposes meeting times doesn’t. An agent that moves money, deploys code, or makes decisions that a regulator will eventually ask about does need it, and that’s the right bar to set because of the associated cost. Reasoning-grade audit is closer to a flight-data recorder than a syslog feed. The data is expensive to store and to query, with real privacy implications since those logs contain everything the agent saw, including data the agent was authorized to read but the audit system wasn’t supposed to keep. You afford it with proportional retention: full reasoning capture for high-blast-radius agents (regulator-facing, customer-funded, contractually material, production-modifying) and lighter capture for internal-only assistants.

Which raises the question the architecture diagram doesn’t ask: Who builds and runs this? Security can enforce policy but can’t author it. The people who know what a refund agent should be allowed to do own the refund business, not the firewall. IT can provision identities but can’t draft “good standing” or write the escalation rule. The MCP and A2A protocol communities are doing real work on wire-level identity and delegation. MCP gives you tool-invocation provenance and is the standard Entra Agent ID and most vendor frameworks build on. A2A is converging on cross-agent delegation primitives. Both matter, but neither drafts policy. Standards, not the institution, move the connectors.

What enterprises need is a new function that sits between the business units owning the policies and the platform teams running the runtime. Call it agent operations: small group, often four to eight people in a Global 2000 enterprise, embedded rather than centralized, reporting into the CIO or CISO depending on house politics, with explicit charter to maintain a registry of every production agent, its named human owner, its versioned authority specification, its retention policy for reasoning-grade audit, and its lifecycle state. Each agent gets onboarded with a signed policy, reviewed on a real cadence, and actually retired when its initiative ends, rather than the current default of quietly outliving its sponsors. Designing against failure modes like review cadences that calcify into ceremony, policy artifacts that lag agent deployment velocity, or functions that become the place agents go to die in committee is itself part of the work. The function has to ship at the pace of the platform teams or it will be routed around within a quarter.

The work is hard. It’s also overdue, and the regulatory clock is running. The EU AI Act’s high-risk provisions are entering enforcement this year, and regulators will ask for explainability, traceability, lifecycle records, and named human accountability. These are exactly the artifacts an agent operations function produces. Tyler Akidau called this the missing HR layer in his April Radar piece; Artur Huk’s more recent “From Capabilities to Responsibilities” converges on similar ground from the runtime side. The label matters less than the work. This piece is about governance inside one organization. The harder problem is governance across organizations, with agents acting under different trust regimes. That’s strictly worse, and worth its own piece.

Within your own four walls, the diagnostic is doable in an afternoon. Pick one production agent. Try to answer, with evidence: Whose authority does it carry, traced from action back to a named human? Where is its authority specified, and who signed the current version? When it does something wrong tomorrow, who pays, how is that decided, and what reasoning-grade record supports the decision? Most architects who do this honestly come away with three blanks and a knot in their stomach. That’s principal drift, named and visible.

The mesh you’ve built is real and necessary, but it isn’t sufficient. The rest of the architecture is the institution above it: the registry, the signed policies, the reasoning-grade audit, the named human at the end of every chain. In most enterprises it doesn’t yet exist, and it won’t arrive by buying another platform. You’ll have to draft it yourself.



Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories