Autonomous AI systems force architects into an uncomfortable question that cannot be avoided much longer: Does every decision need to be governed synchronously to be safe?
At first glance, the answer appears obvious. If AI systems reason, retrieve information, and act autonomously, then surely every step should pass through a control plane to ensure correctness, compliance, and safety. Anything less feels irresponsible. But that intuition leads directly to architectures that collapse under their own weight.
As AI systems scale beyond isolated pilots into continuously operating multi-agent environments, universal mediation becomes not just expensive but structurally incompatible with autonomy itself. The challenge is not choosing between control and freedom. It is learning how to apply control selectively, without destroying the very properties that make autonomous systems useful.
This article examines how that balance is actually achieved in production systems—not by governing every step but by distinguishing fast paths from slow paths and by treating governance as a feedback problem rather than an approval workflow.
The first generation of enterprise AI systems was largely advisory. Models produced recommendations, summaries, or classifications that humans reviewed before acting. In that context, governance could remain slow, manual, and episodic.
That assumption no longer holds. Modern agentic systems decompose tasks, invoke tools, retrieve data, and coordinate actions continuously. Decisions are no longer discrete events; they are part of an ongoing execution loop. When governance is framed as something that must approve every step, architectures quickly drift toward brittle designs where autonomy exists in theory but is throttled in practice.
The critical mistake is treating governance as a synchronous gate rather than a regulatory mechanism. Once every reasoning step must be approved, the system either becomes unusably slow or teams quietly bypass controls to keep things running. Neither outcome produces safety.
The real question is not whether systems should be governed but which decisions actually require synchronous control—and which do not.
Routing every decision through a control plane seems safer until engineers attempt to build it.
The costs surface immediately:
This is not a new lesson. Early distributed transaction systems attempted global coordination for every operation and failed under real-world load. Early networks embedded policy directly into packet handling and collapsed under complexity before separating control and data planes.
Autonomous AI systems repeat this pattern when governance is embedded directly into execution paths. Every retrieval, inference, or tool call becomes a potential bottleneck. Worse, failures propagate outward: When control slows, execution queues; when execution stalls, downstream systems misbehave. Universal mediation does not create safety. It creates fragility.
Production systems survive by allowing most execution to proceed without synchronous governance. These execution flows—fast paths—operate within preauthorized envelopes of behavior. They are not ungoverned. They are bound.
A fast path might include:
Fast paths assume that not every decision is equally risky. They rely on prior authorization, contextual constraints, and continuous observation rather than per-step approval. Crucially, fast paths are revocable. The authority that enables them is not permanent; it is conditional and can be tightened, redirected, or withdrawn based on observed behavior. This is how autonomy survives at scale—not by escaping governance but by operating within dynamically enforced bounds.
Want Radar delivered straight to your inbox? Join us on Substack. Sign up here.
Not all decisions belong on fast paths. Certain moments require synchronous mediation because their consequences are irreversible or cross trust boundaries. These are slow paths.
Examples include:
Slow paths are not common. They are intentionally rare. Their purpose is not to supervise routine behavior but to intervene when the stakes change. Designing slow paths well requires restraint. When everything becomes a slow path, systems stall. When slow paths are absent, systems drift. The balance lies in identifying decision points where delay is acceptable because the cost of error is higher than the cost of waiting.
A common misconception is that selective control implies limited visibility. In practice, the opposite is true. Control planes observe continuously. They collect behavioral telemetry, track decision sequences, and evaluate outcomes over time. What they do not do is intervene synchronously unless thresholds are crossed.
This separation—continuous observation, selective intervention—allows systems to learn from patterns rather than react to individual steps. Drift is detected not because a single action violated a rule, but because trajectories begin to diverge from expected behavior. Intervention becomes informed rather than reflexive.

AI-native cloud architecture introduces new execution layers for context, orchestration, and agents, alongside a control plane that governs cost, security, and behavior without embedding policy directly into application logic. Figure 1 illustrates that most agent execution proceeds along fast paths operating within preauthorized envelopes and continuous observation. Only specific boundary crossings route through a slow-path control plane for synchronous mediation, after which execution resumes—preserving autonomy while enforcing authority.
When intervention is required, effective systems favor feedback over interruption. Rather than halting execution outright, control planes adjust conditions by:
These interventions are proportional and often reversible. They shape future behavior without invalidating past work. The system continues operating, but within a narrower envelope. This approach mirrors mature control systems in other domains. Stability is achieved not through constant blocking but through measured correction. Direct interruption remains necessary in rare cases where consequences are immediate or irreversible, but it operates as an explicit override rather than the default mode of control.
Governance has a cost curve, and it matters. Synchronous control scales poorly. Every additional governed step adds latency, coordination overhead, and operational risk. As systems grow more autonomous, universal mediation becomes exponentially expensive.
Selective control flattens that curve. By allowing fast paths to dominate and reserving slow paths for high-impact decisions, systems retain both responsiveness and authority. Governance cost grows sublinearly with autonomy, making scale feasible rather than fragile. This is the difference between control that looks good on paper and control that survives production.
Architects designing autonomous systems must rethink several assumptions:
These shifts are architectural, not procedural. They cannot be retrofitted through policy alone.

AI agents operate over a shared context fabric that manages short-term memory, long-term embeddings, and event history. Centralizing the state enables reasoning continuity, auditability, and governance without embedding memory logic inside individual agents. Figure 2 shows how control operates as a feedback system: Continuous observation informs constraint updates that shape future execution. Direct interruption exists but as a last resort—reserved for irreversible harm rather than routine governance.
The temptation to govern every decision is understandable. It feels safer. But safety at scale does not come from seeing everything—it comes from being able to intervene when it matters.
Autonomous AI systems remain viable only if governance evolves from step-by-step approval to outcome-oriented regulation. Fast paths preserve autonomy. Slow paths preserve trust. Feedback preserves stability. The future of AI governance is not more gates. It is better control. And control, done right, does not stop systems from acting. It ensures they can keep acting safely, even as autonomy grows.

As AI technology continues to mature, its application grows wider too. Code review tools are one of the fastest growing use cases for AI in software development. They facilitate faster checks, better consistency, and the ability to catch critical security issues humans might miss.
The 2025 Stack Overflow Developer Survey reveals that 84% of developers are now using or planning to use AI tools in their development process, including as part of code reviews. This is up from 76% in 2024. But as these tools grow more sophisticated, the question of accountability becomes more important.
When an AI code review tool suggests a change and a developer accepts it, who’s responsible if that change introduces a bug? It’s not just a theoretical question. Development teams face this issue every time they integrate an AI code review process into their workflow.
The conundrum isn’t just about whether the quality of AI code review is good enough. It’s about understanding the ethical questions that need to be considered when AI tools make recommendations that humans implement.
So, just how ethical is code review carried out by AI, and what steps should developers take to ensure that, where it’s utilized, this form of review is integrated ethically? Let’s take a closer look.
Code review automation has come a long way over the past decade, as machine review has grown to work alongside traditional peer reviews through methods including static code analysis. And now, AI-powered systems that learn from millions of code examples have joined the party, streamlining processes and providing further automation.
Code review automation falls into two distinct approaches. Rule-based static code analysis checks your code against predefined standards, while AI-powered systems learn patterns from large code repositories.
It’s the ethical questions raised by the latter that make for interesting conversations.
Understanding the differences between these approaches helps your team make informed decisions about which to choose. Here’s a brief breakdown of the key differences between the two analysis methods:
| Rule-Based Static Analysis | AI-Powered Analysis | |
|---|---|---|
| How it works | Checks code against predefined rules and standards | Learns patterns from large code repositories |
| Transparency | Shows the exact rule violated | Makes recommendations based on learned patterns |
| Consistency | Provides the same results every time for the same code | Can vary based on model training and updates |
| Context understanding | Limited to codified rules | Can recognize complex patterns across codebases |
| Training required | None – rules are predetermined | Requires large datasets of code examples |
| Best for | Enforcing team standards, catching known issues | Identifying subtle patterns, style suggestions |
Of course, this technology is advancing quickly and various tools are incorporating new functionality.
AI-powered code review represents a genuine advancement in development workflows. What were experimental tools just a few years ago are now production-ready systems that many development teams rely on daily. The benefits are undeniable for organizations of all sizes.
AI code review allows you to process thousands of lines of code in seconds without the fatigue or variable attention that can affect human reviewers. AI tools maintain the same level of scrutiny on the 500th pull request as they did on the first, eliminating inconsistency and often helping to overcome issues such as deadline pressure that can lead to missed problems.
AI tools can identify vulnerability patterns across different languages and frameworks, often catching security vulnerabilities like insecure deserialization, XML external entity (XXE) attacks, and improper authentication handling before they reach production, eliminating the potential issues these can cause. That being said, it’s important to mention that they often cause security issues too.
With AI code review, teams can apply identical standards to every code submission, no matter who wrote it, when it was submitted, or how much political capital the author has in the organization. This removes the subtle (and not-so-subtle) biases that can creep into human code review, such as senior developers’ code receiving lighter scrutiny.
Rather than having to wait days for review feedback, AI code review means developers can get input while the context is still fresh – often within minutes.
This tight feedback loop means issues get fixed while the developer still has the mental model loaded, reducing the cognitive cost of having to switch back to yesterday’s or last week’s code after moving on to something new.
AI code review tools are powerful, but they’re not magic, and treating them as infallible creates its own problems. Understanding where these tools have limitations helps your team use them effectively rather than either over-trusting their recommendations or dismissing them entirely.
Tools can miss project-specific intent, architectural decisions, or business requirements not reflected in the code itself. A technically correct suggestion might break an undocumented but critical assumption.
There’s always a risk with any tool that developers can over-trust them. Automated code review is no different, with a danger that team members accept AI suggestions without properly evaluating them. When a tool has been right 95% of the time, it’s easy to skip careful review on that problematic 5%.
Models trained on narrow datasets can reinforce certain coding styles while missing framework-specific best practices. An AI tool trained mostly on open-source JavaScript, for example, might be less reliable when reviewing enterprise Java or Go microservices.
The big question when it comes to AI code review tools is all about who is responsible for the output.
As an example, let’s say an AI code review tool flags a function as inefficient and suggests optimizing it. When a developer reviews this, they may think it looks reasonable and simply accept the change.
The code then ships to production. However, under high load, the “optimization” may cause a race condition that briefly exposes customer data. This can lead to a need for more time spent fixing problems, leading to a drop in production.
Who’s accountable in cases like this? Is the developer responsible for accepting the recommendation without fully understanding it? Is the code reviewer accountable for not catching what the AI missed? Does responsibility fall on the organization for deploying these tools without proper governance? Should the vendor share liability for providing recommendations without sufficient context? Or is it the responsibility of everyone involved?
These questions mirror larger debates about AI accountability across all sectors. Kate Crawford’s research examines how AI systems often serve and intensify existing power structures, with design choices made by a small group affecting many. Her book Atlas of AI shows these systems aren’t neutral tools, but reflections of specific values and priorities.
Timnit Gebru’s work on algorithmic bias shows how limitations in training data can create measurable harm. Her groundbreaking Gender Shades study showed facial recognition systems were significantly less accurate at identifying certain groups because of over-representation of others. The same principle applies to code review – if AI models are trained on narrow slices of the programming world, they’ll be less effective when applied to different and wider contexts.
The Center for Human-Compatible AI, led by Stuart Russell, emphasizes that AI systems should maintain uncertainty about objectives rather than rigidly chasing goals. This applies directly to AI code review. Tools that are absolutely “certain” about their recommendations, without acknowledging where the training or reasoning might be limited, are more dangerous than those expressing appropriate uncertainty.
As AI code review tools become more widely adopted, vendors face growing ethical obligations to disclose model limitations and explain decision rationale.

Many AI code review systems offer limited visibility into how they prioritize issues or generate suggestions. Unlike rule-based static analysis tools that cite the specific standards they’re checking against, AI models often provide recommendations based on learned patterns without clear explanation. A developer who sees “this function could be refactored” won’t necessarily know whether that’s based on performance patterns, readability heuristics, or something else entirely.
This opacity makes it difficult to decide whether a suggestion is genuinely valuable or shows a misunderstanding of context. When users don’t understand a system or have visibility into its internal workings, this is known as a “black box”. Without transparency in AI code review systems, developer teams are essentially asked to trust this black box, which is nearly impossible without more information.
AI models trained on large code repositories can inherit biases from their training data, reinforcing certain programming conventions while missing framework-specific best practices.
If an AI code review tool is trained primarily on Python data science code, for example, it might suggest patterns optimized for notebook environments when reviewing production backend services, or recommend approaches that work for single-threaded scripts but cause problems in concurrent systems. This creates a hidden quality gap that teams may not recognize until after adoption.
Ethical AI code review requires action from both developers and businesses that make their tools. Teams need governance structures that ensure human oversight remains meaningful, and vendors need to commit to transparency to help teams make informed decisions.
Teams adopting AI code review tools need to build governance around them from day one. Waiting until something goes wrong to establish accountability is too late. The most effective teams treat AI recommendations as input that informs human decision-making. Core practices include:
Establishing ownership: Every AI recommendation needs a human reviewer accountable for the decision to merge. No code should ship based solely on automated approval.
Documenting decision trails: Maintain audit logs distinguishing AI suggestions from human approvals. When problems emerge, you need to understand what the AI recommended and why a human reviewer chose to accept it.
Setting clear policies: Clearly define when to use AI recommendations. Should they be used for routine style checks or are they trusted with critical security reviews? Establish guidelines for testing suggestions locally and handling conflicts between AI and team knowledge.
Encouraging critical evaluation: Train developers to question AI outputs rather than blindly accepting them. Create a culture where challenging tool recommendations is seen as good engineering practice, not as something that slows delivery.
Promoting ongoing dialogue: Use retrospectives to discuss tool limitations and effectiveness. What patterns has the AI missed? Where has it been particularly helpful? This calibrates trust and identifies gaps that others can look out for.
Tool vendors building AI code review systems carry ethical obligations. Vendors need to be transparent about how models make decisions, honest about limitations, and facilitate support for meaningful human oversight. Specifically, vendors should:
Provide explainable recommendations. Clarify why a change was suggested, not just what to change. Instead of “consider refactoring this function,” explain “this function has high cyclomatic complexity (17), which typically correlates with more defects” to give users more context on which to base their decision to reject or accept.
Offer contextual confidence scores. Help developers understand which recommendations need more scrutiny. Context like “high confidence based on 10,000+ similar contexts” versus “low confidence – limited training data for this framework” can make all the difference to users.
Enable customizable alignment. Let teams adapt tools to their priorities. Security-focused teams might prioritize vulnerability detection over style, whereas performance-critical applications can put efficiency above readability.
Adopt open standards. Support regulatory frameworks like the EU AI Act. Commit to third-party auditing of models and transparency about training data sources and limitations.

Automation (or a hybrid approach) doesn’t absolve humans of responsibility. It just shifts how that responsibility is managed. As AI code review tools become more capable, the need for clear accountability frameworks becomes more urgent and code provenance will gain traction.
Teams must establish ownership structures, document decisions, and maintain healthy skepticism toward automated recommendations. At the same time, vendors will also need to prioritize transparency, disclose limitations honestly, and support meaningful oversight.
Different approaches to code review offer different trade-offs. Rule-based static analysis tools like Qodana give you transparent, deterministic inspections where every finding cites a specific rule. AI-powered tools offer pattern recognition across vast repositories. Many teams use both approaches, taking advantage of the strengths of each. And, no doubt we will incorporate some AI technologies going forward, especially Qodana becomes part of a new JetBrains agentic platform, and we develop our code provenance features.
But today, the question isn’t whether to use automation in code review. It’s about how we build systems of accountability that ensure automated tools enhance rather than undermine code quality. Ethical automation isn’t just about compliance. It’s about building trust in the systems that shape our code and, ultimately, the software that shapes our world.
Every day in the hallways at Microsoft, I hear product teams discussing where agents are headed and how software is forever changed. Many of us come into the office more now, and I didn’t realize how much I missed the in-between moments where natural chat gives us energy—coffee and hot takes on the way to meetings and debating at a lunch no one scheduled, but somehow nobody wants to leave. The people who work on Microsoft Azure, Microsoft Foundry, and Microsoft Fabric care deeply about what they’re building—about how cloud and AI platforms can be better for those with hands on keyboards—it’s when we’re unscripted that some of our best insights surface. How could we bottle up this passion?
Today we’re introducing “The Shift” podcast, an evolution of “Leading the Shift,” to share more dialogue. Grounded in questions we heard from you after announcements at Ignite, we’re releasing eight episodes this spring—one each week—that bring engineering, product, and strategy perspectives together. Across levels and backgrounds, this season’s agentic theme explores agents up and down the stack. Knowing change is the only constant, “The Shift” creates space for us all to think out loud.
Agents don’t succeed in isolation. They depend on how your data is unified, how your cloud handles scale, how your applications orchestrate across systems, and ultimately, how this serves people. At Microsoft, we see agents as catalysts for innovation across your entire environment, performing best when layers of the stack work together. That’s where the toughest challenges for technical teams emerge: observability, governance, security, optimization, and quality. It’s a team sport.
Your data strategy determines what your agents can reason over. Your cloud foundation determines what you can do reliably. Your agents and AI app experiences deliver business outcomes. Our colleagues and friends featured on The Shift are solving for these interdependencies. And what they all have in common is conviction that none of this works in pieces.
Our first episode, “Are my agents hunting for data?” drops tomorrow. We’ll sit with Ronald Chang, Dipti Borkar, Josh Caplan, and Cillian Mitchell from the Microsoft Fabric and Microsoft OneLake teams to cover why data preparation is essential to fueling agents with knowledge. And it’s perfect timing with Microsoft Fabric Community Conference next week in Atlanta. I hope you’ll join us to keep this conversation going.
Subscribe today on YouTube, Spotify, Apple Podcasts, Amazon Music, RSS.com, or wherever you listen and learn.
The post Unpacking your top questions on agentic AI: The Shift podcast appeared first on Microsoft Azure Blog.
For this week's coaching conversation, Junaid brings a challenge that resonates well beyond any single team: dealing with uncertainty. He references the World Uncertainty Index report from February 2026, which showed the highest levels of global uncertainty ever recorded — surpassing both the COVID pandemic and the 2008 financial crisis.
This uncertainty doesn't stay at the geopolitical level. It seeps into teams. People show up stressed, unsure about what the next month or three months will bring. As Scrum Masters, we need to be cognizant of where our team members are coming from.
Vasco adds an important layer: uncertainty operates at multiple levels within organizations. A colleague you depend on might be out sick for two weeks. A supplier might not deliver on time. Every dependency is a source of uncertainty. The question becomes: what in our processes is designed to accept and adapt to that uncertainty?
Junaid's answer is powerful in its simplicity: Scrum's rhythm. The sprint, the planning, the daily, the retrospective — these events at a defined cadence create internal predictability. "When you have a rhythm, when you have a known sequence of events in front of you, that takes away a lot of uncertainty."
Vasco builds on this: Scrum creates a boundary — the sprint — that accepts uncertainty outside while reducing it inside. Internal versus external predictability. Inside the sprint, the team can fail in small ways without exposing every failure to the outside. Compare that with traditional project planning, where every task on the critical path has external visibility and impact.
For practical tools, Junaid shares how he used the Eisenhower matrix with a team to convert uncertainty into actionable priorities. They listed all activities from recent sprints, plotted them on the matrix, and found they could delegate or deprioritize 20-25% of their work. That freed them to focus with certainty on the remaining 75%. Combined with timeboxing as an uncertainty management mechanism, teams can create pockets of predictability even in turbulent times.
[The Scrum Master Toolbox Podcast Recommends]
Angela thought she was just there to coach a team. But now, she's caught in the middle of a corporate espionage drama that could make or break the future of digital banking. Can she help the team regain their mojo and outwit their rivals, or will the competition crush their ambitions? As alliances shift and the pressure builds, one thing becomes clear: this isn't just about the product—it's about the people.
🚨 Will Angela's coaching be enough? Find out in Shift: From Product to People—the gripping story of high-stakes innovation and corporate intrigue.
[The Scrum Master Toolbox Podcast Recommends]
About Junaid Shaikh
Junaid Shaikh is an energetic Agile Coach with a natural flair for Agile and Scrum, shaped by recent experiences at software giants like Ericsson and hardware leaders ABB. In his work, he champions collaboration, curiosity, and continuous improvement. Beyond coaching, he brings the same passion to cricket, table tennis, carrom, and his newest sporting obsession — padel. You can link with Junaid Shaikh on LinkedIn.