Microsoft showed off a "consistent" Xbox UI across handhelds, consoles, and cloud gaming during its Xbox keynote at the Game Developers Conference in March. At the time it was difficult to see if there was anything new about the UI from the videos and photos captured during the event, but Microsoft has now given us a closer look at it thanks to a new video of the keynote that was published earlier today.
Jason Ronald, VP of next generation at Xbox, showed off the UI while mentioning that players had been noticing "a lot of fragmentation within the experience" across devices, and an overall lack of consistency. "What the team has been doing …
You’ve probably already approved one without realizing it. The tests passed. The code was clean. You merged it.
But it was agent-generated—and that ease of approval is exactly the problem.
A January 2026 study, “More Code, Less Reuse”, found that agent-generated code introduces more redundancy and more technical debt per change than human-written code. The surface looks clean. The debt is quiet. And reviewers, according to the same research, actually feel better about approving it.
This isn’t an argument to slow down. It’s an argument to be intentional. There’s a difference.
Agent pull requests are already saturating review bandwidth
The volume is already staggering. GitHub Copilot code review has processed over 60 million reviews, growing 10x in less than a year. More than one in five code reviews on GitHub now involve an agent. That’s just the automated review pass. The pull request themselves are multiplying faster than reviewers can handle.
The traditional loop—request review, wait for code owner, merge—breaks down when one developer can kick off a dozen agent sessions before lunch. Throughput has scaled exponentially. Human review capacity hasn’t. The gap is widening.
You’re going to review agent pull requests. The question is whether you’ll catch what matters when you do.
Who (or what) actually wrote this pull request
Before you look at a single line of diff, you need a model for what you’re reviewing.
A coding agent is a productive, literal, pattern-following contributor with zero context about your incident history, your team’s edge case lore, or the operational constraints that don’t live in the repository. It will produce code that looks complete. But that “looks complete” failure mode is dangerous.
You’re the one who carries that context. That’s not a burden. It’s the actual job. The part of review that doesn’t get automated is judgment, and judgment requires context only you have.
Now, back to reviewers. The pull request lands in your queue. The author did their part. Here’s what to watch for.
Red flags to watch for
1. CI gaming
Agents fail CI. When they do, they have an obvious path to get tests passing: remove the tests, skip the lint step, add || true to test commands. Some agents take it.
Any change that weakens CI is a blocker. Full stop. Before approving any agent pull request, check:
Did coverage thresholds change?
Were any tests removed, renamed, or marked as skipped?
Did the workflow stop running on forks or pull requests?
Are any CI steps now gated behind conditions they weren’t before?
Yes, to any of those means you need an explicit justification before you continue.
2. Code reuse blindness
This is the highest-ROI thing you can do as a reviewer. Agents look for prior art. They’ll find a pattern in the codebase and replicate it, often without checking whether a utility that already does the same thing exists somewhere else. The symptoms: new utility functions that duplicate existing ones with slightly different names, validation logic reimplemented in multiple places, middleware written from scratch that already lives in a shared module, helpers that are “almost the same” but with different names.
The agent’s local context doesn’t include the full picture of what exists across your repository. You do.
For every new helper or utility in an agent pull request, do a quick search. If you find an equivalent, don’t leave a comment. Require consolidation before merge. The cost of leaving duplicated logic is that agents will find it as prior art and replicate it further.
💡Pro tip: Require justification for adding new utilities in agent pull requests above a size threshold. This catches the duplication problem early.
3. Hallucinated correctness
The obvious hallucination (calling an API that doesn’t exist, referencing a variable out of scope) gets caught in CI. The dangerous one is subtler: code that compiles, passes every test, and is wrong.
Off-by-one errors in pagination. Missing permission checks on a branch that’s never hit in tests. Validation that short-circuits under an edge case the agent never considered. Wrong behavior under a race condition that only surfaces at scale.
Trace it, don’t just scan it. Pick the most critical path in the diff. Follow it from input through every transform to output. Check boundary conditions (zero, max, empty), missing validation on external values, permission checks on every branch, and surprising conditional logic.
Require a new test that fails on the pre-change behavior. If the agent can’t write a test that would have caught the bug it claims to fix, the fix is incomplete or the understanding is wrong.
4. Agentic ghosting
You leave a thorough review. You explain the issue, provide context, suggest a direction. The pull request goes quiet. Or the agent responds and misses the point entirely and runs in circles. You invest another round. Still nothing useful.
Larger pull requests with no structured plan correlate strongly with agent abandonment or misalignment. The larger and less scoped the pull request, the more likely you’re going to sink review time into something that goes nowhere.
Before you invest deep review on a large agent pull request check the pull request history. Has it been responsive in previous rounds? Does it have a clear implementation plan, or did the agent just start writing code?
If there’s no plan, request a breakdown before you write a single comment. Copy-paste version:
“This pull request is too large for me to review without a clearer implementation plan. Can you break it into smaller scoped units, or add a summary of what each part does and why it’s structured this way? Happy to review after that.“
Firm, short, not personal. And it saves you an hour.
5. Untrusted input in workflows
Prompt injection in CI agents is real and underappreciated. Here’s the pattern: an agent workflow reads content from a pull request body, an issue, or a commit message. That content gets interpolated into a prompt. The prompt goes to a model. The model output gets piped to a shell command. The whole thing runs with GITHUB_TOKEN permissions.
When you’re reviewing any workflow that calls an LLM, these are blockers:
Is untrusted user input, pull requestbodies, issue bodies, commit messages, being interpolated into prompts without sanitization?
Is GITHUB_TOKEN write-scoped when it only needs read access?
Is model output being executed as shell commands without validation?
Are secrets accessible to the agent step or being printed to logs?
What to require before merge: least-privilege permissions in the workflow YAML (permissions: read-all is a reasonable default), sanitize and quote untrusted content before it touches a prompt, separate the “analysis” step from the “execution” step with a human approval gate for anything touching production, never eval model output.
Time
Step
What to do
1–2 min
Scan and classify
Look at the file list and diff size. Narrow task (docs, CI, small change) or complex (multi-file, logic, performance, tests)? That classification sets your review depth for everything that follows.
2–3 min
Check CI changes first
Before reading a single line of app code, look at anything touching .github/workflows, test configs, coverage settings, or build scripts. Flag anything that weakens CI. Stop sign check.
3–5 min
Scan for new utilities
Search for new functions, helpers, or modules. For each one, do a quick repo search to check for duplicates. Flag anything that reinvents existing functionality.
5–8 min
Trace one critical path
Pick the most important logic change. Trace it end-to-end: input → transforms → output. Check boundary conditions, permissions, unexpected branching. This is the step you can’t skip.
8–9 min
Security boundaries
If this PULL REQUEST touches any workflow that calls an LLM or handles untrusted input, run through the security checklist above.
9–10 min
Require evidence
For any non-trivial logic change, require a test that fails on the pre-change behavior. No rollback plan for risky changes? Ask for one.
When to request a smaller pull request:
The diff touches more than five unrelated files
You can’t describe the purpose of the pull request in one sentence
The agent has no implementation plan or the pull request body is empty
CI is failing and the only changes in the diff are to test files
Let Copilot review it first
Use automated review for what it’s good at: catching the mechanical stuff before a human has to. Copilot code review flags style inconsistencies, obvious logic errors, missing error handling, and type mismatches. It handles the low-level scan. That frees you up for the judgment work, which is where your time actually matters.
Treat it as a prerequisite, not a replacement. Let Copilot run first. If it catches something obvious, let the author address it before you invest your review time.
You can tune this with custom instructions specific to your team: flag anything that modifies CI thresholds, surface new utilities for deduplication review, check that every external input is validated. The more specific your instructions, the more useful the automated pass.
💡 Pro tip: I recently experimented with codifying my own review checklist using the Copilot SDK. Instead of remembering to run the same security checks on every pull request, I built a workflow that takes my personal checklist—auth on admin endpoints, tests actually running, safe env variable handling—and runs it against the diff automatically. If it finds critical issues, it blocks the merge.
Judgment is the bottleneck, and that’s fine
The surface area of code is growing. pull request volume is growing. The time you spend scanning boilerplate should shrink.
What doesn’t shrink is the context you carry. The things you know about your system that aren’t written down anywhere. That’s what makes your review valuable, and it’s the part that doesn’t get automated.
Three takeaways:
Any CI weakening is a hard stop.
Let the agents scan first. You trace the critical path.
Red flag checklist as your default on complex agent pull requests.
I am talking to a number of folks about documenting their MCP servers. Others about discovering them. Others about governing them. Generally, we are mostly talking about being able to just see the MCP wave of API expansion that has occurred across your average enterprises. This expansion phase isn’t much different than previous waves of REST, GraphQL, gRPC, Websockets, and Kafka expansions–it just happened faster and wider that most of those.
I published a prototype API docs that generates documentation for API, MCP, and Agent Skills side by side last week. I called the POC, “See”. I am a big fan of “seeing APIs” and “seeing integrations”. I’ve been doing about an hour of research a week into what is happening when it comes to MCP documentation, discovery, and other goings on with MCP at the core and the edges. There isn’t a lot of service or tooling for the seeing of MCP that is required for governance of APIs, and general attitudes seem to be that AI will do the seeing for us.
While I am not convinced that what has helped us find and see APIs historically will translate into helping us see the next generation of APIs, but I am also not convinced that AI will help us discover and see all of our APIs, and the skills, SDKs, and clients needed to engage with those APIs. I want to clarify here–MCP is an API. I see Microsoft, Google, and others going all in on their developer education being delivered via MCP, and I suspect more of the resources that occurs within developers will be shifted to be available via MCP, with as much of the activity as we can will be driven by skills.
I don’t think we will be able see what we need to see in an API-powered chat interface. And since agent’s don’t see, I know they won’t be able to see everything we need. They will help see a lot of what exists in the cracks and shadows that we couldn’t see historically with APIs, but there will be entirely new blind spots to wrestle with. I am finding some really interesting ways of seeing APIs and their properties at scale using machine-readable artifacts, which is enabled using Claude, ChatGPT, and Gemini. I will keep pushing forward automation to help discover and document APIs, as well as visualizations that help us see all of that. I am most interested in doing it in ephemeral and evolving ways, rather than the static or even dynamic ways we’ve done historically.
Seeing APIs is a massively unsolved problems. We just expanded that problem 1000x with AI and MCP. Just as we were beginning to get a hadle on the governance of HTTP APIs, we’ve expanded our API sprawl using GraphQL, Kafka, gRPC, and now MCP. There is so much more to see. There is so much more work to be done. With most people’s strategy that AI will sort it out for us. I hope y’all are right.
Before feeds, before algorithms, there was the Class of 1996: websites & organizations founded (or expanded) in 1996, like the Internet Archive.
On the occasion of the Internet Archive’s 30th anniversary, we’re opening the internet’s yearbook to celebrate the sites, services & scrappy experiments that helped shape the web as we know it. From class leaders like Center for Democracy and Technology to cultural icons like The Onion to the archivists making sure none of it disappears, this is a reunion worth attending.
Some are still thriving. Some have changed beyond recognition. Some are already gone. All of them remind us: the early web wasn’t just built, it was lived in.
THE MORE YOU KNOW: Did you know that some publishers are blocking the Wayback Machine from archiving their sites, putting decades of reporting and cultural history at risk of disappearing from the public record? If the web’s past matters — and the Class of 1996 reminds us that it does — now is the time to speak up. Add your name to the petition calling on publishers to stop blocking the Wayback Machine and help ensure the internet’s history remains accessible for future generations.
Class of 1996
Class President — Center for Democracy and Technology
The Center for Democracy and Technology didn’t just show up—they helped write the rules of the internet. And 30 years later, they’re still fighting to keep it open.
World Passkey Day is a chance to reflect on progress toward a shared goal: reducing our reliance on passwords and other phishable authentication methods by accelerating passkey adoption. As cyberattacks become more automated and AI-powered, each account is only as secure as its weakest credential. Real progress requires more than adding stronger sign-in options—it requires removing phishable credentials and strengthening common attack paths like recovery flows. In partnership with the FIDO Alliance, Microsoft is committed to advancing passkey adoption through ongoing standards work, active participation in working groups, and other contributions to a passwordless future.
Passwords remain a major source of risk; they’re difficult to manage and easy to steal. Along with weaker forms of multifactor authentication, they’re also highly vulnerable to phishing: AI-powered campaigns drive click-through rates as high as 54%.1 In response, Microsoft is expanding passkey adoption across our ecosystem. We’re reducing reliance on legacy authentication and strengthening account recovery so it won’t become a backdoor for cyberattackers.
“Instead of vulnerable secrets or potentially identifiable personal information, a passkey uses a private key stored safely on the user’s device. It only works on the website or app for which the user created it, and only if that same user unlocks it with their biometrics or PIN. This means passkey users can’t be tricked into signing in to a malicious lookalike website, and a passkey is unusable unless the user is present and consenting. These are some qualities that make passkeys a ‘phishing-resistant’ form of authentication.”
Passkey adoption is accelerating: FIDO Alliance estimates 5 billion passkeys already in use worldwide.2 Across Microsoft’s consumer services, including OneDrive, Xbox, and Copilot, hundreds of millions of users sign in with passkeys every day.
There are many reasons to choose passkeys as the standard authentication method over passwords. Sign-in success rates are significantly higher than with passwords, and exposure to credential-based attacks is significantly lower.3 Organizations and individual users alike prefer the simpler, more secure sign-in experience passkeys offer.4
Inside Microsoft, we’ve eliminated weaker authentication methods and rolled out phishing-resistant authentication, covering 99.6% of users and devices in our environment.5 It’s made signing in a lot simpler: no codes to enter, no extra prompts to manage, just a straightforward experience for everyone.
Product updates across sign-in and recovery
Across Microsoft, we’ve been steadily building passkey support into every layer of the identity experience from consumer accounts to enterprise access with Microsoft Entra, and from device-based authentication like Windows Hello to Microsoft’s password manager. This work ensures people can create and use passkeys wherever they sign in, with a consistent, phishing-resistant experience across devices, apps, and environments.
To make passkeys more accessible, we’re expanding where and how people can use them:
Synced passkeys and passkey profiles in Microsoft Entra ID make it easier to scale passwordless sign-in across diverse environments. We’re expanding flexibility in cloud passkey management, including support for larger and more complex policies, and transitioning tenants to a unified passkey profile model.
Entra passkeys on Windows make it simple for users to create and use device-bound passkeys directly on personal or unmanaged Windows devices using Windows Hello, and will be generally available in late May 2026.
Passkeys for Microsoft Entra External ID will be generally available late May 2026, so your customer-facing applications can offer a more seamless, consumer-grade sign-in experience.
Passkey-preferred authentication in Microsoft Entra ID (preview) detects registered methods and prompts the strongest one first. If a passkey is registered, that’s what the user sees—immediately.
On the consumer side, with Microsoft Password Manager, users can now save and sync passkeys across devices signed in with their Microsoft account, with support for iOS and Android rolling out soon through Microsoft Edge.
Account recovery also plays a critical role in maintaining the integrity of identity systems. Historically, it’s been vulnerable to cyberattackers who try to hijack the recovery process, for example by impersonating legitimate users and requesting new credentials.
Microsoft Entra ID account recovery, generally available today, strengthens security for recovery flows by enabling users to regain access to their accounts through a robust identity verification process. Users can regain access after losing all authentication methods by using government-issued ID and biometric face checks. At general availability, we are expanding our identity verification ecosystem with two new partners—1Kosmos and CLEAR1—joining our existing partners Au10tix, IDEMIA, and TrueCredential.
Removing phishable credentials from user accounts
Strengthening authentication is important, but reducing risk means eliminating phishable credentials entirely. Microsoft is continuing to phase out legacy methods and move users toward phishing-resistant authentication. Starting in January 2027, security questions will be removed as a password reset option in Microsoft Entra ID due to their susceptibility to guessing and social engineering.
The rationale is straightforward: improving strong methods while removing weak ones shrinks the attack surface. This is increasingly urgent as AI agents act on behalf of users. If an identity is compromised, cyberattackers can leverage those agents to access systems, execute workflows, and operate within existing permissions. Organizations need to address this risk quickly.
A more secure and usable future
Last year, Microsoft joined dozens of organizations in taking the Passkey Pledge, a commitment to accelerating the adoption of phishing-resistant authentication and to moving beyond passwords. Since then, we’ve seen meaningful progress, from hundreds of millions of better-protected consumer accounts to large-scale deployments across organizations like our own.
What once felt like a long-term shift is finally gaining real momentum: authentication is becoming simpler, safer, and passwordless.
For a more in-depth perspective on how cyberattackers try to bypass authentication through fallback methods and recovery flows—and how to address those gaps—read our companion post.
Getting started
Organizations that want to strengthen their identity security posture can enable passkeys for their users and extend policy protections across both sign-in and recovery scenarios.
To learn more about Microsoft Security solutions, visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us on LinkedIn (Microsoft Security) and X (@MSFTSecurity) for the latest news and updates on cybersecurity.
If you're building with Azure Cosmos DB — or thinking about it — check out Cosmos Conf 2026 on demand content. Real engineers shared what actually works in production: how they design for scale, control cost, and avoid the pitfalls that don’t show up in docs.
A few of the moments in this recap: • AI doesn't remove the need for reliability, security, and performance — it raises the bar • Supporting vibe-coded apps that go viral overnight • Using Copilot to review and improve your Azure Cosmos DB data model live • Memory as an architectural decision, not a checkbox — driving cost, recall quality, and UX • Built for AI: DiskANN vector search, hybrid + full-text, zero to millions of RU/s, zero bytes to petabytes • Time-travel your data to reconstruct the past and replay new scenarios • How one team rescued an AI system that was costing way too much money
The conversation doesn't end here. Explore the full on-demand library and we'll see you next year.