Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
150019 stories
·
33 followers

Accelerating .NET MAUI Development with AI Agents

1 Share

This is a guest blog from Syncfusion. Learn more about the free, open-source Syncfusion Toolkit for .NET MAUI.

As a proud partner with the .NET MAUI team, Syncfusion is excited to share how custom-built AI agents are dramatically improving the development workflow and contributor experience for the entire .NET MAUI community.

The Traditional Contributor Challenge

Contributing to .NET MAUI has historically required significant time investment for even straightforward bug fixes. Our team identified key bottlenecks in the contribution workflow:

  • Issue Reproduction – Setting up the Sandbox app and reproducing platform-specific issues (30-60 minutes)
  • Root Cause Analysis – Debugging across multiple platforms and handlers (1-3 hours)
  • Fix Implementation – Writing and testing the fix (30-120 minutes)
  • Test Creation – Developing comprehensive test coverage (1-2 hours)

For community contributors new to the repository, this could easily extend to days of effort, creating a significant barrier to entry.

Our Solution: Custom-Built AI Agents and Skills for .NET MAUI

The .NET MAUI team has developed a suite of specialized agents and skills that work together to streamline the entire contribution lifecycle. Syncfusion’s team has been leveraging these to dramatically accelerate our .NET MAUI contributions.

pr-review skill: Intelligent Issue Resolution with Built-In Quality Assurance

The pr-review skill implements a systematic 4-phase workflow that handles the complete pull request lifecycle:

Phase 1: Pre-Flight Analysis

The skill begins by conducting a comprehensive issue analysis:

  • Reads the GitHub issue and extracts reproduction steps
  • Analyzes the codebase to understand affected components
  • Identifies platform-specific considerations (Android, iOS, Windows, Mac Catalyst)

Phase 2: Gate – Test Verification

Before any fix is attempted, the skill verifies that tests exist and correctly catch the issues:

  • Checks if tests exist for the issue/PR
  • If tests are missing, notifies the user to create them first using write-tests-agent
  • Validates that existing tests actually fail without a fix (proving they catch the bug)

Note

The recommended workflow is to use write-tests-agent first to create tests, then use the pr-review skill to verify and work on the fix.

Phase 3: Try-Fix – Multi-Attempt Problem Solving

This is where the skill’s intelligence shines. Using the try-fix skill with 4 AI models, the skill:

  • Proposes independent fix approaches – Up to 4 different strategies, each taking a unique angle
  • Applies and tests empirically – Runs the test suite after each fix attempt
  • Records detailed results for comparison

Example try-fix workflow:

Attempt 1: Handler-level fix in CollectionViewHandler → Tests pass on iOS, fail on Android

Attempt 2: Platform-specific fix in Items2 → Tests pass on all platforms, but causes regression

Attempt 3: Core control fix with platform guards → All tests pass, no regressions

Attempt 3 selected as optimal solution

Phase 4: Report Generation

The skill produces a comprehensive summary including:

  • Fix description and approach rationale
  • Test results (before/after comparison)
  • Alternative approaches attempted and why they weren’t selected
  • Recommendation (approve PR or request changes)

write-tests-agent: Intelligent Test Strategy Selection

The write-tests-agent acts as a test strategist that determines the optimal testing approach for each scenario.

Multi-Strategy Test Creation

The agent analyzes the issue and selects appropriate test types:

For UI Interaction Bugs:

  • Invokes write-ui-tests skill
  • Creates Appium-based tests in TestCases.HostApp and TestCases.Shared.Tests
  • Adds proper AutomationId attributes for element location
  • Implements platform-appropriate assertions

For XAML Parsing and Compilation Bugs:

  • Invokes write-xaml-tests skill
  • Creates tests in Controls.Xaml.UnitTests project
  • Tests XAML parsing, XamlC compilation, and source generation
  • Validates markup extensions and binding syntax
  • Tests across all three XAML inflators (Runtime, XamlC, SourceGen)

Future Test Types:

  • Unit Tests (API behavior, logic, calculations)
  • Device Tests (platform-specific API testing)
  • Integration Tests (end-to-end scenarios)

Test Verification: Fail → Pass Validation

A critical feature of write-tests-agent is its use of the verify-tests-fail-without-fix skill:

Mode 1: Verify Failure Only (Test Creation — no fix yet)

Use when writing tests before a fix exists:

  1. Run tests against the current codebase (which still has the bug)
  2. Verify tests FAIL (proving they correctly detect the bug)
  3. ✓ Tests confirmed to reproduce the issue

No files are reverted or modified. This is a single test run that validates your tests actually catch the bug.

Mode 2: Full Verification (Fix + Test Validation)

Use when a PR contains both a fix and tests:

  1. Revert fix files to pre-fix state (test files remain unchanged throughout)
  2. Run tests → Should FAIL (bug is present without fix)
  3. Restore fix files
  4. Run tests → Should PASS (fix resolves bug)
  5. ✓ Both tests and fix verified

This verification step ensures test quality — we avoid the common problem of tests that pass regardless of whether the bug is fixed.

Comprehensive Coverage Through Multiple Test Types

When appropriate, write-tests-agent creates layered test coverage:

Example: CollectionView Scrolling Bug

  • UI Test – Appium test that scrolls and verifies visual positioning
  • XAML Test – Validates that the ItemTemplate XAML compiles correctly across all inflators (Runtime, XamlC, SourceGen)

This dual-layer approach provides both behavioral validation (does the scrolling work?) and structural validation (does the XAML compile correctly?).

Note

As more test type skills are added (device tests, unit tests), the agent will be able to provide even more comprehensive coverage across different levels of the stack.

sandbox-agent: Manual Testing and Validation

The sandbox-agent complements automated testing with manual validation capabilities:

  • Creates test scenarios in the Controls.Sample.Sandbox app
  • Builds and deploys to iOS simulators, Android emulators, or Mac Catalyst
  • Generates Appium test scripts for automated interaction

When to use sandbox-agent:

  • Functional validation of PR fixes before merge
  • Reproducing complex user-reported issues
  • Visual verification of layout and rendering bugs
  • Testing scenarios that are difficult to automate

learn-from-pr agent: Continuous Improvement

The learn-from-pr agent analyzes completed PRs to extract lessons learned and applies improvements to instruction files, skills, and documentation — creating a feedback loop that makes the entire system smarter over time.

How to Use These Tools: Prompt Examples

Using the pr-review skill to Fix an Issue

When you want to create a fix for a GitHub issue, use the pr-review skill to guide you through the entire workflow.

Tip

These prompts are typed directly in the GitHub Copilot CLI terminal while inside the cloned .NET MAUI repository. The skill reads your local repository files, runs builds and tests on your machine, and interacts with GitHub APIs.

Basic fix invocation:

Fix issue #67890

With additional context for complex scenarios:

Fix issue #67890. The issue appears related to async lifecycle events
during CollectionView item recycling. Previous attempts may have failed
because they didn't account for view recycling on Android.

For alternative fix exploration:

Fix issue #67890. Try a handler-level approach first. If that doesn't work,
consider modifying the core control with platform guards.

The skill will:

  • Analyze the issue and codebase (Pre-Flight)
  • Check if tests exist; if not, notify you to create them with write-tests-agent (Gate)
  • Verify tests fail without fix and pass with fix (Validation)
  • Try up to 4 different fix approaches across 4 AI models (Try-Fix)

Using the pr-review skill to Review a Pull Request

When reviewing an existing PR (yours or someone else’s), use the pr-review skill to validate the fix and ensure quality:

Basic PR review:

Review PR #12345

With focus areas:

Review PR #12345. Focus on thread safety in the async handlers
and ensure Android platform-specific code follows conventions.

For test coverage validation:

Review PR #12345. Verify that the tests actually reproduce the bug
and cover all affected platforms (iOS and Mac Catalyst).

The skill will:

  • Analyze the PR changes and linked issue
  • Check if tests exist; if not, notify you to create them with write-tests-agent (Gate)
  • Verify tests fail without the fix and pass with it (Validation)
  • Provide a detailed review report

Important

Fix issue #XXXXX creates a new fix from scratch. Review PR #XXXXX validates and improves an existing PR. The skill adapts its workflow based on whether you’re creating or reviewing.

Writing Tests with write-tests-agent

Simple invocation:

Write tests for issue #12345

Specifying test type:

Write UI tests for issue #12345 that reproduce the button click behavior.

The agent analyzes the issue, selects appropriate test types, and creates comprehensive coverage. If you provide hints about reproduction steps or failure conditions, it incorporates them into the test strategy.

Testing with sandbox-agent

Basic testing:

Test PR #12345 in Sandbox

Platform-specific testing:

Test PR #12345 on iOS 18.5. Focus on the layout changes in SafeArea handling.

Reproducing user-reported issues:

Reproduce issue #12345 in Sandbox on Android. The user reported it happens
when rotating the device while a dialog is open.

Multi-Model Architecture for Quality Assurance

The pr-review skill leverages 4 AI models sequentially in Phase 3 (Try-Fix) to provide comprehensive solution exploration:

Order Model Purpose
1 Claude Opus 4.6 First fix attempt – deep analysis and reasoning
2 Claude Sonnet 4.6 Second attempt – balanced speed and quality
3 GPT-5.3-Codex Third attempt – code-specialized model
4 Gemini 3 Pro Preview Fourth attempt – different model family perspective

Why sequential, not parallel?

  • Only one Appium session can control a device or emulator at a time; parallel runs would interfere with each other’s test execution
  • All try-fix runs modify the same source files — simultaneous changes would overwrite each other’s code and corrupt the working tree
  • Each model runs in a completely separate context with zero visibility into what other models are doing, ensuring every fix attempt is genuinely independent and uninfluenced
  • Before each new model starts, a mandatory cleanup restores the working tree to a clean state — reverting any files the previous attempt modified, ensuring every model begins from an identical baseline

Cross-Pollination Rounds:

The 4 models don’t just run once — they participate in multiple rounds of cross-pollination:

  1. Round 1: Each model independently proposes and tests one fix approach (4 attempts total)
  2. Round 2: Each model reviews all Round 1 results and decides:
    • “NO NEW IDEAS” — Confirms exploration is exhausted for this model
    • “NEW IDEA: [description]” — Proposes a new approach that hasn’t been tried
  3. Round 3 (if needed): Repeat until all 4 models confirm “NO NEW IDEAS” (max 3 rounds)

This ensures comprehensive exploration — models see what failed, why it failed, and what succeeded, allowing them to propose fundamentally different approaches.

This multi-model approach ensures:

  • Diverse solution exploration — Each model brings different problem-solving patterns
  • Comprehensive fix coverage — 4 independent attempts with different AI architectures
  • Learning from failures — Later models see why earlier attempts failed
  • Reduced hallucination — Multiple models must independently solve the problem
  • Best fix selection — Data-driven comparison across 4 different approaches

The try-fix skill benefits most from this architecture — each model proposes an independent fix, tests it empirically, and records detailed results for comparison.

Measurable Impact on Team Productivity

Since implementing these agents, we’ve observed significant improvements across our team:

Task Before (Manual) After (Agents) Time Saved
Issue reproduction 30-60 min 5-10 min ~50 min
Root cause analysis 1-3 hours 20-40 min ~1.5 hours
Implementing fix 30-120 min Automated ~1 hour
Writing tests 1-2 hours 10-20 min ~1.5 hours
Exploring alternatives ❌ Not feasible ✅ Built-in Priceless
Total per issue 4-8 hours 45 min – 2.5 hours ~4-5 hours

That’s a 50-70% time reduction per issue. Our team can now address 2-3x more issues per week while maintaining higher quality standards.

Quality Improvements

Beyond time savings, we’ve seen measurable quality improvements:

  • Test Coverage: 95%+ of PRs now include comprehensive test coverage (up from ~60%)
  • First-Time Fix Rate: 80% of fixes work correctly on first attempt (up from ~50%)
  • Code Review Cycles: Reduced back-and-forth during review

The Skills Ecosystem: Composable Capabilities

These agents are built on a foundation of reusable skills — modular capabilities that can be composed together for different workflows.

Core Skills

try-fix

  • Proposes ONE independent fix approach per invocation
  • Applies fix, runs tests, captures results
  • Records failure analysis for learning
  • Iterated up to 3 times per model if errors occur

write-ui-tests

  • Creates test pages in TestCases.HostApp/Issues/
  • Generates Appium tests in TestCases.Shared.Tests/Tests/Issues/
  • Adds AutomationIds for element location
  • Implements platform-appropriate assertions

write-xaml-tests

  • Creates XAML test files in Controls.Xaml.UnitTests/Issues/
  • Tests across Runtime, XamlC, and SourceGen inflators
  • Validates XAML parsing, compilation, and code generation
  • Handles special file extensions (.rt.xaml, .rtsg.xaml) for invalid code generation cases

verify-tests-fail-without-fix

  • Mode 1 (Failure Only): Run tests once to verify they FAIL, proving they catch the bug
  • Mode 2 (Full Verification): Revert fix files → tests FAIL → restore fix → tests PASS
  • Test files are never reverted — only fix files are manipulated
  • Ensures test quality by proving tests detect the bug

Supporting Skills

azdo-build-investigator

  • Queries Azure DevOps for PR build information
  • Retrieves failed job details
  • Downloads Helix test logs for investigation
  • Identifies build failures and test failures

run-device-tests

  • Executes device tests locally on iOS/Android/Mac Catalyst
  • Supports test filtering by category
  • Manages device/simulator lifecycle
  • Captures test results and logs

pr-finalize

  • Verifies PR title and description match implementation
  • Performs final code review for best practices
  • Used before merging to ensure quality and documentation

Why Skills Matter

Skills provide:

  • Reusability — Same skill used across multiple agents and workflows
  • Testability — Each skill can be tested and improved independently
  • Composability — Agents combine skills to create complex workflows

Impact on Open Source Community

These agents aren’t just improving our internal team productivity — they’re transforming the contributor experience for the entire .NET MAUI community.

Lowering the Barrier to Entry

Before: New contributors faced a steep learning curve:

  • Understanding multi-platform handler architecture
  • Knowing which test type is appropriate
  • Following undocumented platform-specific conventions
  • Navigating complex build and test infrastructure

Now: Agents automatically:

  • Generate platform-appropriate code patterns
  • Select and create correct test types
  • Follow repository conventions automatically
  • Handle build and test infrastructure complexity

Improving Contribution Quality

Every PR now benefits from:

  • ✅ Comprehensive test coverage — Multiple test types covering different scenarios
  • ✅ Alternative fix exploration — Data-driven comparison of approaches
  • ✅ Automated code review — Catches common issues before human review

Accelerating the Contribution Cycle

Maintainer perspective:

  • Fewer back-and-forth review cycles
  • Less time requesting test coverage
  • Reduced need to explain platform-specific conventions
  • Higher confidence in community PRs

Contributor perspective:

  • Faster feedback through automated validation
  • Clear guidance when fixes don’t work
  • Learning repository best practices through agent interactions
  • Greater confidence in submitting PRs

Getting Started as a Contributor

We encourage the community to leverage these agents when contributing to .NET MAUI.

Step 1: Set Up GitHub Copilot CLI

Install GitHub Copilot CLI and authenticate it. See the GitHub Copilot CLI Documentation for setup instructions.

Step 2: Find an Issue to Work On

Browse our issue tracker for contribution opportunities:

Step 3: Use the Agents and Skills

For issue fixes:

First, write tests:

Write tests for issue #12345

Then, implement the fix:

Fix issue #12345

For PR review and improvement:

Review PR #12345

The workflow:

  1. write-tests-agent creates tests (UI, XAML) and verifies they catch the bug
  2. pr-review skill verifies tests exist, explores fix alternatives, compares approaches
  3. Human reviews and refines the output

Note

If you run “Fix issue #12345” without tests, the pr-review skill will notify you to create them first using write-tests-agent.

Step 4: Review and Refine

The agents produce high-quality output, but human review is essential:

  • Verify the fix addresses the root cause
  • Check that tests cover edge cases
  • Ensure code follows .NET MAUI conventions
  • Add additional context where needed

Step 5: Submit Your PR

With agent assistance, your PR will typically include:

  • Working fix with clear rationale
  • Comprehensive test coverage
  • Proper commit messages and PR description
  • Validation that tests prove the fix works

This significantly increases merge rates and reduces review cycles.

Hypothetical Example: From Issue to Merged PR

Let’s walk through a typical contribution workflow with agents:

Issue #12345: CollectionView items disappear on iOS when scrolling rapidly

Traditional Workflow (4-6 hours):

  1. Set up Sandbox app with CollectionView (30 min)
  2. Try to reproduce on iOS simulator (45 min)
  3. Debug handler code to find root cause (2 hours)
  4. Implement fix in Items2/iOS/ (1 hour)

Agent-Assisted Workflow:

Step 1: Create tests first

Write tests for issue #12345

Step 2: Fix the issue

Fix issue #12345

The pr-review skill executes:

  • Pre-Flight (5-10 min) — Reads issue, identifies iOS-specific CollectionView scrolling bug, analyzes Items2/iOS/ handler code, identifies potential causes: view recycling, async loading
  • Gate — Verifies tests from Step 1 catch the bug
  • Try-Fix (20-40 min) — Tries up to 4 fix approaches across 4 models, tests each empirically
  • Report — Compares all approaches, selects optimal solution

Result: High-quality PR ready in under an hour instead of half a day.

Conclusion

The introduction of custom-built AI agents has fundamentally changed how our team approaches .NET MAUI development. By automating the mechanical aspects of issue resolution — reproduction, testing, fix exploration — we can focus on what matters most: understanding the problem, reviewing solutions, and ensuring quality.

Key Takeaways

  • 50-70% reduction in time per issue
  • 2-3x increase in issues addressed per week
  • 95%+ test coverage on new PRs (up from ~60%)
  • Lower barrier to community contribution
  • Higher quality through multi-model fix exploration

We invite the .NET community to experience this new workflow. Your contributions make .NET MAUI better for millions of developers worldwide, and our agents are here to make that contribution process as smooth as possible.

Resources

Documentation

Getting Started

We’d love to hear about your experience using these agents. Share your success stories, challenges, and suggestions in the dotnet/maui discussions or on social media with #dotnetMAUI.

Happy coding! 🚀

The post Accelerating .NET MAUI Development with AI Agents appeared first on .NET Blog.

Read the whole story
alvinashcraft
52 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Rogue AI Triggers Serious Security Incident At Meta

1 Share
For the second time in the past month, an AI agent went rogue at Meta -- this time giving an engineer incorrect advice that briefly exposed sensitive data. The Verge reports: A Meta engineer was using an internal AI agent, which Clayton described as "similar in nature to OpenClaw within a secure development environment," to analyze a technical question another employee posted on an internal company forum. But the agent also independently publicly replied to the question after analyzing it, without getting approval first. The reply was only meant to be shown to the employee who requested it, not posted publicly. An employee then acted on the AI's advice, which "provided inaccurate information" that led to a "SEV1" level security incident, the second-highest severity rating Meta uses. The incident temporarily allowed employees to access sensitive data they were not authorized to view, but the issue has since been resolved. According to Clayton, the AI agent involved didn't take any technical action itself, beyond posting inaccurate technical advice, something a human could have also done. A human, however, might have done further testing and made a more complete judgment call before sharing the information -- and it's not clear whether the employee who originally prompted the answer planned to post it publicly. "The employee interacting with the system was fully aware that they were communicating with an automated bot. This was indicated by a disclaimer noted in the footer and by the employee's own reply on that thread," Clayton commented to The Verge. "The agent took no action aside from providing a response to a question. Had the engineer that acted on that known better, or did other checks, this would have been avoided."

Read more of this story at Slashdot.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

Now anyone can host a global AI challenge

1 Share
Kaggle Community Hackathons enable people to to create their own hackathons with prizes up to $10,000.
Read the whole story
alvinashcraft
2 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

New tools and guidance: Announcing Zero Trust for AI

1 Share

Over the past year, I have had conversations with security leaders across a variety of disciplines, and the energy around AI is undeniable. Organizations are moving fast, and security teams are rising to meet the moment. Time and again, the question comes back to the same thing: “We’re adopting AI fast, how do we make sure our security keeps pace?”

It’s the right question, and it’s the one we’ve been working to answer by updating the tools and guidance you already rely on. We’re announcing Microsoft’s approach to Zero Trust for AI (ZT4AI). Zero Trust for AI extends proven Zero Trust principles to the full AI lifecycle—from data ingestion and model training to deployment and agent behavior. Today, we’re releasing a new set of tools and guidance to help you move forward with confidence:

  • A new AI pillar in the Zero Trust Workshop.
  • Updated Data and Networking pillars in the Zero Trust Assessment tool.
  • A new Zero Trust reference architecture for AI.
  • Practical patterns and practices for securing AI at scale.

Here’s what’s new and how to use it.

Why Zero Trust principles must extend to AI

AI systems don’t fit neatly into traditional security models. They introduce new trust boundaries—between users and agents, models and data, and humans and automated decision-making. As organizations adopt autonomous and semi-autonomous AI agents, a new class of risk emerges: agents that are overprivileged, manipulated, or misaligned can act like “double agents,” working against the very outcomes they were built to support.

By applying three foundational principles of Zero Trust to AI:

  • Verify explicitly—Continuously evaluate the identity and behavior of AI agents, workloads, and users.
  • Apply least privilege—Restrict access to models, prompts, plugins, and data sources to only what’s needed.
  • Assume breach—Design AI systems to be resilient to prompt injection, data poisoning, and lateral movement.

These aren’t new principles. What’s new is how we apply them systematically to AI environments.

A unified journey: Strategy → assessment → implementation

The most common challenge we hear from security leaders and practitioners is a lack of a clear, structured path from knowing what to do to doing it. That’s what Microsoft’s approach to Zero Trust for AI is designed to solve—to help you get to next steps and actions, quickly.

Zero Trust Workshop—now with an AI pillar

Building on last year’s announcement, the Zero Trust Workshop has been updated with a dedicated AI pillar, now covering 700 security controls across 116 logical groups and 33 functional swim lanes. It is scenario-based and prescriptive, designed to move teams from assessment to execution with clarity and speed.

The workshop helps organizations:

  • Align security, IT, and business stakeholders on shared outcomes.
  • Apply Zero Trust principles across all pillars, including AI.
  • Explore real-world AI scenarios and the specific risks they introduce.
  • Identify cross-product integrations that break down silos and drive measurable progress.

The new AI pillar specifically evaluates how organizations secure AI access and agent identities, protect sensitive data used by and generated through AI, monitor AI usage and behavior across the enterprise, and govern AI responsibly in alignment with risk and compliance objectives.

Zero Trust Assessment—expanded to Data and Networking

As AI agents become more capable, the stakes around data and network security have never been higher. Agents that are insufficiently governed can expose sensitive data, act on malicious prompts, or leak information in ways that are difficult to detect and costly to remediate. Data classification, labeling, governance, and loss prevention are essential controls. So are network-layer defenses that inspect agent behavior, block prompt injections, and prevent unauthorized data exposure.

Yet, manually evaluating security configurations across identity, endpoints, data, and network controls is time consuming and error prone. That is why we built the Zero Trust Assessment to automate it. The Zero Trust Assessment evaluates hundreds of controls aligned to Zero Trust principles, informed by learnings from Microsoft’s Secure Future Initiative (SFI). Today, we are adding Data and Network as new pillars alongside the existing Identity and Devices coverage.

Zero Trust Assessment tests are derived from trusted industry sources including:

  • Industry standards such as the National Institute of Standards and Technology (NIST), the Cybersecurity and Infrastructure Security Agency (CISA), and the Center for Internet Security (CIS).
  • Microsoft’s own learnings from SFI.
  • Real-world customer insights from thousands of security implementations.

And we are not stopping here. A Zero Trust Assessment for AI pillar is currently in development and will be available in summer 2026, extending automated evaluation to AI-specific scenarios and controls.

Overall, the redesigned experience delivers:

  • Clearer insights—Simplified views that help teams quickly identify strengths, gaps, and next steps.
  • Deep(er) alignment with the Workshop—Assessment insights directly inform workshop discussions, exercises, and deployment paths.
  • Actionable, prioritized recommendations—Concrete implementation steps mapped to maturity levels, so you can sequence improvements over time.

Zero Trust for AI reference architecture

Our new Zero Trust for AI reference architecture (extends our existing Zero Trust reference architecture) shows how policy-driven access controls, continuous verification, monitoring, and governance work together to secure AI systems, while increasing resilience when incidents occur.

The architecture gives security, IT, and engineering teams a shared mental model by clarifying where controls apply, how trust boundaries shift with AI, and why defense-in-depth remains essential for agentic workloads.

Practical patterns and practices for AI security

Knowing what to do is one thing. Knowing how to operationalize it at scale is another. Our patterns and practices provide repeatable, proven approaches to the most complex AI security challenges, much like software design patterns offer reusable solutions to common engineering problems.

PatternWhat it helps you do
Threat modeling for AIWhy traditional threat modeling breaks down for AI—and how to redesign it for real-world risk at AI scale.
AI observabilityEnd-to-end logging, traceability, and monitoring to enable oversight, incident response, and trust at scale.

See it live at RSAC 2026

If you’re attending RSAC™ 2026 Conference, join us for three sessions focused on Zero Trust for AI—from expanding attack surfaces to hands-on, actionable guidance.

WhenSessionTitle
Wednesday, March 25, 2026, 11:00 AM PT-11:20 AM PTZero Trust Theatre Session, by Tarek Dawoud (Principal Group Product Manager, Microsoft Security) and Hammad Rajjoub (Director, Microsoft Secure Future Initiative and Zero Trust)Zero Trust for AI: Securing the Expanding Attack Surface
Wednesday, March 25, 2026, 12:00 PM PT-1:00 PM PTAncillary Executive Session, by Travis Gross (Principal Group Product Manager, Microsoft Security), Eric Sachs (Corporate Vice President, Microsoft Security), and Marco Pietro (Executive Vice President, Global Head of Cybersecurity, Capgemini), moderated by Mia Reyes (Director of Security, Microsoft). Building Trust for a Secure Future: From Zero Trust to AI Confidence
Thursday, Marxh 26, 2026, 11:00 AM PT-12:00 PM PTRSAC Post-Day Workshop, by Travis Gross, Tarek Dawoud, Hammad RajjoubZero Trust, SFI, and ZT4AI: Practical, actionable guidance for CISOs

Get started with Zero Trust for AI

Zero Trust for AI brings proven security principles to the realities of modern AI. Whether you’re governing agents, protecting models and data, or scaling AI without introducing new risk, the tools, architecture, and guidance are ready for you today.

Get started:

To continue the conversation, join the Microsoft Security Community, where security practitioners and Microsoft experts share insights, guidance, and real world experiences across Zero Trust and AI security.

Learn more about Microsoft Security solutions on our website and bookmark the Microsoft Security blog for expert insights on security matters. Follow us on LinkedIn (Microsoft Security) and X (@MSFTSecurity) for the latest cybersecurity news and updates.

The post New tools and guidance: Announcing Zero Trust for AI appeared first on Microsoft Security Blog.

Read the whole story
alvinashcraft
3 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Rethinking open source mentorship in the AI era

1 Share

Let me paint a picture for you.

A polished pull request lands in your inbox. It looks amazing at first glance, but then you start digging in, and a few things seem off. Forty-five minutes later, you’ve crafted a thoughtful, encouraging response with a few clarifying questions. Who knows: Maybe this person might be a great new person to mentor, so it’s worth your time if they put in theirs.

And then…nothing. Or the follow-up makes it clear the contributor doesn’t have the context needed to explain the change, often because AI made it easy to submit something plausible before they were ready to maintain it. Or you realize you’ve just spent your afternoon debugging someone’s LLM chat session.

This is becoming more common. Not because contributors are acting in bad faith, but because it’s never been easier to generate something that looks plausible. The cost to create has dropped. The cost to review hasn’t.

Open source is experiencing its own “Eternal September”: a constant influx of contributions that strains the social systems we rely on to build trust and mentor newcomers.

The signals have changed

Projects across the ecosystem are seeing this same occurrence. tldraw closed their pull requests. Fastify shut down their HackerOne program after inbound reports became unmanageable at scale.

The overall volume keeps climbing. The Octoverse 2025 report notes that developers merged nearly 45 million pull requests per month in 2025 (up 23% year over year). More pull requests, same maintainer hours.

The old signals, like clean code, fast turnaround, and handling complexity, used to mean someone had invested time into understanding the codebase. Now AI can help users generate all of that in seconds, so these signals aren’t as telling.

To reduce noise and bring more trust back into open source contributions, platforms, including GitHub, are building longer-term solutions. In fact, our product team just published an RFC for community feedback. If you have thoughts on what we can do, we’d love to hear from you.

But platform changes take time. And even when they arrive, you’ll still need strategies for figuring out how mentorship looks today when signals aren’t as easy to read. Here’s what’s working.

Why this is urgent

Mentorship is how open source communities scale.

If I asked a room of open source contributors how they got started, they’d all say it began with a good mentor.

When you mentor someone well, you’re not just adding one contributor. You’re multiplying yourself. They learn to onboard others who do the same. That’s the multiplier effect.

YearBroadcast (1,000/year)Mentorship (2 every 6 months, they do the same)
11,0009
33,000729
55,00059,049

But maintainers are burning out trying to mentor everyone who sends a pull request. If we lose mentoring newcomers, we lose the multiplier entirely.

We can’t abandon mentorship, especially as many long-time maintainers step back from active contribution. (I wrote more about this generational challenge in Who will maintain the future?) So, we need to be strategic about who we invest in.

The 3 Cs: A framework for strategic mentorship at scale

So how do you decide where to invest your mentorship energy when contribution signals are harder to read? Looking at what’s working across projects, I see three filters maintainers are using. I call them the 3 Cs: Comprehension, Context, and Continuity.

1. Comprehension

Do they understand the problem well enough to propose this change?

Some projects now test comprehension before code is submitted. Codex and Gemini CLI, for example, both recently added guidelines: contributors must open an issue and get approval before submitting a pull request. The comprehension check happens in that conversation.

I’m also seeing in-person code sprints and hackathons thriving in this area, where maintainers can have real-time conversations with potential contributors to check both interest and comprehension.

I’m not expecting contributors to understand the whole project. That’s unrealistic. But you want to make sure they’re not committing code above their own comprehension level. As they grow, they can always take on more.

2. Context

Do they give me what I need to review this well?

Comprehension is about their understanding. Context is about your ability to do your job as a reviewer.

Did they link to the issue? Explain trade-offs? Disclose AI use?

The last one is becoming more common. ROOST has a simple three-principle policy. The Processing Foundation added a checkbox. Fedora landed a lightweight disclosure policy after months of discussion.

Disclosing AI is about giving reviewers context. When I know a pull request was AI-assisted, I can calibrate my review. This might mean asking more clarifying questions or focusing on whether the contributor understands the trade-offs, not just whether the code runs.

There’s also AGENTS.md, which provides instructions for AI coding agents, like robots.txt for Copilot. Projects like scikit-learn, Goose, and Processing use AGENTS.md to tell agents instructions, like follow our guidelines, check if an issue is assigned, or respect our norms. This can help to place the burden of gathering the context needed for a review to the contributor (or their tools).

3. Continuity

Do they keep coming back?

This is the mentorship filter.

Drive-by contributions can be helpful but limit your mentorship investment to people who come back and engage thoughtfully.

Your mentorship can scale up over time:

  • Great first conversation in a pull request → make your review a teachable moment
  • They keep coming back → offer to pair on something, then start suggesting harder tasks
  • If they still keep coming back → invite them to an event, or consider commit access

The takeaway

Comprehension and Context get you reviewed. Continuity gets you mentored.

As a maintainer, this means: don’t invest deep mentorship energy until you see all three.

What this looks like:

PR Lands → Follows Guidelines?  
                NO  → Close. Guilt-free. 
                YES → Review → They Come Back?
                                    YES → Consider Mentorship 

Let’s compare this to our first example above. This time, a polished pull requests lands without following the guidelines. Close it. Guilt-free. Protect your time for contributions that matter.

If someone comes back and is engaged in issues; if they submit a second pull request and respond thoughtfully to feedback, now you pay attention. That’s when you invest.

This is how you protect the multiplier effect. You’re not abandoning newcomers. You’re being strategic.

There’s another benefit too: clear criteria reduces bias. When you rely on vibes, you tend to mentor people who look like you or share your cultural context. The 3 Cs give you a rubric instead of gut feelings, and that makes your mentorship more equitable.

Getting started

Pick a C to implement:

CImplementation
ComprehensionRequire issue before pull request
Host an in-person code sprint for live discussions
ContextAdd AI disclosure or AGENTS.md
ContinuityWatch who comes back

Start with one but look for all three when deciding who to mentor.

This isn’t about restricting AI-assisted contributions. It’s about building guardrails that protect human mentorship and keep communities healthy.

AI tools are here to stay. The question is whether we adapt our practices to maintain what makes open source work: human relationships, knowledge transfer, and the multiplier effect.

The 3 Cs give us a framework for exactly that.

Resources

Adapted from my FOSDEM 2026 talk. Thanks to Anne Bertucio, Ashley Wolf, Daniel Stenberg, Tim Head, Bruno Borges, Emma Irwin, Helen Hou-Sandí, Hugo van Kemenade, Jamie Tanna, John McBride, Juan Luis Cano Rodríguez, Justin Wheeler, Matteo Collina, Camilla Moraes, Raphaël de Courville, Rizel Scarlett, and everyone who shared examples online.

The post Rethinking open source mentorship in the AI era appeared first on The GitHub Blog.

Read the whole story
alvinashcraft
3 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

How Squad runs coordinated AI agents inside your repository

1 Share

If you’ve used AI coding tools before, you know the pattern. You write a prompt, the model misunderstands, you refine it, and you coax better output. Progress depends more on steering the model than on building the software.

As projects grow, the challenge stops being “how do I prompt?” and starts becoming “how do I coordinate design, implementation, testing, and review without losing context along the way.”

Multi-agent systems are a great way to move past this plateau, but usually require a massive amount of setup. People spend hours building orchestration layers, wiring up frameworks, and configuring vector databases before they can delegate a single task.

Squad, an open source project built on GitHub Copilot, initializes a preconfigured AI team directly inside your repository. It is a bet that multi-agent development can be accessible, legible, and useful without requiring heavy orchestration infrastructure or deep prompt engineering expertise.

Two commands—npm install -g @bradygaster/squad-cli once globally, squad init once per repo—and Squad drops a specialized AI team: a lead, frontend developer, backend developer, and tester directly against your repository.

Instead of a single chatbot switching roles, Squad demonstrates repository native multi-agent orchestration without heavy centralized infrastructure.

How Squad coordinates work across agents

You describe the work you need done in natural language. From there, a coordinator agent inside Squad figures out the routing, loads repository context, and spawns specialists with task-specific instructions.

For example, you type: “Team, I need JWT auth—refresh tokens, bcrypt, the works.” Then you watch the team spin up in parallel. The backend specialist takes the implementation. The tester starts writing the accompanying test suite. A documentation specialist opens a pull request. Within minutes, files are written and branches are created. These specialists already know your naming conventions and what you decided about database connections last Tuesday—not because you put it in the prompt, but because agents load from shared team decisions and their own project history files committed to the repository.

Instead of forcing you to manually test the output and prompt the model through multiple rounds of fixes, Squad handles iteration internally. Once the backend specialist drafts the initial implementation, the tester runs their test suite against it. If those tests fail, the tester rejects the code. Crucially, the orchestration layer prevents the original agent from revising its own work. Squad’s reviewer protocol can prevent the original author from revising rejected work, and a different agent must step in to fix it. This forces genuine independent review with a separate context window and a fresh perspective, rather than asking a single AI to review its own mistakes. In workflows where reviewer automation is enabled, you review the pull request that survives this internal loop rather than every intermediate attempt.

It’s not autopilot, and it’s not magic on session one. Agents will ask clarifying questions and sometimes make reasonable but wrong assumptions. You still review and merge every pull request. It is collaborative orchestration, not autonomous execution.

Architectural patterns behind repository-native orchestration

Whether you use Squad or build your own multi-agent workflows, there are a few architectural patterns we’ve learned from building repository-native orchestration. These patterns move the architecture away from “black box” behavior toward something inspectable and predictable at the repository level.

1. The “Drop-box” pattern for shared memory

Most AI orchestration relies on real-time chat or complex vector database lookups to keep agents in sync. We’ve found that this is often too fragile; synchronizing state across live agents is a fool’s errand.

Instead, Squad uses a “drop-box” pattern. Every architectural choice, like choosing a specific library or a naming convention, is appended as a structured block to a versioned decisions.md file in the repository. This is a bet that asynchronous knowledge sharing inside the repository scales better than real-time synchronization. By treating a markdown file as the team’s shared brain, you get persistence, legibility, and a perfect audit trail of every decision the team has made. Because this memory lives in project files rather than a live session, the team can also recover context after disconnects or restarts and continue from where it left off.

2. Context replication over context splitting

One of the biggest hurdles in AI development is the context window limit. When a single agent tries to do everything, the “working memory” gets crowded with meta-management, leading to hallucinations.

Squad solves this by ensuring the coordinator agent remains a thin router. It doesn’t do the work; it spawns specialists. Because each specialist runs as a separate inference call with its own large context window (e.g., up to 200K tokens on supported models), you aren’t splitting one context among four agents, you’re replicating repository context across them.

Running multiple specialists in parallel gives you multiple independent reasoning contexts operating simultaneously. This allows each agent to “see” the relevant parts of the repository without competing for space with the other agents’ thoughts.

3. Explicit memory in the prompt vs. implicit memory in the weights

    We believe an AI team’s memory should be legible and versioned. You shouldn’t have to wonder what an agent “knows” about your project.

    In Squad, an agent’s identity is built primarily on two repository files: a charter (who they are) and a history (what they’ve done), alongside shared team decisions. These are plain text. Because these live in your . squad/ folder, the AI’s memory is versioned right alongside your code. When you clone a repo, you aren’t just getting the code; you are getting an already “onoboarded” AI team because their memory lives alongside the code directly in the repository.

    Lowering the barrier to multi-agent workflows

    Our biggest win with Squad is that it makes it easy for anyone to get started with agentic development in a low-touch, low-ceremony way. You shouldn’t have to spend hours wrestling with infrastructure, learning complex prompt engineering, or managing convoluted CLI interactions just to get an AI team to help you write code.

    To see what repository-native orchestration feels like, check out the Squad repository and throw a squad at a problem to see how the workflow evolves.

    The post How Squad runs coordinated AI agents inside your repository appeared first on The GitHub Blog.

    Read the whole story
    alvinashcraft
    3 minutes ago
    reply
    Pennsylvania, USA
    Share this story
    Delete
    Next Page of Stories