Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
151399 stories
·
33 followers

How Drasi used GitHub Copilot to find documentation bugs

1 Share

For early-stage open-source projects, the “Getting started” guide is often the first real interaction a developer has with the project. If a command fails, an output doesn’t match, or a step is unclear, most users won’t file a bug report, they will just move on.

Drasi, a CNCF sandbox project that detects changes in your data and triggers immediate reactions, is supported by our small team of four engineers in Microsoft Azure’s Office of the Chief Technology Officer. We have comprehensive tutorials, but we are shipping code faster than we can manually test them.

The team didn’t realize how big this gap was until late 2025, when GitHub updated its Dev Container infrastructure, bumping the minimum Docker version. The update broke the Docker daemon connection, and every single tutorial stopped working. Because we relied on manual testing, we didn’t immediately know the extent of the damage. Any developer trying Drasi during that window would have hit a wall.

This incident forced a realization: with advanced AI coding assistants, documentation testing can be converted to a monitoring problem.

The problem: Why does documentation break?

Documentation usually breaks for two reasons:

1. The curse of knowledge

Experienced developers write documentation with implicit context. When we write “wait for the query to bootstrap,” we know to run `drasi list query` and watch for the `Running` status, or even better to run the `drasi wait` command. A new user has no such context. Neither does an AI agent. They read the instructions literally and don’t know what to do. They get stuck on the “how,” while we only document the “what.”

2. Silent drift

Documentation doesn’t fail loudly like code does. When you rename a configuration file in your codebase, the build fails immediately. But when your documentation still references the old filename, nothing happens. The drift accumulates silently until a user reports confusion.

This is compounded for tutorials like ours, which spin up sandbox environments with Docker, k3d, and sample databases. When any upstream dependency changes—a deprecated flag, a bumped version, or a new default—our tutorials can break silently.

The solution: Agents as synthetic users

To solve this, we treated tutorial testing as a simulation problem. We built an AI agent that acts as a “synthetic new user.”

This agent has three critical characteristics:

  1. It is naïve: It has no prior knowledge of Drasi—it knows only what is explicitly written in the tutorial.
  2. It is literal: It executes every command exactly as written. If a step is missing, it fails.
  3. It is unforgiving: It verifies every expected output. If the doc says, “You should see ‘Success’”, and the command line interface (CLI) just returns silently, the agent flags it and fails fast.

The stack: GitHub Copilot CLI and Dev Containers

We built a solution using GitHub Actions, Dev Containers, Playwright, and the GitHub Copilot CLI.

Our tutorials require heavy infrastructure:

  • A full Kubernetes cluster (k3d)
  • Docker-in-Docker
  • Real databases (such as PostgreSQL and MySQL)

We needed an environment that exactly matches what our human users experience. If users run in a specific Dev Container on GitHub Codespaces, our test must run in that same Dev Container.

The architecture

Inside the container, we invoke the Copilot CLI with a specialized system prompt (view the full prompt here):

This prompt using the prompt mode (-p) of the CLI agent gives us an agent that can execute terminal commands, write files, and run browser scripts—just like a human developer sitting at their terminal. For the agent to simulate a real user, it needs these capabilities.

To enable the agents to open webpages and interact with them as any human following the tutorial steps would, we also install Playwright on the Dev Container. The agent also takes screenshots which it then compares against those provided in the documentation.

Security model

Our security model is built around one principle: the container is the boundary.

Rather than trying to restrict individual commands (a losing game when the agent needs to run arbitrary node scripts for Playwright), we treat the entire Dev Container as an isolated sandbox and control what crosses its boundaries: no outbound network access beyond localhost, a Personal Access Token (PAT) with only “Copilot Requests” permission, ephemeral containers destroyed after each run, and a maintainer-approval gate for triggering workflows.

Dealing with non-determinism

One of the biggest challenges with AI-based testing is non-determinism. Large language models (LLMs) are probabilistic—sometimes the agent retries a command; other times it gives up.

We handled this with a three-stage retry with model escalation (start with Gemini-Pro, on failure try with Claude Opus), semantic comparison for screenshots instead of pixel-matching, and verification of core-data fields rather than volatile values.

We also have a list of tight constraints in our prompts that prevent the agent from going on a debugging journey, directives to control the structure of the final report, and also skip directives that tell the agent to bypass optional tutorial sections like setting up external services.

Artifacts for debugging

When a run fails, we need to know why. Since the agent is running in a transient container, we can’t just Secure Shell (SSH) in and look around.

So, our agent preserves evidence of every run, screenshots of web UIs, terminal output of critical commands, and a final markdown report detailing its reasoning like shown here:

These artifacts are uploaded to the GitHub Action run summary, allowing us to “time travel” back to the exact moment of failure and see what the agent saw.

Parsing the agent’s report

With LLMs, getting a definitive “Pass/Fail” signal that a machine can understand can be challenging. An agent might write a long, nuanced conclusion like:

To make this actionable in a CI/CD pipeline, we had to do some prompt engineering. We explicitly instructed the agent:

In our GitHub Action, we then simply grep for this specific string to set the exit code of the workflow.

Simple techniques like this bridge the gap between AI’s fuzzy, probabilistic outputs and CI’s binary pass/fail expectations.

Automation

We now have an automated version of the workflow which runs weekly. This version evaluates all our tutorials every week in parallel—each tutorial gets its own sandbox container and a fresh perspective from the agent acting as a synthetic user. If any of the tutorial evaluation fails, the workflow is configured to file an issue on our GitHub repo.

This workflow can optionally also be run on pull-requests, but to prevent attacks we have added a maintainer-approval requirement and a `pull_request_target` trigger, which means that even on pull-requests by external contributors, the workflow that executes will be the one in our main branch.

Running the Copilot CLI requires a PAT token which is stored in the environment secrets for our repo. To make sure this does not leak, each run requires maintainer approval—except the automated weekly run which only runs on the `main` branch of our repo.

What we found: Bugs that matter

Since implementing this system, we have run over 200 “synthetic user” sessions. The agent identified 18 distinct issues including some serious environment issues and other documentation issues like these. Fixing them improved the docs for everyone, not just the bot.

  • Implicit dependencies: In one tutorial, we instructed users to create a tunnel to a service. The agent ran the command, and then—following the next instruction—killed the process to run the next command.
    The fix: We realized we hadn’t told the user to keep that terminal open. We added a warning: “This command blocks. Open a new terminal for subsequent steps.”
  • Missing verification steps: We wrote: “Verify the query is running.” The agent got stuck: “How, exactly?”
    The fix: We replaced the vague instruction with an explicit command: `drasi wait -f query.yaml`.
  • Format drift: Our CLI output had evolved. New columns were added; older fields were deprecated. The documentation screenshots still showed the 2024 version of the interface. A human tester might gloss over this (“it looks mostly right”). The agent flagged every mismatch, forcing us to keep our examples up to date.

AI as a force multiplier

We often hear about AI replacing humans, but in this case, the AI is providing us with a workforce we never had.

To replicate what our system does—running six tutorials across fresh environments every week—we would need a dedicated QA resource or a significant budget for manual testing. For a four-person team, that is impossible. By deploying these Synthetic Users, we have effectively hired a tireless QA engineer who works nights, weekends, and holidays.

Our tutorials are now validated weekly by synthetic users. try the Getting Started guide yourself and see the results firsthand. And if you’re facing the same documentation drift in your own project, consider GitHub Copilot CLI not just as a coding assistant, but as an agent—give it a prompt, a container, and a goal—and let it do the work a human doesn’t have time for.

The post How Drasi used GitHub Copilot to find documentation bugs appeared first on Microsoft Open Source Blog.

Read the whole story
alvinashcraft
20 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Intent redirection vulnerability in third-party SDK exposed millions of Android wallets to potential risk

1 Share

During routine security research, we identified a severe intent redirection vulnerability in a widely used third-party Android SDK called EngageSDK. This flaw allows apps on the same device to bypass Android security sandbox and gain unauthorized access to private data. With over 30 million installations of third-party crypto wallet applications alone, the exposure of PII, user credentials and financial data were exposed to risk. All of the detected apps using vulnerable versions have been removed from Google Play.

Following our Coordinated Vulnerability Disclosure practices (via Microsoft Security Vulnerability Research), we notified EngageLab and the Android Security Team. We collaborated with all parties to investigate and validate the issue, which was resolved as of November 3, 2025 in version 5.2.1 of the EngageSDK. This case shows how weaknesses in third‑party SDKs can have large‑scale security implications, especially in high‑value sectors like digital asset management. 

As of the time of writing, we are not aware of any evidence indicating that this vulnerability has been exploited in the wild. Nevertheless, we strongly recommend that developers who integrate the affected SDK upgrade to the latest available version. While this is a vulnerability introduced by a third-party SDK, Android’s existing layered security model is capable of providing additional mitigations against exploitation of vulnerabilities through intents. Android has updated these automatic user protections to provide additional mitigation against the specific EngageSDK risks described in this report while developers update to the non-vulnerable version of EngageSDK. Users who previously downloaded a vulnerable app are protected.

In this blog, we provide a technical analysis of a vulnerability that bypasses core Android security mechanisms. We also examine why this issue is significant in the current landscape: apps increasingly rely on third‑party SDKs, creating large and often opaque supply‑chain dependencies.  

As mobile wallets and other high‑value apps become more common, even small flaws in upstream libraries can impact millions of devices. These risks increase when integrations expose exported components or rely on trust assumptions that aren’t validated across app boundaries. 

Because Android apps frequently depend on external libraries, insecure integrations can introduce attack surfaces into otherwise secure applications. We provide resources for three key audiences: 

  • Developers: In addition to the best practices Android provides its developers, we provide practical guidance on identifying and preventing similar flaws, including how to review dependencies and validate exported components.  
  • Researchers: Insights into how we discovered the issue and the methodology we used to confirm its impact.  
  • General readers: An explanation of the implications of this vulnerability and why ecosystem‑wide vigilance is essential. 

This analysis reflects Microsoft’s visibility into cross‑platform security threats. We are committed to safeguarding users, even in environments and applications that Microsoft does not directly build or operate.  You can find a detailed set of recommendations, detection guidance and indicators at the end of this post to help you assess exposure and strengthen protections.

Technical details

The Android operating system integrates a variety of security mechanisms, such as memory isolation, filesystem discretionary and mandatory access controls (DAC/MAC), biometric authentication, and network traffic encryption. Each of these components functions according to its own security framework, which may not always align with the others[1].  

Unlike many other operating systems where applications run with the user’s privileges, Android assigns each app with a unique user ID and executes it within its own sandboxed environment. Each app has a private directory for storing data that is not meant to be shared. By default, other apps cannot access this private space unless the owning app explicitly exposes data through components known as content providers.  

To facilitate communication between applications, Android uses intents[2]. Beyond inter-app messaging, intents also enable interaction among components within the same application as well as data sharing between those components. 

It’s worth noting that while any application can send an intent to another app or component, whether that intent is actually delivered—and more broadly, whether the communication is permitted—depends on the identity and permissions of the sending application.  

Intent redirection vulnerability 

Intent Redirection occurs when a threat actor manipulates the contents of an intent that a vulnerable app sends using its own identity and permissions.  

In this scenario, the threat actor leverages the trusted context of the affected app to run a malicious payload with the app’s privileges. This can lead to: 

  • Unauthorized access to protected components  
  • Exposure of sensitive data 
  • Privilege escalation within the Android environment
Figure 1. Visual representation of an intent redirection.

Android Security Team classifies this vulnerability as severe. Apps flagged as vulnerable are subject to enforcement actions, including potential removal from the platform[3].

EngageLab SDK intent redirection

Developers use the EngageLab SDK to manage messaging and push notifications in mobile apps. It functions as a library that developers integrate into Android apps as a dependency. Once included, the SDK provides APIs for handling communication tasks, making it a core component for apps that require real-time engagement.

The vulnerability was identified in an exported activity (MTCommonActivity) that gets added to an application’s Android manifest once the library is imported into a project, after the build process. This activity only appears in the merged manifest, which is generated post-build (see figure below), and therefore is sometimes missed by developers. Consequently, it often escapes detection during development but remains exploitable in the final APK.

Figure 2. The vulnerable MTCommonActivity activity is added to the merged manifest.

When an activity is declared as exported in the Android manifest, it becomes accessible to other applications installed on the same device. This configuration permits any other application to explicitly send an intent to this activity.   

The following section outlines the intent handling process from the moment the activity receives an intent to when it dispatches one under the affected application’s identity. 

Intent processing in the vulnerable activity 

When an activity receives an intent, its response depends on its current lifecycle state: 

  • If the activity is starting for the first time, the onCreate() method runs.  
  • If the activity is already active, the onNewIntent() method runs instead.  

In the vulnerable MTCommonActivity, both callbacks invoke the processIntent() method. 

Figure 3: Calling the processIntent() method.

This method (see figure below) begins by initializing the uri variable on line 10 using the data provided in the incoming intent. If the uri variable is not empty, then – according to line 16 – it invokes the processPlatformMessage():  

Figure 4: The processIntent() method.

The processPlatformMessage() method instantiates a JSON object using the uri string supplied as an argument to this method (see line 32 below):  

Figure 5: The processPlatformMessage() method.

Each branch of the if statement checks the JSON object for a field named n_intent_uri. If this field exists, the method performs the following actions: 

  • Creates a NotificationMessage object  
  • Initializes its intentUri field by using the appropriate setter (see line 52).  

An examination of the intentUri field in the NotificationMessage class identified the following method as a relevant point of reference:

Figure 6: intentUri usage overview.

On line 353, the method above obtains the intentUri value and attempts to create a new intent from it by calling the method a() on line 360. The returned intent is subsequently dispatched using the startActivity() method on line 365. The a() method is particularly noteworthy, as it serves as the primary mechanism responsible for intent redirection:

Figure 7: Overview of vulnerable code.

This method appears to construct an implicit intent by invoking setComponent(), which clears the target component of the parseUri intent by assigning a null value (line 379). Under normal circumstances, such behavior would result in a standard implicit intent, which poses minimal risk because it does not specify a concrete component and therefore relies on the system’s resolution logic.  

However, as observed on line 377, the method also instantiates a second intent variable — its purpose not immediately evident—which incorporates an explicit intent. Crucially, this explicitly targeted intent is the one returned at line 383, rather than the benign parseUri intent.  

Another notable point is that the parseUri() method (at line 376)   is called with the URI_ALLOW_UNSAFE flag (constant value 4), which can permit access to an application’s content providers [6] (see exploitation example below). 

These substitutions fundamentally alter the method’s behavior: instead of returning a non‑directed, system‑resolved implicit intent, it returns an intent with a predefined component, enabling direct invocation of the targeted activity as well as access to the application’s content providers. As noted previously, this vulnerability can, among other consequences, permit access to the application’s private directory by gaining entry through any available content providers, even those that are not exported.

Figure 8: Getting READ/WRITE access to non-exported content providers.

Exploitation starts when a malicious app creates an intent object with a crafted URI in the extra field. The vulnerable app then processes this URI, creating and sending an intent using its own identity and permissions. 

Due to the URI_ALLOW_UNSAFE flag, the intent URI may include the following flags; 

  • FLAG_GRANT_PERSISTABLE_URI_PERMISSION 
  • FLAG_GRANT_READ_URI_PERMISSION  
  • FLAG_GRANT_WRITE_URI_PERMISSION 

When combined, these flags grant persistent read and write access to the app’s private data.  

After the vulnerable app processes the intent and applies these flags, the malicious app is authorized to interact with the target app’s content provider. This authorization remains active until the target app explicitly revokes it [5]. As a result, the internal directories of the vulnerable app are exposed, which allows unauthorized access to sensitive data in its private storage space.  The following image illustrates an example of an exploitation intent:

Figure 9: Attacking the MTCommonActivity.

Affected applications  

A significant number of apps using this SDK are part of the cryptocurrency and digital‑wallet ecosystem. Because of this, the consequences of this vulnerability are especially serious. Before notifying the vendor, Microsoft confirmed the flaw in multiple apps on the Google Play Store.

The affected wallet applications alone accounted for more than 30 million installations, and when including additional non‑wallet apps built on the same SDK, the total exposure climbed to over 50 million installations.  

Disclosure timeline

Microsoft initially identified the vulnerability in version 4.5.4 of the EngageLab SDK. Following Coordinated Vulnerability Disclosure (CVD) practices through Microsoft Security Vulnerability Research (MSVR), the issue was reported to EngageLab in April 2025. Additionally, Microsoft notified the Android Security Team because the affected apps were distributed through the Google Play Store.  

EngageLab addressed the vulnerability in version 5.2.1, released on November 3, 2025. In the fixed version, the vulnerable activity is set to non-exported, which prevents it from being invoked by other apps. 

Date Event 
April 2025 Vulnerability identified in EngageLab SDK v4.5.4. Issue reported to EngageLab 
May 2025 Escalated the issue to the Android Security Team for affected applications distributed through the Google Play Store. 
November 3, 2025 EngageLab released v5.2.1, addressing the vulnerability 

Mitigation and protection guidance

Android developers utilizing the EngageLab SDK are strongly advised to upgrade to the latest version promptly. 

Our research indicates that integrating external libraries can inadvertently introduce features or components that may compromise application security. Specifically, adding an exported component to the merged Android manifest could be unintentionally overlooked, resulting in potential attack surfaces. To keep your apps secure, always review the merged Android manifest, especially when you incorporate third‑party SDKs. This helps you identify any components or permissions that might affect your app’s security or behavior.

Keep your users and applications secure

Strengthening mobile‑app defenses doesn’t end with understanding this vulnerability.

Take the next step: 

Learn more about Microsoft’s Security Vulnerability Research (MSVR) program at https://www.microsoft.com/en-us/msrc/msvr

References

[1] Mayrhofer, René, Jeffrey Vander Stoep, Chad Brubaker, Dianne Hackborn, Bram Bonné, Güliz Seray Tuncay, Roger Piqueras Jover, and Michael A. Specter. The Android Platform Security Model (2023). ACM Transactions on Privacy and Security, vol. 24, no. 3, 2021, pp. 1–35. arXiv:1904.05572. https://doi.org/10.48550/arXiv.1904.05572.  

[2] https://developer.android.com/guide/components/intents-filters  

[3] https://support.google.com/faqs/answer/9267555?hl=en  

[4] https://www.engagelab.com/docs/  

[5] https://developer.android.com/reference/android/content/Intent#FLAG_GRANT_PERSISTABLE_URI_PERMISSION 

[6] https://developer.android.com/reference/android/content/Intent#URI_ALLOW_UNSAFE

This research is provided by Microsoft Defender Security Research with contributions from Dimitrios Valsamaras and other members of Microsoft Threat Intelligence.

Learn more

Review our documentation to learn more about our real-time protection capabilities and see how to enable them within your organization.   

The post Intent redirection vulnerability in third-party SDK exposed millions of Android wallets to potential risk appeared first on Microsoft Security Blog.

Read the whole story
alvinashcraft
35 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Gen Z needs a knowledge base (and so do you)

1 Share
AI tool use is inescapable...especially if you're a young person trying to get an edge in an increasingly difficult job market. But cognitive offloading is dangerous, no matter what age you are. Building a knowledge base can save your brain and skills from atrophy.
Read the whole story
alvinashcraft
44 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Ex-Meta worker investigated for downloading 30,000 private Facebook photos

1 Share
Laura Cress reports: A former Meta employee suspected of downloading around 30,000 private images of Facebook users is being investigated by the Metropolitan Police. The engineer, who lives in London, is believed to have designed a program to be able to access personal pictures on the site while avoiding security checks. A Meta spokesperson told...

Source

Read the whole story
alvinashcraft
52 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

A hacker has allegedly breached one of China’s supercomputers and is attempting to sell a trove of stolen data

1 Share
Isaac Yee reports: A hacker has allegedly stolen a massive trove of sensitive data – including highly classified defense documents and missile schematics – from a state-run Chinese supercomputer in what could potentially constitute the largest known heist of data from China. The dataset, which allegedly contains more than 10 petabytes of sensitive information, is believed...

Source

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

Architecture as Code to Teach Humans and Agents About Architecture

1 Share

A funny thing happened on the way to writing our book Architecture as Code—the entire industry shifted. Generally, we write books iteratively—starting with a seed of an idea, then developing it through workshops, conference presentations, online classes, and so on. That’s exactly what we did about a year ago with our Architecture as Code book. We started with the concept of describing all the ways that software architecture intersects with other parts of the software development ecosystem: data, engineering practices, team topologies, and more—nine in total—in code, as a way of creating a fast feedback loop for architects to react to changes in architecture. In other words, we’re documenting the architecture through code, defining the structure and constraints we want to guide the implementation through.

For example, an architect can define a set of components via a diagram, along with their dependencies and relationships. That design reflects careful thought about coupling, cohesion, and a host of other structural concerns. However, when they turn that diagram over to a team to develop it, how can they be sure the team will implement it correctly? By defining the components in code (with verifications), the architect can both illustrate and get feedback on the design. However, we recognize that architects don’t have a crystal ball, and design should sometimes change to reflect implementation. When a developer adds a new component, it isn’t necessarily an error but rather feedback that an architect needs to know. This isn’t a testing framework; it’s a feedback framework. When a new component appears, the architect should know so that they can assess: Should the component be there? Perhaps it was missed in the design. If so, how does that affect other components? Having the structure of your architect defined as code allows deterministic feedback on structural integrity.

This capability is useful for developers. We defined these intersections as a way of describing all different aspects of architecture in a deterministic way. Then agents arrived.

Agentic AI shows new capabilities in software architecture, including the ability to work toward a solution as long as deterministic constraints exist. Suddenly, developers and architects are trying to build ways for agents to determine success, which requires a deterministic method of defining these important constraints: Architecture as Code.

An increasingly common practice in agentic AI is separating foundational constraints from desired behavior. For example, part of the context or guardrails developers use for code generation can include concrete architectural constraints around code structure, complexity, coupling, cohesion, and a host of other measurable things. As architects can objectively define what an acceptable architecture is, they can build inviolate rules for agents to adhere to. For example, a problem that is gradually improving is the tendency of LLMs to use brute force to solve problems. If you ask for an algorithm that touches all 50 US states, it might build a 50-stage switch statement. However, if one of the architect’s foundational rules for code generation puts a limit on cyclomatic complexity, then the agent will have to find a way to generate code within that constraint.

This capability exists for all nine of the intersections we cover in Architecture as Code: implementation, engineering practices, infrastructure, generative AI, team topologies, business concerns, enterprise architect, data, and integration architecture.

Increasingly, we see the job of developers, but especially architects, being able to precisely and objectively define architecture, and we built a framework for both how to do it and where to do it in Architecture as Code–coming soon!



Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories