
Bob. Clippy. Cortana. Copilot. Microsoft has been trying to unlock the personal-assistant puzzle for decades. Now a fledgling team inside the company that’s been experimenting with OpenClaw — an open-source framework that acts both a virtual assistant and platform for building and managing proactive agents — is taking a stab at the problem.
That team, headed by Corporate Vice President Omar Shahine, already has a working agent prototype and, as of May 1, more than 3,000 daily users inside Microsoft testing “Project Lobster,” the team’s OpenClaw-based desktop environment, up from 100 the previous week.
Not bad for a technology that CEO Satya Nadella dismissed as a security risk akin to “a virus” just a few months ago. A number of other companies, including OpenAI and NVIDIA, are also rushing to integrate the technology with their own.

The vision of Shahine’s team is to create “an always-on agent team (a Chief of Staff agent, an Executive Assistant agent, and a roster of specialist agents) that works 24/7 on your behalf within the Microsoft 365 ecosystem,” as he described it in a blog post.
It’s a “persistent runtime that monitors your signals continuously, prepares your day before you wake up, triages your inbox while you’re in meetings, and follows up on action items without being asked,” he explained.
OpenClaw, developed by Peter Steinberger (who, as of Feb. 2026, works for OpenAI) has only been publicly available since Nov. 2025, originally under the name Clawdbot.
Shahine had been dabbling with OpenClaw since earlier this year to automate tasks at home, such as drafting an email or investigating concert-ticket prices. He demonstrated how Lobster works during a presentation to Microsoft’s AI Accelerator group on Feb. 26. And by March 31, he had a new role at Microsoft: To bring OpenClaw and personal agents to Microsoft 365.
Microsoft recently has made forays into the autonomous-agent space with Copilot Tasks, an agent in preview for consumers that is designed to help with chores like triaging email and booking travel. On the business side, Microsoft is integrating Anthropic’s Cowork technology with Microsoft 365 Copilot in the form of “Claude Cowork,” which takes action inside the various Microsoft Office apps.
But neither of these approaches provides a virtual assistant working on users’ behalf 24/7 with access to people’s full, real lives, Shahine maintains. They can’t do things like order from DoorDash if a user is in back-to-back meetings or reschedule a call if it interferes with a family dinner. That gap is why he decided to target knowledge workers, he says.
Shahine’s team, known as Ocean 11, includes a handful of people, each running his/her own Lobster agent. The team is building out the runtime and supporting infrastructure needed to make Lobster work in an enterprise environment.
As Lobster is currently envisioned, it will work across all kinds of apps and Microsoft 365 and other data sources. It won’t need constant prompting, but instead, will suggest courses of action it can take, pending user approval.
And this is why Nadella and other security-minded professionals have qualms about OpenClaw: It works autonomously, can ingest untested inputs, maintains persistent credentials, and could turn things like prompt-injection attacks into action-injected ones.
Microsoft’s own Defender security team’s current guidance states: “OpenClaw should be treated as untrusted code execution with persistent credentials. It is not appropriate to run on a standard personal or enterprise workstation.”
In an interview, Shahine acknowledged that enterprise-hardening Microsoft’s OpenClaw-based offerings needs to be job No. 1. His team is designing prototype agents to have their own Microsoft 365 identities, meaning their own Entra IDs for governance, their own Exchange mailbox, their own Teams presence, and integration with the Microsoft Graph.
“My goal is to contribute to make OpenClaw better but also consume it and run it so that it’s also a reference design, reference pattern that people can look to and say, ‘Well, you know, it’s great. Microsoft figured out how to make this thing enterprise great,” he said.
Shahine wasn’t ready to talk timetables or deliverables, beyond the Teams plug-in available for OpenClaw. But the team already has developed a Mac and Windows desktop environment called ClawPilot (no relation to clawpilot.ai) that it’s using internally to work with “claw-like agentic workflows.” Shahine said ClawPilot is acting as his personal assistant and goes by “Sebastien” (a nod to “The Little Mermaid”).
Microsoft Vice President Scott Hanselman has built a Windows node for OpenClaw which could get some airtime at Microsoft’s upcoming Build developer conference in San Francisco in June. Shahine said “there will be some concrete information about how we’re working to make Windows a fantastic environment for OpenClaw and other agentic systems to operate.”
Welcome to Hit Subscribe’s Monthly Digest! In this edition, we’re excited to share a collection of recent blog posts we’ve written for our clients. Plus, stick around till the end—we’ve included a meme of the month to keep things fun!
If you’ve been using LLMs lately, you’ve probably seen how easy it is to connect them to your stack—cloud tools, ticketing systems, and internal APIs—with just a bit of setup.The issue is that this also makes it easy to accidentally expose sensitive systems. We’re already seeing failures like OpenClaw-style incidents where overly permissive agent connections led to serious data loss, including things like wiped email inboxes.
AI is a double-edged sword: it can dramatically improve workflows or just as easily scale mistakes when agents start calling tools without clear boundaries.
In this post, I’ll break down MCP Server Security, why it matters, and how to run MCP servers more safely in real-world environments.
AI is getting better at writing code and analyzing systems, but without a structured way to connect it to your tools, integrations can become fragile, insecure, and hard to scale.
The Model Context Protocol (MCP) solves this by providing a standardized bridge between AI and external systems. At the core of this architecture are two components: the MCP Server vs. Client.
This article explains how MCP servers and clients work together to enable secure, efficient AI workflows—and how Tricentis uses MCP to make integrations more reliable at enterprise scale.
If you want AI to speed up your testing without causing mistakes or breaking things, connecting to an MCP server is the way to go.
This guide shows how to safely link AI clients to your tools, manage context, and get work done—like creating test cases, updating test plans, and organizing test data—without risking errors or lost information.
Agentic Quality Assurance: A Guide to AI-Driven QA Quality assurance is changing. As software systems become more complicated and companies release updates faster, the old ways of testing software aren’t keeping up.
Agentic AI is helping quality assurance keep pace by allowing testing systems to make smart decisions on their own instead of relying only on fixed instructions.
In this guide, you’ll learn what agentic quality assurance means, why it is important, best practices, and how teams are using it in real-world situations.
Test management becomes harder as teams scale. Backlogs grow, release cycles speed up, and test data becomes scattered across tools.Test managers often spend more time triaging and prioritizing than focusing on quality outcomes, and manual decision-making starts to break down at scale.
Agentic AI offers a new approach. Instead of relying on humans for every decision, agentic test management lets AI agents analyze context, adapt test plans, and act in real time.
This guide explains what agentic test management is, how it differs from traditional QA, how it works, and how to apply it to your workflows.
Platforms like Replit, Lovable, and Emergent are making it easier for vibe coders to build and debug code, and the output is increasingly moving beyond the “vibe test.” This is where agentic testing comes in.
Enterprise adoption is also accelerating. A 2025 KPMG survey found that 65% of companies over $1B in revenue have moved from AI agent experimentation into active pilots.
Anthropic’s research suggests agents will evolve from handling short tasks to autonomously building and testing full systems with only periodic human oversight.
This guide explores agentic performance testing, key use cases, and how it works in practice.
MCP servers are the backbone of the Model Context Protocol, acting as the bridge between AI systems and the tools, data, and services they need to interact with. They make it possible for LLMs to securely call APIs, run actions, and access context in a structured, standardized way—without custom integrations for every tool.
In this guide, we break down what an MCP server is, how it fits into the Model Context Protocol, and why it’s becoming a key building block for connecting AI systems to tools and data in a secure, structured way.
Web applications today cater to a variety of platforms, browser engines, and devices. Users across the globe may use any combination of browser, device, and platform to access an application.
Cross-browser testing helps to create a consistent behavior across different browsers on different platforms and devices for an application.
In this post, we’ll learn what cross-browser testing is, why it’s important, how to perform it, how to navigate any challenges one might run across, and explore some of the best practices.
Automated Acceptance Testing: A Guide to Get Started There are so many types of automated software testing that learning about all of them and keeping them straight in your head is a challenge.
In this post, we’ll provide a guide on automated acceptance testing, which is essential if you want your applications to meet users’ requirements and stay that way.
Software development often feels like building a bridge while the landscape keeps shifting. Even if the code is correct, it still needs to meet the right user needs.
That’s where validation testing comes in. It ensures the software we build actually solves the problems users care about, not just that it works technically.
While developers focus on making code run, quality teams focus on making sure it delivers real value.
In this post, we explore validation testing—what it is, why it matters, and how it helps ensure the software you build not only works, but actually meets user needs.
Digital transformation is changing how businesses interact with customers. Today, software must do more than work correctly—it needs to deliver a seamless, intuitive experience.
Think of functionality as the engine of an application, and usability as the steering wheel and dashboard that guide the user. If those don’t work, the product fails to deliver value, even if the engine runs perfectly.
That’s where usability testing comes in.
In this post, we’ll explore what automated usability testing is, why it matters, and how agentic AI can enhance your usability testing framework.
If there’s one place you don’t want to rely on last-minute heroics, it’s security. Teams ship faster, attackers get more sophisticated, and the old “pen test right before release” approach no longer holds up.
Automated security testing builds guardrails directly into your delivery pipeline, giving you early and continuous feedback instead of last-minute surprises.
Done well, it helps you catch issues earlier, reduce production risk, and avoid turning every release into a security fire drill.
In this post, we explore automated security testing—what it is, why it matters in modern delivery pipelines, and how it helps teams catch issues earlier without slowing down releases.
Digital inclusion is essential in modern software. Over 1 billion people, about 17 percent of the global population, live with a disability, and digital barriers can limit access to key services like healthcare, jobs, and education.
Accessibility is often treated as a final QA step, which leads to delays and higher costs. Automated Accessibility Testing, especially when enhanced with AI, helps teams catch issues early and build more inclusive products.
This guide explores how to build a modern accessibility program with automation and AI, and why accessibility is about universal design, not just fixes.
Software teams have always cared about quality. But the way they pursue it has changed dramatically.
What started as a final-stage safety net (a QA team that caught bugs before release) has evolved into something far more strategic. Today, quality engineering (QA) sits at the heart of how modern software organizations build, ship, and scale products.
This post will break down everything you need to know: what quality engineering actually is, how it differs from traditional testing, and how teams use automation, data, and modern practices to achieve continuous quality.
User testing helps teams ensure that their digital products meet user expectations and deliver a rich and intuitive experience. However, it’s also a very time-consuming process.
To streamline usability testing workflows, ChatGPT has emerged as a valuable assistant for generating test cases, crafting research questions, and also giving user feedback analysis.
In this guide, you’ll learn how to use ChatGPT for usability testing, as well as follow along with a step-by-step tutorial.

Automated tests break often. A developer renames a button, a field moves, or a locator changes, and suddenly tests fail even though the app still works.
This leads to constant maintenance instead of new test coverage. Self-healing test automation solves this by automatically adapting to UI changes.
This post explains what self-healing test automation is, how it works, why teams use it, and how to start implementing it effectively.
You ask an AI tool a complex question like, “Which test cases should I prioritize for this release?” and it returns an answer that sounds confident but with no logic that you can trace.
There’s no problem with how you chose to prompt the AI, but there can often be a problem with how the AI was asked to think. Chain-of-thought (CoT) prompting is a technique that fixes that.
This guide breaks down what CoT is, how it works, and where it shines, with real examples built for engineers and QA professionals who work with code and testing every day.
Modern software delivery needs automation that can adapt as systems change, but many testing workflows are still fragmented and slow.
AI-powered Model Context Protocol (MCP) servers solve this by giving AI agents a standard way to connect and coordinate testing tools and data across the pipeline.
This post explains what AI MCP servers are, what they enable in testing, and best practices for adopting them.
Modern web applications are complex, handling authentication, payments, third-party integrations, and dynamic content across devices.
As applications grow, manual testing becomes impractical. Automated web app testing helps teams verify functionality across the full stack, not just the UI.
It goes beyond checking buttons, covering workflows, APIs, backend logic, and performance.
This post explains what automated web app testing is and how agentic AI is changing how teams approach it.
Agentic AI is transforming software testing by enabling systems that can plan, execute, and analyze workflows autonomously.
A 2026 survey of 500 executives at $100M+ companies shows strong adoption plans, but scaling remains a challenge due to inconsistent outputs. MCP prompts help QA and DevOps teams bring structure and repeatability to AI-driven workflows, making results more reliable and reusable.
This guide explains MCP prompts and how to apply them in testing, including test generation, defect analysis, and regression evaluation.
Software teams all want to ship high-quality products, but “quality” often means different things to different people, from bug-free code to performance, security, or user experience.
The challenge is that quality is difficult to manage without clear definitions and measurable standards. As systems grow and release cycles accelerate, relying on intuition alone can lead to defects reaching production and increased rework.
This guide explains what software quality means, how to measure it, and how modern teams maintain it at scale.

Every software team has seen it. A small code change gets merged, tests pass, the build is green, and everything looks fine.
Then days later, a critical workflow breaks in production with no obvious connection to the change.
Change impact assessment helps prevent these failures by identifying what a change might affect before it reaches users.
This post explains what change impact assessment is, why it matters, how to do it, and how agentic technology is improving the process.
AI coding tools are accelerating development, but QA teams are struggling to keep up.
Traditional QA relies on writing and maintaining scripts across multiple frameworks, often duplicating effort across web, mobile, and native platforms. This leads to brittle tests, frequent failures, and high maintenance overhead.
In fast-moving Agile teams, this creates too much noise and slows down real defect detection.
In this post, we’ll explore why legacy QA approaches don’t scale and how agentic functional testing helps teams keep up.
Most teams struggle with AI because their tools don’t share context, leading to duplicated effort and missed signals.
Agent orchestration solves this by coordinating AI tools so they work together as a system. Gartner projects that by 2027, 80% of enterprises will use AI-augmented testing tools, up from 10% in 2022.
In this post, we’ll explain what agent orchestration is, how it works, and why it matters for software testing.
You wrote the tests and everything passed, but the real question is whether your tests actually cover the code that matters.
Code coverage tools help answer that by showing how much of your codebase is exercised by tests, not just whether they pass. This is critical for avoiding missed edge cases that can lead to outages, security issues, or audit failures.
In this post, we break down the top code coverage tools, what they do well, where they fall short, and how to choose the right one for your stack.
Traditional automation works well for repeatable tasks in stable environments, but modern software delivery has outpaced static scripts and fixed workflows.
Agentic workflows offer a more flexible approach by focusing on goals instead of hardcoded steps. They can decide what to do next, call the right tools, and adapt when things change.
Gartner, as cited by Slack, expects agentic AI to enable a goal-driven digital workforce that works alongside humans.
In this post, we’ll explain what agentic workflows are, how they work, and how to implement them in a reliable, production-ready way.
It’s increasingly common for employees using generative AI tools to paste sensitive company data into chatbot prompts, often without realizing the risk.
This can include proprietary code, customer PII, and payment card information, sometimes even from unmanaged personal accounts outside corporate visibility.
The issue is rarely malicious intent. It’s usually convenience, like pasting a spreadsheet into ChatGPT to quickly summarize data. But it creates a new, often invisible data loss risk.
This is the problem AI-driven data loss prevention is designed to address.
Microsoft Entra ID and Okta are two leading identity and access management platforms, each taking a different approach to securing enterprise environments. Entra ID is tightly integrated with Microsoft’s ecosystem, while Okta is designed to work across a wide range of tools and cloud applications.
As companies scale their cloud and SaaS usage, choosing the right identity platform has become a key security and architecture decision.
In this post, we compare Microsoft Entra ID vs Okta and break down their strengths, trade-offs, and use cases.

Your app may run perfectly on one device but break on another, frustrating users and risking lost trust and revenue.
In today’s world of diverse platforms, users still expect a flawless experience, regardless of the device or platform they access your app on. That’s why cross-platform testing is no longer optional; it’s critical.
In this guide, we’ll cover what cross-platform testing entails, why it matters, common challenges, and how AI is shaping the future of cross-platform testing.
Software companies constantly look for ways to reduce costs, and test automation is often where budgets quietly spiral out of control.
Many teams invest heavily in automation only to find that maintaining scripts consumes more time than shipping features, with regression testing taking up 40–50% of QA effort.
The good news is that most of this cost comes from predictable patterns with practical fixes.
In this guide, we’ll cover how to reduce test automation costs, including the biggest maintenance traps, what drives inefficiency, and how to improve ROI with better tools and practices.
Quite often, we have seen QA engineers spend hours rewriting scripts because a code change switched the variable name to something else and caused tests to break.
Everything works functionally, but the tests report a failure that now needs to be investigated and fixed! This is the reality most testing teams live in. And it’s exactly the problem agentic testing is designed to solve.
Let’s use this article to deep dive into the world of agentic testing.
Mobile test automation has become essential as apps grow more complex and release cycles get faster. But traditional approaches often struggle with flaky tests, high maintenance, and the challenge of keeping coverage consistent across devices, platforms, and frequent UI changes.
In this guide, we’ll explore mobile test automation, how it works, why it matters, and the key strategies teams use to build reliable, scalable testing for modern mobile applications.
A recent study published in JAMA Internal Medicine shows that the number of skilled nursing facility beds decreased by 2.5% between 2019 and 2024 while operating capacity shrank by 5%.
As capacity tightens, effective bed management has taken center stage for facilities looking to optimize occupancy and protect revenue.
To understand how your facility can make the most of limited space, you need to have a clear understanding of what effective bed management looks like in practice, why it matters, and how to make it happen.
Low census is one of the most persistent operational challenges in post-acute care, directly impacting staffing, revenue, and overall facility performance. When bed occupancy drops, even slightly, it can create ripple effects across admissions, scheduling, and financial stability.
In this post, we’ll explain what low census means, why it happens, and how skilled nursing facilities can address it to improve occupancy and strengthen long-term performance.
Straight from our internal Slack channel—because memes are fun, and so are we.

Thanks for catching up with us and we’ll see you next month. In the meantime, feel free to reach out if you have any questions, want to share your thoughts, or want to talk shop!
Here are Google’s latest AI updates from April 2026
Put a capable coding model inside a developer’s primary workspace, and the IDE stops being a place where you write code. It becomes a place where you direct an agent, watch how it reasons, manage what it pays attention to, and decide when its output is worth shipping. That was the defining theme of the inaugural JetBrains x Codex Hackathon: across roughly 40 submissions over a single weekend, teams explored what it actually means to build with AI natively inside the IDE – not bolted on top of it. The six finalists came up with some of the most compelling answers.
Most coding agents call the model once and hope for the best. As Aditya puts it: “LLMs spend a lot of time thinking in circles.” Hyperreasoning replaces the single shot with something closer to a search: the system drafts several possible approaches to a task, then a learned controller decides which to expand, which to cut, and which to verify against tests. Compiler errors and failing tests feed back into how the controller weighs its options.
Inside the IDE, a tool window renders the search live, so you can watch which paths the controller explored before settling on one. The argument the project makes is that a smaller local model wrapped in this kind of verified search loop can hold its own against much larger frontier models at meaningfully lower cost — with the IDE serving as the place where reasoning becomes visible and directable, rather than a black box that returns code.
Hardware bring-up is a tool-juggling exercise: schematic viewer in one window, vendor apps for the oscilloscope and power supply in others, a terminal talking to the device, a spreadsheet collecting results. Scopecreep collapses that into a single JetBrains tool window. Hand it a circuit schematic and an agent works through testing the board – picking signals worth measuring, capturing the readings, and producing a report.
The design choice worth noticing: when the agent decides a probe needs to be placed, the session pauses and shows the engineer exactly where to put it. The engineer places the probe physically and clicks Resume. It’s the right call for real instruments on a real bench – autonomous, where a computer can be trusted, human-in-the-loop, where the work touches the physical world.
Switch machines mid-task, and your coding agent starts over. mesh-code fixes that by giving agents shared memory of an in-progress project – what’s been tried, what’s been decided, what’s still pending – so a session that begins on one laptop can continue from another, with whichever agent happens to be available. Codex is one of the agents that can plug in.
Long agent sessions accumulate dead weight: tool outputs nobody needs anymore, dead ends, context that was useful ten turns ago and isn’t now. Periscope, built on Wes McKinney’s open-source agentsview, is a JetBrains plugin that shows what’s actually filling up an agent’s working memory turn by turn – and recommends what to do about it, whether that’s continuing, rewinding to a better branching point, compacting, forking, or handing off entirely. It works with Codex and most other coding agents, and everything stays local.
Security incident response is still mostly copy-paste: stack trace into a chat window, repo context explained by hand, a fix written and committed in the hope it’s safe. SecureLoop turns that into a controlled loop inside JetBrains. When something breaks in production, the agent gathers the relevant code, the project’s security rules, and the state of its dependencies, then asks Codex for a structured diagnosis and a proposed fix. That fix runs through automated checks before any pull request opens.
The PR opens automatically. The merge does not. SecureLoop surfaces everything that informed the decision – the diff, the policy it bumped into, the test that proved the patch – inside the IDE for the developer to approve or reject. As the team put it: “Codex fully makes the PR ready for you, and it remains human-in-the-loop where you have to approve or deny.”
The team’s bigger thesis is a security-policy.md file that lives in the repo alongside README.md, spelling out a project’s specific rules for handling secrets, errors, and risky patterns. Coding agents read it before suggesting changes, so the question stops being “what’s a good fix?” and becomes “what’s an acceptable fix under this codebase’s rules?”
Frontend feedback delivered through a chat window is unavoidably vague. “Move that element” or “change that color” leaves the agent guessing which element you actually mean. Pinpoint takes that piece of the ambiguity off the table: developers drop pins directly on a live page, attach a comment to each, and send the whole batch to the agent with precise on-page context attached. The agent now knows exactly which element you meant – even if it still has to figure out what change you want.
The project ships in two pieces: one for annotating web pages in a browser, and a desktop companion for marking up anything visible on screen – useful when the interface in question isn’t a web page.
Looking across these six projects, a clear pattern emerges. Codex embedded in the IDE isn’t just a faster way to write code – it’s a reasoning layer you can watch think, a structured output engine you can direct, a participant in workflows that span hardware instruments, production alerts, shared session state, and context windows. And the IDE becomes the place where all of that comes together: visible, controllable, and version-controlled.
That’s the possibility these teams spent a weekend proving out, and it’s only the beginning.
Phishing campaigns continue to improve sophistication and refinement in blending social engineering, delivery and hosting infrastructure, and authentication abuse to remain effective against evolving security controls. A large-scale credential theft campaign observed by Microsoft Defender Research exemplifies this trend, using code of conduct-themed lures, a multi-step attack chain, and legitimate email services to distribute fully authenticated messages from attacker-controlled domains.
The campaign targeted tens of thousands of users, primarily in the United States, and directed them through several stages of CAPTCHA and intermediate staging pages designed to reinforce legitimacy while filtering out automated defenses. The lures in this campaign used polished, enterprise-style HTML templates with structured layouts and preemptive authenticity statements, making them appear more credible than typical phishing emails and increasing their plausibility as legitimate internal communications. Because the messages contained concerning accusations and repeated time-bound action prompts, the campaign created a sense of urgency and pressure to act.
The attack chain ultimately led to a legitimate sign-in experience that was part of an adversary‑in‑the‑middle (AiTM) phishing flow, which allowed the attackers to proxy the authentication session and capture authentication tokens that could provide immediate account access. Unlike traditional credential harvesting, AiTM attacks intercept authentication traffic in real time, bypassing non-phishing-resistant multifactor authentication (MFA).
In this blog, we’re sharing our analysis of this campaign’s lures, infrastructure, and techniques. Organizations can defend against financial fraud initiated through phishing emails by educating users about phishing lures, investing in advanced anti-phishing solutions like Microsoft Defender for Office 365 and configuring essential email security settings, and encouraging users to employ web browsers that support SmartScreen. Organizations can also enable network protection, which lets Windows use SmartScreen as a host-based web proxy.
Between April 14 and 16, 2026, the Microsoft Defender Research team observed a series of sophisticated phishing campaigns targeting more than 35,000 users across over 13,000 organizations in 26 countries, with majority of targets located in the United States (92%). The campaign did not focus on a single vertical but instead impacted a broad range of industries, most notably Healthcare & life sciences (19%), Financial services (18%), Professional services (11%), and Technology & software (11%). Messages were distributed in multiple distinct waves between 06:51 UTC on April 14 and 03:54 UTC on April 16.


Emails in this campaign posed as internal compliance or regulatory communications, using display names such as “Internal Regulatory COC”, “Workforce Communications”, and “Team Conduct Report”. Subject lines included “Internal case log issued under conduct policy” and “Reminder: employer opened a non-compliance case log”.
Message bodies claimed that a “code of conduct review” had been initiated, referenced organization-specific names embedded within the text, and instructed recipients to “open the personalized attachment” to review case materials. At the top of each message, a notice stated that the message had been “issued through an authorized internal channel” and that links and attachments had been “reviewed and approved for secure access”, reinforcing the email’s purported legitimacy. To further support the confidentiality of the supposed review, the end of each message contained a green banner stating that the contents had been encrypted using Paubox, a legitimate service associated with HIPAA-compliant communications.

Analysis of the sending infrastructure indicated that the campaign emails were sent using a legitime email delivery service, likely originating from a cloud-hosted Windows virtual machine. The messages were sent from multiple sender addresses using domains that are likely attacker-controlled.
Each campaign email included a PDF attachment with filenames such as Awareness Case Log File – Tuesday 14th, April 2026.pdf and Disciplinary Action – Employee Device Handling Case.pdf. The attachment provided additional context about the supposed conduct review, including a summary of the review process and instructions for accessing supporting documentation. Recipients were directed to click a “Review Case Materials” link within the PDF, which initiated the credential harvesting flow.

When clicked, users were initially directed to one of two attacker-controlled domains (for example, acceptable-use-policy-calendly[.]de or compliance-protectionoutlook[.]de). These landing pages displayed a Cloudflare CAPTCHA, presented as a mechanism to validate that the user was coming “from a valid session”. This CAPTCHA likely served as a gating mechanism to impede automated analysis and sandbox detonation.

After completing the CAPTCHA, users were redirected to an intermediate site designed to prepare them for the final stage of the attack. This page informed users that the requested documentation was encrypted and required account authentication. While this stage of the attack has several hallmarks of device code phishing, we were only able to confirm the AITM portion of the attack chain.

After clicking the provided “Review & Sign” button, users were presented with a sign-in prompt requesting their email address.

After submission, users were required to complete a second CAPTCHA involving image selection.

Once these steps were completed, users were shown a message indicating that verification was successful and that their “case” was being prepared.

Following these steps, users were redirected to a third site hosting the final stage of the attack. Analysis of the underlying code indicates that the final destination varied depending on whether the user accessed the workflow from a mobile device or a desktop system.

On the final page, users were informed that all materials related to their code of conduct review had been “securely logged”, “time-stamped”, and “maintained within the organization’s centralized compliance tracking system”. They were then prompted to schedule a time to discuss the case, which required signing in to their account.

Selecting the “Sign in with Microsoft” option redirected users to a Microsoft authentication page, initiating an AiTM session hijacking flow designed to capture authentication tokens and compromise user accounts.
Microsoft recommends the following mitigations to reduce the impact of this threat. Check the recommendations card for the deployment status of monitored mitigations.
Microsoft Defender customers can refer to the list of applicable detections below. Microsoft Defender coordinates detection, prevention, investigation, and response across endpoints, identities, email, apps to provide integrated protection against attacks like the threat discussed in this blog.
| Tactic | Observed activity | Microsoft Defender coverage |
| Initial access | Phishing emails | Microsoft Defender for Office 365 – A potentially malicious URL click was detected – A user clicked through to a potentially malicious URL – Suspicious email sending patterns detected – Email messages containing malicious URL removed after delivery – Email messages removed after delivery – Email reported by user as malware or phish |
| Persistence | Threat actors sign in with stolen valid entities | Microsoft Entra ID Protection – Anomalous Token – Unfamiliar sign-in properties – Unfamiliar sign-in properties for session cookies Microsoft Defender for Cloud Apps – Impossible travel activity |
Microsoft Security Copilot is embedded in Microsoft Defender and provides security teams with AI-powered capabilities to summarize incidents, analyze files and scripts, summarize identities, use guided responses, and generate device summaries, hunting queries, and incident reports.
Customers can also deploy AI agents, including the following Microsoft Security Copilot agents, to perform security tasks efficiently:
Security Copilot is also available as a standalone experience where customers can perform specific security-related tasks, such as incident investigation, user analysis, and vulnerability impact assessment. In addition, Security Copilot offers developer scenarios that allow customers to build, test, publish, and integrate AI agents and plugins to meet unique security needs.
Microsoft Defender XDR customers can use the following threat analytics reports in the Defender portal (requires license for at least one Defender XDR product) to get the most up-to-date information about the threat actor, malicious activity, and techniques discussed in this blog. These reports provide the intelligence, protection information, and recommended actions to prevent, mitigate, or respond to associated threats found in customer environments.
Microsoft Security Copilot customers can also use the Microsoft Security Copilot integration in Microsoft Defender Threat Intelligence, either in the Security Copilot standalone portal or in the embedded experience in the Microsoft Defender portal to get more information about this threat actor.
Microsoft Defender XDR customers can run the following advanced hunting queries to find related activity in their networks:
Campaign emails by sender address
The following query identifies emails associated with this campaign using a message’s sending email address.
EmailEvents
| where SenderMailFromAddress in (" cocpostmaster@cocinternal.com "," nationaladmin@gadellinet.com ","
nationalintegrity@harteprn.com”,” m365premiumcommunications@cocinternal.com”,” documentviewer@na.businesshellosign.de”)
| Indicator | Type | Description | First seen | Last seen |
| compliance-protectionoutlook[.]de | Domain | Domain hosting malicious campaign content | 2026-04-14 | 2026-04-16 |
| acceptable-use-policy-calendly[.]de | Domain | Domain hosting malicious campaign content | 2026-04-14 | 2026-04-16 |
| cocinternal[.]com | Domain | Domain hosting sender email address | 2026-04-14 | 2026-04-16 |
| Gadellinet[.]com | Domain | Domain hosting sender email address | 2026-04-14 | 2026-04-16 |
| Harteprn[.]com | Domain | Domain hosting sender email address | 2026-04-14 | 2026-04-16 |
| Cocpostmaster[@]cocinternal.com | Email address | Email address used to send campaign emails | 2026-04-14 | 2026-04-16 |
| Nationaladmin[@]gadellinet.com | Email address | Email address used to send campaign emails | 2026-04-14 | 2026-04-16 |
| Nationalintegrity[@]harteprn.com | Email address | Email address used to send campaign emails | 2026-04-14 | 2026-04-16 |
| M365premiumcommunications[@]cocinternal.com | Email address | Email address used to send campaign emails | 2026-04-14 | 2026-04-16 |
| Documentviewer[@]na.businesshellosign.de | Email address | Email address used to send campaign emails | 2026-04-14 | 2026-04-16 |
| Awareness Case Log File – Monday 13th, April 2026.pdf | Filename | Name of PDF attachment containing phishing link | 2026-04-14 | 2026-04-14 |
| Awareness Case Log File – Tuesday 14th, April 2026.pdf | Filename | Name of PDF attachment containing phishing link | 2026-04-15 | 2026-04-15 |
| Awareness Case Log File – Wednesday 15th, April 2026.pdf | Filename | Name of PDF attachment containing phishing link | 2026-04-16 | 2026-04-16 |
| 5DB1ECBBB2C90C51D81BDA138D4300B90EA5EB2885CCE1BD921D692214AECBC6 | SHA-256 | File hash of campaign PDF attachment | 2026-04-14 | 2026-04-16 |
| B5A3346082AC566B4494E6175F1CD9873B64ABE6C902DB49BD4E8088876C9EAD | SHA-256 | File hash of campaign PDF attachment | 2026-04-14 | 2026-04-16 |
| 11420D6D693BF8B19195E6B98FEDD03B9BCBC770B6988BC64CB788BFABE1A49D | SHA-256 | File hash of campaign PDF attachment | 2026-04-14 | 2026-04-16 |
For the latest security research from the Microsoft Threat Intelligence community, check out the Microsoft Threat Intelligence Blog.
To get notified about new publications and to join discussions on social media, follow us on LinkedIn, X (formerly Twitter), and Bluesky.
To hear stories and insights from the Microsoft Threat Intelligence community about the ever-evolving threat landscape, listen to the Microsoft Threat Intelligence podcast.
The post Breaking the code: Multi-stage ‘code of conduct’ phishing campaign leads to AiTM token compromise appeared first on Microsoft Security Blog.