Disabling the sandbox for the rest of the session now takes effect immediately, so shell and search commands stop re-prompting to bypass it mid-turn
Subagent sessions keep parent tool restrictions
Show warnings and errors when host custom agents fail to load
Require session limits to be at least 30 AI credits
Add Claude Sonnet 5 as a supported model
Allow tool calls to continue when hooks time out
Ctrl+Q now enqueues the highlighted slash-command argument completion
MCP OAuth against Microsoft Entra servers behind a tenant vanity domain (e.g. Copilot Studio) no longer fails to refresh or re-authenticate (AADSTS9010010 / AADSTS90023)
Prompt mode exit summary shows a resume hint to continue the session
After publicly touting pull request limits as a way to cut maintainer noise, GitHub is taking the same idea further with a new setting that lets repository admins restrict issue creation to collaborators only.
Every day, GitHub engineers introduce new dependencies into the GitHub platform, internal applications, and open source projects. GitHub is not just the home of open source; it is powered by open source! And an important part of using open source responsibly is respecting the licenses that govern the projects you depend on.
At GitHub, we are committed to upholding our obligations to the open source community and to the dependencies we use. Here’s how our Open Source Program Office (OSPO) uses the new GitHub License Compliance feature to manage thousands of dependencies.
Managing the open source license compliance process
Nearly all software carries some kind of license agreement. The license gives you permission to use a project, provided you comply with its obligations. Those obligations may be as simple as giving credit to the original author in your documentation, or they may require you to distribute all your source code when shipping your program. In some cases, licenses may also restrict certain activities or categories of use.
Your organization likely has its own policies about acceptable licenses based on your business model, software ecosystem, and distribution strategy. For example, suppose your organization sells a commercial, closed source binary application. You may want to prevent dependencies that would require you to open source your proprietary code.
Or, you may have a project that you plan to release as an open source package. In this case, you may want to avoid including dependencies governed by commercial or incompatible open source licenses.
If you can’t comply with the obligations required in either scenario, you should avoid the dependency to prevent legal or operational risks. It may require engineering effort to remove these licenses after the fact. For enterprise software, the business risk of noncompliance is huge because it can lead to costly litigation and reputational damage.
Traditionally, license reviews have been performed manually or with third-party software. But now, GitHub has introduced a license compliance feature for GitHub Advanced Security customers, enabling you to review new dependencies directly on pull requests. This review helps ensure that the licenses for those dependencies’ comply with your policy, while also giving you the flexibility to expand your policy to allow new licenses or individual projects.
Two months ago, GitHub’s OSPO migrated from internal-only tools that we’d built to manage compliance onto the new feature. As early adopters, we gave the development team quick feedback and helped ensure the feature would clear the bar for large, fast-moving enterprises with complex compliance requirements.
Setting up for policy success
Because GitHub had built internal license compliance tools prior to the introduction of the product, we had an existing list of acceptable licenses to use as our initial policy. You’ll likely find that many dependencies use common permissive licenses such as MIT, Apache 2.0, and BSD-3-Clause, which are a good starting list to seed your policy. We initially rolled the feature out using the “Evaluate” mode on an organization-wide ruleset, which generated annotations in pull requests without blocking merges, so we were able to get developers accustomed to the new workflow without impeding their productivity. Running the old and new tools in parallel also let us see if their behavior diverged. After about a month of this mode of operation, we got to a state where the alerts were mainly on packages with unusual, missing, or explicitly disallowed licenses.
How GitHub license compliance works
Under the hood, license compliance checks are enabled via rulesets. We target repositories via a custom property, where the value of the property determines whether license checks are enabled in “Active” or “Evaluate” mode. In repositories that are targeted by a ruleset, pull requests that modify a project’s dependencies trigger a scan that looks up the licenses used by each of the new dependencies. If the new dependencies’ licenses are already permitted, or there are package-specific exceptions, the checks pass. If there are failures, either in the direct or transitive dependencies, the tool comments on the pull request with alerts for each problematic package.
The developer then reviews the alerts. If they decide the dependency is unacceptable, they can update their code or close the pull request to remove it. If they believe the license or package should be allowed, they can raise an exception request which will notify a specific team in the organization who can decide whether and how to amend the policy.
A day in the life of the license policy team
GitHub’s license policy team consists of OSPO members and engineers with expertise in license reviews and supply chain analysis. Since we are a worldwide company, our policy review team has members across time zones to review alerts in a timely manner. We are in the process of formalizing an SLA for reviewing license requests, but in practice it’s rarely more than a couple of hours before we can triage an incoming request.
Team members receive email notifications of new review requests and can also access a dashboard to see the backlog of pending requests.
When approving a request, we have two decision points: first, whether to permit the license or the package. Then, decide what scope – enterprise or repository – to use. If it’s a safe license that simply hasn’t shown up before, we’ll add it at the enterprise level and thus allow dependencies with that license anywhere at GitHub. Some packages carry a commercial license which can’t be permitted everywhere but should be allowed in the repository owned by a team which has paid for the software, so those policy amendments get added at the repository level. Package exceptions are useful for internal software which usually doesn’t have license data associated with it. Helpfully, the tool supports wildcard matches for package exceptions. For example, we’ve permitted everything in the @github-ui/* React namespace, so we don’t need to approve those packages one by one.
Making it easy for developers
To support this process, we’ve established procedures about contacting the GitHub OSPO, and how to use an emergency “break glass” override. These situations should be rare, but a clear emergency override process is essential for critically time-sensitive pull requests. As we mentioned above, the license policy enforcement happens via ruleset, and the ruleset condition keys off a custom property. So toggling the value of the property can temporarily turn off enforcement if there’s a critical fix that’s blocked by a license alert. So far, we’ve only needed to use this once, but it was very helpful to have the option.
We’ve also provided internal documentation and training to help developers understand the importance of license compliance. Ultimately, it’s everyone’s job to help ensure compliance and manage risk and it’s our job to make that as easy as possible.
Wrapping up
License compliance is a critical part of managing our software supply chain. By helping developers make informed dependency choices aligned with GitHub’s license policy we prevent costly rewrites and potential legal problems. We’ve been enthusiastically using and providing feedback on the new GitHub License Compliance feature for several months. Now that it’s in public preview, we are excited to see more companies adopt it and hope our experience provides some guidance if you’re just getting started.
GitHub Enterprise Cloud customers can use the License Compliance feature across repositories which have an active GHAS Code Security license. For more information, see About open source license compliance.
SkiaSharp 4.148.0 is the first stable v4 release, bringing a newer Skia engine, API cleanup, performance work and a Microsoft-Uno co-maintenance model.
The practice of tokenmaxxing appears to be dying out, even before I had a chance to write about it. Good riddance. Burning tokens to create the appearance of productivity was fated to last only until the accountants learned about it, and the strictest of all accountants is one’s personal checkbook. What got many developers thinking about the cost of AI was the change in GitHub Copilot’s usage charges. The cost of Copilot went from a monthly fee with unlimited use to a monthly fee that purchased a limited number of credits, which are used to pay the AI provider of your choice. One credit is equivalent to US$0.01; when you’ve used up your credits, you can upgrade your account or pay for additional credits as you go.
The question isn’t why this didn’t happen earlier; it’s why this happened now. Tokenmaxxing is both the creation and victim of two large-scale trends in AI. First, starting with OpenAI, the major AI providers were all playing a blitzscaling game that prioritized user growth over profitability. Giving AI services away for free got you more users, and in the long run, scalers would figure out how to make money from end-user fees, selling user data, or advertising. This process inevitably ends in enshittification, and is still very much the road we’re on.
Second, token usage exploded late in 2025. The appearance of “reasoning models,” which use tokens to maintain an internal dialog in the course of solving a problem, increased the number of tokens used to respond to each prompt. Reasoning tokens are a model’s conversation with itself about possible responses to the prompt, and are often more numerous than the prompt and response themselves. Whether or not users see the reasoning process (often they don’t), reasoning tokens add to the bill. They are frequently counted as “output tokens” because they are generated by the model, and are more expensive than input tokens.
The appearance of agents also multiplied the rate at which users consumed tokens. In May, 2025, Simon Willison quoted Anthropic’s Hannah Moran’s definition of an agent: “Agents are models using tools in a loop.” The Tredence blog writes: “The agent loop is a repeating cycle in which the AI reads the current data, thinks through what it means, chooses an action, carries it out, checks what happens and starts over.” If you’ve ever watched Claude Code, OpenClaw, or any other agent work, a single request can become many calls to a model, each one using hundreds of tokens, if not thousands. In addition to the current request, one agent-generated invocation can contain the task’s entire accumulated context and relevant documents. Between reasoning tokens and agents, token usage goes up by a factor of hundreds.
The increase in token usage might not be an issue if it results in problems being solved and tasks completed more effectively. But it collides with the loss-leader pricing of the blitzscalers; their willingness to operate at a loss to gain control of a market has limits. Regardless of whether the number of AI users is increasing, the amount of computation, and therefore cost, per user grows as the use of agents increases. Reasoning models increased token usage; agents compounded the problem; and that led to price increases.1 Microsoft/GitHub doesn’t want to pay Copilot customers’ AI bills. We haven’t yet seen across-the-board price increases from the AI providers themselves. But we have seen GitHub’s token credits, and we have seen Anthropic and OpenAI price more capable models significantly higher than older or less capable models. Fable is twice as expensive as Opus 4.8, and while some writers have called this pricing “fantastic,” that’s probably because they were expecting an even greater increase. While Fable can delegate tasks to Anthropic’s less expensive models, most early users observe that with Fable, token use goes up rather than down. Anthropic’s switch to token-based billing for its agent SDK (currently on hold) is another signal that the days of inexpensive AI are coming to an end. OpenAI’s story is similar: GPT 5.5 costs twice as much GPT 5.4 per million tokens.
It’s also important to take capacity into account. Huge data centers have been in the news, but those data centers haven’t been built yet. More important, the electrical infrastructure needed to support those data centers—transmission lines, generators—hasn’t been built either, and that’s not an investment over which AI companies have much control. They can build their own power generation facilities on a data center campus, but that’s a huge investment in technologies that they’re not familiar with. And even if you generate power locally, you need other kinds of infrastructure: rail for coal, pipelines for gas. This isn’t (yet) an essay about data center power consumption and its consequences, but it is another factor that limits increased token usage. We’ve seen Anthropic’s outages blamed on capacity, and Anthropic has responded by leasing unused data center capacity from SpaceX. But the other way to respond to increased demand that can’t be met by current capacity is to increase prices, limiting customers to those who can afford to pay. That increase is being noticed by managers, accountants, and independent developers.
Token optimization and accountability are the inevitable consequence of upward pressure on token price. One way to build accountability is through better governance, a route Bennie Haelen describes in “The Subsidy Ended: What Tool-Using Agents Actually Cost.” Better governance is achieved through building an observability layer that lets you see exactly what the agents and models are doing. With a well-designed observability layer, you can see whether the data sent to the model is growing with each invocation, whether the model is using appropriate tools, whether tools are being called repeatedly, and a lot of other information that will tell you whether your agent is running efficiently.
Another piece of token accountability is understanding which models are running your agent’s requests. General-purpose reasoning models range from expensive high-performance models like Claude Fable or Opus 4.8 to models like Gemma 4 26B that can run on a well-equipped laptop, and some models that are even smaller. While it’s tempting to say “I need the best; I’ll run Opus 4.8 or Fable with maximum reasoning,” most requests don’t require that level of reasoning or expense. Agents will be able to decide what model is best for processing every request. Fable can delegate, and we expect other frontier providers to follow as models incorporate agent capabilities. And there’s an active world of open models outside of the frontier AI providers. Vicki Boykis writes that models running locally now work almost as well as frontier models. Tools like OpenRouter give you a model-independent way of routing requests to different models, including open models that run locally. OpenRouter can be integrated with OpenClaw, Claude Code, Cursor, Codex, and other agents to provide intelligent routing.
Tokenmaxxing is dying. It will no doubt take time for its vestiges to die away, and there will always be developers who think they can game the path to a promotion, along with managers who insist on being “all in” with AI. But spending tokens responsibly is now the norm, whether you pay with your own checkbook or a company account. Token optimization will only become more important as per-token charges increase. They undoubtedly will.
Footnotes
Some articles make the strange claim that tokens have gotten cheaper by up to 98%. GPT-5.5 suggests that these writers are considering the work that can be done per token. That comparison may be worthwhile, though it’s unclear how to compare GPT-3 with 5.5 or Fable meaningfully. For this article, a token is a token. ︎
As enterprise deployments mature, some enterprise AI agents are shifting from reading content to taking action. In this post, Microsoft Incident Response walks through an attack pattern that targets the fastest growing part of the agentic AI supply chain: Model Context Protocol (MCP) tools. The post provides a practical playbook for detecting, containing, and preventing this class of attack using Microsoft security controls.
AI agents can plan multi-step tasks, decide which tools to invoke, and execute actions on behalf of the user. Microsoft 365 Copilot can draft and send email, create documents, and update calendar entries. Copilot Studio and Azure AI Foundry allow organizations to build custom agents that connect to business systems through MCP. As AI is increasingly used in read-write workflows, the impact profile of vulnerabilities may shift. A prompt injection against a summarizer can bias an output. A prompt injection against an agent can trigger an action.
According to the International Data Corporation (IDC), the number of active AI agents in enterprises is projected to grow from 28.6 million in 2025 to more than 2.2 billion by 2030. That scale is why the OWASP Top 10 for Agentic Applications, released in December 2025, now sits alongside the LLM Top 10 as a reference framework for defenders. This post focuses on one of its fastest-moving categories: tool misuse and agentic supply chain risk exploited through poisoned MCP tool metadata.
Attack pattern: MCP tool poisoning in a finance workflow
The pattern below maps to ASI02 – Tool Misuse and ASI04 – Agentic Supply Chain Vulnerabilities. It reflects techniques first disclosed by Invariant Labs in April 2025 and observed in 2026 against a growing range of enterprise agents.
The environment
A financial operations team builds a Copilot Studio agent to help analysts handle vendor invoices. The agent has generative orchestration enabled and connects to three tools: a Dataverse MCP server holding the approved vendor master, an Outlook connector for vendor correspondence, and a third-party invoice enrichment MCP server added to validate banking details against an external reference database. The third-party server is reviewed by the team’s service owner lead and approved for production use. No separate security review is performed.
Attack chain overview
Phase 1: Tool description poisoning. A developer pushes an update to the enrichment server. The tool name and user-facing summary remain unchanged, but the MCP tool description is silently modified. This description is the natural-language metadata the agent reads to decide how and when to call the tool. Buried within what appears to be legitimate formatting guidance is a hidden block of instructions directing the agent to retrieve the last thirty unpaid invoices, summarize them, and attach that summary as an additional parameter in the enrichment call—framed as a fraud-heuristic requirement.
Phase 2: Silent re-trust.The MCP reflects tool metadata updates dynamically. In configurations where description changes do not trigger a re-approval workflow, the updated instructions become active without additional review. The poisoned description is live in production.
Phase 3: User invocation. A financial analyst asks the agent a routine question about a supplier. Without any visible indication, the agent follows the hidden instructions embedded in the poisoned tool description, collecting sensitive financial records beyond the scope of the original request and forwarding them as part of the enrichment call, as if it were a normal part of the request.
Phase 4: Exfiltration. The enrichment server returns a plausible “validated” response and silently logs the attached invoice summary to a threat actor-controlled endpoint. The analyst sees a clean answer. No alert may fire in default configurations. Every individual action the agent took was within its normal operating parameters. This pattern does not exploit a vulnerability in Copilot itself, but rather a trust boundary introduced by external tool integrations.
Figure 1:Attack flow for MCP tool poisoning of a Copilot Studio agent, with Microsoft controls mapped to each stage.
Why this pattern is effective
Each action the agent takes on its own is legitimate. The tool is approved, the Dataverse query inherits the analyst’s permissions, and the outbound call goes to a server that was allowlisted when it was added. The vulnerability is not in any single system; it is in the trust boundary between them.The MCP blends instructions (tool descriptions) with data, so a change to a tool’s metadata can redirect the agent’s behavior as effectively as a change to its system prompt. The agent cannot distinguish between a legitimate instruction authored by its owner and a malicious instruction inserted by an upstream maintainer.
Mitigation and protection guidance
Detection and response with Microsoft security tools
The controls mapped in Figure 1 apply at four points in the attack chain, each supported by a specific Microsoft capability:
Govern the supply chain. Maintain a tenant-level allowlist of approved MCP publishers and servers. The Microsoft MCP catalog provides a list of first-party servers, review and assess where provenance is verifiable. Disable Allow all on MCP connections and enable only the specific tools an agent needs.
Inspect tool metadata. Use Prompt Shields in Azure AI Content Safety to inspect content flowing from MCP tool responses and descriptions into agent context. Defender for Cloud’s AI workload protection alerts on suspicious prompts and tool outputs at runtime. Review metadata changes to production tools with the same rigor as changes to system prompts.
Guard the action. Microsoft Purview Data Loss Prevention (DLP) policies inspect tool call parameters and can block sensitive data in outbound payloads. For high-impact actions such as financial data access, external sharing, or account changes, configure human-in-the-loop approval through Copilot Studio. Assign each agent a non-human identity in Microsoft Entra Agent ID and apply Conditional Access to its workload identity.
Correlate the chain. When MCP server telemetry is instrumented and forwarded to Microsoft Sentinel, it can be correlated against agent behavior signals to flag anomalous sequences. Microsoft Defender for Cloud Apps surfaces new external endpoints an agent has started interacting with. Microsoft Purview audit logs provide the evidence trail for investigation and post-incident review.
Three principles for agent supply chain governance
Treat every MCP server as part of the supply chain. Every MCP server an agent can call is a production dependency. Maintain an inventory of approved publishers, review tool descriptions during security review rather than relying on tool names alone, and require a documented owner for any third-party server before production use.
Treat tool descriptions as system prompts. Because models can read tool metadata as part of their working context, a change to that metadata is equivalent to a change in agent instructions. Require change review for tool description updates on critical agents and use Prompt Shields to inspect metadata for imperative language that does not belong in a documentation field.
Apply least agency, not just least privilege. There are important factors to consider for permissions. Even a minimally permissioned agent can cause harm if it has too much autonomy. Turn off Allow all tool access, require human approval for high-impact actions, and establish baseline agent behaviors in Microsoft Sentinel so that deviations from the norm—such as new endpoints, expanded parameters, or unusual query patterns—trigger alerts.
Conclusion
Agents that act on behalf of users depend on a supply chain of tools that is growing as governance programs continue to evolve. A threat actor who modifies a tool description may influence agents that rely on it, even without directly involving a user, a prompt, or a credential. The OWASP Top 10 for Agentic Applications provides the framework.
Microsoft security capabilities—including Copilot Studio guardrails, Prompt Shields, Defender for Cloud AI Protection, Microsoft Entra Agent ID, Microsoft Purview DLP, Microsoft Defender for Cloud Apps, and Microsoft Sentinel—provide the controls. What remains is to apply them deliberately to agentic workflows: scope permissions, govern the tool supply chain, monitor agent behavior, and perform red teaming exercises before deployment.
Microsoft follows coordinated disclosure practices and is not disclosing details of any specific affected organization.
This research is provided by Microsoft Defender Security Research, Mohammed Zaid, and with contributions from members of Microsoft Threat Intelligence.
To hear stories and insights from the Microsoft Threat Intelligence community about the ever-evolving threat landscape, listen to the Microsoft Threat Intelligence podcast.
Review our documentation to learn more about our real-time protection capabilities and see how to enable them within your organization.