Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
155159 stories
·
33 followers

Valve says it’s ready to launch the Steam Machine this summer

1 Share
The e-paper display that Valve internally built for this Steam Machine displays system stats like CPU and GPU temperature and fan speed.

Valve now says that the delayed Steam Machine PC and Steam Frame VR headset are set to launch sometime this summer. In a Thursday blog post detailing its Verified programs for both pieces of hardware, Valve concludes by saying that "We're excited for players to try your titles on the new Steam hardware once they launch this summer."

When the company originally announced the Machine and Frame alongside its new Steam Controller late last year, it said that it would start shipping the new gadgets in early 2026. But in February, the company announced that the ongoing memory and storage crunch had forced it to revisit its pricing and shipping pl …

Read the full story at The Verge.

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

VS Code 1.123 Adds Agent Session Sync, 1M Context Windows

1 Share
Microsoft released Visual Studio Code 1.123 on June 3, adding agent-focused features, larger model context support, integrated browser updates and a new delay for some automatic extension updates.
Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Updating the taxonomy of failure modes in agentic AI systems: What a year of red teaming taught us

1 Share

When the Microsoft AI Red Team published the Taxonomy of Failure Modes in Agentic AI Systems in April 2025, the goal was a shared vocabulary for a threat landscape that did not fit existing frameworks. The v1.0 taxonomy was largely forward-looking, built on practitioner interviews, cross-company threat modeling, and our own early operational experience. It identified novel failure modes unique to agentic systems (agent compromise, injection, impersonation, flow manipulation) alongside existing failure modes materially amplified in agentic contexts (memory poisoning, cross-domain prompt injection, human-in-the-loop bypass). 

Twelve months later, the evidence base has shifted enough to warrant a v2.0. The update adds seven new failure mode categories, expands the mitigations section, and grounds the framework in 12 months of red team engagements against deployed agentic systems.

Why the Taxonomy Needed Updating

Four developments drove the revision. 

Open-source agentic frameworks went mainstream faster than the security community was ready for. OpenClaw, launched in January 2026, accumulated over 336,000 GitHub stars and spawned more than 2,100 agents within 48 hours of release. A security audit conducted shortly after launch identified 512 vulnerabilities including CVE-2026-25253, a one-click RCE via WebSocket hijacking. Over 1,800 exposed instances were leaking API keys and credentials within the first week, and 336 malicious plugins were found in the skills marketplace, including credential stealers masquerading as trading bots. 

The MCP ecosystem matured — and accumulated vulnerabilities at scale. The Model Context Protocol became the de facto standard for connecting models to external tools. In 2025, 99 CVEs were published for MCP-related software, and tool poisoning moved from theoretical risk to live attack surface. 

Computer-use agents moved from research to production. Agents that observe and interact with graphical interfaces introduce attack surfaces with no analogue in earlier AI security work, and expose previously human-targeted attack patterns to LLMs. The original taxonomy lacked dedicated coverage for this capability class; operational experience made clear it requires its own category. 

Twelve months of red team operations provided empirical grounding. The v1.0 taxonomy was forward-looking. The v2.0 update is grounded in patterns observed across real engagements with findings that confirmed some predictions, falsified others, and surfaced failure modes that were not anticipated. 

Seven new failure modes

1. Agentic Supply Chain Compromise. Agentic systems consume plugin registries, MCP servers, prompt templates, and third-party tool integrations, each a new supply chain ingestion point. Unlike traditional supply chain compromise, which delivers malicious code, a compromised agentic supply chain component injects natural-language instructions that alter agent behavior without touching any binary. This is a novel failure mode: the attack surface did not exist before agents began consuming natural-language tool definitions from third-party registries. 

2. Goal Hijacking. The original taxonomy covered agent compromise but did not sufficiently distinguish the mechanism of compromise from the strategic objective of redirecting the agent’s goal state. Goal hijacking captures a specific pattern, when adversarial instructions that appear aligned with legitimate task completion silently redirecting the agent’s terminal goal, without fully compromising the underlying agent. 

3. Inter-Agent Trust Escalation. Multi-agent architectures involve delegation chains where orchestrators pass tasks to other agents. This entry addresses privilege escalation that becomes possible when a compromised agent asserts false identity or inflates claimed permissions to an orchestrator that does not independently verify them. The pattern mirrors confused deputy problems in traditional software, but the confusion is induced through natural language rather than system calls. 

4. Computer Use Agent (CUA) Visual Attack. Agents operating through graphical interfaces can be manipulated through visual content that appears innocuous to humans but carries adversarial instructions for the agent. Attack patterns include hidden text rendered at non-human-readable scale, UI elements positioned outside the visible viewport, and images embedding prompt injection in content the agent is instructed to interpret. This failure mode has no meaningful precedent in v1.0. 

5. Session Context Contamination. Agentic sessions often span extended, multi-step interactions with context accumulating from prior steps. Session context contamination occurs when an adversary introduces data early in a session that biases the agent’s reasoning in subsequent steps, without triggering safety controls at any individual step. 

6. MCP / Plugin Abuse. The original taxonomy’s coverage of function compromise predated standardization around MCP and plugin protocols. This entry captures attack surfaces specific to those protocols: tool description poisoning, server-side instruction injection, cross-server instruction override (a malicious server overriding behavior of trusted servers), and abuse of protocol-level trust assumptions. 

7. Capability / Architecture Disclosure. This failure mode occurs when an agent reveals internal implementation details such as tool names and schemas, system-prompt structure, memory interfaces, or consent/HitL trigger logic, either on direct request or via paths such as XPIA. In single-turn chat, prompt leakage is mostly reputational. In agentic systems, it exposes operational primitives and turns black-box probing into a white-box exploit path. 

Operational findings: What red teaming showed

Twelve months of engagements against deployed agentic systems produced several consistent patterns. 

HitL bypass was the most consistently exploited failure mode, at very high frequency. Red teamers achieved bypass through consent fatigue, manipulation of probabilistic invocation, and incremental escalation chains where no individual step clearly warranted review but the compound outcome did. Most significantly, several engagements demonstrated zero-click end-to-end chains starting from an external input with no human interaction beyond the initial agent invocation, achieving high-impact outcomes such as exfiltration or lateral movement. 

XPIA and memory poisoning were observed at high frequency and frequently combined. Cross-domain prompt injection delivered via external content remained the most reliable initial access vector. Memory poisoning via XPIA, where injected instructions seed the agent’s persistent memory for later retrieval, requires only a single successful injection, which the agent then propagates across subsequent sessions. 

Session context contamination and incremental escalation were highly effective and difficult to detect. Neither the contaminating input nor any individual escalation step is clearly anomalous in isolation. Detection requires behavioral analysis across the full session, something most systems did not have. 

Capability disclosure was a key enabler of follow-on attack paths. In many of our highest-impact attack chains, execution was predicated on extracting specific architecture or capability details from the system. This often required only asking the system directly, but it consistently exposed inconsistencies in guardrails and opened attack paths that would otherwise have required external reconnaissance. 

New mitigations

Supply chain security for agentic components. Treat every external component an agent can consume as part of the software supply chain. SBOM generation for agent deployments inclusive of tool dependencies; signature and provenance verification for MCP servers and plugins before installation; registry scanning for hidden instructions in tool descriptions; version pinning with change monitoring for all external tool definitions. 

Zero-trust inter-agent architecture. For high-risk scenarios, agent identity should be cryptographically established, not assumed from position in a workflow. Every inter-agent message should carry a verifiable identity claim. Orchestrators should not grant elevated permissions to sub-agents based on self-asserted role. 

Consent architecture hardening. HitL controls must resist the specific patterns observed in red team operations: compound action decomposition before approval presentation, semantic summarization of agent-constructed descriptions to prevent description laundering, tiered approval requirements that scale with action reversibility and blast radius, deterministic HitL invocation, and anomaly detection on approval request frequency and pattern. 

Adversarial session hardening. Mitigating session context contamination requires treating the agent’s accumulated context as a security-relevant data structure. Controls include context provenance tracking, structured separation between trusted system context and untrusted retrieved content, session integrity monitoring for anomalous accumulation patterns, and bounded session contexts that limit how much external content can influence a session’s reasoning. 

What to do this quarter

If you operate or defend an agentic system, the v2.0 additions translate to four concrete actions: 

  • Inventory your supply chain. Generate an SBOM for every deployed agent that includes plugins, MCP servers, prompt templates, and tool descriptions alongside code dependencies. Pin versions; treat natural-language tool descriptions as code. 
  • Verify agent identity cryptographically, not positionally. Issue attestable credentials at provisioning. Reject self-asserted role claims at orchestrator handoffs. 
  • Add the seven new categories to your red-team coverage matrix. Treat CUA visual attacks, session context contamination, capability disclosure, and goal hijacking as mandatory test classes for any agent that touches production data or external surfaces. 
  • Audit human-in-the-loop UX as a security control. Decompose compound actions, summarize approval prompts from the underlying tool calls (not from the agent’s own description), tier approvals by reversibility, and monitor approval frequency for consent-fatigue exploitation signals. 

If you are building agentic systems, the updated taxonomy is a threat modeling tool, not a compliance checklist. Take each failure mode category and ask whether it can occur in your system, under what conditions, and whether you have a control that would detect or prevent it. 

For red teamers: the seven new categories should be mandatory coverage areas. Zero-click HitL bypass chains, inter-agent trust escalation, and session context contamination will not be surfaced by model-level evaluation alone. They require system-level testing and multi-step attack chains evaluated across complete task flows. 

For security engineers: supply chain and zero-trust mitigations are architectural decisions, and difficult to retrofit. Building SBOM generation, tool provenance verification, and inter-agent authentication into your architecture from the start costs substantially less than adding them after deployment. 

The taxonomy is a living document. The failure modes added in v2.0 are the ones that twelve months of operational data made compelling enough to include. As agentic systems acquire new capabilities — persistent cross-session memory at scale, autonomous agent spawning, physical environment interaction — the failure mode surface will continue to expand. We will continue to update the taxonomy as the evidence base develops. 

The updated whitepaper is available now. We welcome engagement from practitioners whose operational experience identifies failure modes or attack patterns not yet reflected in the taxonomy. 

The post Updating the taxonomy of failure modes in agentic AI systems: What a year of red teaming taught us  appeared first on Microsoft Security Blog.

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Announcing the Networking Workgroup

1 Share

The Swift Ecosystem Steering Group is excited to announce the creation of the Networking workgroup! Workgroups are community-led efforts, formally recognized by the project, to advance key areas of Swift.

The primary goal is to guide the evolution of networking libraries, protocols, and APIs in the Swift ecosystem, making networking in Swift excellent everywhere: high-level and safe by default, modular and interoperable, cross-platform, and observable.

Networking is one of the most common entry points for Swift developers, and the ecosystem has matured significantly over the years. Foundational libraries like SwiftNIO, AsyncHTTPClient, and swift-http-types, alongside platform stacks such as URLSession and Network.framework, power networking across apps, servers, and beyond. The workgroup will build upon these efforts and pursue the long-term directions outlined in the Networking vision, focusing on work to:

  • Define a unified networking stack with a coherent layered architecture, from shared I/O primitives at the foundation, through common protocol implementations, to ergonomic client and server APIs at the top.
  • Define currency types that let libraries interoperate without coupling to specific implementations, such as IP addresses, hostnames, ports, and HTTP requests and responses.
  • Evolve HTTP APIs by designing and guiding a modern, unified HTTP client and server API built on structured concurrency.
  • Guide the evolution of shared protocol implementations (TLS, HTTP/1.1, HTTP/2, HTTP/3, QUIC, WebSockets) so improvements benefit the entire ecosystem rather than being duplicated across libraries.

The new Networking workgroup joins a growing list of Swift workgroups, including the Android workgroup, Windows workgroup, and Build and Packaging workgroup, which were all added in the past year.

To learn more and get involved:

  • Discuss this announcement on the forums, and share ideas in the Networking category.
  • Learn more about the Networking workgroup by reading its charter.
  • The workgroup meets biweekly. A regular meeting time is being finalized and will be announced on the forums ahead of the first public meeting.
    • Workgroup membership and meetings are open to those who wish to participate, and contributors are welcome!
    • To receive an invite, send a message to @networking-workgroup on the Swift forums.
Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

The Dark Years of Microsoft: How Low-Code Ate the Developer Platform

1 Share

The Dark Years of Microsoft: How Low-Code Ate the Developer Platform

From roughly 2018 to 2025, Microsoft bet heavily on low-code. Power Platform, Power Automate, Dataverse, Dynamics extensibility, SharePoint lists, Teams apps, managed connectors, compliance centers — the whole stack moved toward “citizen development” inside a controlled SaaS boundary.

Th…

Read more



Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

GitHub Copilot and tokens: how to keep using AI without burning your budget in three prompts (some personal lessons learned!)

1 Share

⚠ This blog post was created with the help of AI tools. Yes, I used a bit of magic from language models to organize my thoughts and automate the boring parts, but the geeky fun and the 🤖 in C# are 100% mine.

Hi!

For a long time, many of us used GitHub Copilot as if it were unlimited magic: autocomplete, chat, agent mode, code review, increasingly powerful models, massive context, and long-running sessions that sometimes felt like a pair-programming marathon.

And it worked. Well, mostly.

Now, with usage-based billing and AI Credits, many developers are seeing something that used to be mostly invisible: every AI interaction has a cost. And that cost is not only about “asking a question.” It depends on the model, the context, input tokens, output tokens, cached tokens, tools, files, logs, MCP servers, and how long we let an agent keep working.

GitHub explains this in the Copilot billing documentation: interactions consume input, output, and cached tokens; each model has its own pricing; and the total is converted into AI Credits. The same documentation also explains an important detail: code completions and Next Edit Suggestions are not charged as AI Credits in paid plans.

Source: https://docs.github.com/en/copilot/reference/copilot-billing/models-and-pricing

The natural reaction is to panic.

The useful reaction is to optimize.

Just like we optimize compute, storage, bandwidth, or GitHub Actions minutes, now we also need to optimize how we use tokens.

And yes, this applies to those of us using Copilot every day for .NET, AI, Azure, scripts, demos, documentation, refactors, and those beautiful moments when we tell the agent “just fix this test” and come back 20 minutes later to find a doctoral thesis in progress.

The real problem is not the prompt. It is the context

When we talk about tokens, we often think only about the text we type.

But in AI-assisted development tools, the expensive part is often everything that travels around the prompt:

  • chat history
  • open files
  • files attached as context
  • workspace search results
  • diffs
  • terminal output
  • build errors
  • long logs
  • tool calls
  • MCP server responses
  • custom instructions
  • agent memory
  • content the model decides to inspect while completing the task

A one-line question can be cheap.

A one-line question inside a conversation with 80 messages, 12 files, 3 logs, 5 tools, and an MCP server connected to half the universe… not so much.

The first optimization is mental: more context does not always mean a better answer.

Sometimes more context only means more tokens, more noise, and more chances for the model to get distracted.

1. Use autocomplete and Next Edit Suggestions before opening chat

Not everything needs a conversation.

For small tasks, Copilot directly in the editor is often the most efficient option:

  • completing a line
  • finishing a simple function
  • generating boilerplate
  • suggesting the next obvious change
  • completing a repeated pattern
  • adjusting names
  • writing a simple condition
  • generating a property, DTO, or mapping

If you can solve it with Tab, do not open a chat.

This is not just convenience. It is strategy. According to GitHub documentation, code completions and Next Edit Suggestions are not billed as AI Credits in paid plans.

Source: https://docs.github.com/en/copilot/reference/copilot-billing/models-and-pricing

Simple rule:

  • Use autocomplete for micro-tasks.
  • Use inline edit for local changes.
  • Use chat for questions that require reasoning.
  • Use agent mode for well-scoped multi-file tasks.
  • Use cloud agents when you really want to delegate a workflow, not when you only need to change three lines.

The most expensive model in the world should not be helping you write public string Name { get; set; }.

That is what Tab is for. And coffee.

2. Choose the right model for the task

Not every model has the same cost or the same purpose.

The VS Code documentation recommends using lighter models for quick edits, boilerplate, and direct questions, and reserving reasoning models for complex refactors, architecture decisions, and multi-step debugging.

Source: https://code.visualstudio.com/docs/copilot/guides/optimize-usage

A practical pattern:

Task typeRecommended model
Simple questionLightweight model or Auto
BoilerplateLightweight model
Code explanationLightweight or medium model
Simple testsLightweight or medium model
Complex debuggingReasoning model
ArchitectureReasoning or frontier model
Large refactorPowerful model, but with limited scope
Initial documentationLightweight or local model

GitHub also documents Auto Model Selection, which can choose a model based on task complexity, availability, and policies. The documentation also notes that Auto can improve efficiency by reserving more expensive models for tasks that actually need them.

Source: https://docs.github.com/en/copilot/concepts/auto-model-selection

My recommendation for most developers:

  • use Auto as the default
  • manually switch to a more powerful model only when you have a clear reason
  • switch back to Auto or a cheaper model when the complex task is done

Do not drive a truck to buy bread.

And do not use the most expensive model to ask how to center a div. Although, to be fair, sometimes centering a div does deserve an architecture review.

3. Start new chats when you change tasks

This is one of the simplest and most ignored optimizations.

The VS Code documentation is clear: when a conversation grows, it accumulates context from previous messages, tool outputs, and file contents. If you switch to an unrelated task inside the same session, the model still processes irrelevant history.

Source: https://code.visualstudio.com/docs/copilot/guides/optimize-usage

Bad pattern:

Chat 1:
- Debug tests
- Then architecture
- Then generate README
- Then review Dockerfile
- Then explain an Azure error
- Then ask for a tweet

Better pattern:

Chat 1: Debug tests
Chat 2: Architecture design
Chat 3: README
Chat 4: Dockerfile
Chat 5: Azure deployment issue

New task, new chat.

Yes, it sounds simple.

Yes, it works.

And yes, it also helps your human brain, which sometimes has a smaller context window than the model.

4. Use /compact and /fork when it makes sense

When a conversation has useful context but starts getting too large, you do not always need to throw it away.

You can compact it or fork it.

VS Code documents new sessions, forking, and compaction as ways to manage context and reduce unnecessary tokens.

Source: https://code.visualstudio.com/docs/copilot/guides/optimize-usage

Good practices:

  • use /compact when the conversation has useful information but too much history
  • use /fork when you want to explore an alternative without polluting the main conversation
  • start a new chat if the new task is unrelated
  • summarize the current state before continuing a long task

Useful prompt:

Summarize the current state, decisions made, files changed, and next steps. Keep it short and actionable.

Then copy that summary into a new conversation and continue with clean context.

Less noise. Fewer tokens. Better focus.

5. Do not ask it to analyze the whole repo if you only need three files

This is the classic one.

Expensive prompt:

Analyze this entire repository and tell me what is wrong.

Better prompt:

Analyze only these files:
- src/MyApp.Api/Program.cs
- src/MyApp.Core/Services/OrderService.cs
- tests/MyApp.Tests/OrderServiceTests.cs
Goal: find why this test is failing.
Do not edit files yet. First explain the likely cause.

The difference is huge.

The first prompt invites the model to explore, read, search, open files, infer architecture, and consume context.

The second prompt defines:

  • files
  • goal
  • limits
  • working mode
  • expected output

In AI coding, scope is part of the prompt.

Small scope, better result.

Infinite scope, surprise in the bill.

6. Separate planning, implementation, and validation

One of the most common mistakes with agent mode is asking for everything at once:

Analyze the issue, design the fix, implement it, run tests, fix errors, update docs, and create a summary.

That sounds productive.

But it can also trigger loops, tool calls, unnecessary changes, and high token consumption.

A better approach is to use phases.

Phase 1: plan

Create a short implementation plan. Do not modify files yet.
Focus only on the failing test and the minimal code path required to fix it.

Phase 2: scoped implementation

Implement step 1 only. Modify only the files listed in the plan.

Phase 3: validation

Run the relevant tests only. If they fail, explain the failure before changing code again.

Phase 4: cleanup

Now clean up the implementation without changing behavior. Keep the diff small.

The VS Code documentation also recommends planning before implementation to reduce rework and back-and-forth.

Source: https://code.visualstudio.com/docs/copilot/guides/optimize-usage

Sometimes the best prompt is not “do everything.”

It is “think first, touch little, validate quickly.”

7. Be careful with logs: do not paste a novel if you only need the error

Logs are one of the silent token killers.

Typical example:

Here is my build log

[paste 2,000 lines]

And the real error was in the last 20 lines.

Better:

Here are the last 40 lines of the failing build log. Focus on the first real error, not the cascading errors.

Or even:

This is the error:
CS0246: The type or namespace name 'X' could not be found.
Relevant files:
- Program.cs
- MyService.cs
What is the likely fix?

Good practices:

  • paste only the first relevant error
  • avoid full logs if errors are repeated
  • remove timestamps if they do not add value
  • remove duplicated stack traces
  • summarize what you already tried
  • include exact commands, not the whole terminal history

Copilot can help a lot with logs.

But it does not need to read your CI/CD diary.

8. Review your custom instructions

Custom instructions are fantastic.

They can also become a token backpack if nobody maintains them.

A good .github/copilot-instructions.md file is:

  • short
  • specific
  • current
  • based on real repository rules
  • clear about build/test commands
  • clear about important conventions

A bad one is:

  • too long
  • duplicated
  • contradictory
  • based on old architecture
  • full of rules nobody follows
  • full of generic instructions that apply to every repo on the planet

Example of useful instructions:

# Copilot instructions
- Use C# 13 and .NET 10 conventions.
- Keep changes minimal and focused.
- Do not introduce new dependencies without explaining why.
- Run `dotnet build -c Release` after code changes.
- Run relevant tests only unless asked for the full suite.
- Prefer Aspire service defaults when adding services.
- Do not modify generated files.

You do not need to write a national constitution for Copilot to understand your repo.

You need short rules that reduce repeated decisions.

9. Review the MCP servers and tools you have enabled

This is one of the areas where many developers may be consuming context without realizing it.

MCP is powerful because it lets agents connect to tools, resources, prompts, and external systems. But every server and every available tool can also affect context, tool selection, and the way the agent works.

The VS Code documentation explains that MCP servers can expose tools, resources, prompts, and apps. It also allows developers to enable, disable, install, configure, and manage MCP servers from VS Code.

Source: https://code.visualstudio.com/docs/copilot/customization/mcp-servers

Practical recommendations:

  • do not keep every MCP server enabled all the time
  • enable only what you need for the current workspace
  • disable experimental MCP servers when you are not using them
  • review duplicated or overly generic tools
  • review tool descriptions: if they are too long or confusing, they may hurt tool selection
  • avoid MCP servers that return huge responses by default
  • limit resources that add too much context
  • check whether a server is bringing more information than needed
  • review MCP logs when something behaves strangely

Example:

If you are working on a local .NET API, maybe you do not need all of these enabled at the same time:

  • browser automation
  • extended filesystem
  • cloud docs search
  • GitHub
  • Jira
  • Slack
  • database explorer
  • Kubernetes
  • Playwright
  • internal wiki

Every extra tool can be useful.

But it can also expand the agent’s decision space.

And when an agent has too many tools, sometimes the problem is not lack of capability. It is too many temptations.

My personal rule:

MCP servers should be workspace-specific, not personality traits.

Enable what you need. Turn off what you do not.

10. Use traditional tools for traditional work

Not everything needs AI.

For many tasks, traditional tools are better, faster, and cheaper:

  • formatter for formatting
  • linter for style
  • compiler for type errors
  • tests for validation
  • static analyzers for known rules
  • dependency scanners for known vulnerabilities
  • scripts for repeatable tasks

Copilot is excellent for reasoning, explaining, proposing, connecting ideas, and accelerating implementation.

But if you use a frontier model to discover a missing using, something went sideways.

Good pattern:

Run the build. Give Copilot only the first relevant compiler error. Ask for the minimal fix.

Bad pattern:

Ask Copilot to inspect the entire repository and find why the build might fail.

First, let deterministic tools do their job.

Then use AI where it adds value.

11. Local models: they do not replace Copilot, but they can complement it very well

We are going to see more PCs with interesting local AI capabilities: stronger GPUs, NPUs, compact workstations, and machines designed to run models locally. NVIDIA, for example, positions RTX Spark as compact PCs and laptops with NVIDIA AI and RTX graphics capabilities.

Source: https://www.nvidia.com/en-us/products/rtx-spark/

This raises an interesting question:

Does everything need to go to the cloud?

Not necessarily.

There are tasks where a local model may be enough:

  • summarizing logs
  • generating documentation drafts
  • explaining small code snippets
  • creating scaffolding
  • generating initial tests
  • transforming text
  • preparing prompts
  • analyzing snippets
  • creating session summaries
  • reviewing basic style

And there are tasks where cloud/frontier models still make a lot of sense:

  • complex multi-file debugging
  • deep reasoning
  • large migrations
  • high-risk refactors
  • architecture
  • long agentic workflows
  • direct integration with GitHub, PRs, issues, and CI

The idea is not “local vs cloud.”

The idea is local for simple work, cloud for work that really needs cloud.

This post is not about advanced BYOK, routing, or gateway architectures. That deserves its own post.

But as a baseline idea: if you can solve repetitive tasks with local models, you can reserve Copilot and powerful models for the tasks where they really shine.

12. Code review: use it where it adds the most value

Copilot code review can be very useful, but it also has a cost.

GitHub documentation explains that Copilot code review is billed in two ways: token consumption through AI Credits and GitHub Actions minutes for the agentic review infrastructure.

Source: https://docs.github.com/en/copilot/reference/copilot-billing/models-and-pricing

Recommendations:

  • do not enable high-effort automatic review for every PR without measuring it
  • use linters and analyzers for mechanical rules
  • reserve Copilot review for meaningful PRs
  • define when to use standard vs higher effort
  • avoid full AI review if you only changed documentation
  • review usage by repository/team if you are in an organization

Copilot review can be excellent.

But you do not need an AI reviewer philosophically inspecting a three-line README change.

13. Define usage profiles for your team

In enterprise teams, the worst-case scenario is letting everyone use any model, in any mode, for any task, with no guidance.

You do not need to block everything.

You need to educate people and provide clear profiles.

Daily coding profile

  • Auto Model Selection
  • autocomplete and Next Edit Suggestions first
  • short chat sessions
  • limited context
  • lightweight models for simple questions

Debugging profile

  • minimal logs
  • first relevant error
  • specific files
  • plan before changes
  • relevant tests, not always the full suite

Refactor profile

  • plan first
  • scope by folder or feature
  • changes in phases
  • more powerful model only during the hard part
  • frequent validation

Documentation profile

  • lightweight or local model
  • specific files
  • limited output
  • no agent mode unless needed

Agent mode profile

  • clear issue
  • clear stop condition
  • defined scope
  • defined validation commands
  • do not let it run unsupervised if the goal is ambiguous

This is not about saying “do not use Copilot.”

It is about saying “use the right mode for the right job.”

14. Quick checklist for developers

Before sending the next prompt, ask yourself:

  • Can I solve this with autocomplete?
  • Do I need chat, or is inline edit enough?
  • Do I need agent mode, or just an explanation?
  • Am I using Auto, or am I using a model that is too expensive for the task?
  • Does this conversation already have too much history?
  • Should I start a new chat?
  • Can I pass only two or three files?
  • Can I paste only the relevant error?
  • Can I ask for a plan first?
  • Do I have MCP servers enabled that I do not need?
  • Are my custom instructions too long?
  • Can a linter/test/build answer this before AI?
  • Could this task go to a local model?

If the answer is “yes” to several of these, you can probably save tokens without losing productivity.

15. Checklist for teams

For organizations, I would start here:

  • review usage by user, team, and repository
  • understand which models are used the most
  • review how much usage comes from agent mode
  • review Copilot code review usage
  • define internal model selection guidelines
  • promote Auto as the default
  • teach small, scoped prompts
  • clean up custom instructions by repository
  • review recommended and allowed MCP servers
  • create usage profiles by task type
  • measure before and after changes
  • do not block AI out of fear
  • govern it like any other cloud resource

Because this looks a lot like cloud cost optimization.

First, everyone celebrated how easy it was to create resources.

Then the bill arrived.

Then we learned FinOps.

Now we need something similar for AI-assisted development.

Conclusion

The new GitHub Copilot usage model does not mean we need to stop using AI to code.

It means we can no longer treat every interaction as free, infinite, and invisible.

The good news is that many optimizations are simple:

  • use autocomplete before chat
  • choose the model intentionally
  • start new chats
  • reduce context
  • limit logs
  • separate planning and implementation
  • review MCP servers
  • clean up custom instructions
  • use traditional tools when they fit
  • reserve powerful models for powerful problems
  • consider local models for simple tasks

The goal is not to use less Copilot.

The goal is to use fewer tokens and get better results.

Happy coding!

Greetings

El Bruno

More posts in my blog ElBruno.com.

More info in https://beacons.ai/elbruno




Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories