In the first post of this series, we explored what MCP resources are and why they're the overlooked piece of the MCP puzzle. Now it's time to get practicalâlet's dive into using resources in Visual Studio Code.
By the end of this post, you'll know how to discover, browse, attach, and leverage MCP resources to supercharge your AI-powered development workflow.
Understanding resources in VS Code
When an MCP server exposes resources, VS Code makes them accessible in several ways:
Browse all resources across all installed servers
Browse resources per server to see what each provides
Attach resources to chat as context for your conversations with Copilot
View resources directly in the editor
Save resources from tool call results to your workspace
Think of resources as a context menu for your AIâa way to give Copilot exactly the information it needs without copy-pasting or explaining everything manually.
Setting up your first MCP server with resources
Let's start by installing an MCP server that provides resources. We'll use the GitHub MCP Server as our example because it's widely used and demonstrates several resource patterns.
Method 1: Install from the Extensions View
The easiest way to install an MCP server is through VS Code's built-in gallery:
Open the Extensions view (Ctrl+Shift+X or Cmd+Shift+X on Mac)
Type @mcp in the search field
Find "GitHub MCP Server" and click Install
VS Code will prompt you to trust the serverâreview the configuration and confirm
Method 2: Use the MCP: Add Server Command
Alternatively, you can use the command palette:
Press Ctrl+Shift+P (or Cmd+Shift+P on Mac)
Type "MCP: Add Server"
Select your package manager (NPM, PyPI, or Docker)
Enter the server details
VS Code handles the rest!
Method 3: Manual Configuration
For team projects, you can share MCP server configurations using a workspace file:
Create .vscode/mcp.json in your workspace root
Add your server configuration:
Save the fileâVS Code will detect it and offer to start the server
Browsing resources: Your first look
Once you have an MCP server installed and running, let's explore its resources.
Browse resources
Open the command palette and run MCP: Browse Resources. This shows resources from all your installed MCP servers in one unified view.
You'll see:
Resource names (human-readable descriptions)
Resource URIs (unique identifiers like github://repo/owner/name/readme)
Server attribution (which MCP server provides each resource)
Understanding resource templates
Some resources use templatesâURIs with placeholders that let you provide parameters. For example:
github://repo/{owner}/{name}/file/{path}
database://query/{table_name}
When you select a templated resource, VS Code prompts you for each parameter. This makes resources dynamic and flexibleâyou're not limited to pre-defined values.
Attaching resources to chat
Here's where resources become truly powerful. Instead of explaining context to Copilot, you attach it directly.
Using the Add Context Button
Open GitHub Copilot Chat
Click the Add Context button (the paperclip icon)
Select MCP ResourcesâŚ
Choose the resource you want to attach
The resource content is now part of your conversation context. Copilot can reference it when answering questions or generating code.
Whatâs next?
In the next post, we'll level up by building our own MCP resource server from scratch. You'll learn:
Android's security team published a blog post this week about their experience using Rust. Its title? "Move fast and fix things."
Last year, we wrote about why a memory safety strategy that focuses on vulnerability prevention in new code quickly yields durable and compounding gains. This year we look at how this approach isn't just fixing things, but helping us move faster.
The 2025 data continues to validate the approach, with memory safety vulnerabilities falling below 20% of total vulnerabilities for the first time. We adopted Rust for its security and are seeing a 1000x reduction in memory safety vulnerability density compared to Android's C and C++ code. But the biggest surprise was Rust's impact on software delivery. With Rust changes having a 4x lower rollback rate and spending 25% less time in code review, the safer path is now also the faster one... Data shows that Rust code requires fewer revisions. This trend has been consistent since 2023. Rust changes of a similar size need about 20% fewer revisions than their C++ counterparts... In a self-reported survey from 2022, Google software engineers reported that Rust is both easier to review and more likely to be correct. The hard data on rollback rates and review times validates those impressions.
Historically, security improvements often came at a cost. More security meant more process, slower performance, or delayed features, forcing trade-offs between security and other product goals. The shift to Rust is different: we are significantly improving security and key development efficiency and product stability metrics.
With Rust support now mature for building Android system services and libraries, we are focused on bringing its security and productivity advantages elsewhere. Android's 6.12 Linux kernel is our first kernel with Rust support enabled and our first production Rust driver. More exciting projects are underway, such as our ongoing collaboration with Arm and Collabora on a Rust-based kernel-mode GPU driver. [They've also been deploying Rust in firmware for years, and Rust "is ensuring memory safety from the ground up in several security-critical Google applications," including Chromium's parsers for PNG, JSON, and web fonts.]
2025 was the first year more lines of Rust code were added to Android than lines of C++ code...
Developers spend much of their time editing, refactoring, and debugging rather than producing entirely new code. Code creation tends to involve non-sequential back-and-forth refinement rather than typing out a complete function in one uninterrupted sequence. You might sketch a part, adjust parameters, skip ahead, then revisit earlier sections to refine them.
Diffusion models, and in particular diffusion large language models (d-LLMs), operate differently from current coding assistants. Unlike autoregressive models, which generate token by token in a strict left-to-right sequence, d-LLMs condition on both past and future context. This enables them to model edits and refinements more directly, reflecting how developers iteratively construct and adjust code. As shown by Gong et al. (2025): âthe [d-LLM] model often plans token generation more globally, much like a programmer jumping back and forth through code to refine a code implementation.â This matches the reality of code authorship, which is non-linear: you sketch a bit, revise earlier parts, jump ahead, and keep iterating.
Out-of-order generation feels more human
One of the most striking demos of diffusion-based models like DiffuCoder showed exactly this: the model skipped a parameter mid-function, continued writing later parts, then circled back to fill in what was missing.
(The prompt used here is: âWrite a Python function to implement binary search together with docstrings and type hints.â The example is generated using the apple/DiffuCoder-7B-Instruct model, configured to produce one token per forward pass with a limit of 256 new tokens. The blue slots illustrate positions that the model later revisits and refines during the diffusion process.)
This structure may mirror how many developers think. You may not know every detail upfront, but you can scaffold a function and refine as you go. A model that generates in a sequential way is better suited to support this process.
Bi-directional context improves reasoning
Autoregressive models can be prompted to consider bidirectional context by providing both prefix and suffix in the prompt, but this remains a workaround rather than a native capability. Diffusion models, particularly diffusion large language models (d-LLMs), are designed from the ground up to condition on both past and future context during generation.
This design proves valuable for tasks requiring reversal reasoning, where coherence must hold in both directions, and for code generation, where a variableâs usage downstream should inform its earlier definition. As shown by Nie at al. (2025), d-LLMs exhibit âconsistent zero-shot performance across both forward and reversal tasks.â
For developers, this translates into improved handling of structured logic, long-range dependencies, and code constraints that depend on order-sensitive relationships.
Flexibility in editing and refactoring
Because diffusion models mask and unmask tokens gradually at any random position, they are naturally suited to infilling. If you ask a diffusion model to rewrite a block of code with a different parameter or to refactor a loop into a comprehension, it can directly operate on masked sections.
The distinction with autoregressive LLMs is subtle here, since both require the relevant code region to appear in the prompt. Where diffusion models add value is in integration with deterministic tooling such as IDEs. An IDE could highlight several problematic or incomplete regions, mask them, and allow the diffusion model to unmask and regenerate all affected parts in a single coherent pass. This distinguishes diffusion models from FIM-enabled autoregressive models, which can handle isolated infilling but struggle to maintain global consistency across multiple edits.
Example: coordinated multi-region updates
Consider adding a field to a class that must be initialised in the constructor, used in a method, and serialised elsewhere. Rather than orchestrating multiple FIM calls or regenerating entire methods, a diffusion model can mask the relevant locations and generate all necessary updates at once.
This makes diffusion models well-suited to refactoring tasks where changes must satisfy global constraints, such as ensuring a new parameter appears consistently in a function signature, its documentation, call sites, and test cases.
For example, an IDE might flag a type mismatch in a function signature. Instead of regenerating the entire function, a diffusion model could unmask just the problematic parameter declaration and rewrite it to match the expected type, leaving the rest of the code untouched. This localised editing process mirrors how developers typically fix errors and refactor code incrementally.
Potential speed improvements
Autoregressive models operate sequentially, generating one token per forward pass. Diffusion models, by contrast, can produce multiple tokens in a single forward pass. Benchmarks reveal a current practical shortcoming: increasing the number of tokens per step often reduces quality. The underlying mechanism, however, allows faster inference and is likely to improve in future.
Researchers have proposed semi-autoregressive approaches to bridge the gap between autoregressive and diffusion-based generation â most notably Block Diffusion â Arriola et al. (2025). This method generates blocks of tokens from left to right while allowing diffusion models to unmask flexibly within each block. In principle, this allows reuse of the KV cache, which plays a key role in the efficiency of autoregressive models. In practice, however, unmasking too many tokens in parallel creates a trade-off. Throughput increases, but quality often drops sharply, especially if the KV cache is not reused carefully.
Semi-autoregressive generation represents an intermediate step between autoregressive and truly out-of-order inference. Diffusion-based language models work fundamentally out of sequence, yet current methods still borrow ideas from autoregressive design, such as KV cache reuse, because the optimisation tools for autoregressive generation remain highly developed and effective. Ironically, these mature autoregressive mechanisms improve generation speed even as research moves towards models that can generate fully out of order.
Current limitations
For now, developers should temper expectations. Our internal experimentations with latest open-source models show that:
The best quality comes when unmasking one token per step, which slows things down and makes these models not differ much from AR models in practice.
Diffusion models can repeat prefixes or suffixes, or even output incoherent text when pushed too far.
Repetition: model re-outputs entire prefix blocks multiple times.
def factorial(n):
if n == 0:
return 1
else:
return n * factorial(n - 1)
def factorial(n): # repeated
if n == 0:
return 1
else:
return n * factorial(n - 1)
def factorial(n): # repeated again
if n == 0:
return 1
else:
return n * factorial(n - 1)
Early termination: incomplete function bodies or truncated expressions.
def factorial(n):
if n == 0:
return 1
else:
return n * factorial( # truncated, no argument
Malformed syntax: unmatched brackets, dangling commas, or gibberish tokens.
def factorial(n):
if (n == 0:
return 1,
else:
return n ** factorial[n - 1))
Benchmarking current state-of-the-art d-LLMs – open source (DiffuCoder, Seed-Diffusion) and closed-source (Mercury, Gemini-Diffusion) – shows mixed performance when compared against strong autoregressive baselines such as Qwen2.5-Coder. See Gong et al. (2025) and Song, Yuxuan et al. (2025).
Despite these issues, diffusion models still introduce valuable new possibilities for code generation and editing. At the same time, their ecosystem is very immature compared to autoregressive LLMs.
Training and inference techniques that help mitigate sequential bottlenecks in LLMs, such as chunked prefill, speculative decoding, or prefix caching, have no direct equivalents yet for diffusion models.
Diffusion also requires defining output length in advance, which often leads to inefficiency compared to the <eos> termination signal in LLMs.
Finally, the open-source diffusion models for code makes it harder for developers to experiment and refine these methods.
Where they can be useful today
Code completion with context editing â filling in missing parts of a function rather than only extending text.
Refactoring support â restructuring code blocks where order is less rigid.
Structured text tasks â mathematical reasoning or reversal problems where bi-directional context matters.
These niches give developers a reason to experiment, even if production-ready tools remain a little way off. Beyond these early applications, the major promise of d-LLMs lies in their potential for much faster generation, since they can produce N tokens per forward pass rather than one.
This capability could eventually reshape performance expectations for coding assistants once the qualityâefficiency trade-offs are better understood.
Looking ahead
Diffusion models wonât replace autoregressive models overnight. But they represent a new paradigm that better reflects how developers think and work. Their ability to edit flexibly, consider context in both directions, and potentially accelerate inference sets them apart.
For developers, the practical benefit is clear: more snappy generation, and more support for the unstructured, iterative way you actually write code.
As research continues, diffusion models could become the backbone of coding assistants that feels less like next token generators and more like principled, code structure-aware, programming collaborators.
Alec welcomes Sakari Nahi, CEO of Zure, for a fun and thoughtful discussion that spans 25 years of tech evolution. Sakari shares how a single C# book jump-started his career, why he left a job he didnât love to found a cloud-native consultancy, and what itâs like building a people-first engineering culture across multiple countries.
The two dig into real AI use cases that actually work todayâvector search, customer service automation, field-tech knowledge retrievalâand explore how spec-driven development and tools like GitHub Copilot are transforming the way teams build software. They also get honest about shadow IT, geopolitics affecting cloud decisions, the future of Power Platform, and why AI feels âmagicalâ even without AGI.
Whether youâre a developer, leader, or just AI-curious, this episode is packed with relatable stories and practical perspectives.
Anthropic forecasts explosive growth and large profits, targeting $70 billion in revenue and $17 billion in positive cash flow by 2028. Market sentiment is volatile as big short positions and bank hedging plans collide with massive debt needs for AI infrastructure. Snap's $400 million Perplexity deal and Amazon's lawsuit over agent scraping foreshadow legal and distribution battles over shopping agents and data access.
Brought to you by: KPMG â Go to â www.kpmg.us/aiâ  to learn more about how KPMG can help you drive value with our AI solutions. Vanta - Simplify compliance - â â â â â â â https://vanta.com/nlw
The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Get it ad free at Join our Discord: https://bit.ly/aibreakdown