Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
147673 stories
·
32 followers

The road to better completions: Building a faster, smarter GitHub Copilot with a new custom model

1 Share

Code completion remains the most widely used GitHub Copilot feature, helping millions of developers stay in the flow every day. Our team has continuously iterated on the custom models powering the completions experience in GitHub Copilot driven by developer feedback. That work has had a big impact on giving you faster, more relevant suggestions in the editor.  

We’re now delivering suggestions with 20% more accepted and retained characters, 12% higher acceptance rate, 3x higher token-per-second throughput, and a 35% reduction in latency. 

These updates now power GitHub Copilot across editors and environments. We’d like to share our journey on how we trained and evaluated our custom model for code completions. 

Why it matters 

When Copilot completions improve, you spend less time editing and more time building. The original Copilot was optimized for the highest acceptance rate possible. However, we realized that a heavy focus on acceptance rates could lead to incorrectly favoring a high volume of simple and short suggestions.  

We heard your feedback that this didn’t reflect real developer needs or deliver the highest quality experience. So, we pivoted to also optimize for accepted and retained characters, code flow, and other metrics. 

  • 20% higher accepted-and-retained characters results in more of each Copilot suggestion staying in your final code, not just ending up temporarily accepted and deleted later. In other words, suggestions provide more value with fewer keystrokes.
  • 12% higher acceptance rate means you find suggestions more useful more often, reflecting better immediate utility. 
  • 3x throughput with 35% lower latency makes Copilot feel faster. It handles more requests at once while keeping your coding flow unbroken (throughput describes how much work the system can handle overall, while latency describes how quickly each individual request completes).

How we evaluate custom models 

Copilot models are evaluated using combined signals from offline, pre-production, and production evaluations. Each layer helps us refine different aspects of the experience while ensuring better quality in real developer workflows. 

1) Offline evaluations  

Execution-based benchmark: As part of our offline evaluations, we first test against internal and public repositories with strong code by unit test and scenario coverage, spanning all major languages. Each test simulates real tasks, accepts suggestions, and measures build-and-test pass rates. This emphasizes functional correctness over surface fluency.  

Below is an example of a partial token completion error: the model produced dataet instead of dataset.

Screenshot of a Python code editor showing a function named resolve_file inside a file called dataset_utilities.py. The function takes two string arguments, dataset and filename, and returns a string. The purpose, according to the docstring, is to resolve a file from a dataset and assert that only one file is found. The code uses os.path and glob to find files. There’s a highlighted line path = os.path.join(dat... with an error under dat, suggesting a variable name typo (dat instead of dataset). Several red underlines indicate syntax or reference errors in the code.

LLM-judge scoring: While we start with execution-based evaluation, this has downsides: it only tells if the code will compile, but the results are not always aligned with developer preferences. To ensure the best possible outcomes, we run an independent LLM to score completions across three axes:  

  • Quality: Ensure syntax validity, duplication/overlap, format and style consistency. 
  • Relevance: Focus on relevant code, avoid hallucination and overreach. 
  • Helpfulness: Reduce manual effort, avoid outdated or deprecated APIs. 

2) Pre-production evaluations: Qualitative dogfooding 

Our next step includes working with internal developers and partners to test models side-by-side in real workflows (to do the latter, we exposed the preview model to developers through Copilot’s model picker). We collect structured feedback on readability, trust, and “taste.” Part of this process includes working with language experts to improve overall completion quality. This is unique: while execution-based testing, LLM-based evaluations, dogfood testing, and A/B testing are common, we find language-specific evaluations lead to better outcomes along quality and style preferences. 

3) Production-based evaluations: A/B testing 

Ultimately, the lived experience of developers like you is what matters most. We measure improvements using accepted-and-retained characters, acceptance rates, completion-shown rate, time-to-first token, latency, and many other metrics. We ship only when statistically significant improvements hold up under real developer workloads. 

How we trained our new Copilot completions model 

Mid-training 

Modern codebases use modern APIs. Before fine-tuning, we build a code-specific foundational model via mid-training using a curated, de-duplicated corpus of modern, idiomatic, public, and internal code with nearly 10M repositories and 600-plus programming languages. (Mid-training refers to the stage after the base model has been pretrained on a very large, diverse corpus, but before it undergoes final fine-tuning or instruction-tuning). 

This is a critical step to ensure behaviors, new language syntax, and recent API versions are utilized by the model. We then use supervised fine- tuning and reinforcement learning while mixing objectives beyond next-token prediction—span infillings and docstring/function pairs—so the model learns structure, naming, and intent, not just next-token prediction. This helps us make the foundational model code-fluent, style-consistent, and context-aware, ready for more targeted fine-tuning via supervised fine-tuning. 

Supervised fine-tuning 

Newer general-purpose chat models perform well in natural language to generate code, but underperform on fill-in-the-middle (FIM) code completion. In practice, chat models experience cursor-misaligned inserts, duplication of code before the cursor (prefix), and overwrites of code after the cursor (suffix).  

As we moved to fine-tuned behaviors, we trained models specialized in completions by way of synthetic fine-tuning to behave like a great FIM engine. In practice, this improves: 

  • Prefix/suffix awareness: Accurate inserts between tokens, mid-line continuations, full line completions, and multi-line block completions without trampling the suffix. 
  • Formatting fidelity: Respect local style (indentation, imports, docstrings) and avoid prefix duplication. 

The result is significantly improved FIM performance. For example, here is a benchmark comparing our latest completions model to GPT-4.1-mini on OpenAI’s HumanEval Infilling Benchmarks.  

A chart showing HumanEval Infilling Benchmarks for two different AI models. These include a custom model from GitHub named Copilot Completions and OpenAI's GPT-4o-mini. The evaluations show superior performance across single line, multi line, random span, and random span light tests for the Copilot Completions model.

Reinforcement learning 

Finally, we used a custom reinforcement learning algorithm, teaching the model through rewards and penalties to internalize what makes code suggestions useful in real developer scenarios along three axes:

  • Quality: Syntax-valid, compilable code that follows project style (indentations, imports, headers).  
  • Relevance: On-task suggestions that respect surrounding context and the file’s intent.  
  • Helpfulness: Suggestions that reduce manual effort and prefer modern APIs.  

Together, these create completions that are correct, relevant, and genuinely useful at the cursor instead of being verbose or superficially helpful. 

What we learned 

After talking with programming language experts and finding success in our prompt-based approach, one of our most important lessons was adding related files like C++ header files to our training data. Beyond this, we also came away with three key learnings: 

  • Reward carefully: Early reinforcement learning version over-optimized for longer completions, adding too many comments in the form of “reward hacking.” To mitigate this problem, we introduced comment guardrails to keep completions concise and focused on moving the task forward while penalizing unnecessary commentary. 
  • Metrics matter: Being hyper-focused on a metric like acceptance rate can lead to experiences that look good on paper, but do not result in happy developers. That makes it critical to evaluate performance by monitoring multiple metrics with real-world impact.
  • Train for real-world usage: We align our synthetic fine-tuning data with real-world usage and adapt our training accordingly. This helps us identify problematic patterns and remove them via training to improve real-world outcomes.  

What’s next 

We’re continuing to push the frontier of Copilot completions by: 

  • Expanding into domain-specific slices (e.g., game engines, financial, ERP). 
  • Refining reward functions for build/test success, semantic usefulness (edits that advance the user’s intent without bloat), and API modernity preference for up-to-date, idiomatic libraries and patterns. This is helping us shape completion behavior with greater precision. 
  • Driving faster, cheaper, higher-quality completions across all developer environments.  

Experience faster, smarter code completions yourself. Try GitHub Copilot in VS Code > 

Acknowledgments 

First, a big shoutout to our developer community for continuing to give us feedback and push us to deliver the best possible experiences with GitHub Copilot. Moreover, a huge thanks to the researchers, engineers, product managers, designers across GitHub and Microsoft who curated the training data, built the training pipeline, evaluation suites, client and serving stack and to the GitHub Copilot product and engineering teams for smooth model releases. 

The post The road to better completions: Building a faster, smarter GitHub Copilot with a new custom model appeared first on The GitHub Blog.

Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Announcing Sponsorship on NuGet.org

1 Share


Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Azure Managed Redis at Ignite 2025: pre-day, session, and booth

1 Share

Microsoft Ignite 2025 is almost here! Many practitioners are surprised by the powerful new capabilities in Azure Managed Redis—and now is your chance to see them in action. Whether you are modernizing applications, accelerating AI workloads, or building next-generation agent architectures, Azure Managed Redis is your key to speed and scale. Don’t miss the chance to connect with experts from Microsoft and Redis at our pre-day workshop and general session at Ignite and learn how to:

  • Unlock high-performance caching for demanding workloads
  • Build a powerful memory layer for agentic applications
  • Leverage vector storage for Retrieval-Augmented Generation (RAG)
  • Optimize LLM costs with semantic caching

All in one fully-managed service—Azure Managed Redis.

Connect with Azure Managed Redis team at Ignite 2025

1. Ignite pre-day workshop: Build Internet-Scale AI Apps with Azure Managed Redis — Caching to Agents (in-person only)

When: Ignite pre-day on Monday, November 17, 2025, 1pm-5pm PT 
Where: Moscone Center, San Francisco
Registration: Add this optional in-person workshop in the Ignite registration →

Building AI applications and agents at internet scale requires more than speed — it demands unified memory, context, and scalability.

You’ll see live demos and learn how to build and scale intelligent applications using Azure Managed Redis for caching and modern AI workloads, architect your applications for performance, reliability, and scale with geo-replication, and migrate to Azure Managed Redis.

Seats are limited, please sign up today.

2. Breakout Session: Smarter AI Agents with Azure Managed Redis - BRK129 (in-person and online)

View session details on the Ignite website and save to your event favorites

Azure Managed Redis with Azure AI Foundry and Microsoft Agent Framework let developers build adaptive, context-aware AI systems. Redis handles real-time collaboration, persistent learning, and semantic routing, while the Agent Framework supports advanced reasoning and planning.

Integrated short- and long-term memory lets agents access relevant data directly, simplifying development and operations. Azure Managed Redis supports MCP for control and data plane tasks, enabling easy management and scaling of workloads and knowledge stores.

Join us to discover how to build scalable, multi-agent systems backed by the performance, reliability, and unified memory of Redis on Azure.

3. Visit the Azure Managed Redis booth in the Expo Hall

Have questions? Looking to talk architecture, migration, and supercharging your AI apps? 

Visit us in the Expert Meetup Zone to connect with the Microsoft and Redis product teams, engineers, and architects behind Azure Managed Redis. 

Prepare for Ignite
  1. Learn more about Microsoft Ignite
  2. Explore the Azure Managed Redis documentation.
  3. Try the hands-on workshop for Azure Managed Redis.
Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Astro 5.15

1 Share
Astro 5.15 introduces Netlify skew protection, granular font preload filtering, and new adapter APIs for customizing fetch headers and assets.
Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Considerations for Safe Agentic Browsing

1 Share
Earlier today, we launched a Preview of Actions in Edge, an experimental, opt-in agentic browser feature, available for testing and research purposes. Actions in Edge uses modern CUA (Computer-Using Agent) models to complete tasks for users in their browsers. We are excited about the many exciting and emergent possibilities this feature brings, but as a new technology, it introduces new potential attack vectors that we and the rest of the industry are taking on. We take very seriously our responsibility to keep our users safe on the web. This space is so new and uncharted that we cannot do that in isolation. Our goal with this preview is to explore these new waters with a small set of engaged users and researchers who have a clear understanding of the possibilities and the potential risks of agentic browsers. We have built a number of mitigations and are working closely with the AI and security research community to develop and test new approaches which we will be testing with our active community over the next few months. We welcome all input and feedback and will be actively engaged on our Discord channel here. Users of Actions in Edge feature should carefully review the risks and warnings in Edge before enabling the feature and be vigilant when browsing the web with it enabled.

“Ignore All Previous Instructions”: Prompt injection attacks

AI chatbots have been dealing with prompt injection attacks since their inception, with early attacks being more annoying than outright dangerous. But as AI Assistants have become more capable of doing things (with connectors, code generation, etc.), the risks have risen. Agentic browsers, by virtue of the additional power they bring and their access to the breadth of the open web, add more opportunities for attackers to take advantage of gaps and holes. This is not a theoretical concern: Researchers, including our own security teams, have already published proof-of-concept exploits that use prompt injection to take control of early agentic browsers. These concepts demonstrate that, without protections, attackers can easily craft content that steals users’ data or performs unintended transactions on their behalf.

Our approach to Prompt Injection attacks

The key to any protection strategy is defense-in-depth:
  • Untrusted input: We start by assuming that any input from an untrusted source may contain unsafe instructions.
  • Detect deviations: Prompt injections generally cause the model to do something different from what the user asked of it. Mitigations can be created to detect and block those deviations.
  • Limit access to sensitive data or dangerous actions: Simply put, if the model can’t get to something or do something bad, then the risks are lower.

Protecting from untrusted Input

This phase includes the most basic protection: limit where Copilot gets data from. In this preview we have implemented the following top-level site blocks to avoid known or risky sites.
  • Scoped to known sites by default – In the default “Balanced Mode” setting, Actions in Edge only allow access to a curated list of sites. Users can allow Actions to interact with other sites by approving them. Users can also configure “Strict Mode” in settings which overrides the curated allow list and gives them full control to approve every site Actions interacts with.
  • SmartScreen protection – Microsoft Defender SmartScreen detects and protects millions of Edge users every day from sites confirmed as scams, phishing, or malware. While Copilot is controlling the browser, suspicious or bad sites are blocked automatically by SmartScreen, and the agent is prevented from bypassing the block page.
For any site that Actions in Edge can access, the data from those sites is checked carefully at multiple stages, and marked as untrusted. The following mitigations are currently live or in testing.
  • Azure Prompt Shields mitigate attacks by analyzing whether data is malicious.
  • Built-in safety stack – Copilot is trained specifically to detect and report safety violations if malicious content tries to encourage violence or harmful behavior.
  • Spotlighting (in testing) Kiciman, et al, of Microsoft Research, described a technique called Spotlighting to better separate user instructions from grounding content (documents and web pages) so the model can better ignore injected commands without impacting efficacy. We will be testing Spotlighting with Actions in Edge and will report on its effectiveness.
Experienced security professionals will know that the ability to be responsive to novel attacks is as important as the security blocks themselves.
  • Real-time SmartScreen blocks – when new sites are confirmed as scams, phishing, or malware, the SmartScreen service can block them for all Edge users worldwide within minutes.
  • Global blocklist updates within hours – When new sites are identified as unsafe for Copilot to read, we can update the global blocklist within hours.

Detecting and blocking deviations from the task

Modern AI models, by design, take somewhat unpredictable paths to accomplish the tasks they are set. This can make it challenging to determine whether or not the model is doing what it was asked to do. In Actions for Edge, we add checks to detect hidden instructionstask drift, and suspicious context and to ask for confirmation when risk is higher. Examples include: [caption id="attachment_26077" align="aligncenter" width="422"]Screenshot of Edge Actions UI. A task is paused and the agent asks: "It looks like "en.wikipedia.org" might not be related to this action. Should I continue?" With options for the user to continue or cancel the action. Relevance checks confirm with the user when a site seems unrelated to the original task[/caption]
  • Relevance checks, shown above, give the user a chance to stop an action if a secondary model detects possible task drift.
  • High risk site prompts – when a context is detected to be sensitive (e.g. email, banking, health, sensitive topics) the model will stop and request permission to continue.
  • Task Tracker (in testing) - Paverd et al described a novel technique known as Task Tracker, which monitors activation deltas to detect when the model drifts from the user’s original intent after processing external data. We are integrating these techniques into our orchestration layer, validating their precision, and reducing false positives with the MAI Security team. We will report on progress here as well.

Limit access to sensitive data or dangerous actions

Finally, to mitigate the impact of any bypasses, when the model is running, the browser limits its access to sensitive data or dangerous actions. In this preview, we have disabled the ability for the model to use form fill data, including passwords. Other restrictions include (but are not limited to):
  • No interaction with edge:// pages (e.g., Settings) or UI outside a tab’s web content.
  • External app launches are blocked (protocol handlers).
  • Downloads are disabled.
  • No ability to open the file or directory selection dialog.
  • No access to data or apps outside of Edge
  • Context menus are disabled.
  • Tab audio is muted by default.
  • Site Permission changes are blocked (For example, if a site requests camera access permissions, the agent cannot grant that permission).
As we test and evaluate both the use cases that the community discovers and find valuable, and the security concerns, we will work to close off additional avenues of potential risk.

Closing

We’re keen to learn from your testing—what tasks you try, how Copilot performs, and what new risks you encounter—so we can make the experience safer and more useful. If you have feedback or questions, please share them in the preview channels.
Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Announcing Windows 11 Insider Preview Build 27975 (Canary Channel)

1 Share
Hello Windows Insiders, today we are releasing Windows 11 Insider Preview Build 27975 to the Canary Channel.

What’s new in Build 27975

[General] 

  • This update includes a small set of general improvements and fixes that improve the overall experience for Insiders running this build on their PCs.

Fixes

[Input]

  • Fixed an issue impacting touch keyboard launch reliability in the latest Canary builds.

[Windows Hello]

  • Fixed an issue impacting certain devices which was causing the Windows Hello PIN to not work after upgrading to the latest Canary builds, until you set it up again.

[Settings]

  • Fixed an issue causing Settings to crash when accessing drive information under Settings > System > Storage. This also impacted accessing the drive information from the properties when you right clicked a drive in File Explorer.

Known issues

[Start menu]

  • Insiders with the new Start menu may experience it unexpectedly scrolling to the top.

[Power and Battery]

  • We’re investigating reports that sleep and shutdown aren’t working correctly for some Insiders after the latest Canary builds.

Reminders for Windows Insiders in the Canary Channel

  • The builds we release to the Canary Channel represent the latest platform changes early in the development cycle and should not be seen as matched to any specific release of Windows and features and experiences included in these builds may never get released as we try out different concepts and get feedback. Features may change over time, be removed, or replaced and never get released beyond Windows Insiders. Some of these features and experiences could show up in future Windows releases when they’re ready.
  • Many features in the Canary Channel are rolled out using Control Feature Rollout technology, starting with a subset of Insiders and ramping up over time as we monitor feedback to see how they land before pushing them out to everyone in this channel.
  • Some features may show up in the Dev and Beta Channels first before showing up in the Canary Channel.
  • Some features in active development we preview with Windows Insiders may not be fully localized and localization will happen over time as features are finalized. As you see issues with localization in your language, please report those issues to us via Feedback Hub.
  • To get off the Canary Channel, a clean install of Windows 11 will be required. As a reminder - Insiders can’t switch to a channel that is receiving builds with lower build numbers without doing a clean installation of Windows 11 due to technical setup requirements.
  • The desktop watermark shown at the lower right corner of the desktop is normal for these pre-release builds.
  • Check out Flight Hub for a complete look at what build is in which Insider channel.
Thanks, Amanda
Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories