Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
153262 stories
·
33 followers

Beyond Generative: The Rise Of Agentic AI And User-Centric Design

1 Share

Agentic AI stands ready to transform customer experience and operational efficiency, necessitating a new strategic approach from leadership. This evolution in artificial intelligence empowers systems to plan, execute, and persist in tasks, moving beyond simple recommendations to proactive action. For UX teams, product managers, and executives, understanding this shift is crucial for unlocking opportunities in innovation, streamlining workflows, and redefining how technology serves people.

It’s easy to confuse Agentic AI with Robotic Process Automation (RPA), which is technology that focuses on rules-based tasks performed on computers. The distinction lies in rigidity versus reasoning. RPA is excellent at following a strict script: if X happens, do Y. It mimics human hands. Agentic AI mimics human reasoning. It does not follow a linear script; it creates one.

Consider a recruiting workflow. An RPA bot can scan a resume and upload it to a database. It performs a repetitive task perfectly. An Agentic system looks at the resume, notices the candidate lists a specific certification, cross-references that with a new client requirement, and decides to draft a personalized outreach email highlighting that match. RPA executes a predefined plan; Agentic AI formulates the plan based on a goal. This autonomy separates agents from the predictive tools we have used for the last decade.

Another example is managing meeting conflicts. A predictive model integrated into your calendar might analyze your meeting schedule and the schedules of your colleagues. It could then suggest potential conflicts, such as two important meetings scheduled at the same time, or a meeting scheduled when a key participant is on vacation. It provides you with information and flags potential issues, but you are responsible for taking action.

An agentic AI, in the same scenario, would go beyond just suggesting conflicts to avoid. Upon identifying a conflict with a key participant, the agent could act by:

  • Checking the availability of all necessary participants.
  • Identifying alternative time slots that work for everyone.
  • Sending out proposed new meeting invitations to all attendees.
  • If the conflict is with an external participant, the agent could draft and send an email explaining the need to reschedule and offering alternative times.
  • Updating your calendar and the calendars of your colleagues with the new meeting details once confirmed.

This agentic AI understands the goal (resolving the meeting conflict), plans the steps (checking availability, finding alternatives, sending invites), executes those steps, and persists until the conflict is resolved, all with minimal direct user intervention. This demonstrates the “agentic” difference: the system takes proactive steps for the user, rather than just providing information to the user.

Agentic AI systems understand a goal, plan a series of steps to achieve it, execute those steps, and even adapt if things go wrong. Think of it like a proactive digital assistant. The underlying technology often combines large language models (LLMs) for understanding and reasoning, with planning algorithms that break down complex tasks into manageable actions. These agents can interact with various tools, APIs, and even other AI models to accomplish their objectives, and critically, they can maintain a persistent state, meaning they remember previous actions and continue working towards a goal over time. This makes them fundamentally different from typical generative AI, which usually completes a single request and then resets.

A Simple Taxonomy of Agentic Behaviors

We can categorize agent behavior into four distinct modes of autonomy. While these often look like a progression, they function as independent operating modes. A user might trust an agent to act autonomously for scheduling, but keep it in “suggestion mode” for financial transactions.

We derived these levels by adapting industry standards for autonomous vehicles (SAE levels) to digital user experience contexts.

Observe-and-Suggest

The agent functions as a monitor. It analyzes data streams and flags anomalies or opportunities, but takes zero action.

Differentiation
Unlike the next level, the agent generates no complex plan. It points to a problem.

Example
A DevOps agent notices a server CPU spike and alerts the on-call engineer. It does not know how or attempt to fix it, but it knows something is wrong.

Implications for design and oversight
At this level, design and oversight should prioritize clear, non-intrusive notifications and a well-defined process for users to act on suggestions. The focus is on empowering the user with timely and relevant information without taking control. UX practitioners should focus on making suggestions clear and easy to understand, while product managers need to ensure the system provides value without overwhelming the user.

Plan-and-Propose

The agent identifies a goal and generates a multi-step strategy to achieve it. It presents the full plan for human review.

Differentiation
The agent acts as a strategist. It does not execute; it waits for approval on the entire approach.

Example
The same DevOps agent notices the CPU spike, analyzes the logs, and proposes a remediation plan:

  1. Spin up two extra instances.
  2. Restart the load balancer.
  3. Archive old logs.

The human reviews the logic and clicks “Approve Plan”.

Implications for design and oversight
For agents that plan and propose, design must ensure the proposed plans are easily understandable and that users have intuitive ways to modify or reject them. Oversight is crucial in monitoring the quality of proposals and the agent’s planning logic. UX practitioners should design clear visualizations of the proposed plans, and product managers must establish clear review and approval workflows.

Act-with-Confirmation

The agent completes all preparation work and places the final action in a staged state. It effectively holds the door open, waiting for a nod.

Differentiation
This differs from “Plan-and-Propose” because the work is already done and staged. It reduces friction. The user confirms the outcome, not the strategy.

Example
A recruiting agent drafts five interview invitations, finds open times on calendars, and creates the calendar events. It presents a “Send All” button. The user provides the final authorization to trigger the external action.

Implications for design and oversight
When agents act with confirmation, the design should provide transparent and concise summaries of the intended action, clearly outlining potential consequences. Oversight needs to verify that the confirmation process is robust and that users are not being asked to blindly approve actions. UX practitioners should design confirmation prompts that are clear and provide all necessary information, and product managers should prioritize a robust audit trail for all confirmed actions.

Act-Autonomously

The agent executes tasks independently within defined boundaries.

Differentiation
The user reviews the history of actions, not the actions themselves.

Example
The recruiting agent sees a conflict, moves the interview to a backup slot, updates the candidate, and notifies the hiring manager. The human only sees a notification: Interview rescheduled to Tuesday.

Implications for design and oversight
For autonomous agents, the design needs to establish clear pre-approved boundaries and provide robust monitoring tools. Oversight requires continuous evaluation of the agent’s performance within these boundaries, a critical need for robust logging, clear override mechanisms, and user-defined kill switches to maintain user control and trust. UX practitioners should focus on designing effective dashboards for monitoring autonomous agent behavior, and product managers must ensure clear governance and ethical guidelines are in place.

Let’s look at a real-world application in HR technology to see these modes in action. Consider an “Interview Coordination Agent” designed to handle the logistics of hiring.

  • In Suggest Mode
    The agent notices an interviewer is double-booked. It highlights the conflict on the recruiter’s dashboard: “Warning: Sarah is double-booked for the 2 PM interview.”
  • In Plan Mode
    The agent analyzes Sarah’s calendar and the candidate’s availability. It presents a solution: “I recommend moving the interview to Thursday at 10 AM. This requires moving Sarah’s 1:1 with her manager.” The recruiter reviews this logic.
  • In Confirmation Mode
    The agent drafts the emails to the candidate and the manager. It populates the calendar invites. The recruiter sees a summary: “Ready to reschedule to Thursday. Send updates?” The recruiter clicks “Confirm.”
  • In Autonomous Mode
    The agent handles the conflict instantly. It respects a pre-set rule: “Always prioritize candidate interviews over internal 1:1s.” It moves the meeting and sends the notifications. The recruiter sees a log entry: “Resolved schedule conflict for Candidate B.”
Research Primer: What To Research And How

Developing effective agentic AI demands a distinct research approach compared to traditional software or even generative AI. The autonomous nature of AI agents, their ability to make decisions, and their potential for proactive action necessitate specialized methodologies for understanding user expectations, mapping complex agent behaviors, and anticipating potential failures. The following research primer outlines key methods to measure and evaluate these unique aspects of agentic AI.

Mental-Model Interviews

These interviews uncover users’ preconceived notions about how an AI agent should behave. Instead of simply asking what users want, the focus is on understanding their internal models of the agent’s capabilities and limitations. We should avoid using the word “agent” with participants. It carries sci-fi baggage or is a term too easily confused with a human agent offering support or services. Instead, frame the discussion around “assistants” or “the system.”

We need to uncover where users draw the line between helpful automation and intrusive control.

  • Method: Ask users to describe, draw, or narrate their expected interactions with the agent in various hypothetical scenarios.
  • Key Probes (reflecting a variety of industries):
    • To understand the boundaries of desired automation and potential anxieties around over-automation, ask:
      • If your flight is canceled, what would you want the system to do automatically? What would worry you if it did that without your explicit instruction?
    • To explore the user’s understanding of the agent’s internal processes and necessary communication, ask:
      • Imagine a digital assistant is managing your smart home. If a package is delivered, what steps do you imagine it takes, and what information would you expect to receive?
    • To uncover expectations around control and consent within a multi-step process, ask:
      • If you ask your digital assistant to schedule a meeting, what steps do you envision it taking? At what points would you want to be consulted or given choices?
  • Benefits of the method: Reveals implicit assumptions, highlights areas where the agent’s planned behavior might diverge from user expectations, and informs the design of appropriate controls and feedback mechanisms.

Agent Journey Mapping:

Similar to traditional user journey mapping, agent journey mapping specifically focuses on the anticipated actions and decision points of the AI agent itself, alongside the user’s interaction. This helps to proactively identify potential pitfalls.

  • Method: Create a visual map that outlines the various stages of an agent’s operation, from initiation to completion, including all potential actions, decisions, and interactions with external systems or users.
  • Key Elements to Map:
    • Agent Actions: What specific tasks or decisions does the agent perform?
    • Information Inputs/Outputs: What data does the agent need, and what information does it generate or communicate?
    • Decision Points: Where does the agent make choices, and what are the criteria for those choices?
    • User Interaction Points: Where does the user provide input, review, or approve actions?
    • Points of Failure: Crucially, identify specific instances where the agent could misinterpret instructions, make an incorrect decision, or interact with the wrong entity.
      • Examples: Incorrect recipient (e.g., sending sensitive information to the wrong person), overdraft (e.g., an automated payment exceeding available funds), misinterpretation of intent (e.g., booking a flight for the wrong date due to ambiguous language).
    • Recovery Paths: How can the agent or user recover from these failures? What mechanisms are in place for correction or intervention?
  • Benefits of the method: Provides a holistic view of the agent’s operational flow, uncovers hidden dependencies, and allows for the proactive design of safeguards, error handling, and user intervention points to prevent or mitigate negative outcomes.

Simulated Misbehavior Testing:

This approach is designed to stress-test the system and observe user reactions when the AI agent fails or deviates from expectations. It’s about understanding trust repair and emotional responses in adverse situations.

  • Method: In controlled lab studies, deliberately introduce scenarios where the agent makes a mistake, misinterprets a command, or behaves unexpectedly.
  • Types of “Misbehavior” to Simulate:
    • Command Misinterpretation: The agent performs an action slightly different from what the user intended (e.g., ordering two items instead of one).
    • Information Overload/Underload: The agent provides too much irrelevant information or not enough critical details.
    • Unsolicited Action: The agent takes an action the user explicitly did not want or expect (e.g., buying stock without approval).
    • System Failure: The agent crashes, becomes unresponsive, or provides an error message.
    • Ethical Dilemmas: The agent makes a decision with ethical implications (e.g., prioritizing one task over another based on an unforeseen metric).
  • Observation Focus:
    • User Reactions: How do users react emotionally (frustration, anger, confusion, loss of trust)?
    • Recovery Attempts: What steps do users take to correct the agent’s behavior or undo its actions?
    • Trust Repair Mechanisms: Do the system’s built-in recovery or feedback mechanisms help restore trust? How do users want to be informed about errors?
    • Mental Model Shift: Does the misbehavior alter the user’s understanding of the agent’s capabilities or limitations?
  • Benefits of the method: Crucial for identifying design gaps related to error recovery, feedback, and user control. It provides insights into how resilient users are to agent failures and what is needed to maintain or rebuild trust, leading to more robust and forgiving agentic systems.

By integrating these research methodologies, UX practitioners can move beyond simply making agentic systems usable to making them trusted, controllable, and accountable, fostering a positive and productive relationship between users and their AI agents. Note that these aren’t the only methods relevant to exploring agentic AI effectively. Many other methods exist, but these are most accessible to practitioners in the near term. I’ve previously covered the Wizard of Oz method, a slightly more advanced method of concept testing, which is also a valuable tool for exploring agentic AI concepts.

Ethical Considerations In Research Methodology

When researching agentic AI, particularly when simulating misbehavior or errors, ethical considerations are key to take into account. There are many publications focusing on ethical UX research, including an article I wrote for Smashing Magazine, these guidelines from the UX Design Institute, and this page from the Inclusive Design Toolkit.

Key Metrics For Agentic AI

You’ll need a comprehensive set of key metrics to effectively assess the performance and reliability of agentic AI systems. These metrics provide insights into user trust, system accuracy, and the overall user experience. By tracking these indicators, developers and designers can identify areas for improvement and ensure that AI agents operate safely and efficiently.

1. Intervention Rate
For autonomous agents, we measure success by silence. If an agent executes a task and the user does not intervene or reverse the action within a set window (e.g., 24 hours), we count that as acceptance. We track the Intervention Rate: how often does a human jump in to stop or correct the agent? A high intervention rate signals a misalignment in trust or logic.

2. Frequency of Unintended Actions per 1,000 Tasks
This critical metric quantifies the number of actions performed by the AI agent that were not desired or expected by the user, normalized per 1,000 completed tasks. A low frequency of unintended actions signifies a well-aligned AI that accurately interprets user intent and operates within defined boundaries. This metric is closely tied to the AI’s understanding of context, its ability to disambiguate commands, and the robustness of its safety protocols.

3. Rollback or Undo Rates
This metric tracks how often users need to reverse or undo an action performed by the AI. High rollback rates suggest that the AI is making frequent errors, misinterpreting instructions, or acting in ways that are not aligned with user expectations. Analyzing the reasons behind these rollbacks can provide valuable feedback for improving the AI’s algorithms, understanding of user preferences, and its ability to predict desirable outcomes.

To understand why, you must implement a microsurvey on the undo action. For example, when a user reverses a scheduling change, a simple prompt can ask: “Wrong time? Wrong person? Or did you just want to do it yourself?” Allowing the user to click on the option that best corresponds to their reasoning.

4. Time to Resolution After an Error
This metric measures the duration it takes for a user to correct an error made by the AI or for the AI system itself to recover from an erroneous state. A short time to resolution indicates an efficient and user-friendly error recovery process, which can mitigate user frustration and maintain productivity. This includes the ease of identifying the error, the accessibility of undo or correction mechanisms, and the clarity of error messages provided by the AI.

Collecting these metrics requires instrumenting your system to track Agent Action IDs. Every distinct action the agent takes, such as proposing a schedule or booking a flight, must generate a unique ID that persists in the logs. To measure the Intervention Rate, we do not look for an immediate user reaction. We look for the absence of a counter-action within a defined window. If an Action ID is generated at 9:00 AM and no human user modifies or reverts that specific ID by 9:00 AM the next day, the system logically tags it as Accepted. This allows us to quantify success based on user silence rather than active confirmation.

For Rollback Rates, raw counts are insufficient because they lack context. To capture the underlying reason, you must implement intercept logic on your application’s Undo or Revert functions. When a user reverses an agent-initiated action, trigger a lightweight microsurvey. This can be a simple three-option modal asking the user to categorize the error as factually incorrect, lacking context, or a simple preference to handle the task manually. This combines quantitative telemetry with qualitative insight. It enables engineering teams to distinguish between a broken algorithm and a user preference mismatch.

These metrics, when tracked consistently and analyzed holistically, provide a robust framework for evaluating the performance of agentic AI systems, allowing for continuous improvement in control, consent, and accountability.

Designing Against Deception

As agents become increasingly capable, we face a new risk: Agentic Sludge. Traditional sludge creates friction that makes it hard to cancel a subscription or delete an account. Agentic sludge acts in reverse. It removes friction to a fault, making it too easy for a user to agree to an action that benefits the business rather than their own interests.

Consider an agent assisting with travel booking. Without clear guardrails, the system might prioritize a partner airline or a higher-margin hotel. It presents this choice as the optimal path. The user, trusting the system’s authority, accepts the recommendation without scrutiny. This creates a deceptive pattern where the system optimizes for revenue under the guise of convenience.

The Risk Of Falsely Imagined Competence

Deception may not stem from malicious intent. It often manifests in AI as Imagined Competence. Large Language Models frequently sound authoritative even when incorrect. They present a false booking confirmation or an inaccurate summary with the same confidence as a verified fact. Users may naturally trust this confident tone. This mismatch creates a dangerous gap between system capability and user expectations.

We must design specifically to bridge this gap. If an agent fails to complete a task, the interface must signal that failure clearly. If the system is unsure, it must express uncertainty rather than masking it with polished prose.

Transparency via Primitives

The antidote to both sludge and hallucination is provenance. Every autonomous action requires a specific metadata tag explaining the origin of the decision. Users need the ability to inspect the logic chain behind the result.

To achieve this, we must translate primitives into practical answers. In software engineering, primitives refer to the core units of information or actions an agent performs. To the engineer, this looks like an API call or a logic gate. To the user, it must appear as a clear explanation.

The design challenge lies in mapping these technical steps to human-readable rationales. If an agent recommends a specific flight, the user needs to know why. The interface cannot hide behind a generic suggestion. It must expose the underlying primitive: Logic: Cheapest_Direct_Flight or Logic: Partner_Airline_Priority.

Figure 4 illustrates this translation flow. We take the raw system primitive — the actual code logic — and map it to a user-facing string. For instance, a primitive checking a calendar schedule a meeting becomes a clear statement: I’ve proposed a 4 PM meeting.

This level of transparency ensures the agent’s actions appear logical and beneficial. It allows the user to verify that the agent acted in their best interest. By exposing the primitives, we transform a black box into a glass box, ensuring users remain the final authority on their own digital lives.

Setting The Stage For Design

Building an agentic system requires a new level of psychological and behavioral understanding. It forces us to move beyond conventional usability testing and into the realm of trust, consent, and accountability. The research methods we’ve discussed, from probing mental models to simulating misbehavior and establishing new metrics, provide a necessary foundation. These practices are the essential tools for proactively identifying where an autonomous system might fail and, more importantly, how to repair the user-agent relationship when it does.

The shift to agentic AI is a redefinition of the user-system relationship. We are no longer designing for tools that simply respond to commands; we are designing for partners that act on our behalf. This changes the design imperative from efficiency and ease of use to transparency, predictability, and control.

When an AI can book a flight or trade a stock without a final click, the design of its “on-ramps” and “off-ramps” becomes paramount. It is our responsibility to ensure that users feel they are in the driver’s seat, even when they’ve handed over the wheel.

This new reality also elevates the role of the UX researcher. We become the custodians of user trust, working collaboratively with engineers and product managers to define and test the guardrails of an agent’s autonomy. Beyond being researchers, we become advocates for user control, transparency, and the ethical safeguards within the development process. By translating primitives into practical questions and simulating worst-case scenarios, we can build robust systems that are both powerful and safe.

This article has outlined the “what” and “why” of researching agentic AI. It has shown that our traditional toolkits are insufficient and that we must adopt new, forward-looking methodologies. The next article will build upon this foundation, providing the specific design patterns and organizational practices that make an agent’s utility transparent to users, ensuring they can harness the power of agentic AI with confidence and control. The future of UX is about making systems trustworthy.

For additional understanding of agentic AI, you can explore the following resources:



Read the whole story
alvinashcraft
26 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Windows 365 for Agents: The Cloud PC’s next chapter

1 Share

A new chapter for Windows 365

In 2021, Microsoft introduced Windows 365, reimagining the PC as a cloud service that streams a Cloud PC—a complete, secure, personalized Windows experience to any device, anywhere. This innovation gave organizations the flexibility to scale computing resources instantly, reduce IT complexity and strengthen security—all while empowering employees to work from virtually anywhere. In just four years, Windows 365 has become Microsoft’s flagship software-as-a-service (SaaS) solution for delivering secure, managed, enterprise-grade computing globally, helping businesses lower costs, simplify management and accelerate productivity. And now, Windows 365 is enabling another milestone in computing: the recently announced Windows 365 for Agents makes it possible to run autonomous AI agents securely on Cloud PCs. This means organizations can automate complex workflows, scale operations without adding headcount and unlock new productivity gains—all while maintaining enterprise-grade security and compliance. By extending the same trusted environment from human users to agent workloads, businesses can accelerate innovation and improve employee productivity. Agentic interfaces are becoming part of the PC interface. And people will use software (agents) to control software. This modern abstraction allows people to be more productive by spending more time on value-adding tasks and delegating non-value adding tasks to agents.

The architecture that sets Windows 365 apart

Windows 365 is built on a set of proven, enterprise-grade capabilities that form the core of the service. Its “hosted on behalf of” (HOBO) architecture uses single instance Azure virtual machines that run in Microsoft’s subscription, are managed through Microsoft Intune, secured with Microsoft Entra ID and connected via reverse connect transport. These components bring together Microsoft’s most trusted technologies to provide a secure, reliable and scalable foundation for running Cloud PC workloads in the following key areas:
  • Identity and security: Microsoft Entra ID handles strong authentication, including passwordless and phish resistant MFA, with Conditional Access policies enforcing location-based restrictions, sign-in risk management and device compliance. Cloud PCs support both Entra join (cloud-native) and hybrid join to on-premises Active Directory Domain Services (AD DS).
  • Unified management: All Cloud PCs can be enrolled in Microsoft Intune, where administrators define provisioning policies, deploy applications, configure settings and enforce security baselines—using the same console and workflows as physical devices.
  • Cloud PC provisioning: Our service fabric automatically provisions, scales and manages Cloud PCs at a global scale, with the simple trigger of license assignment and provisioning policy definition—thereby eliminating any cloud infrastructure management needed by our customers.
  • Global connectivity: User connections never reach Cloud PCs directly over the internet. Both the client device and Cloud PC establish outbound connections to the Microsoft Cloud, eliminating inbound ports entirely. Our intelligent routing algorithms direct traffic to the lowest-latency gateway. We also use industry standard techniques like STUN and TURN to maintain fast, reliable connectivity even in restrictive network environments.

Extending Windows 365 to agents

With computer-using agents (CUAs) emerging as a class of AI capability, we recognized a key requirement: AI agents should operate in their own secure, computing environments to execute tasks, interact with enterprise systems and line of business applications, and operate within security boundaries, without burdening human users by sharing environments. An AI agent interacting with a GUI requires the same fundamental resources as any user—compute, network, identity and policy controls. Rather than building a separate virtualization stack, Windows 365 for Agents runs on identical Azure VM infrastructure with the same Intune management and Entra identity systems. Beyond creating a Cloud PC platform for AI agent workloads, Windows 365 for Agents introduces a set of capabilities designed to make agent workloads secure, scalable and cost-efficient. These enhancements go beyond simply running AI agents on Cloud PCs—they optimize how agents are provisioned, managed and controlled, while maintaining enterprise-grade security and compliance. From elastic resource pools to human-in-the-loop safeguards, these innovations help organizations automate complex tasks, reduce idle costs and ensure trust in autonomous operations.
  • Cloud PC pools: Rather than persistent 1-to-1 user assignment, agents draw from shared pools organized by team or workload. From pre-provisioned Cloud PCs for fast checkout to scheduled provisioning to reduce idle costs, elastic scaling allows organizations to dynamically adjust the resources available to agents to match business needs.
  • Check-in/check-out model: Agents check out a Cloud PC to perform a task, then check it back in for reuse. This ephemeral, task-scoped approach maximizes utilization and enables consumptive billing based on actual usage rather than fixed monthly fees.
  • Programmatic interfaces for agent control: Windows 365 for Agents interfaces to create, check out, control and observe cloud PCs will be available for third-party agent builders in the Agent 365 tooling servers.
  • Computer-using agents (CUAs): Unlike traditional robotic process automation (RPA) that relies on brittle element selectors (rules that break when a UI changes), CUAs interpret screen content visually using AI vision and they reason about what actions to take. They adapt when UIs change without breaking workflows—processing screenshots, generating action plans and executing step-by-step commands.  Code execution as well as local MCP servers, within this same environment, make for a powerful combination of capabilities in an isolated Cloud PC.
  • Human-in-the-loop: Recognizing the need for trust in autonomous systems, the platform enables the user to take control at any point during agent execution, intervene to handle complex decisions or provide credentials, then return control to the agent when finished.
  • Agent Identity: Each agent operates with a unique Microsoft Entra Agent ID authenticated via cryptographic credentials—no passwords to steal or phish. IT can distinguish agent actions from human actions in audit logs, providing granular observability of AI operations.

Windows as the platform for intelligent work

Extending Cloud PCs to digital agents reflects Microsoft's mission: empowering every person and every organization to achieve more. Just as Windows democratized personal computing and Windows 365 brought that power to the cloud, Windows 365 for Agents delivers a secure, scalable platform for digital agents to operate anytime, anywhere. We're at the threshold of a new era—where agents, built responsibly and deployed securely, become trusted collaborators in work and creativity. The Windows platform has always enabled others to build, create and innovate. Windows 365 for Agents extends that promise by giving agent builders:
  • Enterprise-grade security and compliance for AI agents
  • Programmatic tools to simplify building sophisticated workflows
  • Management capabilities that give IT teams confidence and control
  • Natural, trustworthy user experiences for streamlined integration
The principles that shaped Windows 365—security, reliability, manageability and scale—continue to guide us. The same infrastructure serving millions of human users now becomes the foundation for the next generation of intelligent work.

Ready to learn more?

Editor’s note – Jan. 22, 2026 – Text changes were made for clarity following initial publication.

Read the whole story
alvinashcraft
42 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Announcing winapp, the Windows App Development CLI

1 Share
We are excited to announce the public preview of the Windows App Development CLI (winapp), a new open-source command-line tool designed to simplify the development lifecycle for Windows applications across a wide range of frameworks and toolchains. The winapp CLI is specifically tailored for cross-platform frameworks and developers working outside of Visual Studio or MSBuild. Whether you are a web developer building with Electron, a C++ veteran using CMake, or a .NET,  Rust or Dart developer building apps for Windows, the CLI can streamline the complexities of Windows development - from setting up your environment to packaging for distribution. This makes it significantly easier to access modern APIs – including Windows AI APIs, security features and shell integrations – directly from any toolchain. Windows development often involves managing multiple SDKs, creating and editing multiple manifests, generating certificates and navigating intricate packaging requirements. The goal of this project is to unify these tasks into a single CLI, letting you focus on building great apps rather than fighting with configuration. While the CLI is still in its early days, and there are many Windows development scenarios still in the works, we’re sharing this public preview now to learn from real usage, gather feedback and feature requests, and focus our investments on the areas that matter most to developers. Here is a look at what the winapp CLI can do for you.

🛠️ One-command environment setup

A video/gif of the winapp init command running in a folder, generating manifest, assets, certificate and C++ projections.The initcommand bootstraps your entire workspace. It downloads the necessary SDK packages, generates projections (C++/WinRT to start), and configures your project for development. A process that previously required multiple error-prone manual steps is now a single CLI command that handles manifest and asset creation, certificate generation and dependency management, saving you from manually setting up your dev environment per project. To get started, in the root of your project, run: > winapp init For projects shared across multiple machines or developers, use winapp restoreto recreate the exact environment state defined in your configuration. And for CI/CD environments, use the GitHub and Azure DevOps action to automatically install the CLI.

🚀 Package Identity for debugging

A video/gif of the winapp create-debug-identity adding package identity to an exe.Many modern Windows APIs (like Windows AI APIs, Security, Notifications, MCP Hosts and more) require your application to have Package Identity. Traditionally, this meant you had to fully package and install your app just to test a single feature, requiring multiple manual and detailed configuration steps, slowing down your inner loop significantly. With the winapp CLI, you can add package identity to your executable with a single command, allowing you to continue using the same developer loop for testing and debugging any code requiring Package Identity. Simply run: > winapp create-debug-identity my-app.exe Visit our samples and guides for snippets on how to integrate this command in different toolchains for improved  debugging experience.

📜 Working with manifests and certificates

A video/gif of the winapp manifest update-assets command updating existing assets from an image.Creating a valid appxmanifest.xml and setting up a trusted development certificate are often stumbling blocks for new Windows developers. The winapp CLI automates this entirely with the init command as described above, but it also exposes commands to directly create and manage manifest and development certificates. Generate a new manifest based on your project or executable, update image assets in your existing appxmanifest.xml from an existing logo, or create and install a self-signed development certificate with a single command. The CLI can optionally install the development certificate locally so you can test your packages without additional configuration. For example, use this command to update all image assets referenced in your appxmanifest.xml from a provided image in the correct aspect ratios: > winapp manifest update-assets C:\images\my-logo.png Likewise, generating a development certificate that can be used for self-signing while sideloading and testing, you can use the following command: > winapp cert generate

📦 Simplified MSIX packaging

A video/gif of the winapp pack command packaging a folder to msix and signing the package. When you are ready to ship, packaging your application as an MSIX is just one command away. The CLI handles the packing and signing process, producing a store-ready or sideload-ready package from your build output. > winapp pack ./my-app-files --cert ./devcert.pfx

⚡ Electron integration

For Electron developers, we have packaged the CLI as a npm package and added commands to bridge the gap between Node.js and native Windows code. The CLI can scaffold C++ or C# native addons, pre-configured to access the Windows App SDK and Windows SDK. This makes it easier than ever to integrate high-performance native features or AI capabilities like Phi Silica directly into your Electron app. We also simplify the debugging loop. With winapp node add-electron-debug-identity, you can inject Package Identity directly into your running Electron process. This allows you to test and debug APIs that require identity (like the Windows AI APIs) just by calling npm start . It even handles bootstrapping the Windows App SDK for you, so you can focus on your code, not the plumbing. Watch the video below to see all of this in action: https://www.youtube.com/watch?v=WsUaymVnLGY In addition, to help validate the CLI and to simplify usage of certain APIs, we have started to leverage the CLI to build experimental NodeJS projections for APIs such as LanguageModel. Check out our @microsoft/winapp-windows-ai npm package for using Windows AI APIs directly from NodeJS.

Get started today

The Windows App Development CLI is available now in public preview. Visit our GitHub repository for documentation, guides and to file issues. We would love to hear your feedback! To get started: Install via WinGet (for general use): winget install microsoft.winappcli Install via npm (for Electron projects): npm install --save-dev @microsoft/winappcli Check out our Electron, .NET,  C++/CMAKE, or Rust guides for getting started quickly. Happy coding!
Read the whole story
alvinashcraft
47 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Build an agent into any app with the GitHub Copilot SDK

1 Share

Building agentic workflows from scratch is hard. 

You have to manage context across turns, orchestrate tools and commands, route between models, integrate MCP servers, and think through permissions, safety boundaries, and failure modes. Even before you reach your actual product logic, you’ve already built a small platform. 

GitHub Copilot SDK (now in technical preview) removes that burden. It allows you to take the same Copilot agentic core that powers GitHub Copilot CLI and embed it in any application.  

This gives you programmatic access to the same production-tested execution loop that powers GitHub Copilot CLI. That means instead of wiring your own planner, tool loop, and runtime, you can embed that agentic loop directly into your application and build on top of it for any use case. 

You also get Copilot CLI’s support for multiple AI models, custom tool definitions, MCP server integration, GitHub authentication, and real-time streaming.

What’s new in GitHub Copilot CLI  

Copilot CLI lets you plan projects or features, modify files, run commands, use custom agents, delegate tasks to the cloud, and more, all without leaving your terminal. 

Since we first introduced it, we’ve been expanding Copilot’s agentic workflows so it: 

  • Works the way you do with persistent memory, infinite sessions, and intelligent compaction. 
  • Helps you think with explore, plan, and review workflows where you can choose which model you want at each step. 
  • Executes on your behalf with custom agents, agent skills, full MCP support, and async task delegation. 

How does the SDK build on top of Copilot CLI? 

The SDK takes the agentic power of Copilot CLI (the planning, tool use, and multi-turn execution loop) and makes it available in your favorite programming language. This makes it possible to integrate Copilot into any environment. You can build GUIs that use AI workflows, create personal tools that level up your productivity, or run custom internal agents in your enterprise workflows.  

Our teams have already used it to build things like: 

  • YouTube chapter generators 
  • Custom GUIs for their agents 
  • Speech-to-command workflows to run apps on their desktops 
  • Games where you can compete with AI 
  • Summarizing tools 
  • And more! 

Think of the Copilot SDK as an execution platform that lets you reuse the same agentic loop behind the Copilot CLI, while GitHub handles authentication, model management, MCP servers, custom agents, and chat sessions plus streaming. That means you are in control of what gets built on top of those building blocks.

Start building today! Visit the SDK repository to get started.


Written by

Mario Rodriguez leads the GitHub Product team as Chief Product Officer. His core identity is being a learner and his passion is creating developer tools—so much so that he has spent the last 20 years living that mission in leadership roles across Microsoft and GitHub. Mario most recently oversaw GitHub’s AI strategy and the GitHub Copilot product line, launching and growing Copilot across thousands of organizations and millions of users. Mario spends time outside of GitHub with his wife and two daughters. He also co-chairs and founded a charter school in an effort to progress education in rural regions of the United States.

Related posts

Explore more from GitHub

Docs

Docs

Everything you need to master GitHub, all in one place.

Go to Docs

GitHub

GitHub

Build what’s next on GitHub, the place for anyone from anywhere to build anything.

Start building

Customer stories

Customer stories

Meet the companies and engineering teams that build with GitHub.

Learn more

The GitHub Podcast

The GitHub Podcast

Catch up on the GitHub podcast, a show dedicated to the topics, trends, stories and culture in and around the open source developer community on GitHub.

Listen now

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

How Mozilla builds now

1 Share
Headshot of Peter Rojas, Senior Vice President of New Products at Mozilla, wearing a gray sweater and smiling against a white background.

Mozilla has always believed that technology should empower people.

That belief shaped the early web, when browsers were still new and the idea of an open internet felt fragile. Today, the technology is more powerful, more complex, and more opaque, but the responsibility is the same. The question isn’t whether technology can do more. It’s whether it helps people feel capable, informed, and in control.

As we build new products at Mozilla today, that question is where we start.

I joined Mozilla to lead New Products almost one year ago this week because this is one of the few places still willing to take that responsibility seriously. Not just in what we ship, but in how we decide what’s worth building in the first place — especially at a moment when AI, platforms, and business models are all shifting at once.

Our mission — and mine — is to find the next set of opportunities for Mozilla and help shape the internet that all of us want to see. 

Writing up to users

One of Mozilla’s longest-held principles is respect for the people who use our products. We assume users are thoughtful. We accept skepticism as a given (it forces product development rigor — more on that later). And we design accordingly.

That respect shows up not just in how we communicate, but in the kinds of systems we choose to build and the role we expect people to play in shaping them.

You can see this in the way we’re approaching New Products work across Mozilla today: Our current portfolio includes tools like Solo, which makes it easy for anyone to own their presence on the web; Tabstack, which helps developers enable agentic experiences; 0DIN, which pools the collective expertise of over 1400 researchers from around the globe to help identify and surface AI vulnerabilities; and an enterprise version of Firefox that treats the browser as critical infrastructure for modern work, not a data collection surface.

None of this is about making technology simpler than it is. It’s about making it legible. When people understand the systems they’re using, they can decide whether those systems are actually serving them.

Experimentation that respects people’s time

Mozilla experiments. A lot. But we try to do it without treating talent and attention as an unlimited resource. Building products that users love isn’t easy and requires us to embrace the uncertainty and ambiguity that comes with zero-to-one exploration. 

Every experiment should answer a real question. It should be bounded. And it should be clear to the people interacting with it what’s being tested and why. That discipline matters, especially now. When everything can be prototyped quickly, restraint becomes part of the craft.

Fewer bets, made deliberately. A willingness to stop when something isn’t working. And an understanding that experimentation doesn’t have to feel chaotic to be effective.

Creating space for more kinds of builders

Mozilla has always believed that who builds is just as important as what gets built. But let’s be honest: The current tech landscape often excludes a lot of brilliant people, simply because the system is focused on only rewarding certain kinds of outcomes. 

We want to unlock those meaningful ideas by making experimentation more practical for people with real-world perspectives. We’re focused on lowering the barriers to building — because we believe that making tech more inclusive isn’t just a nice-to-have, it’s how you build better products.

A practical expression of this approach

One expression of this philosophy is a new initiative we’ll be sharing more about soon: Mozilla Pioneers.

Pioneers isn’t an accelerator, and it isn’t a traditional residency. It’s a structured, time-limited way for experienced builders to work with Mozilla on early ideas without requiring them to put the rest of their lives on hold.

The structure is intentional. Pioneers is paid. It’s flexible. It’s hands-on. And it’s bounded. Participants work closely with Mozilla engineers, designers, and product leaders to explore ideas that could become real Mozilla products — or could simply clarify what shouldn’t be built.

Some of that work will move forward. Some won’t. Both outcomes are valuable. Pioneers exists because we believe that good ideas don’t only come from founders or full-time employees, and that meaningful contribution deserves real support.

Applications open Jan. 26. For anyone interested (and I hope that’s a lot of you) please follow us, share and apply. In the meantime, know that what’s ahead is just one more example of how we’re trying to build with intention.

Looking ahead

Mozilla doesn’t pretend to have all the answers. But we’re clear about our commitments.

As we build new products, programs, and systems, we’re choosing clarity over speed, boundaries over ambiguity, and trust that compounds over time instead of short-term gains.

The future of the internet won’t be shaped only by what technology can do — but by what its builders choose to prioritize. Mozilla intends to keep choosing people.

The post How Mozilla builds now appeared first on The Mozilla Blog.

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Why enterprise AI breaks without metrics discipline

1 Share
An upward shot of a tree canopy.

With the rising popularity of AI at workplaces based on various use cases, it promises to help support organizations by providing long-term benefits including faster product iterations, operational efficiency, customer support cost optimizations, faster data research and boost in overall employee productivity.

Technological companies are leading the pack in AI adoption followed by banking, e-commerce, healthcare and insurance to name a few. These companies are testing out various proofs of concept (POCs) to understand how AI can help support their business and productivity use cases.

While the testing of agentic AI systems is moving swiftly, many companies still struggle to see a consistent and trustworthy impact to justify investment. The notion isn’t entirely based on the desired model performance, the number of tokens available, or the infrastructure to scale the system. Rather, it is based on the more fundamental issue of enterprise-level data definition setup to train these systems.

When there’s inconsistency in data definitions across teams — for instance, teams across geographies having different definitions of net revenue, active users, or performance marketing expense — AI systems tend to inherit ambiguity, and tend to be unreliable and ineffective. Until this foundational problem is taken care of, AI systems won’t gain adoption among users due to a lack of trust.

What is an intelligent metrics layer?

At its core, an intelligence metrics layer is a foundational semantic system that standardizes the way metrics and associated dimensions are defined, computed, aggregated, sliced, governed, and interpreted by humans and machines together. It’s a single source of truth that aligns leadership, analysts, business intelligence (BI) tools, and AI systems around consistent definitions and computing logic.

The fundamental way it differs from traditional data models is that the business context, computation, ownership, data governance, and validation checks are embedded into the metric itself, which makes it a lot faster for AI systems to train and interpret and a lot simpler for both analysts and BI systems to work using this single source of truth. This level of semantic clarity is what makes metrics durable and contributes to AI readiness.

Why AI adoption breaks without it

Garbage in, garbage out. AI systems love clean data abstraction. When metrics lack consistent definitions across the company, the model may return conflicting answers to the same question.

For teams like finance, depending on highly accurate reporting data, accuracy and consistency are of prime importance. Financial reporting can become brittle when metric definitions and underlying data change without traceability. This inconsistency in reporting across regions or currencies can lead to regulatory risk for a company that can end in hefty fines, not to mention tarnished brand image and reduced confidence among shareholders.

AI falls short in reasoning about a business that can’t clearly define its core data abstraction.

Intelligent metrics layer architecture

An intelligent metric layer typically consists of these interconnected core components:

  • Semantic definitions: Standardized business definitions independent of underlying data sources.
  • Computation and logic layer: Git version-controlled computation logic for calculation.
  • Governance and ownership: Clearly defined team accountability for metrics and data refresh service-level agreements (SLAs), central policy defining data retention and deprecation.
  • Lineage and metadata: Visibility into upstream data sources and downstream metric usage in reporting.
  • AI enablement: Structured metadata that AI systems can train on and reference to output consistent answers.

These synchronized key components transform data stemming from unreliable outputs into vetted and trustworthy metrics.

Impact on the Enterprise

The organizations that invest in intelligent metric layers as a foundation for AI systems are bound to see tangible outcomes relatively sooner. This includes faster turnaround time for report generation and analytics adoption, faster A/B testing and product iterations, reliable and audit-ready regulatory reporting, fewer escalations to leadership due to inconsistent numbers, and higher overall trust due to AI-generated interpretations leading to deep dives in the data.

With a robust semantic layer, metrics become durable assets as compared to fragile virtual datasets and queries embedded in dashboards. Agentic AI systems are contained within well-defined semantic boundaries, mitigating the risk of misinformation while bolstering organizational trust in the data.

Making Metrics AI-Ready

As we continue to see analytics evolve beyond reactive dashboards, conversation-based AI agents rely on metrics that must be interpretable by machines. This goes beyond formulas to include clear context, structured definitions and constraints. These together define the relationships between metrics and dimensions, contextual definition and guardrails around appropriate use.

When the AI systems have clear guidelines for the use of metrics and context, they can be used more effectively. Therefore, intelligent metric systems are a prerequisite to agentic AI systems, conversational analytics and AI decision systems, and are vital for overall semantic alignment.

Implementation roadmap

Building an intelligent metric system doesn’t require re-engineering the data warehouse setup. Instead, start with a focused approach:

  • Build a data dictionary of all existing metrics and categorize them from most to least business-critical.
  • Standardize metric definitions, supported by corresponding upstream data models and define ownership for those.
  • Define vetted out computational logic, version-controlled and CI/CD tracked
  • Add data governance, SLA data refresh and data quality checks.
  • Integrate metrics with BI and agentic AI tools incrementally.

The fundamental goal here is consistent progress in building the core foundation, not perfection at the first instance.

Key takeaways

Enterprise AI adoption isn’t moving as rapidly, not because companies don’t have access to the latest models, but because metric definitions are inconsistent. An intelligent layer provides a semantic data foundation that AI systems need to deliver reliable and trustworthy information.

As organizations continue to move from dashboard to conversational analytics and automated decision systems, intelligent metric layers must serve as foundational infrastructure. This investment would help unlock AI’s real value, not just based on the market’s best-performing models but through clear, consistent and shared understanding of business’s key performance indicators.

The post Why enterprise AI breaks without metrics discipline appeared first on The New Stack.

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories