Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
147040 stories
·
32 followers

New California law requires AI to tell you it’s AI

1 Share

A bill attempting to regulate the ever-growing industry of companion AI chatbots is now law in California, as of October 13th.

California Gov. Gavin Newsom signed into law Senate Bill 243, billed as “first-in-the-nation AI chatbot safeguards” by state senator Anthony Padilla. The new law requires that companion chatbot developers implement new safeguards — for instance, “if a reasonable person interacting with a companion chatbot would be misled to believe that the person is interacting with a human,” then the new law requires the chatbot maker to “issue a clear and conspicuous notification” that the product is strictly AI and not human. 

Starting next year, the legislation would require some companion chatbot operators to make annual reports to the Office of Suicide Prevention about safeguards they’ve put in place “to detect, remove, and respond to instances of suicidal ideation by users,” and the Office would need to post such data on its website. 

“Emerging technology like chatbots and social media can inspire, educate, and connect – but without real guardrails, technology can also exploit, mislead, and endanger our kids,” Newsom said in a statement on signing the bill, along with several other pieces of legislation aimed at improving online safety for children, including new age-gating requirements for hardware. “We can continue to lead in AI and technology, but we must do it responsibly — protecting our children every step of the way. Our children’s safety is not for sale.”

The news comes after Governor Newsom officially signed Senate Bill 53, the landmark AI transparency bill that divided AI companies and made headlines for months, into law in California. 

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Building a lasting security culture at Microsoft

1 Share

At Microsoft, building a lasting security culture is more than a strategic priority—it is a call to action. Security begins and ends with people, which is why every employee plays a critical role in protecting both Microsoft and our customers. When secure practices are woven into how we think, work, and collaborate, individual actions come together to form a unified, proactive, and resilient defense.

Over the past year, we’ve made significant strides through the Secure Future Initiative (SFI), embedding security into every layer of our engineering practices. But just as critical has been our transformation in how we educate and engage our employees. We revamped our employee security training program to tackle advanced cyberthreats like AI-enabled attacks and deepfakes. We launched the Microsoft Security Academy to empower our employees with personalized learning paths that create a relevant experience. We’ve made security culture a company-wide imperative, reinforcing vigilance, embedding secure habits into everyday work, and achieving what technology alone cannot. It is more than a mindset shift; it’s a company-wide movement, led from the top and setting a new standard for the industry.

To help other organizations take similar steps, we are introducing two new guides—focused on identity protection and defending against AI-enabled attacks—that offer actionable insights and practical tools. These resources are designed to help organizations rethink their approach in order to move beyond 101-level content and build a culture of security that is resilient, adaptive, and people-powered. Because in cybersecurity, culture is more than a defense—it is the difference between reacting to cyberthreats and staying ahead of them.

Training for proactive security: Empowering employees in a new era of advanced threats

Security is the responsibility of every Microsoft employee, and we’ve taken deliberate steps to make that responsibility tangible and actionable. Over the past year, we’ve worked hard to reinforce a security-first mindset throughout every part of the company—from engineering and operations to customer support—ensuring that security is a shared responsibility at every level. Through redesigned training, personalized guidance, regular feedback loops, and role-specific expectations, we are fostering a culture where security awareness is both instinctive and mandatory.

As cyberattackers become increasingly sophisticated, using AI, deepfakes, and social engineering, so must the way we educate and empower employees. The security training team at Microsoft has overhauled its annual learning program to reflect this urgency. Our training is thoughtfully designed to be even more accessible and inclusive, built from empathy for all job roles and the work they do. This helps ensure that all employees, regardless of background or technical expertise, can fully engage with the content and apply it in meaningful ways. The result is a lasting security culture that employees not only embrace in their work but also carry into their personal lives.

To ensure our lasting security culture is rooted in real-world cyberthreats and tactics, we’ve continued to push our Security Foundations series to feature dynamic, threat-informed content and real-world scenarios. We’ve also updated training content in traditional topics like phishing, identity spoofing, and AI-enabled cyberattacks like deepfakes. All full-time employees and interns are required to complete three sessions annually (90 minutes total), with newly created content every year.

Security training must resonate both in the workplace and at home to create a lasting impact. That is why we equip employees with a self-assessment tool that delivers personalized, risk-based feedback on identity protection, along with tailored guidance to help safeguard their identities—both on the job and in their personal lives.

The ingredients for successful security training

At Microsoft, the success of our security training programs hinges on several crucial ingredients: fresh, risk-based content; collaboration with internal experts; and a relentless focus on relevance and employee satisfaction. Rather than recycling old material, we rebuild our training from the ground up each year, driven by the changing cyberthreat landscape—not just compliance requirements. Each annual program begins with a risk-based approach informed by an extensive listening network that includes internal experts in threat intelligence, incident response, enterprise risk, security risk, and more. Together, we identify the top cyberthreats where employee judgment and decision-making are essential to keeping Microsoft secure—and how those cyberthreats are evolving.

Take social engineering, for instance. This topic is a consistent inclusion in our training because around 80% of security incidents start with a phishing incident or identity compromise. But we are not teaching phishing 101, as we expect our employees already have foundational awareness of this cyberthreat. Instead, we dive into emerging identity threats, real-world cyberattack scenarios, and examples of how cyberattackers are becoming more sophisticated and scaling faster than ever.

The impact we are making on the security culture at Microsoft is not by chance, nor is it anecdotal. The Education and Awareness team within the Office of the Chief Information Security Office (OCISO) applies behavioral science, adult learning theory, and human-centered design to the development of every Security Foundations course. This ensures that training resonates, sticks, and empowers behavioral change. We also continually measure learner satisfaction and content relevancy, both of which have climbed significantly in recent years. We attribute this positive change to the continual innovation and evolution of our content and the increased attention we pay to the learning and cultural needs of our employees.

For example, the Security Foundations training series is consistently one of the highest-rated required employee training courses at Microsoft. Our post-training surveys tell a clear story: employees see themselves as active participants in keeping Microsoft secure. They feel confident identifying threats, know how to escalate issues, and consistently reinforce that security is a top priority across roles, regions, and teams.

This was one of the best Security Foundations that I’ve taken, well done! The emphasis on deepfake possible attacks was enlightening and surprising, I thought it was a great choice to actually deepfake [our actor] to show how real it sounds and show in real time what is possible to get that emphasis. The self-assessment was also great in terms of showing the areas that I need to work on and use more caution.

—Microsoft employee

Today, engagement with the Security Foundations training is strong, with 99% of employees completing each course. Learner satisfaction continues to climb, with the net satisfaction score rising from 144 in fiscal year (FY) 2023 to 170 today. Relevancy scores have followed a similar trend, increasing from 144 in FY 2023 to 169 today.1 These scores reflect that our employees view the security training content as timely, applicable, and actionable.

Microsoft leadership sets the tone

Our security culture change started at the top, with Chief Executive Officer (CEO) Satya Nadella mandating that security be the company’s top priority. His directive to employees is clear: when security and other priorities conflict, security must always take precedence. Chief People Officer (CPO) Kathleen Hogan reinforced this commitment in a company-wide memo, stating, “Everyone at Microsoft will have security as a Core Priority. When faced with a tradeoff, the answer is clear and simple: security above all else.”

The Security Core Priority continues to enhance employee training around security at Microsoft. As of December 2024, every employee had a defined Security Core Priority and discussed their individual impact during performance check-ins with their manager. Hogan explains that this isn’t a one-time pledge, but a non-negotiable, ongoing responsibility shared by every employee. “The Security Core Priority is not a check-the-box compliance exercise; it is a way for every employee and manager to commit to—and be accountable for—prioritizing security, and a way for us to codify your contributions and to recognize you for your impact,” she said. “We all must act with a security-first mindset, speak up, and proactively look for opportunities to ensure security in everything we do.”

This commitment is embedded in how Microsoft governs and operates at the highest levels. Over the past year, the senior leadership team at Microsoft has focused on evaluating the state of our security culture and identifying ways to strengthen it. Security performance is reviewed at weekly executive meetings with deep dives into each of the six pillars of our Secure Future Initiative. The Board of Directors receives regular updates, reinforcing the message that security is a board-level concern. We’ve also reinforced our commitment to security by directly linking leadership compensation to security outcomes—elevating security to the same level of importance as growth, innovation, and financial performance. By using executive compensation as an accountability mechanism tied to specific security performance metrics, we’ve driven measurable improvements, especially in areas like secret hygiene across our code repositories.

Reinforcing security culture through engagement and hiring

Security culture is not built in a single training session; it is sustained through continuous engagement and visible reinforcement. To keep security top-of-mind, Microsoft runs regular awareness campaigns that revisit core training concepts and share timely updates across the company. These campaigns span internal platforms like Microsoft SharePoint, Teams, Viva Engage, and global digital signage in offices. This creates a consistent drumbeat that embeds security into daily workflows through reminders that reinforce key behaviors.

Launching fall 2025, the global security ambassador program will activate a grassroots network of trusted advocates within teams and departments across organizations and geographies. With a goal of reaching at least 5% employee participation, these ambassadors will serve as local champions, helping amplify initiatives, offering peer-to-peer guidance, and offering valuable feedback from the front lines. This approach not only sustains engagement but ensures Microsoft’s security strategy is informed by real-world insights from across the organization. As cyberattackers continue to grow more advanced, our employees must constantly learn and adapt. For this reason, security is a continuous journey that requires a culture of continuous improvement, where lessons from incidents are used to update policies and standards, and where employee feedback helps shape future training and engagement strategies.

Security culture is only as strong as the people who live it. That is why Microsoft is investing heavily in talent to scale its defenses through upskilling and hiring. Through the resulting increase in security engineers, we are making sure that every team, product, and customer benefits from the latest in security thinking and expertise.

Embedding security into engineering

The company leadership sets the vision, but real transformation happens when security is woven into our engineering. We are moving beyond simply applying security frameworks—reengineering how we design, test, and operate technology at scale. To drive this shift, we’ve aligned our engineering practices with the Protect Engineering Systems pillar of SFI, embedding security into every layer of development, from identity protection to threat detection. Our Microsoft Security Development Lifecycle (SDL), once published as a standalone methodology, is now deeply integrated into the Secure by Design pillar of SFI, ensuring security is part of the process, from the first line of code to final deployment.

What is DEVSECOPS?

Learn more ↗

We’ve embedded DevSecOps and shift-left strategies throughout our development lifecycle, backed by new governance models and accountability structures. Every engineering division now has a Deputy Chief Information Security Officers (CISO) responsible for embedding security into their workflows. These practices reduce costs, minimize disruption, and ultimately lead to more resilient products.

Under SFI, security is treated as a core attribute of product innovation, quality, innovation, and trust. And as Microsoft redefines how security is built into engineering, we are also transforming how it is lived. This means providing every employee with the awareness and agility needed to counter the most advanced cyberthreats.

Security culture as a matter of business trust

For Microsoft, a strong security culture helps us protect internal systems and uphold customer and partner trust. With a global presence, broad product footprint, and a customer base that spans nearly all industries, even a single lapse can have impact at a scale where even a single security lapse can have wide-reaching implications. Embedding security into every layer of the company is both complex and essential—and involves more than just cutting-edge tools or isolated policies. Our security-first employee mindset views security not as a discrete function, but as something that informs every role, decision, and workflow. And while tools are indispensable in addressing technical cyberthreats, it is culture that ensures those tools are consistently applied, refined, and scaled across the organization.

Paving the road ahead for lasting security culture

The famous quote attributed to renowned management consultant Peter Drucker that “culture eats strategy for breakfast” holds especially true in cybersecurity. No matter how well-designed a security strategy may be, it can’t succeed without a culture that supports and sustains it. Ultimately, the formula for proactive security at Microsoft is built on three connected elements: people, process, and culture. And although we’ve made meaningful progress on all three fronts, the work is never finished. The cybersecurity landscape is constantly shifting, and with each new challenge comes an opportunity to adapt, improve, and lead.

The decision by Microsoft to treat security not as an isolated discipline, but as a foundational value—something that informs how products are built, how leaders are evaluated, and how employees across the company show up every day—is a core aspect of SFI. This initiative has already led to measurable improvements, including the appointment of Deputy CISOs across engineering divisions, the redesign of employee training to reflect AI-enabled threats, and the coming launch of grassroots programs like the global Security Ambassador program.

The Microsoft Secure Future Initiative is our commitment to building a lasting culture that embeds security into every decision, every product, and every employee mindset. We invite others to join us and transform how security is lived. Because in the current threat landscape, culture is not just a defense—it makes the difference.

Culture in practices: Tools to build a security-first mindset

To reinforce a security-first mindset across work and home, we’ve developed the following resources for our internal employees. We are also making them available for you to help drive the same commitment in your organization.

Microsoft Deputy CISOs

To hear more from Microsoft Deputy CISOs, check out the OCISO blog series.

To stay on top of important security industry updates, explore resources specifically designed for CISOs, and learn best practices for improving your organization’s security posture, join the Microsoft CISO Digest distribution list.

A man sits at a laptop computer. The right side of the image is shaded in blue to symbolize security.

To learn more about Microsoft Security solutions, go to our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us on LinkedIn (Microsoft Security) and X (@MSFTSecurity) for the latest news and updates on cybersecurity.


1Microsoft internal data

The post Building a lasting security culture at Microsoft appeared first on Microsoft Security Blog.

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

How to build reliable AI workflows with agentic primitives and context engineering

1 Share

Many developers begin their AI explorations with a prompt. Perhaps you started the same way: You opened GitHub Copilot, started asking questions in natural language, and hoped for a usable output. This approach can work for simple fixes and code suggestions, but as your needs get more complex—or as your work gets more collaborative—you’re going to need a more foolproof strategy. 

This guide will introduce you to a three-part framework that transforms this ad-hoc style of AI experimentation into a repeatable and reliable engineering practice. At its core are two concepts: agentic primitives, which are reusable, configurable building blocks that enable AI agents to work systematically; and context engineering, which ensures your AI agents always focus on the right information. By familiarizing yourself with these concepts, you’ll be able to build AI systems that can not only code independently, but do so reliably, predictably, and consistently.

An AI-native development framework, showing spec-driven development and agent workflows at the top, context engineering (including roles, rules, context, and memory) below, and prompt engineering (including role activation, context loading, tool invocation, and validation gates) at the base.
The AI-native development framework
Markdown prompt engineering + agent primitives + context engineering = reliability

Whether you’re new to AI-native development or looking to bring deeper reliability to your agent workflows, this guide will give you the foundation you need to build, scale, and share intelligent systems that learn and improve with every use.

What are agent primitives? 

The three-layer framework below turns ad-hoc AI experimentation into a reliable, repeatable process. It does this by combining the structure of Markdown; the power of agent primitives, simple building blocks that give your AI agents clear instructions and capabilities; and smart context management, so your agents always get the right information (not just more information). 

Layer 1: Use Markdown for more strategic prompt engineering

We’ve written about the importance of prompt engineering. But here’s what you need to know: The clearer, more precise, more context-rich your prompt, the better, more accurate your outcome. This is where Markdown comes in. With Markdown’s structure (its headers, lists, and links), you can naturally guide AI’s reasoning, making outputs more predictable and consistent. 

To provide a strong foundation for your prompt engineering, try these techniques with Markdown as your guide: 

  • Context loading: [Review existing patterns](./src/patterns/). In this case, links become context injection points that pull in relevant information, either from files or websites.
  • Structured thinking: Use headers and bullets to create clear reasoning pathways for the AI to follow.
  • Role activation: Use phrases like “You are an expert [in this role].” This triggers specialized knowledge domains and will focus the AI’s responses.
  • Tool integration: Use MCP tool tool-name. This lets your AI agent run code in a controlled, repeatable, and predictable way on MCP servers.
  • Precise language: Eliminate ambiguity through specific instructions.
  • Validation gates: “Stop and get user approval.” Make sure there is always human oversight at critical decision points.

For example, instead of saying, Find and fix the bug, use the following:

You are an expert debugger, specialized in debugging complex programming issues.

You are particularly great at debugging this project, which architecture and quirks can be consulted in the [architecture document](./docs/architecture.md). 

Follow these steps:

1. Review the [error logs](./logs/error.log) and identify the root cause. 

2. Use the `azmcp-monitor-log-query` MCP tool to retrieve infrastructure logs from Azure.  

3. Once you find the root cause, think about 3 potential solutions with trade-offs

4. Present your root cause analysis and suggested solutions with trade-offs to the user and seek validation before proceeding with fixes - do not change any files.

Once you’re comfortable with structured prompting, you’ll quickly realize that manually crafting perfect prompts for every task is unsustainable. (Who has the time?) This is where the second step comes in: turning your prompt engineering insights into reusable, configurable systems.

Layer 2: Agentic primitives: Deploying your new prompt engineering techniques

Now it’s time to implement all of your new strategies more systematically, instead of prompting ad hoc. These configurable tools will help you do just that.

Core agent primitives

When it comes to AI-native development, a core agent primitive refers to a simple, reusable file or module that provides a specific capability or rule for an agent. 

Here are some examples:

  • Instructions files: Deploy structured guidance through modular .instructions.md files with targeted scope. At GitHub, we offer custom instructions to give Copilot repository-specific guidance and preferences. 
  • Chat modes: Deploy role-based expertise through .chatmode.md files with MCP tool boundaries that prevent security breaches and cross-domain interference. For example, professional licenses that keep architects from building and engineers from planning.
  • Agentic workflows: Deploy reusable prompts through .prompt.md files with built-in validation.
  • Specification files: Create implementation-ready blueprints through .spec.md files that ensure repeatable results, whether the work is done by a person or by AI.
  • Agent memory files: Preserve knowledge across sessions through .memory.md files.
  • Context helper files: Optimize information retrieval through .context.md files.

This transformation might seem complex, but notice the pattern: What started as an ad-hoc request became a systematic workflow with clear handoff points, automatic context loading, and built-in validation. 

When you use these files and modules, you can keep adjusting and improving how your AI agent works at every step. Every time you iterate, you make your agent a little more reliable and consistent. And this isn’t just random trial and error — you’re following a structured, repeatable approach that helps you get better and more predictable results every time you use the AI.

💡 Native VS Code support: While VS Code natively supports .instructions.md, .prompt.md, and .chatmode.md files, this framework takes things further with .spec.md, .memory.md, and .context.md patterns that unlock even more exciting possibilities AI-powered software development.

With your prompts structured and your agentic primitives set up, you may encounter a new challenge: Even the best prompts and primitives can fail when they’re faced with irrelevant context or they’re competing for limited AI attention. The third layer, which we’ll get to next, addresses this through strategic context management.

Layer 3: Context engineering: Helping your AI agents focus on what matters

Just like people, LLMs have finite limited memory (context windows), and can sometimes be forgetful. If you can be strategic about the context you give them, you can help them focus on what’s relevant and enable them to get started and work quicker. This helps them preserve valuable context window space and improve their reliability and effectiveness.

Here are some techniques to make sure they get the right context—this is called context engineering: 

  • Session splitting: Use distinct agent sessions for different development phases and tasks. For example, use one session for planning, one for implementation, and one for testing. If an agent has fresh context, it’ll have better focus. It’s always better to have a fresh context window for complex tasks. 
  • Modular and custom rules and instructions: Apply only relevant instructions through targeted .instructions.md files using applyTo YAML frontmatter syntax. This preserves context space for actual work and reduces irrelevant suggestions. 
  • Memory-driven development: Leverage agent memory through .memory.md files to maintain project knowledge and decisions across sessions and time.
  • Context optimization: Use .context.md context helper files strategically to accelerate information retrieval and reduce cognitive load. 
  • Cognitive focus optimization: Use chat modes in .chatmode.md files to keep the AI’s attention on relevant domains and prevent cross-domain interference. Less context pollution means you’ll have more consistent and accurate outputs. 

Agentic workflows: The complete system in action

Now that you understand all three layers, you can see how they combine into agentic workflows—complete, systematic processes where all of your agentic primitives are working together, understanding your prompts, and using only the context they need.  

These agentic workflows can be implemented as .prompt.md files that coordinate multiple agentic primitives into processes, designed to work whether executed locally in your IDE, in your terminal or in your CI pipelines.

Tooling: how to scale agent primitives

Now that you understand the three-layer framework and that the agentic primitives are essentially executable software written in natural language, the question is: How can you scale these Markdown files beyond your individual development workflow?

Natural language as code

The answer mirrors every programming ecosystem’s evolution. Just like JavaScript evolved from browser scripts to using Node.js runtimes, package managers, and deployment tooling, agent primitives need similar infrastructure to reach their full potential.

This isn’t just a metaphor: These .prompt.md and .instructions.md files represent a genuine new form of software development that requires proper tooling infrastructure.

Here’s what we mean: Think of your agent primitives as real pieces of software, just written in natural language instead of code. They have all the same qualities: You can break complex tasks into smaller pieces (modularity), use the same instructions in multiple places (reusability), rely on other tools or files (dependencies), keep improving and updating them (evolution), and share them across teams (distribution).

That said, your natural language programs are going to need the same infrastructure support as any other software.  

Agent CLI runtimes

Most developers start by creating and running agent primitives directly in VS Code with GitHub Copilot, which is ideal for interactive development, debugging, and refining daily workflows. However, when you want to move beyond the editor—to automate your workflows, schedule them, or integrate them into larger systems—you need agent CLI runtimes like Copilot CLI

These runtimes let you execute your agent primitives from the command line and tap into advanced model capabilities. This shift unlocks automation, scaling, and seamless integration into production environments, taking your natural language programs from personal tools to powerful, shareable solutions. 

Runtime management

While VS Code and GitHub Copilot handle individual development, some teams may want additional infrastructure for sharing, versioning, and productizing their agent primitives. Managing multiple Agent CLI runtimes can become complex quickly, with different installation procedures, configuration requirements, and compatibility matrices.

APM (Agent Package Manager) solves this by providing unified runtime management and package distribution. Instead of manually installing and configuring each vendor CLI, APM handles the complexity while preserving your existing VS Code workflow.

Here’s how runtime management works in practice:

# Install APM once
curl -sSL https://raw.githubusercontent.com/danielmeppiel/apm/main/install.sh | sh

# Optional: setup your GitHub PAT to use GitHub Copilot CLI
export GITHUB_COPILOT_PAT=your_token_here

# APM manages runtime installation for you
apm runtime setup copilot          # Installs GitHub Copilot CLI
apm runtime setup codex            # Installs OpenAI Codex CLI

# Install MCP dependencies (like npm install)
apm install

# Compile Agent Primitive files to Agents.md files
apm compile

# Run workflows against your chosen runtime
# This will trigger 'copilot -p security-review.prompt.md' command 
# Check the example apm.yml file a bit below in this guide
apm run copilot-sec-review --param pr_id=123

As you can see, your daily development stays exactly the same in VS Code, APM installs and configures runtimes automatically, your workflows run regardless of which runtime is installed, and the same apm run command works consistently across all runtimes.

Distribution and packaging

Agent primitives’ similarities to traditional software become most apparent when you get to the point of wanting to share them with your team or deploying them into production—when you start to require things like package management, dependency resolution, version control, and distribution mechanisms.

Here’s the challenge: You’ve built powerful agent primitives in VS Code and your team wants to use them, but distributing Markdown files and ensuring consistent MCP dependencies across different environments becomes unwieldy. You need the equivalent of npm for natural language programs.

APM provides this missing layer. It doesn’t replace your VS Code workflow—it extends it by creating distributable packages of agent primitives complete with dependencies, configuration, and runtime compatibility that teams can share, just like npm packages.

Package management in practice

# Initialize new APM project (like npm init)
apm init security-review-workflow

# Develop and test your workflow locally
cd security-review-workflow 
apm compile && apm install
apm run copilot-sec-review --param pr_id=123

# Package for distribution (future: apm publish)
# Share apm.yml and Agent Primitive files with team
# Team members can install and use your primitives
git clone your-workflow-repo
cd your-workflow-repo && apm compile && apm install
apm run copilot-sec-review --param pr_id=456

The benefits compound quickly: You can distribute tested workflows as versioned packages with dependencies, automatically resolve and install required MCP servers, track workflow evolution and maintain compatibility across updates, build on (and contribute to) shared libraries from the community, and ensure everyone’s running the same thing.

Project configuration

The following  apm.yml configuration file serves as the package.json equivalent for agent primitives, defining scripts, dependencies, and input parameters:

# apm.yml - Project configuration (like package.json)
name: security-review-workflow
version: 1.2.0
description: Comprehensive security review process with GitHub integration

scripts:
  copilot-sec-review: "copilot --log-level all --log-dir copilot-logs --allow-all-tools -p security-review.prompt.md"
  codex-sec-review: "codex security-review.prompt.md"
  copilot-debug: "copilot --log-level all --log-dir copilot-logs --allow-all-tools -p security-review.prompt.md"
  
dependencies:
  mcp:
    - ghcr.io/github/github-mcp-server

With this, your agent primitives can now be packaged as distributable software with managed dependencies.

Production deployment

The final piece of the tooling ecosystem enables continuous AI: packaged agent primitives can now run automatically in the same CI/CD pipelines you use every day, bringing your carefully developed workflows into your production environment.

Using APM GitHub Action, and building on the security-review-workflow package example above, here’s how the same APM project deploys to production with multi-runtime flexibility:

# .github/workflows/security-review.yml
name: AI Security Review Pipeline
on: 
  pull_request:
    types: [opened, synchronize]

jobs:
  security-analysis:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        # Maps to apm.yml scripts
        script: [copilot-sec-review, codex-sec-review, copilot-debug]  
    permissions:
      models: read
      pull-requests: write
      contents: read
    
    steps:
    - uses: actions/checkout@v4
    
    - name: Run Security Review (${{ matrix.script }})
      uses: danielmeppiel/action-apm-cli@v1
      with:
        script: ${{ matrix.script }}
        parameters: |
          {
            "pr_id": "${{ github.event.pull_request.number }}"
          }
      env:
        GITHUB_COPILOT_PAT: ${{ secrets.COPILOT_CLI_PAT }}

Key connection: The matrix.script values (copilot-sec-review, codex-sec-review, copilot-debug) correspond exactly to the scripts defined in the apm.yml configuration above. APM automatically installs the MCP dependencies (ghcr.io/github/github-mcp-server) and passes the input parameters (pr_id) to your security-review.prompt.md workflow.

Here’s why this matters: 

  • Automation: Your AI workflows now run on their own, without anyone needing to manually trigger them.
  • Reliability: They run with the same consistency and reproducibility as traditional code deployments.
  • Flexibility: You can run different versions or types of analysis (mapped to different scripts) as needed.
  • Integration: These workflows become part of your organization’s standard CI/CD pipelines, just like regular software quality checks.

This setup ultimately means your agent primitives are no longer just local experiments—they are fully automated tools that you can rely on as part of your software delivery process, running in CI/CD whenever needed, with all dependencies and parameters managed for you.

Ecosystem evolution

This progression follows the same predictable pattern as every successful programming ecosystem. Understanding this pattern helps you see where AI-native development is heading and how to position your work strategically.

The evolution happens in four stages:

  1. Raw Code → agent primitives (.prompt.md, .instructions.md files)
  2. Runtime environments → Agent CLI runtimes 
  3. Package managementAPM (distribution and orchestration layer)
  4. Thriving ecosystem → Shared libraries, tools, and community packages

Just as npm enabled JavaScript’s explosive growth by solving the package distribution problem, APM enables the agent primitive ecosystem to flourish by providing the missing infrastructure layer that makes sharing and scaling natural language programs practical.

The transformation is profound: what started as individual Markdown files in your editor becomes a systematic software development practice with proper tooling, distribution, and production deployment capabilities.

How to get started with building your first agent primitive

Now it’s time to build your first agent primitives. Here’s the plan: 

  1. Start with instructions: Write clear instructions that tell the AI exactly what you want it to do and how it should behave.
  2. Add chat modes: Set up special rules (chat modes) to create safe boundaries for the AI, making sure it interacts in the way you want and avoids unwanted behavior.
  3. Build reusable prompts: Create prompt templates for tasks you do often, so you don’t have to start from scratch each time. These templates help the AI handle common jobs quickly and consistently.
  4. Create specification templates: Make templates that help you plan out what you want your AI to accomplish, then turn those plans into actionable steps the AI can follow.

Instructions architecture

Instructions form the bedrock of reliable AI behavior: They’re the rules that guide the agent without cluttering your immediate context. Rather than repeating the same guidance in every conversation, instructions embed your team’s knowledge directly into the AI’s reasoning process.

The key insight is modularity: instead of one massive instruction file that applies everywhere, you can create targeted files that activate only when working with specific technologies or file types. This context engineering approach keeps your AI focused and your guidance relevant.

✅ Quick actions:

🔧 Tools and files:

.github/
├── copilot-instructions.md          # Global repository rules
└── instructions/
    ├── frontend.instructions.md     # applyTo: "**/*.{jsx,tsx,css}"
    ├── backend.instructions.md      # applyTo: "**/*.{py,go,java}"
    └── testing.instructions.md      # applyTo: "**/test/**"

Example: Markdown prompt engineering in Instructions with frontend.instructions.md:

---
applyTo: "**/*.{ts,tsx}"
description: "TypeScript development guidelines with context engineering"
---
# TypeScript Development Guidelines


## Context Loading
Review [project conventions](../docs/conventions.md) and 
[type definitions](../types/index.ts) before starting.


## Deterministic Requirements
- Use strict TypeScript configuration
- Implement error boundaries for React components
- Apply ESLint TypeScript rules consistently


## Structured Output
Generate code with:
- [ ] JSDoc comments for all public APIs
- [ ] Unit tests in `__tests__/` directory
- [ ] Type exports in appropriate index files

⚠️ Checkpoint: Instructions are context-efficient and non-conflicting.

Chat modes configuration

With your instruction architecture in place, you still need a way to enforce domain boundaries and prevent AI agents from overstepping their expertise. Chat modes solve this by creating professional boundaries similar to real-world licensing. For example, you’d want your architect to plan a bridge and not build it themself. 

Here’s how to set those boundaries: 

  • Define domain-specific custom chat modes with MCP tool boundaries.
  • Encapsulate tech stack knowledge and guidelines per mode.
  • Define the most appropriate LLM model for your chat mode.
  • Configure secure MCP tool access to prevent cross-domain security breaches.
💡 Security through MCP tool boundaries: Each chat mode receives only the specific MCP tools needed for their domain. Giving each chat mode only the tools it needs keeps your AI workflows safe, organized, and professionally separated—just like real-world roles and permissions.

🔧 Tools and files:

.github/
└── chatmodes/
    ├── architect.chatmode.md             # Planning specialist - designs, cannot execute
    ├── frontend-engineer.chatmode.md     # UI specialist - builds interfaces, no backend access
    ├── backend-engineer.chatmode.md      # API specialist - builds services, no UI modification
    └── technical-writer.chatmode.md      # Documentation specialist - writes docs, cannot run code

Example: Creating MCP tool boundaries with backend-engineer.chatmode.md:

---
description: 'Backend development specialist with security focus'
tools: ['changes', 'codebase', 'editFiles', 'runCommands', 'runTasks', 
        'search', 'problems', 'testFailure', 'terminalLastCommand']
model: Claude Sonnet 4
---
You are a backend development specialist focused on secure API development, database design, and server-side architecture. You prioritize security-first design patterns and comprehensive testing strategies.


## Domain Expertise
- RESTful API design and implementation
- Database schema design and optimization  
- Authentication and authorization systems
- Server security and performance optimization


You master the backend of this project thanks to you having read all [the backend docs](../../docs/backend).


## Tool Boundaries
- **CAN**: Modify backend code, run server commands, execute tests
- **CANNOT**: Modify client-side assets

You can also create security and professional boundaries, including:

  • Architect mode: Allow access to research tools only, so they can’t execute destructive commands or modify production code.
  • Frontend engineer mode: Allow access to UI development tools only, so they can’t access databases or backend services.
  • Backend engineer mode: Allow access to API and database tools only, so they can’t modify user interfaces or frontend assets.
  • Technical writer mode: Allow access to documentation tools only, so they can’t run code, deploy, or access sensitive systems.

⚠️ Checkpoint: Each mode has clear boundaries and tool restrictions.

Agentic workflows

Agentic workflows can be implemented as reusable .prompt.md files that orchestrate all your primitives into systematic, repeatable end-to-end processes. These can be executed locally or delegated to independent agents. Here’s how to get started: 

  • Create .prompt.md files for complete development processes.
  • Build in mandatory human reviews.
  • Design workflows for both local execution and independent delegation.

🔧 Tools and files:

.github/prompts/
├── code-review.prompt.md           # With validation checkpoints
├── feature-spec.prompt.md          # Spec-first methodology
└── async-implementation.prompt.md  # GitHub Coding Agent delegation

Example: Complete agentic workflow with feature-spec.prompt.md:

---
mode: agent
model: gpt-4
tools: ['file-search', 'semantic-search', 'github']
description: 'Feature implementation workflow with validation gates'
---
# Feature Implementation from Specification


## Context Loading Phase
1. Review [project specification](${specFile})
2. Analyze [existing codebase patterns](./src/patterns/)
3. Check [API documentation](./docs/api.md)


## Deterministic Execution
Use semantic search to find similar implementations
Use file search to locate test patterns: `**/*.test.{js,ts}`


## Structured Output Requirements
Create implementation with:
- [ ] Feature code in appropriate module
- [ ] Comprehensive unit tests (>90% coverage)
- [ ] Integration tests for API endpoints
- [ ] Documentation updates


## Human Validation Gate
🚨 **STOP**: Review implementation plan before proceeding to code generation.
Confirm: Architecture alignment, test strategy, and breaking change impact.

⚠️ Checkpoint: As you can see, these prompts include explicit validation gates.

Specification templates

There’s often a gap between planning (coming up with what needs to be built) and implementation (actually building it). Without a clear, consistent way to document requirements, things can get lost in translation, leading to mistakes, misunderstandings, or missed steps. This is where specification templates come in. These templates ensure that both people and AI agents can take a concept (like a new feature or API) and reliably implement it. 

Here’s what these templates help you accomplish: 

  • Standardize the process: You create a new specification for each feature, API endpoint, or component.
  • Provide blueprints for implementation: These specs include everything a developer (or an AI agent) needs to know to start building: the problem, the approach, required components, validation criteria, and a checklist for handoff.
  • Make handoff deterministic: By following a standard, the transition from planning to doing is clear and predictable.

🔧 Tools and files: 

Spec-kit is a neat tool that fully implements a specification-driven approach to agentic coding. It allows you to easily get started with creating specs (spec.md), an implementation plan (plan.md) and splitting that into actual tasks (tasks.md) ready for developers or coding agents to work on.

⚠️ ️Checkpoint: Specifications are split into tasks that are implementation-ready before delegation.

Ready to go? Here’s a quickstart checklist

You now have a complete foundation for systematic AI development. The checklist below walks through the implementation sequence, building toward creating complete agentic workflows.

Conceptual foundation

  1. Understand Markdown prompt engineering principles (semantic structure, precision, and tools).
  2. Grasp context engineering fundamentals (context window optimization and session strategy).

Implementation steps

  1. Create .github/copilot-instructions.md with basic project guidelines (context engineering: global rules).
  2. Set up domain-specific .instructions.md files with applyTo patterns (context engineering: selective loading).
  3. Configure chat modes for your tech stack domains (context engineering: domain boundaries).
  4. Create your first .prompt.md agentic workflow.
  5. Build your first .spec.md template for feature specifications (you can use spec-kit for this).
  6. Practice a spec-driven approach with session splitting: plan first, split into tasks, and lastly, implement.

Take this with you

Working with AI agents shouldn’t have to be unpredictable. With the right planning and tools, these agents can quickly become a reliable part of your workflow and processes—boosting not only your own productivity, but your team’s too. 

Ready for the next phase of multi-agent coordination delegation? Try GitHub Copilot CLI to get started > 

The post How to build reliable AI workflows with agentic primitives and context engineering appeared first on The GitHub Blog.

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

BETA: Microsoft 365 Copilot Frontier – Researcher Agent, Analyst Agent & more

1 Share

This is apparently still news to some so to highlight the preview/beta program for Copilot, here’s a recap of the “Microsoft 365 Copilot Frontier” program.

(Note: Before you read too far, the “Frontier” agents in this program are not yet available for Microsoft 365 GCC customers however it is roadmaps for release in the near future.)

[taken from the Frontier program announcement in May 2025]
Beginning to rollout in phases in May, Frontier offers our customers the ability to get hands on with the latest model innovation and provide feedback before experiences are made generally available. Researcher and Analyst are the first Frontier experiences with more experiences to come to Frontier over time. These experiences are being made available under the existing preview terms of your enterprise product terms. Frontier experiences allow for personal data processing, adhering to the Data Processing Agreement (DPA).

Our first Frontier experiences – Researcher and Analyst – will be available in the App (agent) store. Both agents will be labeled with ending in “(Frontier)”. All Microsoft 365 Copilot licensed users will be able to discover the agents by opening their Microsoft 365 Copilot Chat experience on the web, clicking “All Agents” and navigating to “Built by Microsoft,” the “agents” or “productivity” categories of the store. There are no limits on usage of Researcher or Analyst for end-users, subject to change. As these features are still in development, they will only be made available in English.  

For more information, visit the Microsoft 365 Copilot Frontier Program adoption page at:
https://adoption.microsoft.com/en-us/copilot/frontier-program/

For more recent information about the Frontier program:



Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

The Architect’s Dilemma

1 Share

The agentic AI landscape is exploding. Every new framework, demo, and announcement promises to let your AI assistant book flights, query databases, and manage calendars. This rapid advancement of capabilities is thrilling for users, but for the architects and engineers building these systems, it poses a fundamental question: When should a new capability be a simple, predictable tool (exposed via the Model Context Protocol, MCP) and when should it be a sophisticated, collaborative agent (exposed via the Agent2Agent Protocol, A2A)?

The common advice is often circular and unhelpful: “Use MCP for tools and A2A for agents.” This is like telling a traveler that cars use motorways and trains use tracks, without offering any guidance on which is better for a specific journey. This lack of a clear mental model leads to architectural guesswork. Teams build complex conversational interfaces for tasks that demand rigid predictability, or they expose rigid APIs to users who desperately need guidance. The outcome is often the same: a system that looks great in demos but falls apart in the real world.

In this article, I argue that the answer isn’t found by analyzing your service’s internal logic or technology stack. It’s found by looking outward and asking a single, fundamental question: Who is calling your product/service? By reframing the problem this way—as a user experience challenge first and a technical one second—the architect’s dilemma evaporates.

This essay draws a line where it matters for architects: the line between MCP tools and A2A agents. I will introduce a clear framework, built around the “Vending Machine Versus Concierge” model, to help you choose the right interface based on your consumer’s needs. I will also explore failure modes, testing, and the powerful Gatekeeper Pattern that shows how these two interfaces can work together to create systems that are not just clever but truly reliable.

Two Very Different Interfaces

MCP presents tools—named operations with declared inputs and outputs. The caller (a person, program, or agent) must already know what it wants, and provide a complete payload. The tool validates, executes once, and returns a result. If your mental image is a vending machine—insert a well-formed request, get a deterministic response—you’re close enough.

A2A presents agents—goal-first collaborators that converse, plan, and act across turns. The caller expresses an outcome (“book a refundable flight under $450”), not an argument list. The agent asks clarifying questions, calls tools as needed, and holds onto session state until the job is done. If you picture a concierge—interacting, negotiating trade-offs, and occasionally escalating—you’re in the right neighborhood.

Neither interface is “better.” They are optimized for different situations:

  • MCP is fast to reason about, easy to test, and strong on determinism and auditability.
  • A2A is built for ambiguity, long-running processes, and preference capture.

Bringing the Interfaces to Life: A Booking Example

To see the difference in practice, let’s imagine a simple task: booking a specific meeting room in an office.

The MCP “vending machine” expects a perfectly structured, machine-readable request for its book_room_tool. The caller must provide all necessary information in a single, valid payload:

{
  "jsonrpc": "2.0",
  "id": 42,
  "method": "tools/call",
  "params": {
    "name": "book_room_tool",
    "arguments": {
      "room_id": "CR-104B",
      "start_time": "2025-11-05T14:00:00Z",
      "end_time": "2025-11-05T15:00:00Z",
      "organizer": "user@example.com"
    }
  }
}

Any deviation—a missing field or incorrect data type—results in an immediate error. This is the vending machine: You provide the exact code of the item you want (e.g., “D4”) or you get nothing.

The A2A “concierge, an “office assistant” agent, is approached with a high-level, ambiguous goal. It uses conversation to resolve ambiguity:

User: “Hey, can you book a room for my 1-on-1 with Alex tomorrow afternoon?”
Agent: “Of course. To make sure I get the right one, what time works best, and how long will you need it for?”

The agent’s job is to take the ambiguous goal, gather the necessary details, and then likely call the MCP tool behind the scenes once it has a complete, valid set of arguments.

With this clear dichotomy established—the predictable vending machine (MCP) versus the stateful concierge (A2A)—how do we choose? As I argued in the introduction, the answer isn’t found in your tech stack. It’s found by asking the most important architectural question of all: Who is calling your service?

Step 1: Identify your consumer

  1. The machine consumer: A need for predictability
    Is your service going to be called by another automated system, a script, or another agent acting in a purely deterministic capacity? This consumer requires absolute predictability. It needs a rigid, unambiguous contract that can be scripted and relied upon to behave the same way every single time. It cannot handle a clarifying question or an unexpected update; any deviation from the strict contract is a failure. This consumer doesn’t want a conversation; it needs a vending machine. This nonnegotiable requirement for a predictable, stateless, and transactional interface points directly to designing your service as a tool (MCP).
  2. The human (or agentic) consumer: A need for convenience
    Is your service being built for a human end user or for a sophisticated AI that’s trying to fulfill a complex, high-level goal? This consumer values convenience and the offloading of cognitive load. They don’t want to specify every step of a process; they want to delegate ownership of a goal and trust that it will be handled. They’re comfortable with ambiguity because they expect the service—the agent—to resolve it on their behalf. This consumer doesn’t want to follow a rigid script; they need a concierge. This requirement for a stateful, goal-oriented, and conversational interface points directly to designing your service as an agent (A2A).

By starting with the consumer, the architect’s dilemma often evaporates. Before you ever debate statefulness or determinism, you first define the user experience you are obligated to provide. In most cases, identifying your customer will give you your definitive answer.

Step 2: Validate with the four factors

Once you have identified who calls your service, you have a strong hypothesis for your design. A machine consumer points to a tool; a human or agentic consumer points to an agent. The next step is to validate this hypothesis with a technical litmus test. This framework gives you the vocabulary to justify your choice and ensure the underlying architecture matches the user experience you intend to create.

  1. Determinism versus ambiguity
    Does your service require a precise, unambiguous input, or is it designed to interpret and resolve ambiguous goals? A vending machine is deterministic. Its API is rigid: GET /item/D4. Any other request is an error. This is the world of MCP, where a strict schema ensures predictable interactions. A concierge handles ambiguity. “Find me a nice place for dinner” is a valid request that the agent is expected to clarify and execute. This is the world of A2A, where a conversational flow allows for clarification and negotiation.
  2. Simple execution versus complex process
    Is the interaction a single, one-shot execution, or a long-running, multistep process? A vending machine performs a short-lived execution. The entire operation—from payment to dispensing—is an atomic transaction that is over in seconds. This aligns with the synchronous-style, one-shot model of MCP. A concierge manages a process. Booking a full travel itinerary might take hours or even days, with multiple updates along the way. This requires the asynchronous, stateful nature of A2A, which can handle long-running tasks gracefully.
  3. Stateless versus stateful
    Does each request stand alone or does the service need to remember the context of previous interactions? A vending machine is stateless. It doesn’t remember that you bought a candy bar five minutes ago. Each transaction is a blank slate. MCP is designed for these self-contained, stateless calls. A concierge is stateful. It remembers your preferences, the details of your ongoing request, and the history of your conversation. A2A is built for this, using concepts like a session or thread ID to maintain context.
  4. Direct control versus delegated ownership
    Is the consumer orchestrating every step, or are they delegating the entire goal? When using a vending machine, the consumer is in direct control. You are the orchestrator, deciding which button to press and when. With MCP, the calling application retains full control, making a series of precise function calls to achieve its own goal. With a concierge, you delegate ownership. You hand over the high-level goal and trust the agent to manage the details. This is the core model of A2A, where the consumer offloads the cognitive load and trusts the agent to deliver the outcome.
FactorTool (MCP)Agent (A2A)Key question
DeterminismStrict schema; errors on deviationClarifies ambiguity via dialogueCan inputs be fully specified up front?
ProcessOne-shotMulti-step/long-runningIs this atomic or a workflow?
StateStatelessStateful/sessionfulMust we remember context/preferences?
ControlCaller orchestratesOwnership delegatedWho drives: the caller or callee?

Table 1: Four question framework

These factors are not independent checkboxes; they are four facets of the same core principle. A service that is deterministic, transactional, stateless, and directly controlled is a tool. A service that handles ambiguity, manages a process, maintains state, and takes ownership is an agent. By using this framework, you can confidently validate that the technical architecture of your service aligns perfectly with the needs of your customer.

No framework, no matter how clear…

…can perfectly capture the messiness of the real world. While the “Vending Machine Versus Concierge” model provides a robust guide, architects will eventually encounter services that seem to blur the lines. The key is to remember the core principle we’ve established: The choice is dictated by the consumer’s experience, not the service’s internal complexity.

Let’s explore two common edge cases.

The complex tool: The iceberg
Consider a service that performs a highly complex, multistep internal process, like a video transcoding API. A consumer sends a video file and a desired output format. This is a simple, predictable request. But internally, this one call might kick off a massive, long-running workflow involving multiple machines, quality checks, and encoding steps. It’s a hugely complex process.

However, from the consumer’s perspective, none of that matters. They made a single, stateless, fire-and-forget call. They don’t need to manage the process; they just need a predictable result. This service is like an iceberg: 90% of its complexity is hidden beneath the surface. But because its external contract is that of a vending machine—a simple, deterministic, one-shot transaction—it is, and should be, implemented as a tool (MCP).

The simple agent: The scripted conversation
Now consider the opposite: a service with very simple internal logic that still requires a conversational interface. Imagine a chatbot for booking a dentist appointment. The internal logic might be a simple state machine: ask for a date, then a time, then a patient name. It’s not “intelligent” or particularly flexible.

However, it must remember the user’s previous answers to complete the booking. It’s an inherently stateful, multiturn interaction. The consumer cannot provide all the required information in a single, prevalidated call. They need to be guided through the process. Despite its internal simplicity, the need for a stateful dialogue makes it a concierge. It must be implemented as an agent (A2A) because its consumer-facing experience is that of a conversation, however scripted.

These gray areas reinforce the framework’s central lesson. Don’t get distracted by what your service does internally. Focus on the experience it provides externally. That contract with your customer is the ultimate arbiter in the architect’s dilemma.

Testing What Matters: Different Strategies for Different Interfaces

A service’s interface doesn’t just dictate its design; it dictates how you validate its correctness. Vending machines and concierges have fundamentally different failure modes and require different testing strategies.

Testing MCP tools (vending machines):

  • Contract testing: Validate that inputs and outputs strictly adhere to the defined schema.
  • Idempotency tests: Ensure that calling the tool multiple times with the same inputs produces the same result without side effects.
  • Deterministic logic tests: Use standard unit and integration tests with fixed inputs and expected outputs.
  • Adversarial fuzzing: Test for security vulnerabilities by providing malformed or unexpected arguments.

Testing A2A agents (concierges):

  • Goal completion rate (GCR): Measure the percentage of conversations where the agent successfully achieved the user’s high-level goal.
  • Conversational efficiency: Track the number of turns or clarifications required to complete a task.
  • Tool selection accuracy: For complex agents, verify that the right MCP tool was chosen for a given user request.
  • Conversation replay testing: Use logs of real user interactions as a regression suite to ensure updates don’t break existing conversational flows.

The Gatekeeper Pattern

Our journey so far has focused on a dichotomy: MCP or A2A, vending machine or concierge. But the most sophisticated and robust agentic systems do not force a choice. Instead, they recognize that these two protocols don’t compete with each other; they complement each other. The ultimate power lies in using them together, with each playing to its strengths.

The most effective way to achieve this is through a powerful architectural choice we can call the Gatekeeper Pattern.

In this pattern, a single, stateful A2A agent acts as the primary, user-facing entry point—the concierge. Behind this gatekeeper sits a collection of discrete, stateless MCP tools—the vending machines. The A2A agent takes on the complex, messy work of understanding a high-level goal, managing the conversation, and maintaining state. It then acts as an intelligent orchestrator, making precise, one-shot calls to the appropriate MCP tools to execute specific tasks.

Consider a travel agent. A user interacts with it via A2A, giving it a high-level goal: “Plan a business trip to London for next week.”

  • The travel agent (A2A) accepts this ambiguous request and starts a conversation to gather details (exact dates, budget, etc.).
  • Once it has the necessary information, it calls a flight_search_tool (MCP) with precise arguments like origin, destination, and date.
  • It then calls a hotel_booking_tool (MCP) with the required city, check_in_date, and room_type.
  • Finally, it might call a currency_converter_tool (MCP) to provide expense estimates.

Each tool is a simple, reliable, and stateless vending machine. The A2A agent is the smart concierge that knows which buttons to press and in what order. This pattern provides several significant architectural benefits:

  • Decoupling: It separates the complex, conversational logic (the “how”) from the simple, reusable business logic (the “what”). The tools can be developed, tested, and maintained independently.
  • Centralized governance: The A2A gatekeeper is the perfect place to implement cross-cutting concerns. It can handle authentication, enforce rate limits, manage user quotas, and log all activity before a single tool is ever invoked.
  • Simplified tool design: Because the tools are just simple MCP functions, they don’t need to worry about state or conversational context. Their job is to do one thing and do it well, making them incredibly robust.

Making the Gatekeeper Production-Ready

Beyond its design benefits, the Gatekeeper Pattern is the ideal place to implement the operational guardrails required to run a reliable agentic system in production.

  • Observability: Each A2A conversation generates a unique trace ID. This ID must be propagated to every downstream MCP tool call, allowing you to trace a single user request across the entire system. Structured logs for tool inputs and outputs (with PII redacted) are critical for debugging.
  • Guardrails and security: The A2A Gatekeeper acts as a single point of enforcement for critical policies. It handles authentication and authorization for the user, enforces rate limits and usage quotas, and can maintain a list of which tools a particular user or group is allowed to call.
  • Resilience and fallbacks: The Gatekeeper must gracefully manage failure. When it calls an MCP tool, it should implement patterns like timeouts, retries with exponential backoff, and circuit breakers. Critically, it is responsible for the final failure state—escalating to a human in the loop for review or clearly communicating the issue to the end user.

The Gatekeeper Pattern is the ultimate synthesis of our framework. It uses A2A for what it does best—managing a stateful, goal-oriented process—and MCP for what it was designed for—the reliable, deterministic execution of a task.

Conclusion

We began this journey with a simple but frustrating problem: the architect’s dilemma. Faced with the circular advice that “MCP is for tools and A2A is for agents,” we were left in the same position as a traveler trying to get to Edinburgh—knowing that cars use motorways and trains use tracks but with no intuition on which to choose for our specific journey.

The goal was to build that intuition. We did this not by accepting abstract labels, but by reasoning from first principles. We dissected the protocols themselves, revealing how their core mechanics inevitably lead to two distinct service profiles: the predictable, one-shot “vending machine” and the stateful, conversational “concierge.”

With that foundation, we established a clear, two-step framework for a confident design choice:

  1. Start with your customer. The most critical question is not a technical one but an experiential one. A machine consumer needs the predictability of a vending machine (MCP). A human or agentic consumer needs the convenience of a concierge (A2A).
  2. Validate with the four factors. Use the litmus test of determinism, process, state, and ownership to technically justify and solidify your choice.

Ultimately, the most robust systems will synthesize both, using the Gatekeeper Pattern to combine the strengths of a user-facing A2A agent with a suite of reliable MCP tools.

The choice is no longer a dilemma. By focusing on the consumer’s needs and understanding the fundamental nature of the protocols, architects can move from confusion to confidence, designing agentic ecosystems that are not just functional but also intuitive, scalable, and maintainable.



Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

OpenAI Killed Off Cheap ChatGPT Wrappers… Or Did It?

1 Share

In one of the major announcements at their Dev Day conference last week, OpenAI unveiled AgentKit, a new suite of tools designed to make it easier to build agentic workflows.

What does this mean for anyone building products on top of the OpenAI platform?

Is OpenAI competing with us?

Should we be excited, worried, or just ignore the hype?

Let’s dive in.

What tools are in the AgentKit?

AgentKit isn’t a single product – it’s a set of tools designed to work together seamlessly.

It builds on OpenAI’s existing Agents SDK, adding a visual no-code Agent Builder, out-of-the-box UI support with ChatKit, and simple integration for file search, web search, and external MCP servers.

Agent Builder is a visual workflow orchestration tool, similar to n8n, Langflow, and others.

Starting from an initial user input, you add nodes to a graph, each node representing an action or workflow step. The key one is the Agent node, which invokes the OpenAI model of your choice. Alongside LLM instructions and input data, the Agent node can access external data from file storage, vector databases, MCP connections, or web search.

If you’ve used the OpenAI Assistants API or Agents SDK, this will sound familiar. The Agent Builder is simply a more user-friendly interface for building the same functionality. You can even download your workflow as Python or TypeScript source code using the Agents SDK and run it locally.

This makes it great for rapid prototyping, but you can also publish (deploy) your workflow and invoke it from the client via – you guessed it – the Agents SDK.

Compared to tools like n8n, the Agent Builder has fewer options and focuses exclusively on AI workflows. However, it’s tightly integrated with the rest of the OpenAI platform and free to use – you only pay for LLM tokens.

ChatKit, a React-based UI component framework, is another new addition. It makes it easy to create chatbot-style UIs for agentic workflows without needing a dedicated frontend team. ChatKit provides a basic chat interface and supports custom widgets, which can even be uploaded to Agent Builder in a low-code fashion.

The good, the bad, and the (not-so) ugly news

AgentKit is great news for teams building in-house AI tools, especially for non-devs.

While it still needs some developer setup for production, iterating on workflows, prompts, and agent behavior is completely no-code. It’s also a powerful prototyping tool for product owners exploring new AI features or creating quick proofs of concept.

For AI solution builders, AgentKit will likely make a lot of existing chatbot code obsolete. Does that make you obsolete? Only if your product is a simple “chat with your documents” wrapper. If that’s the case, the writing’s been on the wall for a while.

But if your product has complex domain logic, your workflow design and instructions are your real value. That’s the hard part – the code is just an implementation detail. In that case, AgentKit frees you from boilerplate and lets you focus on the high-value work. That’s good news!

The main caveat: building on AgentKit ties you to the OpenAI platform. With the upgraded API, Agents SDK, and now AgentKit, OpenAI is clearly moving up the API value chain.

The original LLM API has become a de facto standard, making it easy to swap in other LLMs like Claude, which made the models somewhat of a commodity. But using AgentKit makes it harder to switch later, since you’d have to reimplement many components. Not necessarily a problem, but something to keep in mind.

Hot or not?

Does AgentKit spell doom for developers? No.

Like other workflow automation and low-code tools, it’s not replacing devs anytime soon. If anything, it’ll save you from writing repetitive boilerplate or endless tweaks requested by product owners and non-tech teammates. Your job is safe – maybe even less tedious.

It will probably kill off a few cheap ChatGPT wrappers. But the more interesting ones – those with domain expertise, specialized logic, and proprietary prompts – will be fine and could benefit.

AgentKit is an incremental but important update. If you’re building any kind of AI-enabled product – whether a quick prototype, an internal tool, or a new product – it’s worth checking out.

The post OpenAI Killed Off Cheap ChatGPT Wrappers… Or Did It? appeared first on ShiftMag.

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories