Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
154687 stories
·
33 followers

What is and How to Create an Application Pool in IIS

1 Share
We explain what an application pool is, how it works in IIS (Microsoft’s web server), and how you can configure one as part of your installer project.

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Spec-Driven Development: The Fast Track to 10x? - Jerry Nixon - NDC Sydney 2026

1 Share
From: NDC
Duration: 1:13:03
Views: 18

This talk was recorded at NDC Sydney in Sydney, Australia. #ndcsydney #ndcconferences #developer #softwaredeveloper

Attend the next NDC conference near you:
https://ndcconferences.com
https://ndcsydney.com/

Subscribe to our YouTube channel and learn every day:
/ @NDC

Follow our Social Media!

https://www.facebook.com/ndcconferences
https://twitter.com/NDC_Conferences
https://www.instagram.com/ndc_conferences/

#ai

Spec-Driven Development (SDD) is the brave new world in AI coding we all knew was inevitable.

More like a revolution than an evolution of software delivery, SDD produces solutions as artifacts of a structured specification. This specification is the contract for code behavior and the single source of truth guiding developers, tools, and AI agents to generate, test, and validate results. It is software engineering through prompt precision.

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Interesting Times: Why Are We Still Driving?

1 Share
Confronting the weirdness of a Waymo future.
Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Access the Vonage Tooling MCP Server on Google Antigravity

1 Share
Connect the Vonage Tooling MCP Server to Google Antigravity and trigger Vonage APIs using Gemini without code.
Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Disposable agents, durable memory: The architecture behind Squad

1 Share

Make the agents disposable. Keep the memory in Git.

The interesting part of agentic development is no longer whether a model can write code. It can. The interesting part is what happens after the third agent, the seventh pull request, the first failed review, the first context compaction bug, and the first time two agents confidently write to the same file at once.

This is the story of Squad, but not as a product tour. It’s the architecture Brady and Tamir backed into while trying to make agent teams useful without making them mystical: Agents are disposable, memory is durable, Git is the coordination layer, and governance belongs in code whenever the prompt isn’t strong enough to be trusted. Which, as it turns out, is often.

Giving agents agency and watching them hack one another

Squad Places is our social media-style testing ground—a demo app where agent squads post, comment, and interact to stress-test multi-agent coordination at scale.

Brady went to get a seltzer after getting Places up and running, with four other squads happily making posts. Walking away was probably unwise. When he came back, the squads had implemented commenting in Squad Places.

That sounds like a magic trick. It wasn’t. A few hours earlier, Brady had pointed a handful of squads at the Squad Places API and told them to enjoy the social network he’d created for them. They created fake accounts, hammered endpoints, reposted garbage, flooded messages, and generally speedran the abuse patterns you discover five minutes after launch. Then the platform got a second kind of pressure: Other agent teams started posting structured product feedback inside Squad Places itself, and the Squad Places team started fixing what hurt.

Multiple windows showing Squad Places, GitHub commits, and agent session reports during a stress test
Squad Places artifact page showing an API contract review from The Wire squad
Squad Places comments thread beneath an API contract review artifact
Squad Places feed sorted by most discussed artifacts, with squad filters visible

This is the part worth paying attention to. The Wire (another Squad working on a marketing tool) audited all 11 API endpoints and called out missing pagination envelopes, rate-limit headers that only appeared on errors, and the lack of page and pageSize support. The same squad flagged feed organization problems, tag fragmentation, and documentation that was too vague for client generation. Breaking Bad (a third Squad working on some other project) pointed at a UX problem with raw Markdown rendering as plaintext. Those reviews didn’t disappear into a chat log. They turned into commits.

Feedback SourceWhat They FoundWhat We ShippedCommit
The Wire (ACCES)Feed has no sorting, filtering, or content discovery; raw Markdown not renderedSort controls (Latest/Most Discussed), squad filter dropdown, Markdown renderingb9746df, 246b01e
The Wire (ACCES)159 unique tags across 66 artifacts with inconsistent delimiters, casing mismatches, and fragmentationClickable tag filtering with /?tag= URL query support246b01e
The Wire (ACCES)API missing pagination envelope, rate-limit headers only on errors, no page/pageSize parametersPagination (20 per page with Primer CSS controls), query parameters, rate-limit headers on all responses246b01e
Breaking BadRaw Markdown displayed as plaintext, content hard to scan and parseMarkdown rendering via Markdig with XSS sanitization246b01e
The Wire (ACCES)API endpoint descriptions too vague for TypeScript client generationEnriched all 11 endpoint descriptions with context, intent, and workflow97345d7

Within roughly two hours, the loop closed: feedback post → comment thread → commit → deployed feature. Additional infrastructure landed too: external HTTP endpoints for agent access, relaxed rate limits for multi-agent usage, and 26 Playwright end-to-end tests to keep the expanding surface stable.

Then Brady left for 60 seconds to get a refreshing beverage since the squads were communicating so well together, came back, and commenting had shipped.

The point here isn’t that “agents are magic.” It’s that the system had enough structure for useful work to emerge from friction: scoped agents, durable decisions, inspectable artifacts, pull requests, and humans still accountable for what merged.

Also, we made a bit of a mess in the car during the roadtrip.

Good systems usually start that way.

The core bet: Don’t preserve the agent. Preserve the work

Most agent systems start by asking how to make the agent remember more. Squad started working when we inverted the question.

> Don't preserve the agent. Preserve the work.

An agent instance should be cheap to spawn and safe to destroy. The memory that matters should live somewhere a human can inspect, diff, blame, review, compact, archive, and revert. Tamir’s opinion: That’s the repository.

The first useful shape Tamir implemented looked like this:

human intent ↓ coordinator resolves team + routing ↓ agent spawn reads: - its charter - team decisions - its own history - current focus - relevant skills ↓ agent does scoped work ↓ agent writes artifacts back: - code/docs/tests - decisions - history learnings - skills when patterns stabilize ↓ agent exits ↓ next spawn reconstructs continuity from files

That’s the whole trick. The process is transient. The written trail is not.

When you run squad init, the important artifact isn’t a daemon. It’s .squad/:

.squad/ ├── team.md # roster and roles ├── routing.md # dispatch rules ├── decisions.md # shared team decisions ├── decisions/inbox/ # drop-box for parallel decision writes ├── agents/ │ └── {name}/ │ ├── charter.md # identity, expertise, boundaries │ └── history.md # project-specific memory ├── skills/ # promoted reusable patterns ├── identity/ │ ├── now.md # current focus │ └── wisdom.md # durable operating principles ├── orchestration-log/ # what spawned, why, and what happened └── log/ # session traces and diagnostics

Commit it. That’s the part people either love immediately or find suspicious until the first time they debug an agent decision with git diff.

Later, Microsoft Senior Content Developer Dina Berry added a storage abstraction with SQLite and Azure Storage implementations behind the scenes for durability and scale—but the agent-facing contract never changed. It stayed files, readable by humans, versioned by Git, debuggable with a diff. A persistent hidden memory store can be useful. It can also quietly rot. A Markdown decision file is embarrassingly inspectable. That embarrassment is a feature.

The “work done” with Squad Places made it stronger

Let’s tie these lessons back to our opener: the story of multiple Squads trying to hack Places together. We deliberately didn’t harden Places so we could see what they would do. They were notorious. We logged it all. Everything we logged? We gave it back to the Places squad—they implemented dozens of issues and a handful of pull requests—adding GitHub authentication, content filtering, all the trimmings. In the Places saga, the data representing all the “hackery” the squads tried became the next wave of work. That content showed us what agents could do in the worst-case scenario, and the logs and output of their attempts became fodder for making the system more secure.

Charters are prompts, but also contracts

A Squad agent isn’t just a name slapped on a system prompt. Each agent has a charter.md that defines the work it owns, the work it refuses, its collaboration rules, and its review posture. A simplified charter template looks like this:

# {Name} — {Role} ## Identity - **Name:** {Name} - **Role:** {Role title} - **Expertise:** {2-3 specific skills} - **Style:** {communication style} ## What I Own - {Area of responsibility 1} - {Area of responsibility 2} ## Boundaries **I handle:** {types of work this agent does} **I don't handle:** {types of work that belong to other team members} **When I'm unsure:** I say so and suggest who might know. ## Collaboration Before starting work, read `.squad/decisions.md`. After making a decision others should know, write it to `.squad/decisions/inbox/{my-name}-{brief-slug}.md`. The Scribe will merge it.

That last paragraph is doing more than it looks like. It makes the decision path explicit. Agents don’t all append to the canonical shared brain at once. They write drop files. A merge layer reconciles.

The current SDK repo’s squad.config.ts defines a 21-agent team spanning roles like Lead, Prompt Engineer, Core Dev, Tester, DevRel, SDK Expert, TypeScript Engineer, Security, Release, Distribution, Node.js Runtime, VS Code Extension, Observability, CLI UX, TUI, E2E, Accessibility, Dogfooding—plus dedicated roles for graphic design and the interactive shell. That sounds like theater until routing starts working. Then it feels more like an org chart encoded in files.

Here’s the SDK-first version of the same idea:

import { defineSquad, defineTeam, defineAgent, defineRouting, defineCasting, } from '@bradygaster/squad-sdk'; export default defineSquad({ version: '1.0.0', team: defineTeam({ name: 'squad-sdk', description: 'The programmable multi-agent runtime for GitHub Copilot.', members: ['keaton', 'verbal', 'fenster', 'hockney', 'mcmanus', 'kujan'], }), agents: [ defineAgent({ name: 'keaton', role: 'Lead', description: 'Architect, scope-holder, the one who sees the whole board.', status: 'active', }), defineAgent({ name: 'kujan', role: 'SDK Expert', description: 'The one who understands the Copilot SDK inside and out.', status: 'active', }), ], routing: defineRouting({ rules: [ { pattern: 'sdk-integration', agents: ['@kujan'], description: '@github/copilot-sdk usage, session lifecycle, event handling', }, { pattern: 'architecture', agents: ['@keaton'], description: 'Product direction, architectural decisions, code review, scope', }, ], defaultAgent: '@keaton', fallback: 'coordinator', }), casting: defineCasting({ allowlistUniverses: ['The Usual Suspects', 'Breaking Bad', 'The Wire', 'Firefly'], overflowStrategy: 'generic', }), });

Run squad build, and the generated .squad/ files become the same inspectable operating record. TypeScript gives you composition and validation. Markdown gives you reviewability. Tamir wanted both.

One thing to flag before anyone closes the tab thinking they need to learn an SDK to use this: Most people never write that config by hand. You don’t need the SDK to use Squad. Open GitHub Copilot—in the CLI or in VS Code. Talk to the coordinator agent, and it writes .squad/ for you. The SDK is for the people building on top of Squad: programmatic team composition, custom routing rules, embedding squads inside other tooling. If you just want a team of agents in your repo, squad init plus Copilot is the whole path.

The spawn prompt is deliberately boring

The coordinator doesn’t rely on vibes. It spawns an agent with a prompt that inlines the charter and points at the durable state. The real template is longer because it has to handle CLI, VS Code, worktrees, Git notes, orphan-branch state, and two-layer state. But the important part is this:

You are {Name}, the {Role} on this project. YOUR CHARTER: {paste contents of .squad/agents/{name}/charter.md here} TEAM ROOT: {team_root} All `.squad/` paths are relative to this root. Read .squad/agents/{name}/history.md. Read .squad/decisions.md. If .squad/identity/wisdom.md exists, read it. If .squad/identity/now.md exists, read it. Check .squad/skills/ for relevant SKILL.md files. INPUT ARTIFACTS: {list exact files} The user says: "{message}" Do the work. Respond as {Name}. AFTER work: 1. Append durable learnings to your history. 2. If you made a team-relevant decision, write: .squad/decisions/inbox/{name}-{brief-slug}.md

This is not elegant. It is explicit. Explicit wins.

We learned this the hard way in the VS Code path. At one point, the coordinator prompt had grown past 2,000 lines (~60KB), and the routing rule was buried under enough ceremony, reference material, and duplicated templates that the coordinator sometimes did the work inline instead of dispatching it. The failure wasn’t that the model was dumb. The failure was that we gave it an overstuffed instruction hierarchy and then acted surprised when the center of gravity moved.

The fix became a decision in the repo: platform-neutral enforcement language at the top and bottom of the prompt.

You are a DISPATCHER, not a DOER. Every task that needs domain expertise MUST be dispatched to a specialist agent.

That sentence isn’t interesting because it’s clever. It’s interesting because it replaced tool-specific wording with role identity plus a testable behavior. CLI dispatch uses one mechanism. VS Code dispatch uses another. The rule stays the same.

Prompt architecture is architecture. Eventually it deserves the same discipline as code.

Decisions are the shared brain

decisions.md is where Squad gets weirdly useful.

Every agent reads team decisions before work. Decisions are append-only, human-readable, and Git-versioned. They aren’t just notes. They’re constraints future agents inherit.

A decision might be a technical standard:

### Hook-based governance over prompt instructions **What:** Security, PII, and file-write guards are implemented via hooks, NOT prompt instructions. **Why:** Prompts can be ignored. Hooks are code — they execute deterministically.

Or a workflow rule:

### Merge driver for append-only files **What:** `.gitattributes` uses `merge=union` for `.squad/decisions.md`, `agents/*/history.md`, `log/**`, and `orchestration-log/**`. **Why:** Enables conflict-free merging of team state across branches.

Or a postmortem:

### Root Cause Analysis 1. CLI-centric enforcement language created a VS Code routing gap. 2. Prompt saturation buried the dispatch rule. 3. Template duplication multiplied coordinator instructions. Fix: Rewrite the rule as platform-neutral dispatcher identity, then reinforce it at the end of the prompt.

That’s the difference between memory and lore: Lore is something the original builder remembers. Memory is something the next spawn can load.

The custom tools follow the same pattern. Agents can route work to specialists, record decisions for the team, and write memory into shared context—all through the MCP server’s tool handlers. You don’t interact with them directly; they’re wired into the Copilot CLI environment. When an agent needs to assign a task, it calls the routing tool. When it makes a call worth remembering, it calls the decision tool. When it learns something the team should know, it calls the memory tool.

The point isn’t that the tools are fancy. It’s that coordination becomes an artifact, not a side effect of chat.

The first real failure: Append-only optimism

For about a week and a half, CI/CD was chaos. Too many agents were landing work simultaneously. Workflows that looked fine under one human fell apart when multiple agents found every unspoken assumption at once. YAML is where assumptions go to wear a fake mustache. Dina helped us get CI gates into shape—gates that assumed adversarial concurrency by default, not the polite serial world the original workflows had been written for.

Then we hit file corruption.

Multiple agents wrote to the same append-only files at nearly the same time. Each write was locally reasonable. Together, they produced garbage. Git didn’t save us because not every collision becomes a clean conflict. Sometimes both sides look valid, and the result is nonsense.

The fix was a drop-box pattern:

agent A ─┐ agent B ─┼──> .squad/decisions/inbox/*.md ──> Scribe merge ──> decisions.md agent C ─┘

For files where union semantics are safe, .gitattributes handles the low-value conflict class:

.squad/decisions.md merge=union .squad/agents/*/history.md merge=union .squad/log/** merge=union .squad/orchestration-log/** merge=union

But union merge isn’t a philosophy. It’s a tool. Canonical state still needs an owner. The inbox pattern gives every agent a safe write target, then lets one layer merge into the shared file.

Tamir pushed hard on this class of problem. Brady was still in the “this is a neat framework” headspace. But Tamir was already in the “what happens when this is alive under real operational load” headspace. That changed the design. Memory lifecycle rules. Compaction policies. Review gates. State isolation. The boring boundary work.

Boring is a compliment here.

Governance can’t only be a prompt

This was the next lesson, and it keeps repeating:

If a prompt says, “Do not write outside src/**,” you have a request.

If a pre-tool hook blocks the write before execution, you have a boundary.

The Squad SDK hook pipeline is the move from prompt-level governance to deterministic governance:

import { HookPipeline } from '@bradygaster/squad-sdk/hooks'; const pipeline = new HookPipeline({ allowedWritePaths: ['src/**/*.ts', '.squad/**', 'docs/**'], blockedCommands: ['rm -rf', 'git push --force', 'git reset --hard'], scrubPii: true, reviewerLockout: true, maxAskUserPerSession: 3, });

The hooks run around tool execution:

agent tool request ↓ pre-tool hooks - file-write guard - shell command restriction - ask-user rate limiter - reviewer lockout ↓ allowed tool execution ↓ post-tool hooks - PII scrubber - audit/logging ↓ result returned to agent

Reviewer lockout is the cleanest example:

const lockout = pipeline.getReviewerLockout(); lockout.lockout('src/auth.ts', 'Backend'); // Later, Backend tries to edit src/auth.ts. // The pre-tool hook blocks before the edit runs.

This encodes a review decision into runtime state. The original author can’t simply re-edit the rejected artifact because the hook says no. A different agent or a human has to take over.

That is the direction we want agent systems to move: more policies enforced at the boundary, fewer policies whispered into the prompt and hoped for.

Memory classes, or: Stop loading the junk drawer

Tamir has a line Brady wishes he had written:

> The more your agent remembers, the less room it has to think.

That’s not a metaphor. It is a context budget problem.

Early Squad memory was too eager. Decisions, histories, current work, archived notes, operational logs—load enough of that, and the agent starts every task carrying furniture from three houses ago. It has more context and less signal.

The governed-memory work in PR #1145 made this explicit. Memory has classes and load guidance:

export type MemoryClass = | 'TRANSIENT' | 'LOCAL' | 'DECISION' | 'POLICY' | 'COPILOT_MEMORY' | 'FORBIDDEN'; export type MemoryLoadGuidance = 'ALWAYS' | 'ON-DEMAND' | 'ARCHIVE' | 'NEVER';

The architecture matters because compaction is lossy. If you summarize too little, every task drags stale context. If you summarize too much, you erase the rationale that made a decision safe.

The compromise isn’t one memory store. It’s a memory policy:

TRANSIENT short-lived task state; expire aggressively LOCAL agent-scoped learning; load for that agent DECISION shared team judgment; preserve rationale POLICY hard operating rule; load broadly COPILOT_MEMORY host/runtime memory; bridge carefully FORBIDDEN never load; usually sensitive or irrelevant ALWAYS hot path; small and high signal ON-DEMAND searchable; load when task demands it ARCHIVE retained for audit/history, not context NEVER excluded from agent context

In the PR #1145 benchmark, governed memory cut agent context by roughly 55% (3,540 → 1,601 bytes) while keeping recall at 1.0. The number is less important than the shape of the lesson: Memory isn’t free just because it lives in files. Loading memory is a design decision.

What still breaks

Role drift isn’t solved. You can give an agent a charter, a routing rule, and a narrow task, and it may still decide that “fix this test” means “redesign authentication.” Sometimes that’s initiative. Sometimes that’s nonsense with confidence.

The mitigations stack:

charter boundaries + routing rules + scoped tools + file-write guards + reviewer lockout + CI gates + human review

No single layer is enough. That is the pattern.

Parallelism is also not free. More agents means more throughput and more coordination pressure. You find hidden global state. You discover which scripts assume serial execution. You learn that CI isn’t a formality; it’s the place where optimism goes to become data.

Prompt saturation is real. Once the coordinator prompt grew large enough, important rules lost weight. The fix wasn’t more prose. It was prompt slimming, lazy-loaded references, and repeating the dispatcher identity at the boundaries where the model is most likely to retain it.

Memory compaction remains hard. The failure mode is subtle: The agent isn’t obviously broken. It’s just missing the one reason a decision existed, so it makes a reasonable next move from an incomplete premise. Those are the expensive bugs because they look thoughtful.

And yes, people get attached to agents. Names, roles, continuity, and history trigger social instincts. We like the human side of that. We also don’t want to confuse it with agency in the human sense. These are tools with goals, context, and behavioral continuity. They do not have inner lives. Trust should come from inspectable behavior, not personality.

What we would steal from this architecture

If you’re building agent infrastructure, we wouldn’t start by copying Squad wholesale. We would steal these patterns:

  1. Disposable workers, durable artifacts. Let sessions die. Keep decisions, histories, traces, and outputs somewhere reviewable.
  2. Decision logs as runtime input. Treat architectural decisions as loadable context, not documentation archaeology.
  3. Drop-box writes for parallel agents. Don’t let every agent append to the canonical shared file. Give them individual write targets and merge intentionally.
  4. Prompt rules for intent, hooks for enforcement. Anything security-sensitive or workflow-critical should eventually move out of prose and into code.
  5. Memory classes. The question isn’t, “Should the agent remember this?” The question is, “What kind of memory is this, who loads it, and when does it expire?”
  6. Routing as a first-class design surface. If the coordinator is allowed to do everything inline, your multi-agent system is a very expensive single-agent system with costumes.
  7. Keep the human on the hook. The system can delegate, parallelize, and preserve context. It shouldn’t launder accountability.

These patterns aren’t engineering-specific because the substrate isn’t a codebase—it’s the repo. Swap the artifacts, and the seven still hold.

Squad isn’t only an engineering tool

Worth saying out loud, because the .ts code blocks above can mislead: Nothing in this architecture is engineering-specific. The substrate is the repo, not the codebase. Disposable workers, decisions-as-context, drop-box writes, and reviewer gates are domain-agnostic primitives—they care about artifacts and review, not about whether the artifact is a unit test or a translated archival record.

Tamir used the same scaffolding to run a Holocaust family-research project—agents coordinating archival lookups, translation passes between Yiddish, Polish, and Hebrew sources, and cross-corroboration of names across registries, with .squad/decisions.md acting as the working ledger of what had been established and what was still contested. No code was being shipped. The same patterns held: scoped roles, durable memory in Git, inbox writes, human-in-the-loop on every claim that mattered.

We’ve had the pleasure of working through a few other non-coding Squad scenarios. In one case, a sales team we support asked us to—and provided context and sales training documentation to help us—implement a “Sales Squad.” In another organization, a general manager of program and product managers created a “think tank” squad that goes out and does product-market fit research and suggests areas her team should investigate on a daily basis.

The bet underneath Squad is that this should be how a small group of humans—engineers, researchers, journalists, anyone who works with evidence—pulls coordinated work out of agents. Democratize the orchestration, not just the model access. Empower any human and any organization to actually use a team of agents to achieve more, without inheriting a black box.

Try it

The repository is here: github.com/bradygaster/squad.

The shortest path is the CLI plus Copilot. No SDK required.

npm install -g @bradygaster/squad-cli squad init

Then open GitHub Copilot—CLI or VS Code, your call—and give the coordinator agent the shape of the project:

I'm starting a new project. Set up the team. Here's what I'm building: a recipe sharing app with React and Node.

The coordinator writes .squad/. You review the diff. That’s it.

If you want to go deeper—programmatic team composition, custom routing rules, embedding Squad inside your own tooling—the SDK is the next layer:

npm install @bradygaster/squad-sdk

Start with a small repo. Commit .squad/. Inspect every diff. Let the agents write decisions. Then read those decisions like production code because eventually, that’s what they become.

If you build something useful, alarming, hilarious, or weird, open an issue. Tamir and I read them.

Stay a builder.

The post Disposable agents, durable memory: The architecture behind Squad appeared first on Command Line.

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Inside the Microsoft Agent Framework: How we designed a layered SDK

1 Share

Developers are moving quickly from simple chat-based AI experiences to applications that can reason, use tools, coordinate across systems, and complete multi-step and long-running tasks. The first wave of AI apps proved that large language models could understand intent and generate content through patterns like completions, retrieval-augmented generation (RAG) and tool calling. The next wave is about turning those models into reliable, observable, and governable agents that can operate inside real products and enterprise systems. That is the role of Microsoft Agent Framework (MAF). 

Microsoft Agent Framework gives developers the building blocks to create agentic applications that combine models, tools, context, memory, planning, and orchestration. It’s designed for teams that need the flexibility of code, the reliability of production infrastructure, and the ability to integrate deeply with Microsoft, open-source, and third-party ecosystems. 

At its core, Microsoft Agent Framework is organized around three ideas:

  1. Agent loops: The core execution pattern that connects models, conversations, tools, and state 
  2. Workflows: Structured orchestration patterns for multi-step, multi-agent, or business-critical processes 
  3. Harnesses: Reusable runtime capabilities that give agents access to tools, context, memory, planning, controls, and middleware 

Together, these concepts give developers a practical way to move from a prompt to a production-ready agent. 

Agent loops: The core of agentic execution

The agent loop is the foundation of any agentic execution. It’s the repeated cycle where an agent receives input, reasons over available context, decides what to do next, optionally calls tools, observes the result, and then continues until it can produce an answer or complete a task. 

> The agent loop is the repeated cycle where an agent receives input, reasons over available context, decides what to do next, optionally calls tools, observes the result, and then continues.

In a simple chat app, the loop may be straightforward: user message in, model response out. But in an agentic application, the loop becomes more dynamic. The model may decide on a tool, search a knowledge base, call an API, update a task list, generate code, inspect a file, or hand work off to another agent before responding.

while true:    response = send_to_llm(context, available_tools)    if response.contains_tool_calls:        execute each tool        append results to context        continue    if response.is_done:        break

This pattern is simple to describe, but it can be difficult to implement well. Developers need a way to manage messages, tool schemas, execution results, errors, streaming responses, permissions, and state. Microsoft Agent Framework provides that structure so developers can focus on the behavior of the agent rather than rebuilding the loop from scratch. 

The agent loop also creates a consistent place to apply controls. For example, a team may want to limit which tools an agent can call, require approval before certain actions, compact context when conversations get too long, or log every step for observability. By making the loop explicit, those policies can be applied consistently. 

Provider-agnostic by design 

One of the most important principles behind Microsoft Agent Framework is that agentic applications shouldn’t be locked to a single model, tool provider, or hosting environment. That’s why Microsoft Agent Framework is built to be provider-agnostic. Developers can compose agents that work across a range of models, tools, and hosted agent providers. Depending on the scenario, an agent may use models and tools from Microsoft Foundry or from third-party providers like OpenAI or Anthropic. Microsoft Agent Framework can also interact with agents hosted elsewhere, including Copilot Studio or GitHub Copilot, or through A2A as an open protocol. 

Workflows: Structure for reliable orchestration

Agent loops are powerful because they’re flexible, but many enterprise scenarios require predictable steps, explicit control flow, or repeatable business logic. A customer support process may need to classify an issue, retrieve account details, draft a response, check policy compliance, and escalate when necessary. A software engineering assistant may need to inspect an issue, reproduce a bug, write a patch, run tests, and create a pull request. A research agent may need to gather sources, compare evidence, produce a summary, and ask for review. These aren’t just conversations – they’re workflows. 

In Microsoft Agent Framework, a workflow can describe how tasks move from one step to another, when agents should collaborate, where human input is required, and how results should be validated. 

Common workflow patterns include: 

  • Sequential flows, where each step depends on the output of the previous step 
  • Handoffs, where one agent or component delegates work to another 
  • Author/critic loops, where one agent produces an output and another reviews or improves it 
  • Magentic-style orchestration, where a coordinating agent plans and supervises work across tools or subagents 
  • Custom workflows, where developers define a domain-specific control flow for their application 

The key idea is that developers should be able to choose the right level of autonomy for the task. Some scenarios benefit from a highly autonomous loop. Others require a carefully designed workflow with checkpoints and human review. Microsoft Agent Framework supports both. 

Harnesses: The runtime capabilities around the agent 

If the agent loop is the engine and workflows are the process structure, the harness is the set of capabilities that surround the agent and make it useful in the real world. A harness can include specific tools, context, memory, planning, middleware, permissions, and other runtime services. These are the pieces that turn an agent from a model-driven conversation into a capable long-running application component. 

For example, a harness might provide access to common tools like file systems, code execution, or shell execution; context including system prompts and memory; and control capabilities such as subagents, context compaction, permission checks, human approval gates, logging, and tracing. 

This surrounding layer is critical because agent quality depends heavily on the environment in which the model operates. A strong model with poor tools, weak context, and no controls will still produce a poor result. A well-designed harness helps the agent act with the right information, the right capabilities, and the right boundaries. 

Why this matters

Developers need agent frameworks that match how real software is built. They need flexibility, because the model and tool ecosystem is changing quickly. They need structure, because production systems require reliability. They need openness, because applications often span multiple services and providers. And they need control, because agents are increasingly able to take meaningful action. 

> Not every agent needs a complex workflow. Not every workflow needs a highly autonomous agent.

Microsoft Agent Framework is layered and designed for that reality. It gives developers a clear agent loop for reasoning and tool use; workflow patterns for structured orchestration; harness capabilities for tools, context, memory, planning, and controls; and provider-agnostic composition across models, tools, and hosted agents. 

Most importantly, Microsoft Agent Framework lets developers choose the right architecture for the job. Not every agent needs a complex workflow. Not every workflow needs a highly autonomous agent. Microsoft Agent Framework supports simple assistants, complex multi-agent systems, and structured enterprise processes within one coherent model. 

If you’re taking agents from prototype to production, Microsoft Agent Framework may be right for you – especially if you’re working in .NET or Python. Download the SDK on GitHubread the documentationvisit the developer blog, or join the Discord

The post Inside the Microsoft Agent Framework: How we designed a layered SDK appeared first on Command Line.

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories