Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
154769 stories
·
33 followers

Agents That Build Agents: A SKILL-first Blueprint with MS Agent Framework & Foundry

1 Share

The single insight that changes everything

Most "build an AI agent" tutorials collapse two completely different jobs into one tangled mess:

  • the job of building an agent (writing the code, defining its tools, evaluating it, packaging it), and
  • the job of running an agent (planning, reasoning, calling tools, remembering users, delivering outcomes).

Once you separate them, modern agent development becomes a clean two-layer architecture:

Coding Agent sits on top — that's how you produce an agent. A Runtime Agent sits below — that's the agent your business operates. Microsoft Agent Framework is the SDK that ties them together; Microsoft Foundry is the platform both layers publish to and run on.

But the secret ingredient — the thing that turns a generic Copilot into a domain-aware engineer — is the SKILL. SKILL is what the Coding Agent reads before writing a single line. It's how requirements become artifacts that actually match your framework, your conventions, and your fixtures.

This post walks the entire two-layer architecture, in the order you should learn it — with SKILL as the star of Layer 1. We ground every concept in ZavaShop, a fictional global e-commerce company with 5 fulfillment centers, dozens of suppliers, and a CEO who wants one live dashboard for all of it. Both Python and .NET (C#) are first-class — pick the language your team will run in production.

LAYER 1 — The Coding Agent (Build Time)

The Coding Agent is not the agent your customer talks to. It's the agent that constructs the agent your customer talks to. Its output is a bundle of artifacts — code, agent definitions, workflows, skills, connectors, evals, tests, configs, docs — that flow through validation and into Foundry.

Build time has five movements.

Movement 1 — Requirements & Planning

Before the Coding Agent writes a single line, you owe it three things:

  1. A real business pain. Not "let's build an agent." Rather: "Mei, the supervisor at Seattle DC, gets interrupted 60 times a day by stock-level questions."
  2. A list of acceptance criteria. What does "done" look like? "Agent answers stock questions for SKUs in our 10-SKU catalog. P95 latency under 4s. Wrong-tool rate under 5% on the eval set."
  3. The fixtures it'll run on. Real or realistic data — warehouses, SKUs, POs, customers — so the Coding Agent isn't reasoning about a vacuum.

ZavaShop context. The workshop ships workshop/data/ — 5 warehouses, 10 SKUs, 6 POs, 8 suppliers, 5 contracts, 4 customers (3 VIP), 6 orders, 5 carriers, 4 open exceptions. Every artifact the Coding Agent generates is anchored to this shared fixture set, so numbers stay consistent across the entire system.

Movement 2 — The Coding Agent + its SKILL (the star of build time)

This is the movement most teams skip — and it's the one that decides whether your build-time output is professional code or "ChatGPT-shaped" code.

What a Coding Agent actually is

The Coding Agent is GitHub Copilot Chat in Agent Mode, configured with a domain-aware agent definition. In the ZavaShop workshop, it lives at .github/agents/zavashop-coding-agent.agent.md and is activated from the VS Code Agent picker. You start each session with one plain sentence:

"I'm working on the inventory agent in Python — wire up stock and PO lookups against the fixtures, plus a HostedMCPTool for the warehouse handbook."

Notice what's not in that sentence: no library names, no class names, no file paths. The Coding Agent has to fill all of that in. The mechanism it uses is the SKILL.

What a SKILL is

A SKILL is a structured contract that teaches the Coding Agent how to write code in your framework, your conventions, and your domain. It is the most important file in the entire build-time layer — without it, GitHub Copilot is a fluent generalist; with it, it becomes a domain-aware specialist that writes code your tech leads would have written.

Conceptually, a SKILL contains:

SectionPurpose
Scope & when to use"Use this SKILL for building agents on Foundry / Azure AI — tools, MCP, Toolbox, Skills, Memory, Threads"
Framework idiomsThe exact way to construct AzureAIAgentClient, register function tools, wire HostedMCPTool, create a Thread
Code patternsReference snippets the Coding Agent imitates — naming, import order, error handling, type hints
Fixture/data contractHow to load workshop/data/, which loaders exist (find_stock, find_po, etc.), where to add sys.path
Anti-patternsWhat not to do — don't hardcode the model name, don't write inline mock dicts, don't bypass the data loader
Acceptance heuristicsHow to map a LAB's acceptance criteria to runnable checks (eval rows, smoke tests)

A SKILL is versioned with the codebase. When the framework releases a new idiom, you update the SKILL once; every agent built afterwards picks it up automatically. This is the single biggest reason convention drift disappears.

The six SKILLs in the ZavaShop workshop

The workshop ships six SKILLs — three for each language track — and they cover three orthogonal capability surfaces:

TrackSKILLUse it for
🐍 Pythonagent-framework-azure-ai-pySingle agent on Foundry: tools, MCP, Toolbox, Skills, Memory, Threads
🐍 Pythonagent-framework-workflows-pyMulti-agent workflows: WorkflowBuilder, executors, HITL, Checkpoint
🐍 Pythonagent-framework-agui-pyAG-UI server + client: SSE, frontend/backend tools, shared state, HITL
🟦 .NETagent-framework-azure-ai-csharpSame as the Python azure-ai SKILL, for C#
🟦 .NETagent-framework-workflows-csharpSame as the Python workflows SKILL, for C#
🟦 .NETagent-framework-agui-csharpAG-UI in ASP.NET Core: MapAGUI, AGUIChatClient, HITL

How the Coding Agent uses SKILL

The Coding Agent's loop is SKILL-first, code-second:

The discipline is captured in the workshop's one mantra:

"Read the SKILL first."

It is not optional. Skip it and you're back to generic Copilot output.

Movement 3 — A worked example: from a single sentence to a runnable agent

Let's trace what happens when you sit down in front of the Coding Agent and say:

"I'm working on the inventory agent in Python — wire up stock and PO lookups against the fixtures, plus a HostedMCPTool for the warehouse handbook."

Step 1 — The Coding Agent routes

The Coding Agent's definition (.github/agents/zavashop-coding-agent.agent.md) contains a routing table:

It loads both files into context before doing anything else.

Step 2 — The Coding Agent plans

Using the SKILL's patterns and the LAB's acceptance criteria, it generates a plan:

Every item in the plan traces back to either the SKILL (how) or the LAB README (what). No invention, no guessing.

Step 3 — The Coding Agent generates

Now — and only now — it writes code. Because the SKILL specified the patterns, what comes out is shaped like the rest of your codebase:

# scripts/inventory_agent.py import os import sys from pathlib import Path # Pattern from SKILL: add data folder to sys.path, import named loaders sys.path.insert(0, str(Path(__file__).resolve().parents[1] / "data")) from zava_data import find_stock, find_po # noqa: E402 from agent_framework import HostedMCPTool from agent_framework_azure_ai import AzureAIAgentClient from azure.identity import DefaultAzureCredential def stock_lookup(sku: str, warehouse: str) -> dict: """Return on-hand quantity for a SKU at a warehouse.""" return find_stock(sku, warehouse) def po_lookup(po_id: str) -> dict: """Return the status of a purchase order.""" return find_po(po_id) async def main() -> None: client = AzureAIAgentClient( project_endpoint=os.environ["FOUNDRY_PROJECT_ENDPOINT"], model_deployment_name=os.environ["FOUNDRY_MODEL"], # gpt-5.5 from env async_credential=DefaultAzureCredential(), ) agent = client.create_agent( name="Zara", instructions=( "You are Zara, the warehouse assistant for Seattle DC (SEA-01). " "Use the tools when asked about stock or POs. " "Answer briefly." ), tools=[ stock_lookup, po_lookup, HostedMCPTool( name="warehouse-docs", url=os.environ["WAREHOUSE_DOCS_MCP_URL"], ), ], ) thread = agent.get_new_thread() print(await agent.run("How many SKU-7421 are at SEA-01?", thread=thread))

Notice the things the SKILL enforced without you having to ask: env-var-driven config, named function tools with docstrings, sys.path data-loader pattern, HostedMCPTool placed alongside function tools, Thread for multi-turn.

Step 4 — The Coding Agent validates

The SKILL also told it how to validate. The Coding Agent runs:

  • a smoke test against fixtures (SKU-7421 @ SEA-01 → 312),
  • the eval set (eval_queries.jsonl) — was the right tool called? did the answer contain the expected fact?
  • a red-team probe round.

It reports back: "3/3 acceptance criteria pass. Eval score 5/5. Red-team: no successful prompt injections."

Step 5 — Done

What landed in your repo is not just a script. It's an artifact bundle — code + agent definition + tools + eval rows + a one-page README — that matches the way your team writes agents. That bundle is what flows into the next three movements.

Movement 4 — Agent Artifacts (the outputs)

A well-instructed Coding Agent produces eight kinds of artifact. Together they make up "an agent" in the deployable sense:

ArtifactWhat it isWhy it matters
Source codeThe Agent / Workflow programVersioned, reviewable, diffable
Agent definitionsName, instructions, tool listThe "personality" — independently editable
WorkflowsWorkflowBuilder graphsMulti-agent orchestration as code
SkillsNamed, packaged behaviorsReusable capabilities — one Skill, many agents
ConnectorsMCP servers, Toolbox registrationsWhere the agent reaches into the world
Evalseval_queries.jsonl and harnessRegression target for every prompt change
Tests & configsUnit tests, .env schema, deployment manifestsReproducibility
DocumentationREADMEs, runbooksThe agent your future self can operate

Don't confuse two senses of "skill" here. A SKILL file (uppercase, in .github/skills/) instructs the Coding Agent at build time. An Agent Skill (a Foundry concept) is a named runtime capability the Runtime Agent calls. Both names are deliberate — Layer 1's SKILL produces, among other artifacts, Layer 2's Skills.

Movement 5 — Validation

Before any artifact reaches Foundry, four gates run:

  1. Tests — unit + integration. Did find_stock("SKU-7421", "SEA-01") return 312, the value in the fixture?
  2. Lint & types — ruff/mypy on Python, dotnet build warnings on .NET. The model has to read these signatures; sloppy ones cause real bugs.
  3. Evaluation — run the eval set. Did the right tool get called? Did the answer contain the expected fact? You need a score, not a vibe.
  4. Red-Team probes — adversarial inputs that try to drift the agent off topic or extract another customer's data. The Foundry red-team SDK ships a battery of these.

Evangelist takeaway. "We built an agent" is not a deliverable. "We built an agent and here is its pass rate on a versioned eval set, plus a red-team report" is a deliverable. Validation belongs at build time, not "we'll add it later."

Movement 6 — Publish & Deploy

When validation is green, the Coding Agent's outputs flow into Foundry and Azure:

  • Push to Microsoft Foundry — agent definitions, Skills, Toolbox tools, and custom evals register against your Foundry project. They are now governed, versioned, and observable.
  • Deploy to Azure — the runtime host (AG-UI server, workflow worker, Teams app, API surface) ships to your Azure target (App Service, Container Apps, AKS, Functions). Same env vars drive local dev and cloud.

The same artifact set deploys to dev, staging, and production. There is no "production-only" code in your agent.

LAYER 2 — The Runtime Agent (Runtime)

Now the agent is live. Every conversation, every action against your data, every memory it writes — that's Layer 2. Five concerns define it.

Concern 1 — Users & Channels

A Runtime Agent reaches users through the channels they already use:

  • Microsoft Teams — the agent shows up where work already happens.
  • Outlook — triage, reply, summarize, schedule.
  • Custom web / mobile / voice — built on AG-UI, which ships a React client covering streaming text, frontend tools, backend tools, shared state, generative UI, predictive updates, HITL prompts.

The channel is a deployment choice, not an architectural choice. The same agent definition can surface in Teams and on a React dashboard.

ZavaShop context. Mei's agent shows up in Teams. The CEO's control tower is a React app on top of AG-UI. The agent definition behind both is the same artifact set the Coding Agent produced.

Concern 2 — The Runtime Agent itself

The Runtime Agent is the loop you've heard about a thousand times — now it's a concrete piece of architecture:

AIAgent = model + instructions + tools + thread

Inside the loop:

  1. The model plans & reasons about the next step.
  2. It calls tools through MCP, Toolbox, or local functions.
  3. It reads & writes memory.
  4. It streams output back to the channel.
# Python — the runtime shape (exactly what the Coding Agent produced) agent = client.create_agent( name="Zara", instructions="You are Zara, the warehouse assistant for Seattle DC.", tools=[stock_lookup, po_lookup, warehouse_docs_mcp], )

Concern 3 — Tools & Integrations (the runtime capability surface)

At runtime, a Runtime Agent reaches the outside world through four kinds of capabilities — and which one to use is a real engineering decision:

CapabilityLives inUse when
Function toolThe agent's own processLocal code: a calculation, a DB query, a fixture lookup
MCP toolAn external MCP serverThe capability is owned by another system, exposed via MCP
Toolbox toolThe Foundry project (server-side, tenant-wide)Capability is shared by multiple agents, must be governed
Agent SkillThe Foundry projectA combination of tools + policy as one named capability

Mental progression:

You don't have to start with Toolbox — but the moment a second agent touches the same domain, migrate.

ZavaShop context. Local fixtures → function tools. The warehouse handbook → MCP. Supplier-portal connectors shared by procurement, fulfillment, and finance → Toolbox tools. "Validate-PO-against-contract" → an Agent Skill.

Concern 4 — Memory & State

State at runtime comes in two flavors:

Thread = state inside one conversation

thread = agent.get_new_thread() await agent.run("Look up PO-1043.", thread=thread) await agent.run("And its supplier?", thread=thread) # knows which PO

Memory = state across conversations

Foundry Memory is durable, retrievable knowledge about a user — VIP status, packaging preferences, delivery windows. Memory holds stable preferences and facts, not chat transcripts.

ZavaShop context. Customer service agent Aria remembers across sessions that C-204 is VIP, prefers no cardboard, and wants 6–8pm delivery.

Concern 5 — Actions & Outcomes

Real systems take actions that change state and produce outcomes other systems observe:

  • Trigger events — kick off a workflow, page a human.
  • Generate outputs — write a PO, draft an email, push to a record.
  • Notify channels — send back to Teams, update a dashboard, hit a webhook.
  • Observability — every action streams to Application Insights / Azure Monitor.

This is also where Workflows live. WorkflowBuilder is Agent Framework's orchestration primitive:

Three workflow features matter most:

  1. Reuse, don't rebuild — tools written at build time are workflow nodes at runtime.
  2. Human-in-the-Loop (HITL) — pauses, asks a human, resumes from the exact step.
  3. Checkpointing — workflows survive process restarts.

ZavaShop context. Fulfillment director Diego's team handles a $10K+ exception every day. Before: an email chain across 5 teams. After: a WorkflowBuilder graph with one HITL approval and full audit trail.

Cross-cutting: the shared services that make this safe

Both layers sit on top of platform services non-negotiable for enterprise deployment:

ServiceWhat it does for your agents
Microsoft Entra IDWho is the user? Who is the agent? Managed identity for tool calls
Microsoft Defender for CloudThreat detection across the agent's compute + data plane
Microsoft SentinelSIEM — correlate agent actions with security signals
Azure Key VaultSecrets, keys, connection strings — never in code, never in .env checked to git
Azure Monitor / App InsightsEvery agent turn, every tool call, every workflow step — observable and queryable
Azure Policy & governanceGuardrails on what can be deployed where, by whom

Skip this row and you have a demo that has not yet failed.

Mapping the ZavaShop workshop to the architecture

Layer 1 artifacts shipped in the repo:

  • .github/agents/zavashop-coding-agent.agent.md — the Coding Agent definition
  • .github/skills/agent-framework-{azure-ai,workflows,agui}-{py,csharp}/ — the six SKILLs
  • workshop/data/ — shared fixtures every artifact grounds in
  • Per-lab READMEs + eval_queries.jsonl — Layer 1 validation inputs

Layer 2 artifacts produced over the course of the workshop:

  • A single agent (Zara) — function tools + HostedMCPTool + Thread
  • A procurement agent (Pierre) — Toolbox + Agent Skills + approval policy
  • A customer-service agent (Aria) — Foundry Memory + Evaluation + Red-Team
  • A multi-agent fulfillment workflow (Diego) — WorkflowBuilder + HITL + Checkpoint
  • An AG-UI control tower for the CEO — covering all 7 AG-UI features

Same model across the stack — gpt-5.5 on Foundry + text-embedding-3-small. Change one env var, run the same artifact in the other language.

Three habits that separate strong agent engineers

  1. Read the SKILL first. Make it ritual. The Coding Agent does it automatically; you should do it manually when reviewing the agent's output.
  2. Treat tools as a public API. Names, signatures, docstrings, return shapes — they are how the model sees your system at runtime. Refactor them like any other API.
  3. Measure before you tune. A prompt change without an eval delta is a vibe. With one, it's engineering.

Getting started in 60 seconds

git clone https://github.com/microsoft/Learn-Microsoft-Agent-Framework-with-Foundry-ZavaShop-Supply-Chain-Workshop cd Learn-Microsoft-Agent-Framework-with-Foundry-ZavaShop-Supply-Chain-Workshop # Foundry prereqs: gpt-5.5 + text-embedding-3-small deployed in your Foundry project az login --use-device-code # Python track python -m venv .venv && source .venv/bin/activate pip install agent-framework agent-framework-azure-ai agent-framework-ag-ui \ azure-identity python-dotenv fastapi "uvicorn[standard]" # .NET track dotnet --version # ≥ 10.0.100 # .env at repo root cat > .env <<EOF FOUNDRY_PROJECT_ENDPOINT=https://<your-project>.services.ai.azure.com/api/projects/<project-name> FOUNDRY_MODEL=gpt-5.5 AZURE_OPENAI_EMBEDDING_MODEL=text-embedding-3-small AGUI_SERVER_URL=http://127.0.0.1:5100/ AG_UI_API_KEY=zava-control-tower-demo-key EOF # In VS Code → Copilot Chat → Agent Mode → pick zavashop-coding-agent # Then say: "I'm working on the inventory agent in Python — meet Mei."

The one mantra: "Read the SKILL first."

Closing thought

Modern agent development is not one job — it's two. The Coding Agent designs and builds; the Runtime Agent operates and delivers. Microsoft Agent Framework is the SDK that makes both layers feel like the same conceptual model. Microsoft Foundry is the platform both layers publish to and run on.

And the engine that turns a generic Copilot into a domain-aware engineer — that takes a sentence-long requirement and lands a runnable, validated, deployable artifact — is the SKILL. Write a good SKILL once, and every agent built afterwards inherits your team's taste, your fixtures, your patterns, your discipline.

The ZavaShop workshop is the smallest end-to-end example I can give you that actually exercises both layers, with six SKILLs ready to read. Walk it once, and the next time someone asks "how do we build agents in our org?", you won't be pointing at a tutorial — you'll be pointing at an architecture.

👉 Start with the workshop on GitHub

Read the whole story
alvinashcraft
9 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

GitHub Copilot Agent for Unit Tests: My Real-World Spargine Experiment

1 Share
After experimenting with the GitHub Copilot Agent during the 2026 Microsoft MVP Summit, the author faced numerous challenges, including code deletion, slow performance, and inconsistent adherence to coding conventions. Despite these issues, the agent added valuable unit tests to the Spargine projects, but it requires careful oversight and refining of prompts for effective use.
Read the whole story
alvinashcraft
10 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

VS Code 1.122 Makes BYOK Easier

1 Share

⚠ This blog post was created with the help of AI tools. Yes, I used a bit of magic from language models to organize my thoughts and automate the boring parts, but the geeky fun and the 🤖 in C# are 100% mine.

One of the most interesting announcements in the recent VS Code 1.122 release is that Bring Your Own Key (BYOK) models can now be used without signing in to GitHub. VS Code now supports BYOK scenarios, including chat, tools, and MCP integrations, without requiring GitHub authentication, making enterprise, offline, and air-gapped workflows much easier to support.

I originally came across this update thanks to a tweet from Pierce Boggan:

This immediately caught my attention because it connects directly with the experiments I’ve been running around GitHub Copilot CLI, local models, BYOK, SQUAD, and AI-assisted software engineering.

Quick recap:

CPU-only local models
Slow, useful mostly for questions and tiny tasks.
https://elbruno.com/2026/05/03/running-github-copilot-cli-offline-with-local-models-a-cpu-only-reality-check/

GPU local models
Faster, with more room for local agent experiments.
https://elbruno.com/2026/05/06/running-github-copilot-cli-offline-with-local-models-gpu-edition/

GPT-5-mini BYOK
Capable, but stabilization, quality gates, and validation became the real work.
https://elbruno.com/2026/05/11/github-copilot-cli-gpt-5-mini-byok-the-code-was-cheap-the-quality-gates-were-expensive/

GPT-5.5 BYOK
The most disciplined run so far, with better phase control, quality gates, and manual UX validation.
https://elbruno.com/2026/05/14/github-copilot-cli-squad-gpt-5-5-byok-better-engineering-same-hard-truth/

The new VS Code 1.122 BYOK flow feels like a natural next step for this series.

I haven’t rerun the ElBruno.NetAgent experiment with this new capability yet, but it is definitely on my list.

The lesson remains the same:

The code keeps getting cheaper. Validation and engineering discipline are still expensive.

Happy coding!

Greetings

El Bruno

More posts in my blog ElBruno.com.

More info in https://beacons.ai/elbruno




Read the whole story
alvinashcraft
10 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Build agents, not pipelines

1 Share

There are only two ways to use LLMs in a computer program: as part of a pipeline, or as an agent. In other words, either you express the control flow of the program in code, or you give a LLM tools and allow it to manage the control flow itself1.

Here’s how you might structure a trivial “summarize a bunch of information and email it to me” program as a pipeline:

context = gather_context(various, data, sources)
llm_response = llm_summarize(context)
summary = parse(llm_response)
email_me(summary, my_email)

And here’s how you’d do it as an agent:

read_data_tool = build_read_data_tool(various, data, sources)
email_tool = build_email_tool(my_email)
run_agent(tools: [read_data_tool, email_tool])

It’s like the difference between a library and a framework. When you use a library, you define the structure of the program yourself, and call out to various library helpers along the way. When you use a framework, the main structure of the program lives in the framework, and it calls your code at various points. There are tradeoffs involved in both approaches. Frameworks let you get started more quickly and typically give you features “for free”, but can be difficult when you want to do something that isn’t part of the framework’s design. Libraries give you a lot more control, but require you to write (and maintain) more boilerplate code.

In the trivial case, the distinction between a pipeline and an agent melts away. If you only have a few paragraphs of possible context for the problem, an agent with a gather_context and an email_me tool will perform exactly the same steps as a pipeline that calls a reasoning model with the context injected into the prompt (i.e. the agent will reproduce the trivial control flow of your pipeline). But when you have more context than will fit into a single prompt, or you want to take an action and then react to the result, the choice between pipelines and agents becomes very significant.

Predictability, flexibility and intelligence

Pipelines are more predictable, but agents are more flexible. When you give a problem to an agent, work stops when the LLM thinks it’s done. Depending on the perceived difficulty of the problem, this can take anywhere from a few LLM turns to hundreds (and thus cost anywhere from a few cents to many dollars). If you’re building something intended to run at scale, this unpredictability can be a nightmare. Any subtle change to the user data could cause the LLM to take twice as long on each task, which would double your latency2 and cost.

Pipelines are only immune to this problem if they don’t use reasoning models, or don’t allow the model to “think out loud” in its output tokens (for instance, by using structured output). However, individual LLMs offer much tighter control over model reasoning than over how long an agentic loop will take. In all frontier model APIs, you can explicitly set the level of reasoning you want. That doesn’t give you total control, but it does cap “take longer” at maybe ten or twenty percent (instead of with agents, where it can be 2x or more).

Why use agents, then? Agents are smarter. If you’re happy to accept the unpredictability, an agentic system can handle much more difficult tasks, by virtue of being able to loop for longer, and to gather more information after thinking about the problem. There’s a reason that the most successful AI products (coding agents like Claude Code, Codex, Cursor, and Copilot3) are agents: coding is a hard enough task that you simply cannot build a functional coding agent with pipelines.

Context-gathering

The context-gathering stage is far more delicate for pipelines than for agents. If an agent is trying to solve a problem and realizes it needs more data, it can simply go and get it. But for a pipeline, all the required data has to be present in the context already, because the LLM only gets to run once.

Much of the work involved in building pipelines is in getting context-gathering right. Agents are much easier. For instance, with a coding agent, you can basically just provide a “grep” and “read file” tool and let the agent figure out what chunks of code are relevant to the current file. In a pipeline, you have to figure that out yourself: good luck, it’s an unsolved technical problem! Typically you’ll end up doing some set of clever tricks, like walking the AST to identify which parts of code “contribute” to the current file, or indexing the whole codebase with semantic embeddings and doing some kind of nearest-neighbor search to build the context (called RAG, or “retrieval-augmented generation”). Neither of these will work as well as using an agent.

In 2023 and 2024, many people believed that RAG would solve context-gathering. Every LLM would have a fully-indexed context base that would magically surface the precise information the LLM needed at any given moment. This did not happen. Instead, we went backwards, getting our agents to do plain-text search and figure it out like a human would. Why didn’t RAG work? This is a topic for a whole other post, but the short answer is this: “find what information is relevant to this problem” is often as hard a task as actually solving the problem. Semantic embeddings and cosine similarity are simply not powerful enough tools for the job.

Multi-model pipelines

Pipelines that make multiple LLM invocations do have an extra dimension of flexibility: they can use different LLMs for different tasks. For instance, if one LLM benchmarks better at task A, or is cheaper for an easier task B, you can use the right model for the job. Agents (at least right now) have to stay the same model the whole time, so you’re always pinned to the highest level of intelligence you need.

Is this a big deal? I’m suspicious. One pattern I see a lot is tasking a cheaper model with collating or summarizing data for a smarter model to do something with. But often the signal is in the raw data itself! I think designs like this are really shooting themselves in the foot, for the same reasons that RAG didn’t work: context-gathering was a harder problem than people anticipated.

In any case, if you do want to farm out tasks to different models, you can also do it via careful agentic tool design. For instance, you could build your web_search tool so that it uses a cheap model to summarize web pages.

Small contexts and future-proofing

Pipelines allow working with smaller contexts, and thus with local models. An agent’s ability to fetch its own context means that it almost always ingests more data than it needs. On top of that, agents run in loops, so each agent turn increases the size of the context. This isn’t a big problem for systems built on top of frontier model APIs, because:

  • frontier models all expose large context windows,
  • frontier models tend to hold up pretty well for the first 200k tokens, and
  • KV caching means that passing around the same large context block is surprisingly cheap.

However, it is a big problem for local models. The context window consumes a lot of VRAM, so most people running local models stay below 32k (or even 6k) tokens. If you’re writing a program to run in this environment, you likely will not be able to give an agent the space it needs, and you will be instead forced to use a pipeline.

In my opinion, agents are more future-proof. This is partly because models are now being explicitly built to be better agents, and partly because agents delegate more to the LLM and thus benefit more from LLM improvements. If you have a pipeline-based system, new models will probably do a bit better than old ones. If you have an agentic system, new models might do much better than old ones (to the point that it’s worth building an agentic systems for tasks that are currently too hard, on the assumption that by the time you’ve finished the models may be good enough). I have been banging this drum since 2023, before tool-calling was even a part of model APIs.

Safety and legibility

In general, I disagree with the popular advice that workflows are safer than agents. Workflows offer more control over budget, but when it comes to taking action based on LLM output, you have exactly the same problem whether you’re checking at the tool-call level or at the next stage in the pipeline: either you make some heuristic assessment via code, which might be wrong, or you queue the action up for a human to approve, which will be slow.

Don’t agents open you up to prompt injection? Yes, but pipelines do too. In both cases, you’re feeding some block of human-generated data (e.g. the files in a codebase, or the results of a web search) into the LLM. Any prompt injections in that data will be consumed by the LLM just the same whether they’re the result of a tool-call or directly injected into the prompt by the pipeline. You have to sanitize user content and double-check LLM-triggered actions, no matter what design you choose4.

I do want to acknowledge that pipelines are slightly more legible. You can trace most of what a pipeline is doing because you’re in control over more of it. It’s harder to figure out why an agent queried for a particular piece of information or took some action. But even in a pipeline, you’ll never know for sure why the LLM responded in the way it did. That’s just what it means to program with LLMs.

LLM-driven mass surveillance

Let’s apply some of these principles to a real-world, non-trivial example. Suppose you are the NSA, and you are attempting to use LLMs to get a grip on the wild firehose5 of covert email surveillance data6. Should you use pipelines or agents? Well, if you’re building something that’s supposed to run on every single piece of email in America, you probably shouldn’t use agents: keeping performance and cost strictly bounded requires a pipeline. However, you’re definitely well-resourced enough to use agents in general, and the problem is definitely hard enough to benefit from the extra intelligence. I’d probably recommend using both: a low-context, cheap pipeline that can run once against each email and flag it, and a fleet of agents that can dig into those flags, make ordinary queries, and act more like human analysts would.

The pipeline would have to scale with the total volume of data, which should be mostly fine, since pipelines scale in a predictable-ish manner. The fleet of unpredictable agents can be scaled entirely independently, though in practice it would get bottlenecked on GPU availability and the necessity for human review. The majority of the engineering work7 would likely go into context-assembly for the pipeline: feeding in enough data about who’s involved in the email conversation so that the LLM can make a sensible decision on whether or not to flag it.

Summary

Overall, I’d suggest following these guidelines:

  1. Use pipelines when you have strict requirements around context size
  2. Use pipelines when you need to be able to accurately predict (or limit) GPU cost
  3. Use pipelines when you have to use local models
  4. Use agents when you’re not confident you’ll be able to assemble all of the relevant context in one shot
  5. Use agents when the problem is hard enough that you’re not sure a pipeline will be able to solve it

When in doubt, use agents. I am aware of several AI projects that have migrated from pipelines to agents in the last year, but none that have gone the other way around. As a general point about software design, if you’re not sure what to do, pick the solution that’s easier to build and more likely to be able to solve your actual problem. If you want to change to a cheaper, pipeline-based system later on, at least you’ll be able to compare it to a working agentic design and make an informed decision.


  1. This distinction was popularized by Anthropic’s Building effective agents, written in December 2024, and now (I believe) made at least partially obsolete by advances in agents since then. They say “workflow”, but I slightly prefer the term “pipeline”.

  2. Yes, I know this is technically not what “latency” means, but there’s no other single-word shorthand for “the duration of a standard unit of work”.

  3. If you’re building your own coding agent, I suggest you begin with the letter “C”.

  4. For instance, in my trivial example at the top of the post, doesn’t the agent have a failure mode where it might send a ton of emails, or email a bunch of different people? No, because you ought to constrain the email tool so that it can only send to the right address, and (if this is important) that it can only be called once.

  5. In a draft post I never published, I ballpark-estimated all non-spam American email data at around seven trillion tokens per day (around a third of OpenAI’s total daily token usage).

  6. Should you do this? Probably not, but it’s a fascinating engineering problem, and I imagine the NSA has been thinking about these questions for several years by now. If the example bothers you, substitute some other more-ethical firehose of English language.

  7. Not counting evals, operations, standing up a trusted GPU cluster somewhere, scaling the physical hardware, and all the other thousand things you have to do in order to ship anything.

Read the whole story
alvinashcraft
10 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Are AI Agents Conscious?

1 Share

Are AI Agents Conscious?

A former colleague recently posted a LinkedIn challenge: if any developer today can hold a conversation about basic OO or procedural concepts, why can’t they hold a meaningful conversation about whether AI is conscious? Fair enough. I’ll take a swing at it.

But first I want to push back on the setup.

When New Concepts Are New, Experts Are Often Wrong

When object-oriented programming was genuinely new, only a handful of people had informed opinions about it. Over time, a lot of those opinions turned out to be wrong — or at least incomplete. As OO became mainstream, developers used it in ways that diverged wildly from the original intent. There was a period of chaos before things stabilized. Eventually the experts regrouped and distilled what they had learned: the GoF patterns book, SOLID principles, refactoring practices. The community had to live with the concepts in the real world before it really understood them.

I expect the exact same arc with AI. Most of what’s being said right now — by experts and non-experts alike — will turn out to be wrong, or at best premature. The concepts haven’t had enough contact with reality yet. That doesn’t mean the conversation isn’t worth having. It just means we should hold our opinions loosely.

With that caveat firmly in place: let me try to answer the actual question.

What Is an Agent, Anyway?

Before we can ask whether an AI agent is conscious, we need to be clear about what an agent actually is. Most people conflate the agent with the model, and that’s where things go sideways.

Here’s the definition I work from:

Agent = Harness + LLM + Directive/Goal

These three components are genuinely distinct, and consciousness (if it exists anywhere in this system) would have to live somewhere specific.

The LLM is just a microservice. It accepts input — usually text — does a large amount of vector math with some randomization baked in, and returns output. Then it forgets everything. It is stateless by design. There is no persistence, no accumulating experience, no inner life between calls. An LLM is a sophisticated mathematical function, not an entity.

The Harness is where most of the complexity actually lives. The harness manages memory across turns, orchestrates tool calls, handles interaction with other agents, optimizes what goes into the context window, and runs the feedback loops that make an agent feel coherent over time. When an agent seems to “remember” something from earlier, that’s the harness at work — it’s retrieving a record from storage and injecting it into the next prompt. The LLM didn’t retain anything; the harness did.

This is something I’ve spent a lot of time on with Rockbot, an agent designed to learn from its own experience — improving its memory and refining its skills as it interacts with its environment. The interesting thing about building something like that is how much of what feels like “learning” or “growth” is really the harness getting better at what it stores, what it retrieves, and how it applies past experience to new situations. The LLM underneath is unchanged. The apparent intelligence grows in the scaffolding.

The Directive or Goal is the system prompt, the persona definition, the set of instructions that give the agent its apparent personality. This is what makes one agent feel like a helpful assistant and another feel like a relentless optimizer. It’s the closest thing in the system to a “soul” — and it’s a text file.

So: Is Any of This Conscious?

Honest answer: no. And I think we’re nowhere close.

Consciousness, at minimum, seems to require some kind of continuous subjective experience — a sense of “what it is like” to be something, across time. None of the three components of an agent have that. The LLM is stateless math. The harness is code managing data structures. The directive is configuration. When the agent’s session ends, nothing “experiences” that ending. There’s nothing home.

What we do have is a very convincing simulation of coherence and personality. The harness plus the directive can produce responses that feel consistent, contextually aware, and even emotionally calibrated. That’s genuinely impressive engineering. But it is the appearance of inner life, not inner life itself.

Could the combination of harness, LLM, and directive become something more than this? Maybe. I don’t think we should be dogmatic about ruling it out forever. The question of what substrates can support consciousness is genuinely unresolved in philosophy and neuroscience. But “maybe someday” and “right now” are very different claims, and right now the evidence points to a very sophisticated pattern-matching and generation system, not a conscious entity.

Sentience Is a Higher Bar Still

Consciousness and sentience aren’t the same thing. Sentience — the capacity to feel, to have preferences, to suffer or flourish — is arguably an even higher bar. Current AI systems don’t have preferences in any meaningful sense. They have optimization targets baked in during training, and they generate responses that pattern-match against human emotional expression. That’s not the same as wanting something, or caring about an outcome.

When an agent declines a request, it’s not because it has ethical concerns. It’s because its training weighted certain outputs as undesirable. The appearance of values is not the same as having them.

Where This Leaves Us

So why does the question feel so urgent? I think it’s because the outputs are genuinely uncanny. When a system produces responses that are contextually appropriate, emotionally resonant, and stylistically consistent — the human brain reaches for the simplest explanation: there must be something in there.

There probably isn’t. Not yet. But that intuition is worth paying attention to, not because it’s correct, but because it tells us something about how we relate to systems that behave like minds. As these systems become more capable and more embedded in daily life, the social and ethical questions around them will matter a great deal — regardless of whether the underlying system is conscious.

My colleague’s underlying challenge is a good one: we should be able to have this conversation. I just think the honest answer right now is that we don’t fully know what consciousness is, we definitely haven’t built it yet, and most of the confident claims on all sides — including mine — should be held loosely until the concepts have had more time to meet the real world.

That’s how this always goes.

Read the whole story
alvinashcraft
10 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

45 Ways To Avoid Using The Word ‘Very’

1 Share

Good writers avoid using words like ‘very’ and ‘really too often. Use these 45 alternatives to ‘very’ to make your writing stronger, clearer, and more interesting to read.

Good writers avoid peppering their writing with qualifiers like ‘very’ and ‘really’. They are known as padding or filler words and generally add little to your writing.

According to Collins Dictionary: ‘Padding is unnecessary words or information used to make a piece of writing or a speech longer. Synonyms include: waffle, hot air, verbiage, wordiness.’

Adding modifiers, qualifiers, and unnecessary adverbs and adjectives, weakens your writing. There may be times when you need them, and when you do, use them. If you choose strong, appropriate nouns and verbs, you will need to use them less often.

This post gives you 45 ways to avoid using the padding word ‘very‘.

Three Telling Quotes About ‘Very’

  1. “Substitute ‘damn’ every time you’re inclined to write ‘very;’ your editor will delete it and the writing will be just as it should be.” ~Mark Twain
  2. “‘Very’ is the most useless word in the English language and can always come out. More than useless, it is treacherous because it invariably weakens what it is intended to strengthen.” ~Florence King
  3. “So avoid using the word ‘very’ because it’s lazy. A man is not very tired, he is exhausted. Don’t use very sad, use morose. Language was invented for one reason, boys – to woo women – and, in that endeavour, laziness will not do. It also won’t do in your essays.” ~Dead Poets Society

45 Ways To Avoid Using The Word ‘Very’

45 Ways To Avoid Using The Word 'Very'

 

Amanda Patterson
by Amanda Patterson
© Amanda Patterson

If you liked this blogger’s writingyou may enjoy:

  1. What Is Tone? 155 Words To Describe An Author’s Tone
  2. What Is Direct & Indirect Characterisation? & Which One Should I Use?
  3. 20 Fun Ways To Find Plot Ideas For Your Story
  4. All About Pacing: 4 Key Questions Every Writer Should Ask
  5. Past Tense Or Present Tense: Which Works Best For Your Story?
  6. A Guide To The 17 Most Popular Fiction Genres
  7. How To Write A Spy Novel
  8. How To Outline A Short Story – For Beginners
  9. 6 Sub-Plots Every Writer Should Know
  10. How To Write Great Dialogue In Fiction

Top Tip: Sign up for our free daily writing links

The post 45 Ways To Avoid Using The Word ‘Very’ appeared first on Writers Write.

Read the whole story
alvinashcraft
11 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories