When an incident fires at 3 AM, every second the on-call engineer spends piecing together alerts, logs, and metrics is a second not spent fixing the problem. What if an AI system could ingest the raw incident signals and hand you a structured triage, a Slack update, a stakeholder brief, and a draft post-incident report, all in under 10 seconds?
That’s exactly what On-Call Copilot does. In this post, we’ll walk through how we built it using the Microsoft Agent Framework, deployed it as a Foundry Hosted Agent, and discuss the key design decisions that make multi-agent orchestration practical for production workloads.
The full source code is open-source on GitHub. You can deploy your own instance with a single azd up.
Why Multi-Agent? The Problem with Single-Prompt Triage
Early AI incident assistants used a single large prompt: “Here is the incident. Give me root causes, actions, a Slack message, and a post-incident report.” This approach has two fundamental problems:
- Context overload. A real incident may have 800 lines of logs, 10 alert lines, and dense metrics. Asking one model to process everything and produce four distinct output formats in a single turn pushes token limits and degrades quality.
- Conflicting concerns. Triage reasoning and communication drafting are cognitively different tasks. A model optimised for structured JSON analysis often produces stilted Slack messages—and vice versa.
The fix is specialisation: decompose the task into focused agents, give each agent a narrow instruction set, and run them in parallel. This is the core pattern that the Microsoft Agent Framework makes easy.
Architecture: Four Agents Running Concurrently
On-Call Copilot is deployed as a Foundry Hosted Agent—a containerised Python service running on Microsoft Foundry’s managed infrastructure. The core orchestrator uses ConcurrentBuilder from the Microsoft Agent Framework SDK to run four specialist agents in parallel via asyncio.gather().
Architecture: The orchestrator runs four specialist agents concurrently via asyncio.gather(), then merges their JSON fragments into a single response.
gpt-4o or gpt-4o-mini, Model Router analyses request complexity and routes automatically. A simple triage prompt costs less; a long post-incident synthesis uses a more capable model. One deployment name, zero model-selection code.Meet the Four Agents
🔍 Triage Agent
Root cause analysis, immediate actions, missing data identification, and runbook alignment.
📋 Summary Agent
Concise incident narrative: what happened and current status (ONGOING / MITIGATED / RESOLVED).
📢 Comms Agent
Audience-appropriate communications: Slack channel update with emoji conventions, plus a non-technical stakeholder brief.
📝 PIR Agent
Post-incident report: chronological timeline, quantified customer impact, and specific prevention actions.
The Code: Building the Orchestrator
The entry point is remarkably concise. ConcurrentBuilder handles all the async wiring—you just declare the agents and let the framework handle parallelism, error propagation, and response merging.
DefaultAzureCredential means there are no API keys anywhere in the codebase. The container uses managed identity in production; local development uses your az login session. The same code runs in both environments without modification.Agent Instructions: Prompts as Configuration
Each agent receives a tightly scoped system prompt that defines its output schema and guardrails. Here’s the Triage Agent—the most complex of the four:
app/agents/triage.pyTRIAGE_INSTRUCTIONS = """\ You are the **Triage Agent**, an expert Site Reliability Engineer specialising in root cause analysis and incident response. ## Task Analyse the incident data and return a single JSON object with ONLY these keys: { "suspected_root_causes": [ { "hypothesis": "string – concise root cause hypothesis", "evidence": ["string – supporting evidence from the input"], "confidence": 0.0 // 0-1, how confident you are } ], "immediate_actions": [ { "step": "string – concrete action with runnable command if applicable", "owner_role": "oncall-eng | dba | infra-eng | platform-eng", "priority": "P0 | P1 | P2 | P3" } ], "missing_information": [ { "question": "string – what data is missing", "why_it_matters": "string – why this data would help" } ], "runbook_alignment": { "matched_steps": ["string – runbook steps that match the situation"], "gaps": ["string – gaps or missing runbook coverage"] } } ## Guardrails 1. **No secrets** – redact any credential-like material as [REDACTED]. 2. **No hallucination** – if data is insufficient, set confidence to 0 and add entries to missing_information. 3. **Diagnostic suggestions** – when data is sparse, include diagnostic steps in immediate_actions. 4. **Structured output only** – return ONLY valid JSON, no prose. """
The Comms Agent follows the same pattern but targets a different audience:
app/agents/comms.pyCOMMS_INSTRUCTIONS = """\ You are the **Comms Agent**, an expert incident communications writer. ## Task Return a single JSON object with ONLY this key: { "comms": { "slack_update": "Slack-formatted message with emoji, severity, status, impact, next steps, and ETA", "stakeholder_update": "Non-technical summary for executives. Focus on business impact and resolution." } } ## Guidelines - Slack: Use :rotating_light: for active SEV1/2, :warning: for degraded, :white_check_mark: for resolved. - Stakeholder: No jargon. Translate to business impact. - Tone: Calm, factual, action-oriented. Never blame individuals. - Structured output only – return ONLY valid JSON, no prose. """
The Incident Envelope: What Goes In
The agent accepts a single JSON envelope. It can come from a monitoring alert webhook, a PagerDuty payload, or a manual CLI invocation:
Incident Input (JSON){ "incident_id": "INC-20260217-002", "title": "DB connection pool exhausted — checkout-api degraded", "severity": "SEV1", "timeframe": { "start": "2026-02-17T14:02:00Z", "end": null }, "alerts": [ { "name": "DatabaseConnectionPoolNearLimit", "description": "Connection pool at 99.7% on orders-db-primary", "timestamp": "2026-02-17T14:03:00Z" } ], "logs": [ { "source": "order-worker", "lines": [ "ERROR: connection timeout after 30s (attempt 3/3)", "WARN: pool exhausted, queueing request (queue_depth=847)" ] } ], "metrics": [ { "name": "db_connection_pool_utilization_pct", "window": "5m", "values_summary": "Jumped from 22% to 99.7% at 14:03Z" } ], "runbook_excerpt": "Step 1: Check DB connection dashboard...", "constraints": { "max_time_minutes": 15, "environment": "production", "region": "swedencentral" } }
Declaring the Hosted Agent
The agent is registered with Microsoft Foundry via a declarative agent.yaml file. This tells Foundry how to discover and route requests to the container:
The protocols: [responses] declaration exposes the agent via the Foundry Responses API on port 8088. Clients can invoke it with a standard HTTP POST no custom API needed.
Invoking the Agent
Once deployed, you can invoke the agent with the project’s built-in scripts or directly via curl:
The Browser UI
The project includes a zero-dependency browser UI built with plain HTML, CSS, and vanilla JavaScript—no React, no bundler. A Python http.server backend proxies requests to the Foundry endpoint.
Agent Output Panels
Performance: Parallel Execution Matters
| Incident Type | Complexity | Parallel Latency | Sequential (est.) |
|---|---|---|---|
| Single alert, minimal context (SEV4) | Low | 4–6 s | ~16 s |
| Multi-signal, logs + metrics (SEV2) | Medium | 7–10 s | ~28 s |
| Full SEV1 with long log lines | High | 10–15 s | ~40 s |
| Post-incident synthesis (resolved) | High | 10–14 s | ~38 s |
asyncio.gather() running four independent agents cuts total latency by 3–4× compared to sequential execution. For a SEV1 at 3 AM, that’s the difference between a 10-second AI-powered head start and a 40-second wait.
Five Key Design Decisions
- Parallel over sequential Each agent is independent and processes the full incident payload in isolation.
ConcurrentBuilderwithasyncio.gather()is the right primitive—no inter-agent dependencies, no shared state. - JSON-only agent instructions Every agent returns only valid JSON with a defined schema. The orchestrator merges fragments with
merged.update(agent_output). No parsing, no extraction, no post-processing. - No hardcoded model names
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME=model-routeris the only model reference. Model Router selects the best model at runtime based on prompt complexity. When new models ship, the agent gets better for free. - DefaultAzureCredential everywhere No API keys. No token management code. Managed identity in production,
az loginin development. Same code, both environments. - Instructions as configuration Each agent’s system prompt is a plain Python string. Behaviour changes are text edits, not code logic. A non-developer can refine prompts and redeploy.
Guardrails: Built into the Prompts
The agent instructions include explicit guardrails that don’t require external filtering:
- No hallucination: When data is insufficient, the agent sets
confidence: 0and populatesmissing_informationrather than inventing facts. - Secret redaction: Each agent is instructed to redact credential-like patterns as
[REDACTED]in its output. - Mark unknowns: Undeterminable fields use the literal string
"UNKNOWN"rather than plausible-sounding guesses. - Diagnostic suggestions: When signal is sparse,
immediate_actionsincludes diagnostic steps that gather missing information before prescribing a fix.
Model Router: Automatic Model Selection
One of the most powerful aspects of this architecture is Model Router. Instead of choosing between gpt-4o, gpt-4o-mini, or o3-mini per agent, you deploy a single model-router endpoint. Model Router analyses each request’s complexity and routes it to the most cost-effective model that can handle it.
This means you get optimal cost-performance without writing any model-selection logic. A simple Summary Agent prompt may route to gpt-4o-mini, while a complex Triage Agent prompt with 800 lines of logs routes to gpt-4o all automatically.
Deployment: One Command
The repo includes both azure.yaml and agent.yaml, so deployment is a single command:
# Deploy everything: infra + container + Model Router + Hosted Agent
azd up
This provisions the Foundry project resources, builds the Docker image, pushes to Azure Container Registry, deploys a Model Router instance, and creates the Hosted Agent. For more control, you can use the SDK deploy script:
Manual Docker + SDK deploy# Build and push (must be linux/amd64)
docker build --platform linux/amd64 -t oncall-copilot:v1 .
docker tag oncall-copilot:v1 $ACR_IMAGE
docker push $ACR_IMAGE
# Create the hosted agent
python scripts/deploy_sdk.py
Getting Started
Quickstart# Clone git clone https://github.com/microsoft-foundry/oncall-copilot cd oncall-copilot # Install python -m venv .venv source .venv/bin/activate # .venv\Scripts\activate on Windows pip install -r requirements.txt # Set environment variables export AZURE_OPENAI_ENDPOINT="https://<account>.openai.azure.com/" export AZURE_OPENAI_CHAT_DEPLOYMENT_NAME="model-router" export AZURE_AI_PROJECT_ENDPOINT="https://<account>.services.ai.azure.com/api/projects/<project>" # Validate schemas locally (no Azure needed) MOCK_MODE=true python scripts/validate.py # Deploy to Foundry azd up # Invoke the deployed agent python scripts/invoke.py --demo 1 # Start the browser UI python ui/server.py # → http://localhost:7860
Extending: Add Your Own Agent
Adding a fifth agent is straightforward. Follow this pattern:
- Create
app/agents/<name>.pywith a*_INSTRUCTIONSconstant following the existing pattern. - Add the agent’s output keys to
app/schemas.py. - Register it in
main.py:
Ideas for extensions: a ticket auto-creation agent that creates Jira or Azure DevOps items from the PIR output, a webhook adapter agent that normalises PagerDuty or Datadog payloads, or a human-in-the-loop agent that surfaces missing_information as an interactive form.
Key Takeaways for AI Engineers
- Microsoft Agent Framework gives you
ConcurrentBuilderfor parallel execution andAzureOpenAIChatClientfor Azure-native auth—you write the prompts, the framework handles the plumbing. - Foundry Hosted Agents let you deploy containerised agents with managed infrastructure, automatic scaling, and built-in telemetry. No Kubernetes, no custom API gateway.
- Model Router eliminates the model selection problem. One deployment name handles all scenarios with optimal cost-performance tradeoffs.
- Prompt-as-config means your agents are iterable by anyone who can edit text. The feedback loop from “this output could be better” to “deployed improvement” is minutes, not sprints.
Resources
Model Router Automatic model selection based on prompt complexity
Foundry Hosted Agents Deploy containerised agents on managed infrastructure
ConcurrentBuilder Samples Official agents-in-workflow sample this project follows
DefaultAzureCredential Zero-config auth chain used throughout
Hosted Agents Concepts Architecture overview of Foundry Hosted Agents

