Introduction
AI-powered coding assistants have transformed how developers write and review code. But most of these tools require sending your source code to cloud services, a non-starter for teams working with proprietary codebases, air-gapped environments, or strict compliance requirements. What if you could have an intelligent coding agent that finds bugs, fixes them, runs your tests, and produces PR-ready summaries, all without a single byte leaving your machine?
The Local Repo Patch Agent demonstrates exactly this. By combining the GitHub Copilot SDK for agent orchestration with Foundry Local for on-device inference, this project creates a fully autonomous coding workflow that operates entirely on your hardware. The agent scans your repository, identifies bugs and code smells, applies fixes, verifies them through your test suite, and generates a comprehensive summary of all changes, completely offline and secure.
This article explores the architecture behind this integration, walks through the key implementation patterns, and shows you how to run the agent yourself. Whether you're building internal developer tools, exploring agentic workflows, or simply curious about what's possible when you combine GitHub's SDK with local AI, this project provides a production-ready foundation to build upon.
Why Local AI Matters for Code Analysis
Cloud-based AI coding tools have proven their valueβGitHub Copilot has fundamentally changed how millions of developers work. But certain scenarios demand local-first approaches where code never leaves the organisation's network.
Consider these real-world constraints that teams face daily:
- Regulatory compliance: Financial services, healthcare, and government projects often prohibit sending source code to external services, even for analysis
- Intellectual property protection: Proprietary algorithms and trade secrets can't risk exposure through cloud API calls
- Air-gapped environments: Secure facilities and classified projects have no internet connectivity whatsoever
- Latency requirements: Real-time code analysis in IDEs benefits from zero network roundtrip
- Cost control: High-volume code analysis without per-token API charges
The Local Repo Patch Agent addresses all these scenarios. By running the AI model on-device through Foundry Local and using the GitHub Copilot SDK for orchestration, you get the intelligence of agentic coding workflows with complete data sovereignty. The architecture proves that "local-first" doesn't mean "capability-limited."
The Technology Stack
Two core technologies make this architecture possible, working together through a clever integration called BYOK (Bring Your Own Key). Understanding how they complement each other reveals the elegance of the design.
GitHub Copilot SDK
The GitHub Copilot SDK provides the agent runtime, the scaffolding that handles planning, tool invocation, streaming responses, and the orchestration loop that makes agentic behaviour possible. Rather than managing raw LLM API calls, developers define tools (functions the agent can call) and system prompts, and the SDK handles everything else.
Key capabilities the SDK brings to this project:
- Session management: Maintains conversation context across multiple agent interactions
- Tool orchestration: Automatically invokes defined tools when the model requests them
- Streaming support: Real-time response streaming for responsive user interfaces
- Provider abstraction: Works with any OpenAI-compatible API through the BYOK configuration
Foundry Local
Foundry Local brings Azure AI Foundry's model catalog to your local machine. It automatically selects the best available hardware accelerationβGPU, NPU, or CP, and exposes models through an OpenAI-compatible API on localhost. Models run entirely on-device with no telemetry or data transmission.
For this project, Foundry Local provides:
- On-device inference: All AI processing happens locally, ensuring complete data privacy
- Dynamic port allocation: The SDK auto-detects the Foundry Local endpoint, eliminating configuration hassle
- Model flexibility: Swap between models like
qwen2.5-coder-1.5b, phi-3-mini, or larger variants based on your hardware
- OpenAI API compatibility: Standard API format means the GitHub Copilot SDK works without modification
The BYOK Integration
The entire connection between the GitHub Copilot SDK and Foundry Local happens through a single configuration object. This BYOK (Bring Your Own Key) pattern tells the SDK to route all inference requests to your local model instead of cloud services:
const session = await client.createSession({
model: modelId,
provider: {
type: "openai", // Foundry Local speaks OpenAI's API format
baseUrl: proxyBaseUrl, // Streaming proxy β Foundry Local
apiKey: manager.apiKey,
wireApi: "completions", // Chat Completions API
},
streaming: true,
tools: [ /* your defined tools */ ],
});
This configuration is the key insight: with one config object, you've redirected an entire agent framework to run on local hardware. No code changes to the SDK, no special adaptersβjust standard OpenAI-compatible API communication.
Architecture Overview
The Local Repo Patch Agent implements a layered architecture where each component has a clear responsibility. Understanding this flow helps when extending or debugging the system.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Your Terminal / Web UI β
β npm run demo / npm run ui β
ββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββ
β src/agent.ts (this project) β
β β
β ββββββββββββββββββββββββββββββ ββββββββββββββββββββ β
β β GitHub Copilot SDK β β Agent Tools β β
β β (CopilotClient) β β list_files β β
β β BYOK β Foundry β β read_file β β
β ββββββββββ¬ββββββββββββββββββββ β write_file β β
β β β run_command β β
ββββββββββββββΌββββββββββββββββββββββββ΄βββββββββββββββββββ β
β β
β JSON-RPC β
ββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ
β GitHub Copilot CLI (server mode) β
β Agent orchestration layer β
ββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββ
β POST /v1/chat/completions (BYOK)
ββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ
β Foundry Local (on-device inference) β
β Model: qwen2.5-coder-1.5b via ONNX Runtime β
β Endpoint: auto-detected (dynamic port) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The data flow works as follows: your terminal or web browser sends a request to the agent application. The agent uses the GitHub Copilot SDK to manage the conversation, which communicates with the Copilot CLI running in server mode. The CLI, configured with BYOK, sends inference requests to Foundry Local running on localhost. Responses flow back up the same path, with tool invocations happening in the agent.ts layer.
The Four-Phase Workflow
The agent operates through a structured four-phase loop, each phase building on the previous one's output. This decomposition transforms what would be an overwhelming single prompt into manageable, verifiable steps.
Phase 1: PLAN
The planning phase scans the repository and produces a numbered fix plan. The agent reads every source and test file, identifies potential issues, and outputs specific tasks to address:
// Phase 1 system prompt excerpt
const planPrompt = `
You are a code analysis agent. Scan the repository and identify:
1. Bugs that cause test failures
2. Code smells and duplication
3. Style inconsistencies
Output a numbered list of fixes, ordered by priority.
Each item should specify: file path, line numbers, issue type, and proposed fix.
`;
The tools available during this phase are list_files and read_fileβthe agent explores the codebase without modifying anything. This read-only constraint prevents accidental changes before the plan is established.
Phase 2: EDIT
With a plan in hand, the edit phase applies each fix by rewriting affected files. The agent receives the plan from Phase 1 and systematically addresses each item:
// Phase 2 adds the write_file tool
const editTools = [
{
name: "write_file",
description: "Write content to a file, creating or overwriting it",
parameters: {
type: "object",
properties: {
path: { type: "string", description: "File path relative to repo root" },
content: { type: "string", description: "Complete file contents" }
},
required: ["path", "content"]
}
}
];
The write_file tool is sandboxed to the demo-repo directory, path traversal attempts are blocked, preventing the agent from modifying files outside the designated workspace.
Phase 3: VERIFY
After making changes, the verification phase runs the project's test suite to confirm fixes work correctly. If tests fail, the agent attempts to diagnose and repair the issue:
// Phase 3 adds run_command with an allowlist
const allowedCommands = ["npm test", "npm run lint", "npm run build"];
const runCommandTool = {
name: "run_command",
description: "Execute a shell command (npm test, npm run lint, npm run build only)",
execute: async (command: string) => {
if (!allowedCommands.includes(command)) {
throw new Error(`Command not allowed: ${command}`);
}
// Execute and return stdout/stderr
}
};
The command allowlist is a critical security measure. The agent can only run explicitly permitted commandsβno arbitrary shell execution, no data exfiltration, no system modification.
Phase 4: SUMMARY
The final phase produces a PR-style Markdown report documenting all changes. This summary includes what was changed, why each change was necessary, test results, and recommended follow-up actions:
## Summary of Changes
### Bug Fix: calculateInterest() in account.js
- **Issue**: Division instead of multiplication caused incorrect interest calculations
- **Fix**: Changed `principal / annualRate` to `principal * (annualRate / 100)`
- **Tests**: 3 previously failing tests now pass
### Refactor: Duplicate formatCurrency() removed
- **Issue**: Identical function existed in account.js and transaction.js
- **Fix**: Both files now import from utils.js
- **Impact**: Reduced code duplication, single source of truth
### Test Results
- **Before**: 6/9 passing
- **After**: 9/9 passing
This structured output makes code review straightforward, reviewers can quickly understand what changed and why without digging through diffs.
The Demo Repository: Intentional Bugs
The project includes a demo-repo directory containing a small banking utility library with intentional problems for the agent to find and fix. This provides a controlled environment to demonstrate the agent's capabilities.
Bug 1: Calculation Error in calculateInterest()
The account.js file contains a calculation bug that causes test failures:
// BUG: should be principal * (annualRate / 100)
function calculateInterest(principal, annualRate) {
return principal / annualRate; // Division instead of multiplication!
}
This bug causes 3 of 9 tests to fail. The agent identifies it during the PLAN phase by correlating test failures with the implementation, then fixes it during EDIT.
Bug 2: Code Duplication
The formatCurrency() function is copy-pasted in both account.js and transaction.js, even though a canonical version exists in utils.js. This duplication creates maintenance burden and potential inconsistency:
// In account.js (duplicated)
function formatCurrency(amount) {
return '$' + amount.toFixed(2);
}
// In transaction.js (also duplicated)
function formatCurrency(amount) {
return '$' + amount.toFixed(2);
}
// In utils.js (canonical, but unused)
export function formatCurrency(amount) {
return '$' + amount.toFixed(2);
}
The agent identifies this duplication during planning and refactors both files to import from utils.js, eliminating redundancy.
Handling Foundry Local Streaming Quirks
One technical challenge the project solves is Foundry Local's behaviour with streaming requests. As of version 0.5, Foundry Local can hang on stream: true requests. The project includes a streaming proxy that works around this limitation transparently.
The Streaming Proxy
The streaming-proxy.ts file implements a lightweight HTTP proxy that converts streaming requests to non-streaming, then re-encodes the single response as SSE (Server-Sent Events) chunksβthe format the OpenAI SDK expects:
// streaming-proxy.ts simplified logic
async function handleRequest(req: Request): Promise {
const body = await req.json();
// If it's a streaming chat completion, convert to non-streaming
if (body.stream === true && req.url.includes('/chat/completions')) {
body.stream = false;
const response = await fetch(foundryEndpoint, {
method: 'POST',
body: JSON.stringify(body),
headers: { 'Content-Type': 'application/json' }
});
const data = await response.json();
// Re-encode as SSE stream for the SDK
return createSSEResponse(data);
}
// Non-streaming and non-chat requests pass through unchanged
return fetch(foundryEndpoint, req);
}
This proxy runs on port 8765 by default and sits between the GitHub Copilot SDK and Foundry Local. The SDK thinks it's talking to a streaming-capable endpoint, while the actual inference happens non-streaming. The conversion is transparent, no changes needed to SDK configuration.
Text-Based Tool Call Detection
Small on-device models like qwen2.5-coder-1.5b sometimes output tool calls as JSON text rather than using OpenAI-style function calling. The SDK won't fire tool.execution_start events for these text-based calls, so the agent includes a regex-based detector:
// Pattern to detect tool calls in model output
const toolCallPattern = /\{[\s\S]*"name":\s*"(list_files|read_file|write_file|run_command)"[\s\S]*\}/;
function detectToolCall(text: string): ToolCall | null {
const match = text.match(toolCallPattern);
if (match) {
try {
return JSON.parse(match[0]);
} catch {
return null;
}
}
return null;
}
This fallback ensures tool calls are captured regardless of whether the model uses native function calling or text output, keeping the dashboard's tool call counter and CLI log accurate.
Security Considerations
Running an AI agent that can read and write files and execute commands requires careful security design. The Local Repo Patch Agent implements multiple layers of protection:
- 100% local execution: No code, prompts, or responses leave your machineβcomplete data sovereignty
- Command allowlist: The agent can only run
npm test, npm run lint, and npm run buildβno arbitrary shell commands
- Path sandboxing: File tools are locked to the
demo-repo/ directory; path traversal attempts like ../../../etc/passwd are rejected
- File size limits: The
read_file tool rejects files over 256 KB, preventing memory exhaustion attacks
- Recursion limits: Directory listing caps at 20 levels deep, preventing infinite traversal
These constraints demonstrate responsible AI agent design. The agent has enough capability to do useful work but not enough to cause harm. When extending this project for your own use cases, maintain similar principles, grant minimum necessary permissions, validate all inputs, and fail closed on unexpected conditions.
Running the Agent
Getting the Local Repo Patch Agent running on your machine takes about five minutes. The project includes setup scripts that handle prerequisites automatically.
Prerequisites
Before running the setup, ensure you have:
- Node.js 18 or higher: Download from nodejs.org (LTS version recommended)
- Foundry Local: Install via
winget install Microsoft.FoundryLocal (Windows) or brew install foundrylocal (macOS)
- GitHub Copilot CLI: Follow the GitHub Copilot CLI install guide
Verify your installations:
node --version # Should print v18.x.x or higher
foundry --version
copilot --version
One-Command Setup
The easiest path uses the provided setup scripts that install dependencies, start Foundry Local, and download the AI model:
# Clone the repository
git clone https://github.com/leestott/copilotsdk_foundrylocal.git
cd copilotsdk_foundrylocal
# Windows (PowerShell)
.\setup.ps1
# macOS / Linux
chmod +x setup.sh
./setup.sh
When setup completes, you'll see:
βββ Setup complete! βββ
You're ready to go. Run one of these commands:
npm run demo CLI agent (terminal output)
npm run ui Web dashboard (http://localhost:3000)
Manual Setup
If you prefer step-by-step control:
# Install npm packages
npm install
cd demo-repo && npm install --ignore-scripts && cd ..
# Start Foundry Local and download the model
foundry service start
foundry model run qwen2.5-coder-1.5b
# Copy environment configuration
cp .env.example .env
# Run the agent
npm run demo
The first model download takes a few minutes depending on your connection. After that, the model runs from cache with no internet required.
Using the Web Dashboard
For a visual experience with real-time streaming, launch the web UI:
npm run ui
Open http://localhost:3000 in your browser. The dashboard provides:
- Phase progress sidebar: Visual indication of which phase is running, completed, or errored
- Live streaming output: Model responses appear in real-time via WebSocket
- Tool call log: Every tool invocation logged with phase context
- Phase timing table: Performance metrics showing how long each phase took
- Environment info: Current model, endpoint, and repository path at a glance
Configuration Options
The agent supports several environment variables for customisation. Edit the .env file or set them directly:
| Variable | Default | Description |
|---|
| FOUNDRY_LOCAL_ENDPOINT | auto-detected | Override the Foundry Local API endpoint |
| FOUNDRY_LOCAL_API_KEY | auto-detected | Override the API key |
| FOUNDRY_MODEL | qwen2.5-coder-1.5b | Which model to use from the Foundry Local catalog |
| FOUNDRY_TIMEOUT_MS | 180000 (3 min) | How long each agent phase can run before timing out |
| FOUNDRY_NO_PROXY | β | Set to 1 to disable the streaming proxy |
| PORT | 3000 | Port for the web dashboard |
Using Different Models
To try a different model from the Foundry Local catalog:
# Use phi-3-mini instead
FOUNDRY_MODEL=phi-3-mini npm run demo
# Use a larger model for higher quality (requires more RAM/VRAM)
FOUNDRY_MODEL=qwen2.5-7b npm run demo
Adjusting for Slower Hardware
If you're running on CPU-only or limited hardware, increase the timeout to give the model more time per phase:
# 5 minutes per phase instead of 3
FOUNDRY_TIMEOUT_MS=300000 npm run demo
Troubleshooting Common Issues
When things don't work as expected, these solutions address the most common problems:
| Problem | Solution |
|---|
foundry: command not found | Install Foundry Localβsee Prerequisites section |
copilot: command not found | Install GitHub Copilot CLIβsee Prerequisites section |
| Agent times out on every phase | Increase FOUNDRY_TIMEOUT_MS (e.g., 300000 for 5 min). CPU-only machines are slower. |
| Port 3000 already in use | Set PORT=3001 npm run ui |
| Model download is slow | First download can take 5-10 min. Subsequent runs use the cache. |
Cannot find module errors | Run npm install again, then cd demo-repo && npm install --ignore-scripts |
| Tests still fail after agent runs | The agent edits files in demo-repo/. Reset with git checkout demo-repo/ and run again. |
| PowerShell blocks setup.ps1 | Run Set-ExecutionPolicy -Scope Process Bypass first, then .\setup.ps1 |
Diagnostic Test Scripts
The src/tests/ folder contains standalone scripts for debugging SDK and Foundry Local integration issues. These are invaluable when things go wrong:
# Debug-level SDK event logging
npx tsx src/tests/test-debug.ts
# Test non-streaming inference (bypasses streaming proxy)
npx tsx src/tests/test-nostream.ts
# Raw fetch to Foundry Local (bypasses SDK entirely)
npx tsx src/tests/test-stream-direct.ts
# Start the traffic-inspection proxy
npx tsx src/tests/test-proxy.ts
These scripts isolate different layers of the stack, helping identify whether issues lie in Foundry Local, the streaming proxy, the SDK, or your application code.
Key Takeaways
- BYOK enables local-first AI: A single configuration object redirects the entire GitHub Copilot SDK to use on-device inference through Foundry Local
- Phased workflows improve reliability: Breaking complex tasks into PLAN β EDIT β VERIFY β SUMMARY phases makes agent behaviour predictable and debuggable
- Security requires intentional design: Allowlists, sandboxing, and size limits constrain agent capabilities to safe operations
- Local models have quirks: The streaming proxy and text-based tool detection demonstrate how to work around on-device model limitations
- Real-time feedback matters: The web dashboard with WebSocket streaming makes agent progress visible and builds trust in the system
- The architecture is extensible: Add new tools, change models, or modify phases to adapt the agent to your specific needs
Conclusion and Next Steps
The Local Repo Patch Agent proves that sophisticated agentic coding workflows don't require cloud infrastructure. By combining the GitHub Copilot SDK's orchestration capabilities with Foundry Local's on-device inference, you get intelligent code analysis that respects data sovereignty completely.
The patterns demonstrated here, BYOK integration, phased execution, security sandboxing, and streaming workarounds, transfer directly to production systems. Consider extending this foundation with:
- Custom tool sets: Add database queries, API calls to internal services, or integration with your CI/CD pipeline
- Multiple repository support: Scan and fix issues across an entire codebase or monorepo
- Different model sizes: Use smaller models for quick scans, larger ones for complex refactoring
- Human-in-the-loop approval: Add review steps before applying fixes to production code
- Integration with Git workflows: Automatically create branches and PRs from agent-generated fixes
Clone the repository, run through the demo, and start building your own local-first AI coding tools. The future of developer AI isn't just cloudβit's intelligent systems that run wherever your code lives.
Resources