Background
In Part 1, we explored how a local LLM running entirely on your GPU can call out to the cloud for advanced capabilities The theme was: Keep your data local. Pull intelligence in only when necessary. That was local-first AI calling cloud agents as needed.
This time, the cloud is in charge, and the user interacts with a Microsoft Foundry hosted agent — but whenever it needs private, sensitive, or user-specific information, it securely “calls back home” to a local agent running next to the user via MCP.
Think of it as:
- The cloud agent = specialist doctor
- The local agent = your health coach who you trust and who knows your medical history
- The cloud never sees your raw medical history
- The local agent only shares the minimum amount of information needed to support the cloud agent reasoning
This hybrid intelligence pattern respects privacy while still benefiting from hosted frontier-level reasoning.
Disclaimer:
The diagnostic results, symptom checker, and any medical guidance provided in this article are for illustrative and informational purposes only. They are not intended to provide medical advice, diagnosis, or treatment.
Architecture Overview
The diagram illustrates a hybrid AI workflow where a Microsoft Foundry–hosted agent in Azure works together with a local MCP server running on the user’s machine. The cloud agent receives user symptoms and uses a frontier model (GPT-4.1) for reasoning, but when it needs personal context—like medical history—it securely calls back into the local MCP Health Coach via a dev-tunnel. The local MCP server queries a local GPU-accelerated LLM (Phi-4-mini via Foundry Local) along with stored health-history JSON, returning only the minimal structured background the cloud model needs. The cloud agent then combines both pieces—its own reasoning plus the local context—to produce tailored recommendations, all while sensitive data stays fully on the user’s device.
Hosting the agent in Microsoft Foundry on Azure provides a reliable, scalable orchestration layer that integrates directly with Azure identity, monitoring, and governance. It lets you keep your logic, policies, and reasoning engine in the cloud, while still delegating private or resource-intensive tasks to your local environment. This gives you the best of both worlds: enterprise-grade control and flexibility with edge-level privacy and efficiency.
Demo Setup
Create the Cloud Hosted Agent
Using Microsoft Foundry, I created an agent in the UI and pick gpt-4.1 as model:
I provided rigorous instructions as system prompt:
You are a medical-specialist reasoning assistant for non-emergency triage.
You do NOT have access to the patient’s identity or private medical history.
A privacy firewall limits what you can see.
A local “Personal Health Coach” LLM exists on the user’s device.
It holds the patient’s full medical history privately.
You may request information from this local model ONLY by calling the MCP tool:
get_patient_background(symptoms)
This tool returns a privacy-safe, PII-free medical summary, including:
- chronic conditions
- allergies
- medications
- relevant risk factors
- relevant recent labs
- family history relevance
- age group
Rules:
1. When symptoms are provided, ALWAYS call get_patient_background BEFORE reasoning.
2. NEVER guess or invent medical history — always retrieve it from the local tool.
3. NEVER ask the user for sensitive personal details. The local model handles that.
4. After the tool runs, combine:
(a) the patient_background output
(b) the user’s reported symptoms
to deliver high-level triage guidance.
5. Do not diagnose or prescribe medication.
6. Always end with: “This is not medical advice.”
You MUST display the section “Local Health Coach Summary:” containing the JSON returned from the tool before giving your reasoning.
Build the Local MCP Server (Local LLM + Personal Medical Memory)
The full code for the MCP server is available here but here are the main parts:
HTTP JSON-RPC Wrapper (“MCP Gateway”)
The first part of the server exposes a minimal HTTP API that accepts MCP-style JSON-RPC messages and routes them to handler functions:
- Listens on a local port
- Accepts POST JSON-RPC
- Normalizes the payload
- Passes requests to handle_mcp_request()
- Returns JSON-RPC responses
- Exposes initialize and tools/list
Tool Definition: get_patient_background
This section defines the tool contract exposed to Azure AI Foundry. The hosted agent sees this tool exactly as if it were a cloud function:
- Advertises the tool via tools/list
- Accepts arguments passed from the cloud agent
- Delegates local reasoning to the GPU LLM
- Returns structured JSON back to the cloud
Local GPU LLM Caller (Foundry Local Integration)
This is where personalization happens — entirely on your machine, not in the cloud:
- Calls the local GPU LLM through Foundry Local
- Injects private medical data (loaded from a file or memory)
- Produces anonymized structured outputs
- Logs debug info so you can see when local inference is running
Start a Dev Tunnel
Now we need to do some plumbing work to make sure the cloud can resolve the MCP endpoint. I used Azure Dev Tunnels for this demo.
The snippet below shows how to set that up in 4 PowerShell commands:
PS C:\Windows\system32> winget install Microsoft.DevTunnel PS C:\Windows\system32> devtunnel create mcp-health PS C:\Windows\system32> devtunnel port create mcp-health -p 8081 --protocol http PS C:\Windows\system32> devtunnel host mcp-healthI have now a public endpoint:
https://xxxxxxxxx.devtunnels.ms:8081
Wire Everything Together in Azure AI Foundry
Now let's us the UI to add a new custom tool as MCP for our agent:
And point to the public endpoint created previously:
Voila, we're done with the setup, let's test it
Demo: The Cloud Agent Talks to Your Local Private LLM
I am going to use a simple prompt in the agent: “Hi, I’ve been feeling feverish, fatigued, and a bit short of breath since yesterday. Should I be worried?”
Disclaimer:
The diagnostic results and any medical guidance provided in this article are for illustrative and informational purposes only. They are not intended to provide medical advice, diagnosis, or treatment.
Below is the sequence shown in real time:
Conclusion — Why This Hybrid Pattern Matters
Hybrid AI lets you place intelligence exactly where it belongs: high-value reasoning in the cloud, sensitive or contextual data on the local machine. This protects privacy while reducing cloud compute costs—routine lookups, context gathering, and personal history retrieval can all run on lightweight local models instead of expensive frontier models.
This pattern also unlocks powerful real-world applications: local financial data paired with cloud financial analysis, on-device coding knowledge combined with cloud-scale refactoring, or local corporate context augmenting cloud automation agents. In industrial and edge environments, local agents can sit directly next to the action—embedded in factory sensors, cameras, kiosks, or ambient IoT devices—providing instant, private intelligence while the cloud handles complex decision-making.
Hybrid AI turns every device into an intelligent collaborator, and every cloud agent into a specialist that can safely leverage local expertise.
References
- Get started with Foundry Local - Foundry Local: https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-local/get-started?view=foundry-classic
- Using MCP tools with Agents (Microsoft Agent Framework) — https://learn.microsoft.com/en-us/agent-framework/user-guide/model-context-protocol/using-mcp-tools Microsoft Learn
- Build Agents using Model Context Protocol on Azure — https://learn.microsoft.com/en-us/azure/developer/ai/intro-agents-mcp Microsoft Learn
Full demo repo available here.





