Key capabilities in this preview
Foundry Local exposes standard REST and OpenAI‑compatible APIs, enabling IT and AI teams to deploy and operate local AI workloads using familiar, cloud‑aligned patterns across edge and on‑prem environments.
In this public preview, we deliver the following capabilities:
- Azure Arc extension for Foundry Local
Deploy and manage Foundry Local via an Azure Arc extension, enabling consistent install, configure, update, and governance workflows across Arc‑enabled Kubernetes clusters, in addition to Helm‑based installation.
- Built‑in generative models from the Foundry Local catalog
Deploy pre‑built generative models directly from the Foundry Local model catalog using a simple control‑plane API request.
- Bring‑your‑own predictive models (ONNX) from OCI registries
Deploy custom predictive models (such as ONNX models) securely pulled from customer‑managed OCI registries and run locally.
- REST and OpenAI‑compatible inference endpoints
Consume both generative and predictive models through standard HTTP endpoints.
- Multi‑model orchestration for agent‑style applications
Enable applications that coordinate multiple local models—for example, generative models guiding calls to predictive models—within a single Kubernetes cluster.
Running Foundry Local on Azure Local single-node gives you:
- A validated, supported hardware foundation for running AI inference at the edge, from compact 1U nodes on the factory floor to rugged form factors in remote sites, using hardware from the Azure Local catalog
- AKS on Azure Local as the deployment target, so Foundry Local runs as a containerized workload managed by Kubernetes - the same operational model you use for any other workload on the cluster
- GPU access through the NVIDIA device plugin on AKS, giving Foundry Local's ONNX Runtime direct access to the node's discrete GPU without requiring Windows or host-OS-level configuration
Two installation Options for single node deployment:
The preview includes the Foundry Local Azure Arc extension, providing a consistent installation, deployment, and lifecycle management experience through Azure Arc, while also supporting Helm‑based installation
Choose one of two installation paths:
Option 1 - Arc-enabled Kubernetes Extension
Recommended when: your organization manages multiple Azure Local instances and wants Microsoft to handle the deployment lifecycle — version updates, configuration drift detection, health monitoring — through the Azure portal without the team needing to manage Helm releases manually.
Arc-enabled Kubernetes extensions deploy and manage workloads on AKS clusters registered with Azure. The extension operator runs in the cluster and reconciles the desired state declared in Azure, which means you don't need direct kubectl or helm access to the node to push updates. This is the lower-operational-overhead path for OT teams who are not Kubernetes specialists.
Once installed, the extension appears in the Azure portal under your AKS cluster's Extensions blade. Model updates and configuration changes are pushed by modifying the extension configuration in Azure — no shell access to the node required. For disconnected or intermittently connected deployments, the extension operator caches its desired state and continues operating; it reconciles with Azure when connectivity resumes.
Option 2 - Helm Chart
Recommended when: your team manages AKS workloads with Helm or GitOps (Flux), and you need precise control over GPU resource allocation, node affinity, model pre-loading, or persistent volume configuration.
The Helm chart gives you full control over the deployment manifest. You decide exactly how much GPU memory is requested per pod, which node the inference pod is pinned to, and what StorageClass backs the model cache. This matters on a single-node Azure Local deployment where you're sharing one physical GPU between the inference workload and potentially other AKS workloads.
With Helm you can also integrate with Flux for GitOps-managed deployment — useful when you manage multiple Azure Local single-node instances across plant sites and want to push model or configuration updates from a central Git repository.
![]()
Example of a model deployment YAML file
Note: Verify the chart repository URL, chart name, and exact values.yaml parameters from the official Foundry Local documentation before deploying to production.
Choosing Between the Two
| |
Helm Chart
|
Arc Extension
|
|
authentication
|
API key
|
EntraID
|
|
Version upgrades
|
Manual helm upgrade or Flux
|
Automatic, managed by Microsoft
|
|
GitOps compatible
|
Yes (Flux HelmRelease)
|
Yes (via Azure Policy / desired state)
|
|
Requires cluster access
|
Yes
|
No (after initial registration)
|
|
Best for
|
Platform engineers, custom configs
|
OT-managed sites, multi-site fleet
|
|
Disconnected operation
|
Works after initial deploy
|
Works; reconciles on reconnect
|
|
Control plane
|
K8S native management (kubectl)
|
K8S native management + REST API control plane
|
Early Customer Validation and Key Scenarios
Early customer validation is shaping the preview -helping ensure Foundry Local meets real-world requirements for latency, data control, and operating in constrained or disconnected environments across industries such as energy, manufacturing, government, financial services, and retail.
Based on this early feedback, customers are prioritizing scenarios such as:
o On-site inference with data, models, and processing under customer control
o Decision support in disconnected or restricted-network environments
o In-jurisdiction processing for sensitive records and casework
o Real-time detection and situational awareness within secure facilities
- Industrial and critical infrastructure
o Edge operations assistants combining sensor telemetry with conversational AI
o Low-latency quality inspection and process verification on factory floors
o Predictive maintenance for remote or intermittently connected equipment
o Local safety monitoring and operational oversight close to systems
This input is guiding improvements across deployment flows, model catalog experience, hardware coverage, telemetry visibility, and documentation -so teams can evaluate and adopt Foundry Local more quickly and confidently in the environments above.
Examples:
CNC Anomaly Explanation: A machine vision system on a CNC line classifies a surface defect and passes the classification JSON to the Foundry Local endpoint. Phi-4-mini generates a plain-language root-cause hypothesis for the operator, referencing the specific machining parameters.
Disconnected Safety Procedure Lookup: An offshore platform or remote mine site loses WAN connectivity. The Foundry Local pods continue serving requests from the AKS cluster on the Azure Local node - Kubernetes keeps the pods running, the model is already on the local PersistentVolume, and no external dependency is required. Workers query safety procedures (LOTO sequences, chemical handling) from an intranet application backed by the same inference endpoint. Qwen2.5-7B fits within 8–12 GB VRAM and supports a 32K token context window, making it viable for inline procedure retrieval without a separate vector database - useful when plant-floor infrastructure is minimal.
Foundry Local for Devices and Foundry Local on Azure Local: What's Different
Foundry Local for devices reached general availability for developer devices -Windows 10/11, macOS (Apple Silicon), and Android. That release targets a specific scenario: a developer or end user running AI inference on their own machine, with the model executing locally on their CPU, GPU, or NPU. The install is a single command (winget or brew), the service runs directly on the host OS, and there is no Azure subscription or infrastructure required. It is a developer tool and an application-embedded runtime.
General overview of Foundry Local is available here: What is Foundry Local? - Foundry Local | Microsoft Learn
The public preview for Azure Local single node is a different deployment target built for a different operational context. The runtime is the same - ONNX Runtime, the same model catalog, the same OpenAI-compatible API - but where it runs, how it is deployed, and how it is managed are entirely different.
| |
Foundry Local for Devices (GA)
|
Foundry Local on Azure Local Single Node (Preview)
|
|
Target
|
Developer machines, end-user devices
|
Enterprise edge servers on the factory floor or remote site
|
|
OS
|
Windows 10/11, macOS, Android
|
Linux container on AKS on Azure Local
|
|
Hardware
|
Laptops, workstations, NPU-equipped devices
|
Validated server hardware from the Azure Local catalog
|
|
GPU access
|
Direct host GPU (CUDA, DirectML, Apple Neural Engine)
|
NVIDIA device plugin on Kubernetes
|
|
Installation
|
winget install or brew install
|
Arc-enabled Kubernetes extension or Helm chart
|
|
Lifecycle management
|
Manual update via winget upgrade
|
Managed via Helm/Flux or Arc extension operator
|
|
Intended consumers
|
One developer or one application on one machine
|
Multiple applications sharing one inference endpoint on the plant network
|
|
Disconnected operation
|
Supported after model download; primarily online
|
Designed for persistent disconnected operation with NVMe-cached models
|
|
Model persistence
|
Local device cache
|
Kubernetes PersistentVolume on local storage
|
|
Operational model
|
Developer installs and manages it
|
Platform team deploys it; applications consume it as a service
|
The short version: the GA device release is for building and running AI-enabled applications on a single machine. The Azure Local single-node preview is for deploying Foundry Local as a shared, production inference service that runs continuously on validated industrial hardware, survives WAN outages, and is consumed by multiple workloads running on the same edge cluster.
If you are prototyping an application on your laptop using the GA release, the same application code - specifically the OpenAI-compatible API calls - runs unchanged against the Azure Local deployment. You change only the base_url from localhost to the Kubernetes Service
Built for Secure Industrial and Sovereign Operations
Foundry Local supports Microsoft’s sovereign cloud principles—allowing AI workloads to operate fully locally, with customer‑controlled data boundaries and governance.
![]()
Foundry Local on Arc high level Service Diagram
Integration with Azure Arc provides unified management, configuration, and monitoring across hybrid and disconnected landscapes, enabling organizations to meet stringent compliance and operational requirements while adopting advanced AI capabilities.
Learn more about Foundry Local on Azure Local
- RECOMMENDED participate in Foundry Local on Azure Local preview form link
- Foundry Local on Azure Local Documentation link
- Reach out to the team for support requests, feedback or suggestions here: FoundryLocal_Support@microsoft.com
- Foundry Local on Azure Local: HELM deployment Demo - link
- Foundry Local is now Generally Available link