Google is bringing the Gemini app to macOS as a native desktop experience.
Google is bringing the Gemini app to macOS as a native desktop experience.
Every prompt you send to a hosted AI service leaves your tenant. Your code, your architecture decisions, your proprietary logic — all of it crosses a network boundary you don't control. For teams building in regulated industries or handling sensitive IP, that's not a philosophical concern. It's a compliance blocker.
What if you could spin up a fully private AI coding agent — running on your own GPU, in your own Azure subscription — with a single command?
That's exactly what this template does. One azd up, 15 minutes, and you have Google's Gemma 4 running on Azure Container Apps serverless GPU with an OpenAI-compatible API, protected by auth, and ready to power OpenCode as your terminal-based coding agent. No data leaves your environment. No third-party model provider sees your code. Full control.
Azure Container Apps serverless GPU gives you on-demand GPU compute without managing VMs, Kubernetes clusters, or GPU drivers. You get a container, a GPU, and an HTTPS endpoint — Azure handles the rest.
Here's what makes this approach different from calling a hosted model API:
This isn't a tradeoff between convenience and privacy. ACA serverless GPU makes self-hosted AI as easy to deploy as any SaaS endpoint — but the data stays yours.
The template deploys two containers into an Azure Container Apps environment:
The Ollama container pulls the Gemma 4 model on first start, so there's nothing to pre-build or upload. The nginx proxy runs on the free Consumption profile — only the Ollama container needs GPU.
After deployment, you get a single HTTPS endpoint that works with curl, any OpenAI-compatible SDK, or OpenCode — a terminal-based AI coding agent that turns the whole thing into a private GitHub Copilot alternative.
azd upYou need the Azure CLI and Azure Developer CLI (azd) installed.
git clone https://github.com/simonjj/gemma4-on-aca.git
cd gemma4-on-aca
azd up
The setup walks you through three choices:
GPU selection — T4 (16 GB VRAM) for smaller models, or A100 (80 GB VRAM) for the full Gemma 4 lineup.
Model selection — depends on your GPU choice. The defaults are tuned for the best quality-to-speed ratio on each GPU tier.
Proxy password — protects your endpoint with basic auth.
Region availability: Serverless GPUs are available in various regoins such as
australiaeast,brazilsouth,canadacentral,eastus,italynorth,swedencentral,uksouth,westus, andwestus3. Pick one of these when prompted for location.
That's it. Provisioning takes about 10 minutes — mostly waiting for the ACA environment to create and the model to download.
Gemma 4 ships in four sizes. The right choice depends on your GPU and workload:
| Model | Params | Architecture | Context | Modalities | Disk Size |
|---|---|---|---|---|---|
gemma4:e2b | ~2B | Dense | 128K | Text, Image, Audio | ~7 GB |
gemma4:e4b | ~4B | Dense | 128K | Text, Image, Audio | ~10 GB |
gemma4:26b | 26B | MoE (4B active) | 256K | Text, Image | ~18 GB |
gemma4:31b | 31B | Dense | 256K | Text, Image | ~20 GB |
We benchmarked every model on both GPU tiers using Ollama v0.20 with Q4_K_M quantization and 32K context in Sweden Central:
| Model | GPU | Tokens/sec | TTFT | Notes |
|---|---|---|---|---|
gemma4:e2b | T4 | ~81 | ~15ms | Fastest on T4 |
gemma4:e4b | T4 | ~51 | ~17ms | Default T4 choice — best quality/speed |
gemma4:e2b | A100 | ~184 | ~9ms | Ultra-fast |
gemma4:e4b | A100 | ~129 | ~12ms | Great for lighter workloads |
gemma4:26b | A100 | ~113 | ~14ms | Default A100 choice — strong reasoning |
gemma4:31b | A100 | ~40 | ~30ms | Highest quality, slower |
51 tokens/second on a T4 with the 4B model is fast enough for interactive coding assistance. The 26B model on A100 delivers 113 tokens/second with noticeably better reasoning — ideal for complex refactoring, architecture questions, and multi-file changes.
The 26B and 31B models require A100 — they don't fit in T4's 16 GB VRAM.
After azd up completes, the post-provision hook prints your endpoint URL. Test it:
curl -u admin:<YOUR_PASSWORD> \
https://<YOUR_PROXY_ENDPOINT>/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemma4:e4b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
You should get a JSON response with Gemma 4's reply. The endpoint is fully OpenAI-compatible — it works with any tool or SDK that speaks the OpenAI API format.
Here's where it gets powerful. OpenCode is a terminal-based AI coding agent — think GitHub Copilot, but running in your terminal and pointing at whatever model backend you choose.
The azd up post-provision hook automatically generates an opencode.json in your project directory with the correct endpoint and credentials. If you need to create it manually:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"gemma4-aca": {
"npm": "@ai-sdk/openai-compatible",
"name": "Gemma 4 on ACA",
"options": {
"baseURL": "https://<YOUR_PROXY_ENDPOINT>/v1",
"headers": {
"Authorization": "Basic <BASE64_OF_admin:YOUR_PASSWORD>"
}
},
"models": {
"gemma4:e4b": {
"name": "Gemma 4 e4b (4B)"
}
}
}
}
}
Generate the Base64 value:
echo -n "admin:YOUR_PASSWORD" | base64
Now run it:
opencode run -m "gemma4-aca/gemma4:e4b" "Write a binary search in Rust"
That command sends your prompt to Gemma 4 running on your ACA GPU, and streams the response back to your terminal. Every token is generated on your infrastructure. Nothing leaves your subscription.
For interactive sessions, launch the TUI:
opencode
Select your model with /models, pick Gemma 4, and start coding. OpenCode supports file editing, code generation, refactoring, and multi-turn conversations — all powered by your private Gemma 4 instance.
This matters most for teams that can't send code to external APIs:
With ACA serverless GPU, you're not running a VM or managing a Kubernetes cluster to get this privacy. It's a managed container with a GPU attached. Azure handles the infrastructure, you own the data boundary.
When you're done:
azd down
This tears down all Azure resources. Since ACA serverless GPU bills only while your containers are running, you can also scale to zero replicas to pause costs without destroying the environment.
azd up, and you're liveFrom January through March 2026, Microsoft Entra introduced key updates to help organizations strengthen identity security, simplify governance, and improve user experience. This Q1 roundup highlights the latest feature releases and important changes—organized by product—so you can quickly see what’s new, what’s changing, and what actions you may need to take.
[Action may be required]
If your organization has Conditional Access (CA) policies scoped to Register security information, those policies will now apply when users set up Windows Hello for Business (WHfB) or register macOS Platform SSO credentials. Organizations without these policies aren't affected.
When this will happen
How this affects your organization
Users registering WHfB or macOS PSSO credentials will need to satisfy your register-security-info CA policies and may see a CA prompt during device setup. Important: WHfB uses the Device Registration Client, classified as "Other clients" in CA. If your policy blocks "Other clients," WHfB and PSSO provisioning will be blocked. Add a trusted location exclusion to avoid this.
Action recommended
[Action may be required]
Starting February 2026, Microsoft Authenticator introduced jailbreak/root detection for Microsoft Entra credentials in the Android app. The rollout progresses from warning mode → blocking mode → wipe mode. Users must move to compliant devices to continue using Microsoft Entra accounts in Authenticator. Learn more.
[Action may be required]
We’re consolidating agent management experiences to make it easier to observe, govern, and secure all agents in your tenant. Agent 365 will be the single source of truth, offering a unified catalog, consistent visibility, and simplified management.
What’s changing
With this change:
[Action may be required]
What is hard matching in Microsoft Entra Connect Sync and Cloud Sync?
When Microsoft Entra Connect or Cloud Sync adds new objects from Active Directory, the Microsoft Entra ID service tries to match the incoming object with a Microsoft Entra object by looking up the incoming object’s sourceAnchor value against the OnPremisesImmutableId attribute of existing cloud managed objects in Microsoft Entra ID. If there's a match, Microsoft Entra Connect or Cloud Sync takes over the source or authority (SoA) of that object and updates it with the properties of the incoming Active Directory object in what is known as a "hard match."
To strengthen the security posture of your Microsoft Entra ID environment, we are introducing a change that will restrict certain types of hard match operations by default.
What’s changing
Beginning June 1, 2026, Microsoft Entra ID will block any attempt by Microsoft Entra Connect Sync or Cloud Sync from hard-matching a new user object from Active Directory to an existing cloud-managed Microsoft Entra ID user object that hold Microsoft Entra roles.
This means:
What’s not changing
Customer action required
If you encounter a hard match error after June 1, 2026, see our documentation for mitigation steps.
-Shobhit Sahay
Learn more about Microsoft Entra
Prevent identity attacks, ensure least privilege access, unify access controls, and improve the experience for users with comprehensive identity and network access solutions across on-premises and clouds.
Meet Brittany Ellich, a Staff Software Engineer here at GitHub, and explore the custom productivity tool she built using the GitHub Copilot CLI. Because she prefers visual interfaces over the command line, she used GitHub Copilot to vibe code a personalized command center. Her app features an AI chat agent named Marvin, unified task lists, and calendar integrations. Watch to see how she built it and learn why creating your own AI tools is the best way to learn.
Her project is open source, check it out: https://github.com/features/copilot/cli?utm_source=social-youtube-build-a-game-cli-features-cta&utm_medium=social&utm_campaign=dev-pod-copilot-cli-2026
#GitHubCopilot #CopilotCLI #VibeCoding
Stay up-to-date on all things GitHub by subscribing and following us at:
YouTube: http://bit.ly/subgithub
Blog: https://github.blog
X: https://twitter.com/github
LinkedIn: https://linkedin.com/company/github
Instagram: https://www.instagram.com/github
TikTok: https://www.tiktok.com/@github
Facebook: https://www.facebook.com/GitHub/
About GitHub:
It’s where over 180 million developers create, share, and ship the best code possible. It’s a place for anyone, from anywhere, to build anything—it’s where the world builds software. https://github.com