Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
148429 stories
·
33 followers

Building a Dual Sidecar Pod: Combining GitHub Copilot SDK with Skill Server on Kubernetes

1 Share

Why the Sidecar Pattern?

In Kubernetes, a Pod is the smallest deployable unit — a single Pod can contain multiple containers that share the same network namespace and storage volumes. The Sidecar pattern places auxiliary containers alongside the main application container within the same Pod. These Sidecar containers extend or enhance the main container's functionality without modifying it.

💡 Beginner Tip: If you're new to Kubernetes, think of a Pod as a shared office — everyone in the room (containers) has their own desk (process), but they share the same network (IP address), the same file cabinet (storage volumes), and can communicate without leaving the room (localhost communication).

The Sidecar pattern is not a new concept. As early as 2015, the official Kubernetes blog described this pattern in a post about Composite Containers. Service mesh projects like Envoy, Istio, and Linkerd extensively use Sidecar containers for traffic management, observability, and security policies. In the AI application space, we are now exploring how to apply this proven pattern to new scenarios.

Why does this matter? There are three fundamental reasons:

1. Separation of Concerns

Each container in a Pod has a single, well-defined responsibility. The main application container doesn't need to know how AI content is generated or how skills are managed — it only serves the results. This separation allows each component to be independently tested, debugged, and replaced, aligning with the Unix philosophy of "do one thing well."

In practice, this means: the frontend team can iterate on Nginx configuration without affecting AI logic; AI engineers can upgrade the Copilot SDK version without touching skill management code; and operations staff can adjust skill configurations without notifying the development team.

2. Shared Localhost Network

All containers in a Pod share the same network namespace, with the same 127.0.0.1. This means communication between Sidecars is just a simple localhost HTTP call — no service discovery, no DNS resolution, no cross-node network hops.

From a performance perspective, localhost communication traverses the kernel's loopback interface, with latency typically in the microsecond range. In contrast, cross-Pod ClusterIP Service calls require routing through kube-proxy's iptables/IPVS rules, with latency typically in the millisecond range. For AI agent scenarios that require frequent interaction, this difference is meaningful.

From a security perspective, localhost communication doesn't traverse any network interface, making it inherently immune to eavesdropping by other Pods in the cluster. Unless a Service is explicitly configured, Sidecar ports are not exposed outside the Pod.

3. Efficient Data Transfer via Shared Volumes

Kubernetes emptyDir volumes allow containers within the same Pod to share files on disk. Once a Sidecar writes a file, the main container can immediately read and serve it — no message queues, no additional API calls, no databases. This is ideal for workflows where one container produces artifacts (such as generated blog posts) and another consumes them.

⚠️ Technical Precision Note: "Efficient" here means eliminating the overhead of network serialization/deserialization and message middleware. However, emptyDir fundamentally relies on standard file system I/O (disk read/write or tmpfs) and is not equivalent to OS-level "Zero-Copy" (such as the sendfile() system call or DMA direct memory access). For blog content generation — a file-level data transfer use case — filesystem sharing is already highly efficient and sufficiently simple.

In the gh-cli-blog-agent project, we take this pattern to its fullest extent by using two Sidecars within a single Pod:

A Note on Kubernetes Native Sidecar Containers

It is worth noting that Kubernetes 1.28 (August 2023) introduced native Sidecar container support via KEP-753, which reached GA (General Availability) in Kubernetes 1.33 (April 2025). Native Sidecars are implemented by setting restartPolicy: Always on initContainers, providing capabilities that the traditional approach lacks:

  • Deterministic startup order: init containers start in declaration order; main containers only start after Sidecar containers are ready
  • Non-blocking Pod termination: Sidecars are automatically cleaned up after main containers exit, preventing Jobs/CronJobs from being stuck
  • Probe support: Sidecars can be configured with startup, readiness, and liveness probes to signal their operational state

This project currently uses the traditional approach of deploying Sidecars as regular containers, with application-level health check polling (wait_for_skill_server) to handle startup dependencies. This approach is compatible with all Kubernetes versions (1.24+), making it suitable for scenarios requiring broad compatibility.

If your cluster version is ≥ 1.29 (or ≥ 1.33 for GA stability), we strongly recommend migrating to native Sidecars for platform-level startup order guarantees and more graceful lifecycle management. Migration example:

# Native Sidecar syntax (Kubernetes 1.29+) initContainers: - name: skill-server image: blog-agent-skill restartPolicy: Always # Key: marks this as a Sidecar ports: - containerPort: 8002 startupProbe: # Platform-level startup readiness signal httpGet: path: /health port: 8002 periodSeconds: 2 failureThreshold: 30 - name: copilot-agent image: blog-agent-copilot restartPolicy: Always ports: - containerPort: 8001 containers: - name: blog-app # Main container starts last; Sidecars are ready image: blog-agent-main ports: - containerPort: 80

Architecture Overview

The deployment defines three containers and three volumes:

ContainerImagePortRole
blog-appblog-agent-main80Nginx — serves Web UI and reverse proxies to Sidecars
copilot-agentblog-agent-copilot8001FastAPI — AI blog generation powered by GitHub Copilot SDK
skill-serverblog-agent-skill8002FastAPI — skill file management and synchronization
VolumeTypePurpose
blog-dataemptyDirCopilot agent writes generated blogs; Nginx serves them
skills-sharedemptyDirSkill server writes skill files; Copilot agent reads them
skills-sourceConfigMapKubernetes-managed skill definition files (read-only)

💡 Design Insight: The three-volume design embodies the "least privilege" principle — blog-data is shared only between the Copilot agent (write) and Nginx (read); skills-shared is shared only between the skill server (write) and the Copilot agent (read). skills-source provides read-only skill definition sources via ConfigMap, forming a unidirectional data flow: ConfigMap → skill-server → shared volume → copilot-agent.

The Kubernetes deployment YAML clearly describes this structure:

volumes: - name: blog-data emptyDir: sizeLimit: 256Mi # Production best practice: always set sizeLimit to prevent disk exhaustion - name: skills-shared emptyDir: sizeLimit: 64Mi # Skill files are typically small - name: skills-source configMap: name: blog-agent-skill

⚠️ Production Recommendation: The original configuration used emptyDir: {} without a sizeLimit. In production, an unrestricted emptyDir can grow indefinitely until it exhausts the node's disk space, triggering a node-level DiskPressure condition and causing other Pods to be evicted. Always setting a reasonable sizeLimit for emptyDir is part of the Kubernetes security baseline. Community tools like Kyverno can enforce this practice at the cluster level.

Nginx reverse proxies route requests to Sidecars via localhost:

# Reverse proxy to copilot-agent sidecar (localhost:8001 within the same Pod) location /agent/ { proxy_pass http://127.0.0.1:8001/; proxy_set_header Host $host; proxy_set_header X-Request-ID $request_id; # Enables cross-container request tracing proxy_read_timeout 600s; # AI generation may take a while } # Reverse proxy to skill-server sidecar (localhost:8002 within the same Pod) location /skill/ { proxy_pass http://127.0.0.1:8002/; proxy_set_header Host $host; }

Since all three containers share the same network namespace, 127.0.0.1:8001 and 127.0.0.1:8002 are directly accessible — no ClusterIP Service is needed for intra-Pod communication. This is a core feature of the Kubernetes Pod networking model: all containers within the same Pod share a single network namespace, including IP address and port space.

Advantage 1: GitHub Copilot SDK as a Sidecar

Encapsulating the GitHub Copilot SDK as a Sidecar, rather than embedding it in the main application, provides several architectural advantages.

Understanding the GitHub Copilot SDK Architecture

Before diving deeper, let's understand how the GitHub Copilot SDK works. The SDK entered technical preview in January 2026, exposing the production-grade agent runtime behind GitHub Copilot CLI as a programmable SDK supporting Python, TypeScript, Go, and .NET.

The SDK's communication architecture is as follows:

The SDK client communicates with a locally running Copilot CLI process via the JSON-RPC protocol. The CLI handles model routing, authentication management, MCP server integration, and other low-level details. This means you don't need to build your own planner, tool loop, and runtime — these are all provided by an engine that has been battle-tested in production at GitHub's scale.

The benefit of encapsulating this SDK in a Sidecar container is: containerization isolates the CLI process's dependencies and runtime environment, preventing dependency conflicts with the main application or other components.

Cross-Platform Node.js Installation in the Container

A notable implementation detail is how Node.js (required by the Copilot CLI) is installed inside the container. Rather than relying on third-party APT repositories like NodeSource — which can introduce DNS resolution failures and GPG key management issues in restricted network environments — the Dockerfile downloads the official Node.js binary directly from nodejs.org with automatic architecture detection:

# Install Node.js 20+ (official binary, no NodeSource APT repo needed) ARG NODE_VERSION=20.20.0 RUN DPKG_ARCH=$(dpkg --print-architecture) \ && case "${DPKG_ARCH}" in amd64) ARCH=x64;; arm64) ARCH=arm64;; armhf) ARCH=armv7l;; *) ARCH=${DPKG_ARCH};; esac \ && curl -fsSL "https://nodejs.org/dist/v${NODE_VERSION}/node-v${NODE_VERSION}-linux-${ARCH}.tar.xz" -o node.tar.xz \ && tar -xJf node.tar.xz -C /usr/local --strip-components=1 --no-same-owner \ && rm -f node.tar.xz

The case statement maps Debian's architecture identifiers (amd64, arm64, armhf) to Node.js's naming convention (x64, arm64, armv7l). This ensures the same Dockerfile works seamlessly on both linux/amd64 (Intel/AMD) and linux/arm64 (Apple Silicon, AWS Graviton) build platforms — an important consideration given the growing adoption of ARM-based infrastructure.

Independent Lifecycle and Resource Management

The Copilot agent is the most resource-intensive component — it needs to run the Copilot CLI process, manage JSON-RPC communication, and handle streaming responses. By isolating it in its own container, we can assign dedicated CPU and memory limits without affecting the lightweight Nginx container:

# copilot-agent: needs more resources for AI inference coordination resources: requests: cpu: 250m memory: 512Mi limits: cpu: "1" memory: 2Gi # blog-app: lightweight Nginx with minimal resource needs resources: requests: cpu: 50m memory: 64Mi limits: cpu: 200m memory: 128Mi

This resource isolation delivers two key benefits:

  1. Fault isolation: If the Copilot agent crashes due to a timeout or memory spike (OOMKilled), Kubernetes only restarts that container — the Nginx frontend continues running and serving previously generated content. Users see "generation feature temporarily unavailable" rather than "entire site is down."
  2. Fine-grained resource scheduling: The Kubernetes scheduler selects nodes based on the sum of Pod-level resource requests. Distributing resource requests across containers allows kubelet to more precisely track each component's actual resource consumption, helping HPA (Horizontal Pod Autoscaler) make better scaling decisions.

Graceful Startup Coordination

In a multi-Sidecar Pod, regular containers start concurrently (note: this is precisely one of the issues that native Sidecars, discussed earlier, can solve). The Copilot agent handles this through application-level startup dependency checks — it waits for the skill server to become healthy before initializing the CopilotClient:

async def wait_for_skill_server(url: str, retries: int = 30, delay: float = 2.0): """Wait for the skill-server sidecar to become healthy. In traditional Sidecar deployments (regular containers), containers start concurrently with no guaranteed startup order. This function implements application-level readiness waiting. If using Kubernetes native Sidecars (initContainers + restartPolicy: Always), the platform guarantees Sidecars start before main containers, which can simplify this logic. """ async with httpx.AsyncClient() as client: for i in range(retries): try: resp = await client.get(f"{url}/health", timeout=5.0) if resp.status_code == 200: logger.info(f"Skill server is healthy at {url}") return True except Exception: pass logger.info(f"Waiting for skill server... ({i + 1}/{retries})") await asyncio.sleep(delay) raise RuntimeError(f"Skill server at {url} did not become healthy")

This pattern is critical in traditional Sidecar architectures: you cannot assume startup order, so explicit readiness checks are necessary. The wait_for_skill_server function polls http://127.0.0.1:8002/health at 2-second intervals up to 30 times (maximum total wait of 60 seconds) — simple, effective, and resilient.

💡 Comparison: With native Sidecars, the skill-server would be declared as an initContainer with a startupProbe. Kubernetes would ensure the skill-server is ready before starting the copilot-agent. In that case, wait_for_skill_server could be simplified to a single health check confirmation rather than a retry loop.

SDK Configuration via Environment Variables

All Copilot SDK configuration is passed through Kubernetes-native primitives, reflecting the 12-Factor App principle of externalized configuration:

env: - name: SKILL_SERVER_URL value: "http://127.0.0.1:8002" - name: SKILLS_DIR value: "/skills-shared/blog/SKILL.md" - name: COPILOT_GITHUB_TOKEN valueFrom: secretKeyRef: name: blog-agent-secret key: copilot-github-token

Key design decisions explained:

  • COPILOT_GITHUB_TOKEN is stored in a Kubernetes Secret — never baked into images or passed as build arguments. Using the GitHub Copilot SDK requires a valid GitHub Copilot subscription (unless using BYOK mode, i.e., Bring Your Own Key), making secure management of this token critical.
  • SKILLS_DIR points to skill files synchronized to a shared volume by the other Sidecar. This means the Copilot agent container image is completely stateless and can be reused across different skill configurations.
  • SKILL_SERVER_URL uses 127.0.0.1 instead of a service name — since this is intra-Pod communication, DNS resolution is unnecessary.

🔐 Production Security Tip: For stricter security requirements, consider using External Secrets Operator to sync Secrets from AWS Secrets Manager, Azure Key Vault, or HashiCorp Vault, rather than managing them directly in Kubernetes. Native Kubernetes Secrets are only Base64-encoded by default, not encrypted at rest (unless Encryption at Rest is enabled).

CopilotClient Sessions and Skill Integration

The core of the Copilot Sidecar lies in how it creates sessions with skill directories. When a blog generation request is received, it creates a session with access to skill definitions:

session = await copilot_client.create_session({ "model": "claude-sonnet-4-5-20250929", "streaming": True, "skill_directories": [SKILLS_DIR] })

The skill_directories parameter points to files on the shared volume — files placed there by the skill-server sidecar. This is the handoff point: the skill server manages which skills are available, and the Copilot agent consumes them. Neither container needs to know about the other's internal implementation — they are coupled only through the filesystem as an implicit contract.

💡 About Copilot SDK Skills: The GitHub Copilot SDK allows you to define custom Agents, Skills, and Tools. Skills are essentially instruction sets written in Markdown format (typically named SKILL.md) that define the agent's behavior, constraints, and workflows in a specific domain. This is consistent with the .copilot_skills/ directory mechanism in GitHub Copilot CLI.

File-Based Output to Shared Volumes

Generated blog posts are written to the blog-data shared volume, which is simultaneously mounted in the Nginx container:

BLOG_DIR = os.path.join(WORK_DIR, "blog") # ... # Blog saved as blog-YYYY-MM-DD.md # Nginx can serve it immediately from /blog/ without any restart

The Nginx configuration auto-indexes this directory:

location /blog/ { alias /usr/share/nginx/html/blog/; autoindex on; }

The moment the Copilot agent writes a file, it's immediately accessible through the Nginx Web UI. No API calls, no database writes, no cache invalidation — just a shared filesystem.

This file-based data transfer has an additional benefit: natural persistence and auditability. Each blog exists as an independent Markdown file with a date-timestamp in its name, making it easy to trace generation history. (Note, however, that emptyDir lifecycle is tied to the Pod — data is lost when the Pod is recreated. For persistence needs, see the "Production Recommendations" section below.)

Advantage 2: Skill Server as a Sidecar

The skill server is the second Sidecar — a lightweight FastAPI service responsible for managing the skill definitions used by the Copilot agent. Separating skill management into its own container offers clear advantages.

Decoupled Skill Lifecycle

Skill definitions are stored in a Kubernetes ConfigMap:

apiVersion: v1 kind: ConfigMap metadata: name: blog-agent-skill data: SKILL.md: | # Blog Generator Skill Instructions You are a professional technical evangelist... ## Key Requirements 1. Outline generation 2. Mandatory online research (DeepSearch) 3. Technical evangelist perspective ...

ConfigMaps can be updated independently of any container image. When you run kubectl apply to update a ConfigMap, Kubernetes synchronizes the change to the volumes mounted in the Pod.

⚠️ Important Detail: ConfigMap volume updates do not take effect immediately. The kubelet detects ConfigMap changes through periodic synchronization, with a default sync period controlled by --sync-frequency (default: 1 minute), plus the ConfigMap cache TTL. The actual propagation delay can be 1–2 minutes. If immediate effect is needed, you must actively call the /sync endpoint to trigger a file synchronization:

def sync_skills(): """Copy skill files from ConfigMap source to the shared volume.""" source = Path(SKILLS_SOURCE_DIR) dest = Path(SKILLS_SHARED_DIR) / "blog" dest.mkdir(parents=True, exist_ok=True) synced = 0 for skill_file in source.iterdir(): if skill_file.is_file(): target = dest / skill_file.name shutil.copy2(str(skill_file), str(target)) synced += 1 return synced

This design means: updating AI behavior requires no container image rebuilds or redeployments. You simply update the ConfigMap, trigger a sync, and the agent's behavior changes. This is a tremendous operational advantage for iterating on prompts and skills in production.

💡 Advanced Thought: Why not mount the ConfigMap directly to the copilot-agent's SKILLS_DIR path? While technically feasible, introducing the skill-server as an intermediary provides the triple value of validation, API access, and extensibility (see "Why Not Embed Skills in the Copilot Agent" below).

Minimal Resource Footprint

The skill server does one thing — serve and sync files. Its resource requirements reflect this:

resources: requests: cpu: 50m memory: 64Mi limits: cpu: 200m memory: 256Mi

Compared to the Copilot agent's 2Gi memory limit, the skill server costs a fraction of the resources. This is the beauty of the Sidecar pattern — you can add lightweight containers for auxiliary functionality without significantly increasing the Pod's total resource consumption.

REST API for Skill Introspection

The skill server provides a simple REST API that allows external systems or operators to query available skills:

.get("/skills") async def list_skills(): """List all available skills.""" source = Path(SKILLS_SOURCE_DIR) skills = [] for f in sorted(source.iterdir()): if f.is_file(): skills.append({ "name": f.stem, "filename": f.name, "size": f.stat().st_size, "url": f"/skill/{f.name}", }) return {"skills": skills, "total": len(skills)} @app.get("/skill/{filename}") async def get_skill(filename: str): """Get skill content by filename.""" file_path = Path(SKILLS_SOURCE_DIR) / filename if not file_path.exists() or not file_path.is_file(): raise HTTPException(status_code=404, detail=f"Skill '{filename}' not found") return {"filename": filename, "content": file_path.read_text(encoding="utf-8")}

This API serves multiple purposes:

  • Debugging: Verify which skills are currently loaded without needing to kubectl exec into the container, significantly lowering the troubleshooting barrier.
  • Monitoring: External tools can poll /skills to ensure the expected skill set is deployed. Combined with Prometheus Blackbox Exporter, you can implement configuration drift detection.
  • Extensibility: Future systems can dynamically register or update skills via the API, providing a foundation for A/B testing different prompt strategies.

Why Not Embed Skills in the Copilot Agent?

Mounting the ConfigMap directly into the Copilot agent container seems simpler. But separating it into a dedicated Sidecar has the following advantages:

  1. Validation layer: The skill server can validate skill file format and content before synchronization, preventing invalid skill definitions from causing Copilot SDK runtime errors.
  2. API access: Skills become queryable and manageable through a REST interface, supporting operational automation.
  3. Independent evolution of logic: If skill management becomes more complex (e.g., dynamic skill registration, version management, prompt A/B testing, role-based skill distribution), the skill server can evolve independently without affecting the Copilot agent.
  4. Clear data flow: ConfigMap → skill-server → shared volume → copilot-agent. Each arrow is an explicit, observable step. When something goes wrong, you can pinpoint exactly which stage failed.

💡 Architectural Trade-off: For small-scale deployments or PoC (Proof of Concept) work, directly mounting the ConfigMap to the Copilot agent is a perfectly reasonable choice — fewer components means lower operational overhead. The Sidecar approach's value becomes fully apparent in medium-to-large-scale production environments. Architectural decisions should always align with team size, operational maturity, and business requirements.

End-to-End Workflow

Here is the complete data flow when a user requests a blog post generation:

Every step uses intra-Pod communication — localhost HTTP calls or shared filesystem reads. No external network calls are needed between components. The only external dependency is the Copilot SDK's connection to GitHub authentication services and AI model endpoints via the Copilot CLI.

The Kubernetes Service exposes three ports for external access:

ports: - name: http # Nginx UI + reverse proxy port: 80 nodePort: 30081 - name: agent-api # Direct access to Copilot Agent port: 8001 nodePort: 30082 - name: skill-api # Direct access to Skill Server port: 8002 nodePort: 30083

⚠️ Security Warning: In production, it is not recommended to directly expose the agent-api and skill-api ports via NodePort. These two APIs should only be accessible through the Nginx reverse proxy (/agent/ and /skill/ paths), with authentication and rate limiting configured at the Nginx layer. Directly exposing Sidecar ports bypasses the reverse proxy's security controls. Recommended configuration:

# Production recommended: only expose the Nginx port ports: - name: http port: 80 targetPort: 80 # Combine with NetworkPolicy to restrict inter-Pod communication

Production Recommendations and Architecture Extensions

When moving this architecture from a development/demo environment to production, the following areas deserve attention:

Cross-Platform Build and Deployment

The project's Makefile auto-detects the host architecture to select the appropriate Docker build platform, eliminating the need for manual configuration:

ARCH := $(shell uname -m) ifeq ($(ARCH),x86_64) DOCKER_PLATFORM ?= linux/amd64 else ifeq ($(ARCH),aarch64) DOCKER_PLATFORM ?= linux/arm64 else ifeq ($(ARCH),arm64) DOCKER_PLATFORM ?= linux/arm64 else DOCKER_PLATFORM ?= linux/amd64 endif

Both macOS and Linux are supported as development environments with dedicated tool installation targets:

# macOS (via Homebrew) make install-tools-macos # Linux (downloads official binaries to /usr/local/bin) make install-tools-linux

The Linux installation target downloads kubectl and kind binaries directly from upstream release URLs with architecture-aware selection, avoiding dependency on any package manager beyond curl and sudo. This makes the setup portable across different Linux distributions (Ubuntu, Debian, Fedora, etc.).

Health Checks and Probe Configuration

Configure complete probes for each container to ensure Kubernetes can properly manage container lifecycles:

# copilot-agent probe example livenessProbe: httpGet: path: /health port: 8001 initialDelaySeconds: 10 periodSeconds: 30 timeoutSeconds: 5 readinessProbe: httpGet: path: /health port: 8001 periodSeconds: 10 startupProbe: # AI agent startup may be slow httpGet: path: /health port: 8001 periodSeconds: 5 failureThreshold: 30 # Allow up to 150 seconds for startup

Data Persistence

The emptyDir lifecycle is tied to the Pod. If generated blogs need to survive Pod recreation, consider these approaches:

  • PersistentVolumeClaim (PVC): Replace the blog-data volume with a PVC; data persists independently of Pod lifecycle
  • Object storage upload: After the Copilot agent generates a blog, asynchronously upload to S3/Azure Blob/GCS
  • Git repository push: Automatically commit and push generated Markdown files to a Git repository for versioned management

Security Hardening

# Set security context for each container securityContext: runAsNonRoot: true runAsUser: 1000 readOnlyRootFilesystem: true # Only write through emptyDir allowPrivilegeEscalation: false capabilities: drop: ["ALL"]

Observability Extensions

The Sidecar pattern is naturally suited for adding observability components. You can add a third (or fourth) Sidecar to the same Pod for log collection, metrics export, or distributed tracing:

Horizontal Scaling Strategy

Since containers within a Pod scale together, HPA scaling granularity is at the Pod level. This means:

  • If the Copilot agent is the bottleneck, scaling Pod replicas also scales Nginx and skill-server (minimal waste since they are lightweight)
  • If skill management becomes compute-intensive in the future, consider splitting the skill-server from a Sidecar into an independent Deployment + ClusterIP Service for independent scaling

Evolution Path from Sidecar to Microservices

The dual Sidecar architecture provides a clear path for future migration to microservices:

Each migration step only requires changing the communication method (localhost → Service DNS); business logic remains unchanged. This is the architectural flexibility that good separation of concerns provides.

sample code - https://github.com/kinfey/Multi-AI-Agents-Cloud-Native/tree/main/code/GitHubCopilotSideCar

Summary

The dual Sidecar pattern in this project demonstrates a clean cloud-native AI application architecture:

  • Main container (Nginx) stays lean and focused — it only serves HTML and proxies requests. It knows nothing about AI or skills.
  • Sidecar 1 (Copilot Agent) encapsulates all AI logic. It uses the GitHub Copilot SDK, manages sessions, and generates content. Its only coupling to the rest of the Pod is through environment variables and shared volumes. The container image is built with cross-platform support — Node.js is installed from official binaries with automatic architecture detection, ensuring the same Dockerfile works on both amd64 and arm64 platforms.
  • Sidecar 2 (Skill Server) provides a dedicated management layer for AI skill definitions. It bridges Kubernetes-native configuration (ConfigMap) with the Copilot SDK's runtime needs.

This separation gives you independent deployability, isolated failure domains, and — most importantly — the ability to change AI behavior (skills, prompts, models) without rebuilding any container images.

The Sidecar pattern is more than an architectural curiosity; it is a practical approach to composing AI services in Kubernetes, allowing each component to evolve at its own pace. With cross-platform build support (macOS and Linux, amd64 and arm64), Kubernetes native Sidecars reaching GA in 1.33, and AI development tools like the GitHub Copilot SDK maturing, we anticipate this "AI agent + Sidecar" combination pattern will see validation and adoption in more production environments.

References

Read the whole story
alvinashcraft
8 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Stop Drawing Architecture Diagrams Manually: Meet the Open-Source AI Architecture Review Agents

1 Share

Hey everyone! I am Shivam Goyal, a Microsoft MVP, and I am super excited to share a project that is going to save you a massive amount of time.

Designing software architecture is arguably one of the most creative and enjoyable parts of engineering. Documenting it, reviewing it for security flaws, and keeping the diagrams updated as the system evolves? Not so much.

We have all been there. You sketch out a brilliant microservices architecture on a whiteboard, take a blurry photo of it, and spend the next three hours wrestling with boxes, arrows, and alignment tools. By the time you finally get to the actual security and risk review, the architecture has already changed.

What if you could just explain your system in plain English, or point a tool to a messy README, and instantly get a prioritized risk assessment, actionable recommendations, and an editable architecture diagram?

Enter the Architecture Review Agent, an open-source AI sample my team and I built with the Microsoft Agent FrameworkAzure OpenAI, and Excalidraw MCP.

What is the Architecture Review Agent?

At its core, the Architecture Review Agent is an automated pipeline that takes architectural descriptions in almost any format and transforms them into structured insights and visual maps.

Whether you feed it a strictly formatted YAML file, a Markdown design doc, or just a brain dump like: "We have a React frontend hitting a Kong gateway, which routes to three microservices, each with its own Postgres DB," the agent processes it in seconds.

Here is what you get back:

  • An Interactive Excalidraw Diagram: No more static, uneditable images. The agent renders a fully interactive diagram via Excalidraw MCP that you can immediately tweak right in your browser.
  • Prioritized Risk Analysis: An automated assessment of Single Points of Failure (SPOFs), scalability bottlenecks, security gaps, and architectural anti-patterns.
  • Component Dependency Mapping: A detailed breakdown of fan-in and fan-out metrics, plus detection of orphaned components. 

See it in action: Check out this end-to-end review of an architecture, from file upload to risk detection and interactive diagram generation.

Why You Should Add It to Your Workflow

I wanted this agent to adapt to how developers actually work, rather than forcing you to learn a new proprietary diagramming language.

1. Smart Input Intelligence

The agent works with what you already have. If you pass it structured YAML or Markdown, it uses a lightning-fast rule-based parser. If you pass it unstructured text, code files, or meeting notes, it automatically falls back to Azure OpenAI (we highly recommend GPT-4.1) to intelligently infer the components, their types, and how they connect.

2. Actionable, Context-Aware Reviews

This isn't just about drawing boxes. The AI analyzes your data flow to flag real-world issues. It will warn you about shared database anti-patterns, highlight missing API gateways, or point out infrastructure components that lack redundancy. The risks are bucketed by severity (Critical to Low) so you know exactly what to tackle first.

A Quick Note on AI Recommendations: While the agent is incredibly powerful, it is designed to be a co-pilot for your architecture team, not a replacement for human expertise. Always treat the AI-generated risk assessments and recommendations as a starting point. They are an amazing tool to accelerate your review process, but you should always verify the findings and conduct formal security audits with your human experts!

3. Exports That Actually Matter

Need a slide for your next architecture review board? Grab the high-res PNG export. Need your team to collaborate and refine the design? Download the .excalidraw JSON file or edit it directly in the React web UI.

Deploy It Your Way: Featuring Microsoft Foundry Hosted Agents

The repository ships with scripts to get you up and running immediately. You have two production-ready deployment paths: a traditional full-stack web app, or my absolute favourite approach, a Hosted Agent via Microsoft Foundry.

Option A: Full-Stack Web App (Azure App Service)

This is perfect if your team wants a custom, drag-and-drop React web interface. This path deploys a FastAPI backend and a React frontend to Azure App Service, giving you full ownership over the API surface and the UI.

Option B: The Future of Zero-Ops AI (Microsoft Foundry Hosted Agents)

If you want to build a scalable, enterprise-grade API without wrestling with infrastructure, Hosted agents in Foundry Agent Service (preview) - Microsoft Foundry is the way to go.

Recently introduced in preview, Hosted Agents allow you to bring your own agent code (built with the Microsoft Agent Framework) and run it as a fully managed containerized service. Microsoft Foundry handles the heavy lifting so you can focus purely on your agent's logic.

Here is why deploying the Architecture Review Agent on Microsoft Foundry is a complete game changer:

  • Zero-Ops Infrastructure: The platform automatically builds your container via ACR Tasks and manages the compute. It scales seamlessly from 0 to 5 replicas, including scaling to 0 to save costs when idle.
  • Built-in Conversation Persistence: You do not need to build your own database to remember chat history. The Foundry Agent Service natively manages conversation state across requests.
  • Enterprise Security Out-of-the-Box: Say goodbye to hardcoding API keys. Hosted Agents use system-assigned Managed Identities (Entra ID) with Role-Based Access Control (RBAC).
  • Publish Anywhere: Once deployed to Foundry, you can publish your agent directly to Microsoft Teams or Microsoft 365 Copilot with no extra code required. Your team can literally ask Copilot in Teams to review an architecture spec!
  • Seamless VS Code Deployment: We have integrated this sample with the Microsoft Foundry for VS Code extension. Deploying to the cloud is as simple as opening the Command Palette, running Microsoft Foundry: Deploy Hosted Agent, and following the prompts.

Get Started in 5 Minutes

The project is completely open-source and waiting for you to test it out. If you have Python 3.11+ and access to Azure OpenAI or a Microsoft Foundry project, you can generate your first architecture review right now.

Just clone the repository, run the setup script, and try feeding it your messiest system architecture description.

GitHub Repo: Azure-Samples/agent-architecture-review-sample

Learn More & Let's Connect!

Building this agent has been an incredible journey, and I truly believe tools like this are the future of how we design and review software. But this is just the beginning, and I would love for you to be a part of it.

If you want to dive deeper into the technology stack powering the Architecture Review Agent, here are some fantastic resources to get you started:

I want to hear from you. Whether you are deploying this for your enterprise, hacking on it over the weekend, or have a cool idea for a new feature, I would love to connect.

Let me know what you think in the comments below, and happy architecting!

Read the whole story
alvinashcraft
8 hours ago
reply
Pennsylvania, USA
Share this story
Delete

How to Access a Shared OneDrive Folder in Azure Logic Apps

1 Share

What is the problem?

A common enterprise automation scenario involves copying files from a OneDrive folder shared by a colleague into another storage service such as SharePoint or Azure Blob Storage using Azure Logic Apps.

However, when you configure the OneDrive for Business – “List files in folder” action in a Logic App, you quickly run into a limitation:

  • The folder picker only shows:
    • Root directory
    • Subfolders of the authenticated user’s OneDrive
  • Shared folders do not appear at all, even though you can access them in the OneDrive UI

This makes it seem like Logic Apps cannot work with shared OneDrive folders—but that’s not entirely true.

Logic app's "List files in folder" action from OneDrive for business connector showing list of folders while when clicked on the file picker icon

Why this happens

The OneDrive for Business connector is user‑context scoped. It only enumerates folders that belong to the signed-in user’s drive and does not automatically surface folders that are shared with the user.

Even though shared folders are visible under “Shared with me” in the OneDrive UI, they:

  • Live in a different drive
  • Have a different driveId
  • Require explicit identification before Logic Apps can access them

How to access a shared OneDrive folder

There are two supported ways to access a shared OneDrive directory from Logic Apps.

Option 1: Use Microsoft Graph APIs (Delegated permissions)

You can invoke Microsoft Graph APIs directly using:

This requires:

  • Admin consent or delegated consent workflows
  • Additional Entra ID configuration

📘 Reference: HTTP with Microsoft Entra ID (preauthorized) - Connectors | Microsoft Learn

While powerful, this approach adds setup complexity.

 

Option 2: Use Graph Explorer to configure the OneDrive connector

Instead of calling Graph from Logic Apps directly, you can:

  1. Use Graph Explorer to discover the shared folder metadata
  2. Manually configure the OneDrive action using that metadata

 

Step-by-step: Using Graph Explorer to access a shared folder

Scenario

A colleague has shared a OneDrive folder named “Test” with me, and I need to process files inside it using a Logic App.

Step 1: List shared folders using Microsoft Graph

In Graph Explorer, run the following request:

GET https://graph.microsoft.com/v1.0/{OneDrive shared folder owner username}/drive/root/children

📘Reference: List the contents of a folder - Microsoft Graph v1.0 | Microsoft Learn

 

✅This returns all root-level folders visible to the signed-in user, including folders shared with you.

From the response, locate the shared folder. You only need two values:

  • parentReference.driveId
  • id (folder ID)

Graph explorer snippet showing the request sent to the API to list the files & folders shared by a specific user on the root drive

 

Step 2: Configure Logic App “List files in folder” action

In your Logic App:

  1. Add OneDrive for Business → List files in folder
  2. Do not use the folder picker
  3. Manually enter the folder value using this format: {driveId}.{folderId}

 

Once saved, the action successfully lists files from the shared OneDrive folder.

Step 3: Build the rest of your workflow

After the folder is resolved correctly:

  • You can loop through files
  • Copy them to SharePoint
  • Upload them to Azure Blob Storage
  • Apply filters, conditions, or transformations

All standard OneDrive actions now work as expected.

Troubleshooting: When Graph Explorer doesn’t help

If you’re unable to find the driveId or folderId via Graph Explorer, there’s a reliable fallback.

Use browser network tracing

  1. Open the shared folder in OneDrive (web)
  2. Open Browser Developer Tools → Network
  3. Look for requests like:Browser network trace snippet on how to fetch the required driveID & folderId
  4. In the response payload, extract:
    • CurrentFolderUniqueId → folder ID
    • drives/{driveId} from the CurrentFolderSpItemUrl

This method is very effective when Graph results are incomplete or filtered.

Read the whole story
alvinashcraft
8 hours ago
reply
Pennsylvania, USA
Share this story
Delete

How 'be like' took over the world, with Sali Tagliamonte

1 Share

1163. This week, we look at what it’s like to be a "language detective" with Sali Tagliamonte and how she used her own teenagers as a research lab. We look at a 25-year study on how the phrase "be like" became a permanent fixture of English, why the word "very" is suddenly making a comeback with younger generations, and what happens to our language when we spend all day talking to AI.

Sali Tagliamonte, University of Toronto

🔗 Join the Grammar Girl Patreon.

Thank you to the members of the Order of the Aardvark at Patreon:

  • Linda Cox
  • Laurel Paul
  • Russ Skinner

🔗 Share your familect recording in Speakpipe or by leaving a voicemail at 833-214-GIRL (833-214-4475)

🔗 Watch my LinkedIn Learning writing courses.

🔗 Subscribe to the newsletter.

🔗 Take our advertising survey

🔗 Get the edited transcript.

🔗 Get Grammar Girl books

| HOST: Mignon Fogarty

| Grammar Girl is part of the Quick and Dirty Tips podcast network.

  • Audio Engineer: Dan Feierabend, Maram Elnagheeb
  • Director of Podcast: Holly Hutchings
  • Advertising Operations Specialist: Morgan Christianson
  • Marketing and Video: Nat Hoopes, Rebekah Sebastian
  • Podcast Associate: Maram Elnagheeb

| Theme music by Catherine Rannus.

| Grammar Girl Social Media: YouTubeTikTokFacebook. ThreadsInstagramLinkedInMastodonBluesky.


Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.





Download audio: https://dts.podtrac.com/redirect.mp3/media.blubrry.com/grammargirl/stitcher.simplecastaudio.com/e7b2fc84-d82d-4b4d-980c-6414facd80c3/episodes/6a7d13cd-d64c-439b-9705-4e38a6285c00/audio/128/default.mp3?aid=rss_feed&awCollectionId=e7b2fc84-d82d-4b4d-980c-6414facd80c3&awEpisodeId=6a7d13cd-d64c-439b-9705-4e38a6285c00&feed=XcH2p3Ah
Read the whole story
alvinashcraft
8 hours ago
reply
Pennsylvania, USA
Share this story
Delete

#716 – Electronics Manufacturing History with David Ray

1 Share

Thanks to our sponsor for this episode, SeaSats! Check out their open positions making autonomous ocean vehicles.

Welcome David Ray of Cyber City Circuits

• The “Retro Electro” Series: David explains his passion for writing historical articles for Digi Key, focusing on “giants” like Orstead whose contributions to electricity are often overlooked.
• Career Background: David details his path from Marine Corps radio repair to cash register and Motorola radio repair.
• Starting the Business: In late 2019, David cashed in his retirement to buy pick-and-place machines and start his own factory.
• Teaching the First Lady: David recounts the story of teaching First Lady Jill Biden how to solder during a summer camp.
• Growth via Twitter: For the first few years, 95% of his revenue came from relationships built on Twitter (X).
• The Kit Business: David discusses his “Soldering Kit of the Month” program, noting that while fun, the kit business is exhausting and low-margin.
• Equipment & Machines: A discussion on why he uses Charm High machines and his strong advice to buy new equipment rather than used industrial machines, which are often sold because they are “used up”.
• Stencils & Paste: David advocates for framed stencils and GC10 solder paste, which is shelf-stable and prevents cold solder joints.
• Soldering Physics: Insights into the thermodynamics of soldering, especially the difficulty of working with 2 oz copper boards.
• John Fluke History: David previews his research on John Fluke, explaining that Fluke meters became yellow because the Navy had trouble finding gray ones on the ground.
• Upcoming Articles: David mentions future work on the history of Op-amps and strain gauges.
• Business Services: Overview of Cyber City Circuits’ services, including reverse engineering, obsolescence engineering, and free DFM (Design for Manufacturing) consulting.
• Success Philosophy: David shares his “Monopoly mindset,” viewing business setbacks as “chance cards,” and stresses that persistence is the only way to avoid failure.





Download audio: https://traffic.libsyn.com/theamphour/TheAmpHour-716-DavidRay.mp3
Read the whole story
alvinashcraft
8 hours ago
reply
Pennsylvania, USA
Share this story
Delete

AWS CloudWatch Finally Hits Snooze

1 Share

Welcome to episode 343 of The Cloud Pod, where the forecast is always cloudy! Justin, Ryan, and Matt are in the studio this week bringing you all the latest in Cloud and AI news, including some of the smaller clouds like Cloudflare and Crusoe Cloud, as well as announcements from the big guys like Google’s Gemini DeepThink, Anthropic’s big pay day, and Microsoft’s Notepad problem. We’ve got all this plus Matt screwing up his outro AGAIN, so let’s get started! 

Titles we almost went with this week

  • Chrome’s WebMCP Protocol: Teaching AI Agents to Stop Doom-Scrolling the DOM and Actually Get Work Done
  • Claude Enterprise Self-Service: Because Sometimes You Just Want to Buy AI Without Small Talk
  • AWS EC2 Goes Inception Mode: Now You Can Virtualize Your Virtualization Without Going Broke
  • Amazon EC2 Nested Virtualization: Because Your Virtual Machine Was Lonely and Needed Its Own Virtual Machine
  • CloudWatch Alarm Mute Rules: Because Your Deployment Doesn’t Need a Standing    Ovation at 3 AM
  • Anthropic’s $380 Billion Valuation Proves AI Funding Has Gone Claude Nine
  • AWS EC2 Nested Virtualization Finally Escapes the Expensive Hardware Jail
  • Cloudflare Teaches AI Agents the Magic Words: Accept text/markdown and Save 13,000 Tokens
  • Crusoe Cloud’s MCP Server: Teaching AI Assistants to Stop Asking for the Manager and Just Fix Your Infrastructure
  • Azure’s New Agentic Copilot: Because Manually Clicking Through Dashboards Was So 2023
  • Chrome’s WebMCP Gives AI Agents a GPS for Websites Because Apparently They’ve Been Lost in the HTML This Whole Time 
  • Anthropic Cuts Out the Middleman: Claude Enterprise Now Available Without the Enterprise Sales Dance
  • AWS Gives CloudWatch the Silent Treatment: New Mute Rules Let Alarms Sleep Through Maintenance Windows
  • AWS CloudWatch Hits Snooze: Mute Rules End On-Call Nightmares
  • AWS Gives CloudWatch the Silent Treatment

General News 

00:45 Bloat Risk? Microsoft’s Notepad Upgrade Also Introduced a Vulnerability | PCMag

  • Microsoft’s recent Notepad modernization introduced CVE-2026-20841, a vulnerability in the new Markdown support feature that allows malicious links in files to execute remote code. 
  • The flaw has been patched in the February 2026 security updates, but it highlights the security trade-offs when adding features to historically simple applications.
  • The vulnerability exploits Notepad’s Markdown rendering capability, which Microsoft added in May to support lightweight markup language formatting. When Notepad opens a specially crafted Markdown file, embedded malicious links can trigger unverified protocols that load and execute remote files on the system.
  • This incident raises questions about feature bloat in core Windows utilities, particularly as Microsoft continues adding network-dependent capabilities like AI-powered text writing to Notepad. Security researchers are debating whether basic text editors should have network functionality at all, given the expanded attack surface.
  • The vulnerability demonstrates how modernization efforts can introduce security risks in previously low-risk applications. 
  • Organizations using Windows need to ensure their systems receive the February 2026 security updates to address this specific flaw in Notepad’s Markdown implementation.

02:04 Matt – “I’m just confused why they didn’t use Copilot on their pull request in order to identify this as a potential bug. I feel like it should have found it. Just sayin’…”  

03:13 WebMCP is available for early preview

  • Chrome is introducing WebMCP, a standardized protocol that lets websites expose structured tools and actions directly to AI agents, eliminating the need for agents to parse raw HTML and DOM elements. 
  • This addresses a key reliability problem in agentic workflows where AI agents currently struggle with inconsistent web interactions.
  • The protocol offers two interaction modes: a declarative API for simple HTML form-based actions and an imperative API for complex JavaScript-driven workflows. This dual approach lets websites define exactly how agents should interact with features like booking systems, support ticket forms, and checkout processes.
  • Early use cases focus on high-value transactional workflows, including e-commerce product configuration, travel booking with complex filtering requirements, and automated customer support ticket creation with technical details. These scenarios benefit most from structured interactions versus unreliable DOM manipulation.
  • The early preview program requires sign-up for access to documentation and demos, indicating this is still in experimental stages. 
  • Developers interested in making their sites agent-ready will need to implement these new APIs to participate in the agentic web ecosystem Chrome is building.
  • This represents Chrome’s attempt to standardize how AI agents interact with websites before the market fragments with competing approaches. Sites that adopt WebMCP early may gain advantages as browser-based AI agents become more prevalent.
  • Interested in signing up for the preview? You can do that here

04:41 Ryan – “It makes a lot of sense why they want to standardize on a specific protocol, but I can’t help but feel like this is the beginning of the end of human interaction; where you’re going to have an AI agent-to-agent protocol.” 

AI Is Going Great – Or How ML Makes Money 

07:27 Anthropic raises $30 billion in Series G funding at $380 billion post-money valuation \ Anthropic

  • Anthropic closed a $30 billion Series G at a $380 billion post-money valuation, reaching $14 billion in run-rate revenue with 10x annual growth for three consecutive years. 
  • The company now serves eight of the Fortune 10, with over 500 customers spending more than $1 million annually.
  • Claude Code, made generally available in May 2025, has grown to $2.5 billion in run-rate revenue and now accounts for 4% of all public GitHub commits worldwide. Business subscriptions quadrupled since early 2026, with enterprise customers representing over half of Claude Code’s revenue.
  • Opus 4.6 launched last week as the latest model release, leading the GDPval-AA benchmark for economically valuable knowledge work in finance and legal domains. The model powers agents capable of generating professional documents, spreadsheets, and presentations autonomously.
  • Anthropic expanded its product portfolio in January with over thirty launches, including Cowork, which extends Claude Code capabilities to broader knowledge work with eleven open-source plugins for specialized roles. 
  • Claude for Enterprise is now HIPAA-compliant and available for healthcare and life sciences organizations.
  • Claude remains the only frontier AI model available across all three major cloud platforms through AWS Bedrock, Google Cloud Vertex AI, and Microsoft Azure Foundry
  • The company trains on diversified hardware, including AWS Trainium, Google TPUs, and NVIDIA GPUs, to optimize workload performance and resilience.

08:10 Matt – “Those numbers are insane. I just want to make sure we’re all clear about that.” 

15:16 Introducing Sonnet 4.6 \ Anthropic

  • Claude Sonnet 4.6 is now generally available across all Claude plans, API, and major cloud platforms at the same pricing as Sonnet 4.5 ($3/$15 per million tokens), with a 1M token context window in beta. 
  • The model now serves as the default for Free and Pro plan users, bringing Opus-class performance to a mid-tier price point.
  • Computer use capabilities have improved substantially, with Sonnet 4.6 scoring 94% on insurance benchmarks and showing human-level performance on tasks like navigating complex spreadsheets and multi-step web forms. 
  • The model demonstrates better resistance to prompt injection attacks compared to Sonnet 4.5 and performs similarly to Opus 4.6 on safety evaluations.
  • Coding performance has advanced significantly, with early users preferring Sonnet 4.6 over Sonnet 4.5 roughly 70% of the time and even choosing it over Opus 4.5 59% of the time. 
  • Users report better instruction following, less overengineering, fewer hallucinations, and more consistent follow-through on multi-step tasks, with one customer reporting an 80.2% score on SWE-bench Verified.
  • Several features have reached general availability on the API, including code execution, memory, programmatic tool calling, tool search, and tool use examples. 
  • Web search and fetch tools now automatically write and execute code to filter search results, improving response quality and token efficiency.
  • The model supports both adaptive thinking and extended thinking modes, with context compaction in beta that automatically summarizes older context as conversations approach limits. 
  • Claude in Excel now supports MCP connectors, allowing users to pull data from external sources like S&P Global, LSEG, and PitchBook directly within spreadsheets. 

17:42 Ryan – “I haven’t played with Sonnet because it’s just released, but playing around with Opus, you can see that it’s another major improvement in these steps, and it is pretty fantastic to use.”

19:44 Token Anxiety – by Nikunj Kothari – Balancing Act

  • This article describes a cultural shift in San Francisco’s tech scene where developers are prioritizing AI agent management over social activities, with people leaving parties early to check on overnight code generation and spending weekends running 12-hour build sessions with AI assistants like Claude and Codex.
  • The piece highlights how AI coding tools have created a new productivity anxiety where developers feel compelled to keep agents running continuously, even during sleep, to maximize output and stay competitive as new model capabilities and context windows are released weekly.
  • Developers are adopting new vocabulary around AI models, discussing them like sommeliers evaluate wine and using animal training metaphors like keeping Claude on a tight leash for code review while giving it more slack for creative work.
  • The constant stream of benchmark improvements and new AI capabilities is creating pressure to continuously optimize workflows, as each advancement makes previous methods feel outdated and multiplies the sense that competitors are already leveraging these improvements.
  • This represents a broader shift in developer culture where traditional leisure activities are being replaced by AI-assisted building, with the primary social metric changing from what you accomplished to how many agents you have running in parallel.

24:25 Ryan – “I still don’t know how everyone has these overnight workloads; I guess I don’t trust AI at all; I’m not going to let it run unsupervised.”  

31:48 Alibaba Launches New LLM as China’s AI Battle Heats Up 

  • Qwen 3.5 is out. No industry freakouts (like with DeepSeek) so far

33:06 Seed News – ByteDance Seed Team

  • ByteDance officially launched Seedance 2.0, a next-generation video creation model with a unified multimodal audio-video architecture supporting text, image, audio, and video inputs. 
  • The model can process up to 9 images, 3 video clips, 3 audio clips, and natural language instructions simultaneously for comprehensive content referencing and editing.
  • The model delivers substantial improvements in complex motion rendering and physical accuracy, particularly excelling at multi-subject interactions like competitive figure skating with synchronized movements, mid-air spins, and precise landings that follow real-world physics. 
  • Industry evaluations show Seedance 2.0 achieves leading performance in motion stability, instruction following, and visual aesthetics compared to competing models.
  • Seedance 2.0 introduces dual-channel stereo audio generation with multi-track parallel output for background music, ambient effects, and voiceovers synchronized to visual rhythm. 
  • The model supports 15-second high-quality multi-shot audio-video output suitable for commercial advertising, film VFX, game animations, and explainer videos.
  • New video editing capabilities allow targeted modifications to specific clips, characters, actions, and storylines, plus video extension functionality for generating continuous shots based on user prompts. 
  • The model demonstrates improved instruction-following for complex scripts and open-ended prompts while maintaining subject consistency across extended sequences.
  • The unified multimodal architecture enables professional-grade content creation workflows where users can reference composition, motion, camera movement, visual effects, and audio elements from input assets, significantly lowering barriers to industrial-level video production without requiring specialized technical expertise.
  • https://www.instagram.com/reel/DUm4zSvEn76/ – John Wick cat video as mentioned. 

34:53 Justin – “I’m surprised Hollywood stock didn’t crash today over this; very very impressive. Crazily so.” 

AWS 

36:47 Announcing new Amazon EC2 general purpose M8azn instances

  • AWS launches M8azn instances powered by fifth-generation AMD EPYC Turin processors running at 5GHz, the highest CPU frequency available in the cloud. These general-purpose instances deliver 2x compute performance over M5zn and 24% better performance than M8a instances, with 4.3x higher memory bandwidth and 10x larger L3 cache.
  • The instances target latency-sensitive workloads like high-frequency trading, real-time financial analytics, and simulation modeling for automotive and aerospace industries. 
  • Built on sixth-generation Nitro Cards, they provide 2x networking throughput and 3x EBS throughput compared to M5zn instances.
  • M8azn instances come in nine sizes from 2 to 96 vCPUs with up to 384 GiB memory at a 4:1 memory-to-vCPU ratio, including two bare metal variants. Available in US East Virginia, US West Oregon, Tokyo, and Frankfurt regions through On-Demand, Spot, and Savings Plans pricing models.
  • The high-frequency positioning fills a specific niche for workloads requiring maximum single-threaded performance rather than just core count.
  • This complements AWS’s broader M8a lineup by offering customers a choice between standard frequency instances and these premium high-frequency variants for specialized use cases.

37:03 Announcing Amazon EC2 C8i, M8i, and R8i instances on second-generation AWS Outposts racks

  • AWS is bringing C8i, M8i, and R8i instances to second-generation Outposts racks, delivering 20% better performance and 2.5x more memory bandwidth compared to the previous C7i, M7i, and R7i generation. These instances also provide 20% more compute capacity within the same physical rack space and power consumption, improving density for on-premises deployments.
  • The new instances run on custom Intel Xeon 6 processors exclusive to AWS and target workloads that need enhanced on-premises performance, including large databases, memory-intensive applications, real-time analytics, high-performance video encoding, and CPU-based ML inference. 
  • This addresses the gap for customers who need cloud-class compute but must keep workloads on-premises due to latency, data residency, or regulatory requirements.
  • Second-generation Outposts racks continue AWS’s hybrid cloud strategy by extending the latest EC2 instance types to customer data centers with the same APIs and tooling as the public cloud. 
  • The availability varies by region, so customers should check the Outposts rack FAQs page for current country and territory support before planning deployments.
  • The performance improvements come primarily from the memory bandwidth increase and processor generation upgrade, which should benefit database operations, in-memory caching, and data-intensive applications that previously hit memory bottlenecks on Outposts. 
  • The power and space efficiency gains matter for customers with constrained data center capacity or energy budgets.

37:08 Amazon EC2 Hpc8a Instances powered by 5th Gen AMD EPYC processors are now available

  • AWS launches Hpc8a instances powered by 5th Gen AMD EPYC processors, delivering 40% higher performance and 42% greater memory bandwidth than the previous Hpc7a generation, while offering up to 25% better price-performance for tightly coupled HPC workloads like computational fluid dynamics and weather modeling.
  • The instances come in a single 96xlarge size with 192 cores, 768 GiB memory, and 300 Gbps Elastic Fabric Adapter networking, featuring customizable core counts at launch and sixth-generation AWS Nitro cards for offloaded virtualization functions. Simultaneous Multithreading is disabled by default to optimize HPC performance.
  • Available now in US East Ohio and Europe Stockholm regions, with support for AWS ParallelCluster, AWS Parallel Computing Service, and Amazon FSx for Lustre integration to simplify cluster management and provide sub-millisecond storage latencies. Customers can purchase as On-Demand Instances or through Savings Plans, with specific pricing available on the EC2 pricing page.
  • The 1:4 core-to-memory ratio and high core density target compute-intensive simulation workloads requiring rapid time-to-results, including crash simulations and high-resolution weather modeling within tight operational windows. The customizable core count feature allows right-sizing based on specific HPC workload requirements without paying for unused capacity.

39:20 Ryan – “I’m sure they use a subcontractor for actual maintenance, things. But I’m sure that you have to give them access and manage them just like you would any other remote hands for your data center.”

39:37 MSK simplifies Kafka topic management with new APIs and console integration

  • Amazon MSK now provides native AWS APIs for Kafka topic management, eliminating the need to set up and maintain separate Kafka admin clients. The three new APIs (CreateTopic, UpdateTopic, and DeleteTopic) work alongside existing ListTopics and DescribeTopic APIs through AWS CLI, SDKs, and CloudFormation, letting teams manage topics using standard AWS tooling and IAM permissions.
  • The MSK console now consolidates all topic operations in one interface with guided defaults for creating and updating topics. Users can configure properties like replication factor, partition count, retention policies, and cleanup settings while viewing comprehensive partition-level metrics and configuration details directly in the console.
  • These capabilities are available at no additional cost for MSK provisioned clusters running Kafka version 3.6 and above across all regions where MSK is offered. Organizations need to configure appropriate IAM permissions to use the new APIs, with setup instructions available in the MSK Developer Guide.
  • The update addresses a common operational pain point where teams previously had to maintain separate Kafka admin tooling outside the AWS ecosystem. This integration brings Kafka topic management into standard AWS workflows, improving consistency with existing infrastructure-as-code practices and centralized access control through IAM.

40:47 Ryan – “I suspect this has more to do with Kafka than AWS because Kafka is notoriously hard to administer, so in a lot of cases there’s just not the ability…so I’m really happy to see this.”

42:40 Amazon Bedrock adds support for six fully-managed open weights models

  • Amazon Bedrock now supports six new open weights models, including DeepSeek V3.2, MiniMax M2.1, GLM 4.7, GLM 4.7 Flash, Kimi K2.5, and Qwen3 Coder Next, providing frontier-class performance at lower inference costs than proprietary alternatives. 
  • These models cover different enterprise needs from advanced reasoning and agentic tasks to autonomous coding with large output windows and lightweight production deployments.
  • The models run on Project Mantle, a new distributed inference engine that accelerates model onboarding to Bedrock while providing serverless inference with quality of service controls and automated capacity management. Project Mantle includes native OpenAI API compatibility, allowing customers to switch from OpenAI endpoints without code changes.
  • The addition of these open weights models gives AWS customers more flexibility in model selection based on specific workload requirements and cost constraints. 
  • DeepSeek V3.2 and Kimi K2.5 handle complex reasoning tasks, while GLM 4.7 and MiniMax 2.1 support coding workflows with extended context windows, and Qwen3 Coder Next and GLM 4.7 Flash offer cost-efficient options for high-volume production use.
  • Project Mantle’s unified capacity pools and higher default quotas address common scaling challenges customers face when deploying large language models. 
  • The serverless architecture eliminates infrastructure management overhead, while the automated capacity management helps prevent quota limitations during peak usage periods.

44:05 Matt – “I like how they made it all compatible with OpenAI. It’s kind of like S3 compatibility; I feel like we’re slowly kind of coming to a standard, which means you can go play with it and see which model makes sense.”

46:02 Amazon EKS Auto Mode Announces Enhanced Logging for its Managed Kubernetes Capabilities

  • EKS Auto Mode now integrates with CloudWatch Vended Logs to automatically collect logs from its managed Kubernetes capabilities, including compute autoscaling, block storage, load balancing, and pod networking. 
  • This gives customers centralized visibility into Auto Mode’s infrastructure management operations without manual configuration.
  • The integration uses CloudWatch Vended Logs, which provides lower pricing than standard CloudWatch Logs while maintaining built-in AWS authentication and authorization. 
  • Customers can route logs to CloudWatch Logs, S3, or Kinesis Data Firehose, depending on their retention and analysis requirements, with standard destination charges applying.
  • Each Auto Mode capability can be configured independently as a log delivery source through CloudWatch APIs or the AWS Console. 
  • This granular control allows teams to monitor specific components like the Karpenter-based autoscaler or VPC CNI networking without collecting unnecessary log data.
  • The feature addresses a common operational challenge where Auto Mode’s automated infrastructure management previously operated as a black box. DevOps teams can now troubleshoot issues like pod scheduling failures, storage provisioning problems, or load balancer configuration errors by examining the actual logs from Auto Mode’s control plane operations.
  • Available immediately in all regions where EKS Auto Mode operates, this logging capability helps bridge the observability gap between customer workloads and AWS-managed Kubernetes infrastructure components.

47:05 Justin – “All I have to say is, some lovely CloudWatch PM just made their bonus this year by turning this one, as this is a lot of logging context that you now need to parse and pay for.” 

49:26 AWS CloudWatch Alarm Mute Rules eliminate alert fatigue

  • CloudWatch Alarm Mute Rules let you temporarily silence alarm notifications during planned maintenance windows, deployments, or off-hours without disabling the underlying monitoring. 
  • The feature supports up to 100 alarms per rule with one-time or recurring schedules, and automatically triggers any suppressed actions once the mute period ends if the alarm state persists.
  • This addresses a common operational pain point where teams either ignore alerts during maintenance windows or use risky script-based workarounds that can be forgotten and leave monitoring disabled. 
  • The native integration eliminates the need for custom automation to manage notification states during planned activities.
  • The feature is available today across all AWS regions that support CloudWatch alarms at no additional cost beyond standard CloudWatch pricing. 
  • Configuration is done through the CloudWatch console or API, with support for all alarm states, including OK, ALARM, and INSUFFICIENT_DATA.
  • Primary use cases include silencing non-critical alerts during scheduled deployments, muting development environment alarms outside business hours, and suppressing known issues during maintenance windows. 
  • This helps reduce alert fatigue while maintaining full visibility into system state and metrics collection.
  • The automatic re-triggering of muted actions ensures teams don’t miss persistent issues that started during a mute window, providing a safety mechanism that manual notification management typically lacks.

50:49 Ryan – “This is much nicer. Basically, set it for ignore for an hour and then have it kick back in. Glad to see this, but strange that it took this long.” 

52:48 Amazon EC2 supports nested virtualization on virtual Amazon EC2 instances

  • AWS now supports nested virtualization on standard EC2 instances, not just bare metal, allowing customers to run KVM or Hyper-V hypervisors inside virtual machines. This expands flexibility for development and testing scenarios that previously required more expensive bare metal instances.
  • The feature launches on the latest generation C8i, M8i, and R8i instance families across all commercial AWS regions. 
  • Customers can now run mobile app emulators, automotive hardware simulators, and Windows Subsystem for Linux on Windows workstations directly on virtual instances.
  • This capability addresses a long-standing limitation where nested virtualization required bare metal instances, which carry higher costs and longer provisioning times compared to standard virtual instances. 
  • The change makes nested environments more accessible for development teams and testing workflows.
  • Common use cases include software vendors who need to test their products across multiple operating systems, automotive companies simulating vehicle hardware environments, and mobile developers running Android or iOS emulators at scale. 
  • These workloads can now run on more cost-effective instance types with faster deployment.
  • The feature requires enabling hardware virtualization extensions in the instance configuration, with full documentation available in the EC2 user guide. Pricing follows standard EC2 rates for the C8i, M8i, and R8i instance families without additional charges for the nested virtualization capability itself.

54:13 Ryan – “These kinds of announcements are usually preceded or quickly followed with Nitro…and it’s neat. It’s neat how they isolate the hardware layer to match these workloads.” 

54:50 Announcing Amazon SageMaker Inference for custom Amazon Nova models

  • AWS now lets customers deploy custom-trained Amazon Nova models on SageMaker Inference with production-grade controls over instance types, auto-scaling, context length, and concurrency settings. 
  • This addresses customer requests for the same deployment flexibility they get with open-weight models, enabling full-rank customized Nova Micro, Nova Lite, and Nova 2 Lite models trained via SageMaker Training Jobs or HyperPod.
  • The service reduces inference costs by supporting more cost-effective EC2 G5 and G6 instances instead of requiring P5 instances, with auto-scaling based on 5-minute usage patterns and configurable inference parameters. 
  • Customers pay only for compute instances used with per-hour billing and no minimum commitments, following standard SageMaker pricing.
  • Deployment works through SageMaker Studio UI or SDK, supporting both real-time streaming and asynchronous batch inference modes. The service includes advanced configuration options for context length up to 8000 tokens, max concurrency settings, and inference parameters like temperature and top-p for optimizing latency-cost-accuracy tradeoffs.
  • Currently available in US East N. Virginia and the US West Oregon regions, with support for Nova models with reasoning capabilities. 
  • Instance type requirements vary by model size, with Nova Micro supporting g5.12xlarge and up, Nova Lite requiring g5.48xlarge minimum, and Nova 2 Lite needing p5.48xlarge instances.

56:47 Ryan – “It’s not an open-source model, and so it is kind of crazy that Nova offers that customization.” 

GCP

57:25  Gemini 3 Deep Think: AI model update designed for science

  • Google has released a major update to Gemini 3 Deep Think, a specialized reasoning mode designed for complex scientific and engineering problems where data is messy or incomplete, and solutions aren’t straightforward. 
  • The model achieved notable benchmark results, including 48.4% on Humanity’s Last Exam, 84.6% on ARC-AGI-2, and gold medal performance on the 2025 International Math, Physics, and Chemistry Olympiads.
  • Early adopters are using Deep Think for practical applications like identifying logical flaws in peer-reviewed mathematics papers, optimizing semiconductor crystal growth fabrication methods, and converting sketches into 3D-printable files with generated code. 
  • The model combines deep scientific knowledge with engineering utility to move beyond theoretical work into applied research.
  • The updated Deep Think is available now to Google AI Ultra subscribers through the Gemini app, with pricing following the existing Ultra subscription model. 
  • For the first time, Google is offering API access through an early access program for select researchers, engineers, and enterprises who can apply through a Google form.
  • The release targets scientific research institutions and engineering teams working on complex problems in physics, chemistry, materials science, and advanced mathematics, where traditional AI models struggle with ambiguous requirements. 
  • Deep Think’s ability to work with incomplete data and generate executable code for physical modeling makes it particularly relevant for R&D workflows.

1:00:19 New global queries in BigQuery span data from multiple regions

  • BigQuery global queries now allow users to run a single SQL statement across datasets stored in multiple geographic regions without requiring ETL pipelines or data replication. 
  • The feature automatically handles cross-region data movement in the background while respecting existing security controls like VPC Service Controls and requiring explicit opt-in at both the project and user level.
  • The primary use case targets multinational organizations that need to analyze distributed data for compliance or performance reasons, such as joining US customer data with European transaction logs and Asian operational data in one query. 
  • EssilorLuxottica is using this to perform cross-region aggregated analysis while maintaining data residency requirements for security and compliance. (DOES IT THOUGH?) 
  • Users maintain control over where queries execute and can specify the processing location to meet data residency requirements, though cross-region data transfers will incur additional egress costs that organizations need to factor into their analytics budgets. 
  • The feature is currently in preview with documentation available here.
  • This addresses a longstanding limitation in cloud data warehousing, where geographic data distribution required complex engineering solutions, now replaced by standard SQL queries that any authorized analyst can run directly from the BigQuery console. The feature respects governance controls by default and prevents accidental data movement through required permissions and explicit enablement.

1:01:36 Matt – “I feel l ike it is compliant… if you’re running local and you’re not collecting anything that could be confidential. So it depends on how your lawyer at your company interprets it.” 

Azure

1:03:47 Agentic cloud operations and Azure Copilot for AI‑driven workloads

  • Microsoft introduces agentic cloud operations through Azure Copilot, which uses AI agents to automate and coordinate cloud management tasks across the full infrastructure lifecycle. Instead of adding another dashboard, Azure Copilot provides a unified interface accessible through natural language, chat, console, or CLI that connects directly to a customer’s actual Azure environment, including subscriptions, resources, and policies.
  • Azure Copilot includes six specialized agents that handle migration discovery and dependency mapping, deployment with infrastructure-as-code generation, continuous observability across the full stack, cost and performance optimization with carbon impact analysis, resiliency management including ransomware protection, and troubleshooting with root cause diagnosis. 
  • These agents work as a connected system rather than isolated tools, correlating signals and taking action within existing RBAC and policy controls.
  • The service maintains governance through built-in oversight features, including Bring Your Own Storage for conversation history, which keeps operational data within the customer’s Azure environment for compliance and sovereignty requirements. 
  • All agent-initiated actions are reviewable, traceable, and auditable while respecting existing security policies and role-based access controls.
  • Target customers are organizations running modern applications and AI workloads at scale, where traditional manual operations cannot keep pace with rapid deployment cycles and infrastructure changes. 
  • The approach addresses environments where workloads move from experimentation to production in weeks and where telemetry streams continuously from every layer of the stack.
  • Pricing details were not disclosed in the announcement, though the service builds on existing Azure Copilot capabilities introduced at Microsoft Ignite. Organizations can access resources and get started at azure.microsoft.com/products/copilot.

1:05:39 Matt – “Also, a developer actually understanding what they want and telling you what they want and actually being useful? I would love to see too, because how many times have we built something, deployed it, day before the release – we actually need these 16 other things that we didn’t tell you about that we manually did in our dev environment, which is why it’s working… and the release is tomorrow. Good luck. Why is it not done yet?”

1:06:18 General Availability: Instant access support for incremental snapshots of Azure Premium SSD v2 and Ultra Disk

  • Azure now offers instant access to incremental snapshots for Premium SSD v2 and Ultra Disk storage, eliminating the previous wait time when restoring disks from snapshots. 
  • This addresses a significant operational pain point for customers running high-performance workloads that require rapid disaster recovery or quick environment provisioning.
  • The feature specifically targets enterprise customers using Azure’s highest-tier storage options, Premium SSD v2 and Ultra Disk, which are typically deployed for mission-critical databases, SAP HANA, and other latency-sensitive applications. 
  • Previously, customers had to wait for snapshot data to fully hydrate before using restored disks, creating delays in recovery scenarios.
  • Incremental snapshots only capture changes since the last snapshot, reducing storage costs and backup windows compared to full snapshots. 
  • With instant access now available, customers can immediately mount and use restored disks while background hydration completes, improving recovery time objectives for business continuity planning.
  • This capability brings Premium SSD v2 and Ultra Disk snapshot functionality closer to parity with standard Azure managed disk snapshots. 
  • The feature is now generally available across Azure regions where Premium SSD v2 and Ultra Disk are supported, though specific pricing for snapshot storage follows existing Azure snapshot pricing models based on stored data volume.

1:06:25 Justin – “Welcome to what Amazon and Google have been doing for quite a while, so thanks, Azure! 

Emerging Clouds 

1:08:16 Introducing the Crusoe Cloud MCP server

  • Crusoe Cloud released an MCP server that connects AI coding assistants like Claude Code and Cursor directly to cloud infrastructure, but unlike typical API wrappers, it returns filtered responses designed specifically for LLM consumption to avoid flooding context windows with unnecessary data. 
  • The server includes composite tools like get_resource_relationships that map entire infrastructure topologies in a single call by fetching 11 resource types in parallel and resolving cross-references, something that doesn’t exist in their CLI or any single API endpoint.
  • The cluster_health_check tool provides pre-analyzed node-level health metrics organized by InfiniBand pod placement, returning structured summaries with problem nodes flagged rather than raw metric time series that would require additional processing. 
  • This approach addresses a key limitation of AI agents working with cloud infrastructure: most MCP implementations just wrap CLI commands and return the same JSON a human would see, forcing the AI to parse through irrelevant metadata and empty fields.
  • The implementation reflects a broader trend of cloud providers releasing MCP servers, but Crusoe’s focus on response filtering and burst-heavy access patterns specific to AI agents suggests infrastructure management tools are being redesigned around LLM capabilities rather than human interaction patterns. For developers already using AI coding assistants, this enables natural language infrastructure queries and troubleshooting without manual scripting or console navigation.

1:10:16 Ryan – “This is gonna be chaos.” 

1:10:21 Introducing Markdown for Agents

  • Cloudflare now automatically converts HTML to markdown for AI agents using content negotiation headers, reducing token usage by up to 80 percent. 
  • When agents request pages with Accept: text/markdown, Cloudflare’s network performs real-time conversion at the edge, eliminating the need for downstream processing and reducing costs for AI systems.
  • The feature addresses a fundamental inefficiency where AI agents waste tokens parsing HTML markup, navigation elements, and styling that have no semantic value. 
  • A simple heading that costs 3 tokens in markdown can consume 12-15 tokens in HTML, and this blog post example shows 16,180 tokens in HTML versus 3,150 in markdown.
  • Cloudflare includes an x-markdown-tokens header with converted responses to help developers calculate context window sizes and chunking strategies. The service also automatically adds Content-Signal headers indicating the content can be used for AI training, search results, and agentic use, integrating with their Content Signals framework from Birthday Week.
  • The feature is available in beta at no cost for Pro, Business, and Enterprise plans, with Cloudflare already enabling it on their own blog and developer documentation. 
  • Popular coding agents like Claude Code and OpenCode already send the appropriate accept headers, positioning this as infrastructure for the shift from traditional SEO to AI-driven content discovery.
  • Cloudflare Radar now tracks content type distribution for AI bot traffic, allowing analysis of how different agents consume web content over time. This data is accessible through public APIs and shows early adoption patterns like OAI-Searchbot requesting markdown content.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod





Download audio: https://episodes.castos.com/5e2d2c4b117f29-10227663/2374104/c1e-60w0c7w00gfj21d4-47o3ko8zc52r-r7ltqs.mp3
Read the whole story
alvinashcraft
8 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories