Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
156697 stories
·
33 followers

C# Evolved Updates and Geography Games

1 Share
From: Fritz's Tech Tips and Chatter
Duration: 0:00
Views: 0

Fritz is building websites and needs your help!

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

What’s new in Svelte: July 2026

1 Share

This month brought a real shift in how SvelteKit projects are configured. You can now define your SvelteKit config directly inside vite.config.js and skip svelte.config.js entirely. We also got the first preview of explicit environment variables, which will eventually replace $env/* modules in SvelteKit 3.

On top of that, the language tools and the sv CLI both caught up with Svelte's new {const ...} declaration tags, so the whole toolchain is now in sync.

Let's dive in!

What's new in SvelteKit

  • You can now pass your SvelteKit config directly to the Vite plugin, so a separate svelte.config.js is no longer required, as a preview of how Kit 3 will require config to live in vite.config.js (2.62.0, Docs, #15944)
  • Experimental explicit environment variables let you declare and type your env vars in one place, as a preview of how $env/* will work in SvelteKit 3 (2.63.0, Docs, #15934)
  • Remote function commands can now receive File objects directly, so you can upload files without manually wrapping them in FormData (2.64.0, Docs, #15978)
  • Remote queries can now refresh other queries, making it easier to invalidate related data after a mutation (2.65.0, Docs, #16012)
  • Prerendered .md and .mdx files are now precompressed alongside HTML, JS and CSS for faster delivery (2.66.0, Docs, #15893)
  • SvelteKit now warns when boolean fields in remote form schemas are not marked optional, which is a common cause of silent submit failures (2.66.0, Docs, #15804)
  • The new prerender.handleInvalidUrl option lets you customize how invalid URLs found during crawling are reported (2.67.0, Docs, #16088)
  • RemoteFormEnhanceInstance and RemoteFormEnhanceCallback are now exported types, so you can type your custom enhance callbacks directly (2.68.0, Docs, #15816)
  • Submitted submit fields now keep their value in the form action payload, which makes multi-button forms easier to handle on the server (2.68.0, Docs, #15979)

For all the features and bugfixes that landed this month, check out the SvelteKit / Adapter CHANGELOGs.

What's new in the Svelte CLI and Language Tools

  • The Svelte CLI demo template now uses the new {const ...} declaration tag, so newly created projects show off the latest Svelte syntax (sv@0.16.0, #1110)
  • sv create now scaffolds projects against @sveltejs/kit ^2.62.0 and moves the Svelte config into the Vite plugin by default (sv@0.16.0, #1119)
  • A new experimental add-on lets you toggle experimental flags and opt into @next versions directly from the CLI (sv@0.16.0, #1121)
  • The drizzle and better-auth add-ons now support SvelteKit's new explicit environment variables (sv@0.16.0, #1122)
  • New defineEnv and svelteConfig helpers in @sveltejs/sv-utils make it easier to read and edit a project's Svelte config from add-ons (sv-utils@0.3.0)
  • The Svelte language server, svelte-check and svelte2tsx now understand Svelte 5's {const ...} declaration tags (svelte-language-server@0.18.1/svelte-check@4.5.0/svelte2tsx@0.7.56, #3033)
  • CSS completions now work inside nested <style> tags (svelte-language-server@0.18.1, #3022)
  • The language tools can now read Svelte config straight from vite.config.js/ts, matching SvelteKit's new Vite plugin configuration (svelte-language-server@0.18.2/svelte-check@4.6.0, #3031)
  • svelte-check now accepts a --config option to point at a custom config file location (svelte-check@4.7.0, #3066)
  • Experimental tsgo (TypeScript Go) support is now available in svelte-check for faster type checking on large codebases (svelte-check@4.7.0, #3036)

Want to dive deeper? Check out the Svelte CLI and language-tools releases. For all the minor changes and bugfixes that came out in the Svelte compiler this month, you can read the full Svelte CHANGELOG.


Community Showcase

Apps & Sites built with Svelte

  • COLOR LAB is a browser-based color science instrument that explores RGB gamuts as 3D solids and builds perceptual theme ramps with WCAG checks
  • Cometline is a local desktop AI companion built with SvelteKit, Electron and a Go agent core
  • Disc is a database for users, with a built-in Svelte dashboard
  • EZResumes is a fully client-side, local-first resume builder that lets you customize layouts with Typst
  • Graphgen is a node-based generative art tool for 2D line art
  • Lunarr is an open source self-hosted media server and Plex alternative
  • Pixel Snapper is a desktop tool that snaps near-pixel-art images onto perfect grids, built with SvelteKit, Tauri and Rust by the makers of the previously featured Sprite Fusion (GitHub)
  • SuperMCP is a native macOS app that gives Claude, Cursor, Windsurf and other AI tools access to Reddit, X, LinkedIn and more via the local Chrome cookies
  • darkly is an open source art program built with Rust and Svelte

Spotted in the Wild

  • Guild Wars 3 - the official ArenaNet site for the upcoming MMORPG
  • Obama Foundation - the new site for the recently opened Obama Presidential Center
  • On a more meta note, sveltekit.fyi scans Bluesky for sites built with SvelteKit and showcases them, in the spirit of nuxt.fyi

Learning Resources

Featuring Svelte Contributors and Ambassadors

This Week in Svelte

To Read

Libraries, Tools & Components

Frameworks and Tooling

  • Mochi is a performance-focused alternative to SvelteKit from Stanislav Khromov that uses islands architecture and programmatic routing on Bun
  • pottz bundles your SvelteKit app, the Bun runtime and a webview into a single executable so you can ship it as a native desktop app
  • Svelte TV renders Svelte components to WebGL instead of the DOM for use on smart TVs and low-memory webviews
  • TabSpot is a declarative keyboard navigation engine for hierarchical and grid interfaces

UI Components and Visual Effects

  • neobrutalism-svelte is a UI library inspired by neobrutalism, balsamiq and lo-fi aesthetics
  • @winkintel/bootstrap-svelte brings Bootstrap 5's grid, utilities and components to Svelte 5 with full TypeScript support
  • Tan Compose is a lightweight library for building declarative reusable web components
  • Svaul is a zero-dependency drawer component for Svelte 5, modernizing vaul-svelte
  • Svelte Animated Icon ships close to 10,000 animated icon variants across five icon libraries using the Web Animations API (GitHub)
  • FlareCharts is a runes-native charting library for Svelte 5 with built-in CSS theming and accessibility
  • Svelte Video Editor is a host-agnostic timeline-based video editor component for Svelte 5 with multi-track clips, scrubbing and ripple edits

Developer Tools and Plugins

  • tsv is a Rust-based formatter and parser for TypeScript, Svelte and CSS, with a linter on the roadmap
  • svelte-docinfo extracts JSON documentation from TypeScript and Svelte modules using the TypeScript compiler
  • VS Code Live Theme Editor is a VSIX extension for editing every token and color in any VS Code, Cursor or TRAE theme
  • sveltekit-cloudflare-do automatically exports Durable Objects to the Cloudflare Worker bundle generated by @sveltejs/adapter-cloudflare

That's it for this month! Let us know if we missed anything on Reddit or Discord.

Until next time 👋🏼!

Read the whole story
alvinashcraft
17 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Fully Automated AI Inference on AWS, Azure, and Google Cloud with Pulumi

1 Share

Putting Ollama on a cloud GPU is something I keep coming back to. A while ago I wrote up running open-source LLMs on an AWS EC2 box with Ollama and Pulumi, and the shape never really changes: a GPU instance, a model server, and a firewall rule in front. Infrastructure as code earned its place by making that kind of setup predictable and repeatable, and AI infrastructure is no exception. A GPU box serving a model is still a VM, a disk, and a firewall rule, and it should be declared like one.

Thorsten Hans made exactly that case in his Akamai post, Fully Automated AI Infrastructures with Terraform and Akamai Cloud, which stands up a single GPU instance on Linode, installs the drivers, runs Ollama, and pulls a model, with no manual steps after terraform apply.

I liked the shape of it, so this post ports the same idea to Pulumi and runs it across AWS, Azure, and Google Cloud instead of one. The result is one program shape per cloud: a single pulumi up brings up a GPU box that installs its own driver, runs Ollama, and pulls a model with no manual steps, and a single pulumi destroy takes it back down. Along the way it drops the two imperative bits the Terraform version leans on: a static access token sitting in an environment variable, and a null_resource running a shell loop to wait for the model. The first becomes an OIDC login from a Pulumi ESC environment, so no long-lived key lives anywhere. The second turns out not to be a resource at all.

What you are building

Strip away the per-cloud naming and every version of this is the same three things: a GPU virtual machine, a firewall in front of it, and a cloud-init script that turns a bare Ubuntu box into a running inference server. The model serving runs on Ollama, which exposes an HTTP API on port 11434 and keeps the model resident in GPU memory between requests.

flowchart LR
Dev([Your machine / curl]) -->|"HTTP :11434"| FW["Firewall / security group<br/>(allow 11434, optional 22)"]
FW --> VM["GPU VM (Ubuntu 24.04)<br/>NVIDIA driver + Ollama"]
VM -->|"resident in GPU memory"| Model["qwen2.5:14b"]
ESC["Pulumi ESC<br/>pulumi-idp/auth"] -.->|"OIDC login (short-lived creds)"| Pulumi["pulumi up"]
Pulumi -.->|"AWS / Azure / GCP"| VM
Component Port Description
GPU VM - Ubuntu 24.04 with an NVIDIA T4-class GPU. Installs the driver and Ollama from cloud-init, then pulls the model.
Ollama 11434 Serves the model over an HTTP API and keeps it in GPU memory between calls.
Firewall / security group - Allows inbound 11434 (and 22 when you ask for it). Open to the world for a demo; lock the CIDR down for anything real.
pulumi-idp/auth (ESC) - One environment that brokers an OIDC login into AWS, Azure, and GCP. No static keys in code or CI.

One detail here is worth pausing on, and it explains why the Pulumi version comes out shorter than the Terraform one. The Akamai project ends with a null_resource that runs a curl loop in local-exec to block terraform apply until the model finishes downloading. That is not infrastructure but a runtime check wearing a resource costume. Pulumi has a Command provider that would let you reproduce it line for line, and this post deliberately does not. The program provisions the box, the box pulls the model on its own, and the program prints the endpoint. Whether the model has finished downloading yet is a question you answer with a curl, not a question your IaC tool should be holding a deployment open to ask.

Prerequisites

Before getting started, ensure you have:

  • Pulumi CLI installed and configured
  • A Pulumi Cloud account (ESC and OIDC live here)
  • An account on at least one of AWS, Azure, or Google Cloud, with an OIDC trust set up for Pulumi (covered below)
  • Enough GPU quota in your target region for one T4-class instance; a fresh account often starts at zero, which is the most common reason a first deploy fails, so check it before you run pulumi up
  • Node.js 18+ for the TypeScript program
  • curl to talk to the inference endpoint once it comes up
This post serves the qwen2.5:14b model, the same one the Akamai article uses, because the 4-bit quant fits comfortably on a single 16 GB T4. The model is one config value (model), so swap in anything Ollama supports; match the GPU to the model’s memory footprint, because a model that overflows VRAM still runs but spills onto the CPU and crawls.

Credentials without the copy-paste

The Terraform version authenticates the only way a single-cloud demo can: you mint a personal access token, export it as LINODE_TOKEN, and the provider reads it from the environment. It works, but that token is long-lived, it sits in your shell history and your CI secrets, and you mint one per cloud. Across three clouds that is three static keys to rotate and worry about.

Pulumi ESC (Environments, Secrets, and Configuration) replaces it all with an OIDC login. The idea: instead of storing a cloud key, you store a trust relationship. At deploy time, ESC presents a short-lived OIDC token to AWS, Azure, or Google Cloud, and each one hands back temporary credentials scoped to a role you control. Nothing long-lived is ever written down.

I keep that wiring in one environment, pulumi-idp/auth, and every stack imports it. Here is the whole thing:

# pulumi-idp/auth: one ESC environment that brokers an OIDC login into all three
# clouds. Every stack imports it. No static cloud key lives here or anywhere else.
values:
 aws:
 login:
 fn::open::aws-login:
 oidc:
 roleArn: arn:aws:iam::123456789012:role/pulumi-esc
 sessionName: pulumi-esc
 azure:
 login:
 fn::open::azure-login:
 clientId: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee
 tenantId: aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee
 subscriptionId: /subscriptions/00000000-0000-0000-0000-000000000000
 oidc: true
 gcp:
 login:
 fn::open::gcp-login:
 project: 123456789012 # numeric project number, not the project ID
 oidc:
 workloadPoolId: pulumi-esc
 providerId: pulumi-esc
 serviceAccount: pulumi-esc@my-project.iam.gserviceaccount.com
 environmentVariables:
 # AWS: read by the Pulumi AWS provider, the AWS SDKs, and the aws CLI
 AWS_ACCESS_KEY_ID: ${aws.login.accessKeyId}
 AWS_SECRET_ACCESS_KEY: ${aws.login.secretAccessKey}
 AWS_SESSION_TOKEN: ${aws.login.sessionToken}
 # Azure: read by the azure-native provider
 ARM_USE_OIDC: "true"
 ARM_CLIENT_ID: ${azure.login.clientId}
 ARM_TENANT_ID: ${azure.login.tenantId}
 ARM_SUBSCRIPTION_ID: ${azure.login.subscriptionId}
 ARM_OIDC_TOKEN: ${azure.login.oidc.token}
 # Google Cloud: read by the Pulumi Google Cloud provider
 GOOGLE_CLOUD_PROJECT: ${gcp.login.project}
 GOOGLE_OAUTH_ACCESS_TOKEN: ${gcp.login.accessToken}

You only need the blocks for the clouds you actually deploy to. Opening this environment makes ESC perform a live OIDC login for each cloud listed, so trim it to the one you use, or keep all three if, like me, you bounce between them.

Each cloud needs a one-time trust setup so this login is allowed at all: an IAM OIDC identity provider and role on AWS, a federated credential on an Azure app registration, and a Workload Identity Pool on Google Cloud. You do that once; after that, pulumi-idp/auth is the only thing any stack references for credentials.

A stack opts into it with one block in its stack config. The AWS stack’s Pulumi.aws.yaml:

# Pulumi.aws.yaml
environment:
 - pulumi-idp/auth

That is the entire credential story. No keys in the program, no keys in CI, nothing to rotate. The program below never mentions a secret; it only creates resources, and the ambient credentials from pulumi-idp/auth carry the request.

One cloud-init, three clouds

The part that turns a bare Ubuntu box into an inference server is identical on every cloud, so it lives in one file, cloud-init.yaml, that all three programs read. It follows the Akamai cloud-config closely, with a couple of robustness tweaks for a multi-cloud run, and it runs in a deliberate order because the GPU driver needs a reboot before Ollama can see the card:

  1. Update packages and install the kernel headers and ubuntu-drivers.
  2. Install the NVIDIA driver, then reboot so the kernel loads it.
  3. On the next boot, a one-shot systemd service installs Ollama, binds it to 0.0.0.0:11434, and pulls the model.
  4. The service disables itself, so it never runs again.
#cloud-config
# Zero-touch Ollama GPU box. Cloud-agnostic: no provider metadata, no static creds.
# The Pulumi program substitutes the model name on the `ollama pull` line below
# before this is passed as user-data (AWS/GCP) or custom-data (Azure).

write_files:
 # Make Ollama listen on every interface and never unload the model from VRAM.
 - path: /etc/systemd/system/ollama.service.d/override.conf
 content: |
 [Service]
 Environment="OLLAMA_HOST=0.0.0.0:11434"
 Environment="OLLAMA_KEEP_ALIVE=-1"

 # Runs once on the post-reboot boot, after the GPU driver is loaded.
 - path: /usr/local/bin/ollama-setup.sh
 permissions: '0755'
 content: |
 #!/usr/bin/env bash
 set -euxo pipefail

 # Install Ollama; the installer creates and starts the ollama systemd unit.
 curl -fsSL https://ollama.com/install.sh | sh

 # Pick up the OLLAMA_HOST / OLLAMA_KEEP_ALIVE override written above.
 systemctl daemon-reload
 systemctl enable ollama.service
 systemctl restart ollama.service

 # Wait for the daemon to bind its socket before pulling. This is the box
 # waiting on its OWN local daemon, not an external readiness gate.
 until curl -fsS http://127.0.0.1:11434/api/tags >/dev/null 2>&1; do
 sleep 2
 done

 # Pre-pull the model so the endpoint answers on the very first request.
 ollama pull __MODEL__

 # One-shot: never run again on future boots.
 systemctl disable ollama-setup.service

 - path: /etc/systemd/system/ollama-setup.service
 content: |
 [Unit]
 Description=One-time Ollama install and model pull
 After=network-online.target
 Wants=network-online.target

 [Service]
 Type=oneshot
 ExecStart=/usr/local/bin/ollama-setup.sh
 RemainAfterExit=true
 # The model pull runs for several minutes; without this, systemd's default
 # 90s start timeout kills the service mid-download and the model never lands.
 TimeoutStartSec=0

 [Install]
 WantedBy=multi-user.target

runcmd:
 - apt-get update
 - DEBIAN_FRONTEND=noninteractive apt-get -y upgrade
 # linux-headers-generic also covers the kernel the upgrade above may have pulled
 # in, so DKMS builds the NVIDIA module for the kernel that boots next.
 - DEBIAN_FRONTEND=noninteractive apt-get install -y "linux-headers-$(uname -r)" linux-headers-generic ubuntu-drivers-common
 # --gpgpu selects the headless server driver branch, the right one for a compute
 # GPU like the T4; bare `ubuntu-drivers install` would pull the desktop stack.
 - ubuntu-drivers install --gpgpu
 - systemctl enable ollama-setup.service

power_state:
 mode: reboot
 message: Rebooting to load the NVIDIA driver before Ollama setup
 condition: true

The model name is the literal token __MODEL__. The Pulumi program reads this file at deploy time and substitutes your model config value before handing it to the instance. Two environment settings are doing quiet but important work: OLLAMA_HOST=0.0.0.0:11434 makes Ollama listen on every interface instead of localhost alone, and OLLAMA_KEEP_ALIVE=-1 keeps the model pinned in GPU memory so only the first request pays the load cost.

The programs

Every program has the same five beats: read the config, template the cloud-init.yaml, open the inbound ports, launch the GPU instance with that cloud-init as its user data, and export the endpoint. What differs is only the dialect each cloud speaks for “GPU instance” and “firewall rule.”

The shared contract keeps the three listings legible side by side: the same model and allowSsh config keys, the same cloud-init.yaml, and the same four exports (publicIp, ollamaEndpoint, generateEndpoint, tagsEndpoint). Pick your cloud:

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
import * as fs from "fs";

const cfg = new pulumi.Config();
const model = cfg.get("model") ?? "qwen2.5:14b";
const allowSsh = cfg.getBoolean("allowSsh") ?? false;
const region = (cfg.get("region") ?? "us-east-1") as aws.Region;

// Inject the model name into cloud-init at deploy time (a file read, not a shell-out).
const userData = fs.readFileSync("cloud-init.yaml", "utf8").replace(/__MODEL__/g, model);

// Region comes from config; credentials arrive ambiently from pulumi-idp/auth.
const provider = new aws.Provider("aws", { region });

// Most recent Ubuntu 24.04 LTS (amd64, hvm, gp3) published by Canonical.
const ubuntu = aws.ec2.getAmi({
 mostRecent: true,
 owners: ["099720109477"],
 filters: [
 { name: "name", values: ["ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-*"] },
 { name: "virtualization-type", values: ["hvm"] },
 ],
}, { provider });

const ingress: aws.types.input.ec2.SecurityGroupIngress[] = [{
 description: "Ollama HTTP API. NOTE: restrict cidrBlocks to your own range in production.",
 fromPort: 11434,
 toPort: 11434,
 protocol: "tcp",
 cidrBlocks: ["0.0.0.0/0"],
}];

if (allowSsh) {
 ingress.push({
 description: "SSH",
 fromPort: 22,
 toPort: 22,
 protocol: "tcp",
 cidrBlocks: ["0.0.0.0/0"],
 });
}

const sg = new aws.ec2.SecurityGroup("ollama", {
 description: "Ollama inferencing access",
 ingress,
 egress: [{
 description: "Allow all outbound",
 fromPort: 0,
 toPort: 0,
 protocol: "-1",
 cidrBlocks: ["0.0.0.0/0"],
 }],
}, {
 provider,
});

const server = new aws.ec2.Instance("ollama", {
 ami: ubuntu.then(a => a.id),
 instanceType: "g4dn.xlarge", // 1x NVIDIA T4
 vpcSecurityGroupIds: [sg.id],
 associatePublicIpAddress: true,
 userData, // plain text; the AWS provider base64-encodes it for you
 rootBlockDevice: {
 volumeSize: 40,
 volumeType: "gp3",
 },
 tags: { Name: "ollama" },
}, {
 provider,
});

export const publicIp = server.publicIp;
export const ollamaEndpoint = pulumi.interpolate`http://${publicIp}:11434`;
export const generateEndpoint = pulumi.interpolate`http://${publicIp}:11434/api/generate`;
export const tagsEndpoint = pulumi.interpolate`http://${publicIp}:11434/api/tags`;
import * as pulumi from "@pulumi/pulumi";
import * as gcp from "@pulumi/gcp";
import * as fs from "fs";

const cfg = new pulumi.Config();
const model = cfg.get("model") ?? "qwen2.5:14b";
const allowSsh = cfg.getBoolean("allowSsh") ?? false;
const project = cfg.get("project");
const zone = cfg.get("zone") ?? "us-central1-a";

// Inject the model name into cloud-init at deploy time (a file read, not a shell-out).
const userData = fs.readFileSync("cloud-init.yaml", "utf8").replace(/__MODEL__/g, model);

// Latest Ubuntu 24.04 LTS amd64 image.
const ubuntu = gcp.compute.getImage({
 family: "ubuntu-2404-lts-amd64",
 project: "ubuntu-os-cloud",
});

// Network tag binds the firewall rule to this instance.
const networkTag = "ollama";

const firewall = new gcp.compute.Firewall("ollama-fw", {
 network: "default",
 project: project,
 // Inbound to Ollama from anywhere; restrict this CIDR for production.
 allows: allowSsh
 ? [{ protocol: "tcp", ports: ["11434"] }, { protocol: "tcp", ports: ["22"] }]
 : [{ protocol: "tcp", ports: ["11434"] }],
 sourceRanges: ["0.0.0.0/0"],
 targetTags: [networkTag],
});

const instance = new gcp.compute.Instance("ollama", {
 machineType: "n1-standard-4",
 zone: zone,
 project: project,
 tags: [networkTag],
 bootDisk: {
 initializeParams: {
 image: ubuntu.then(i => i.selfLink),
 size: 40,
 },
 },
 guestAccelerators: [{
 type: "nvidia-tesla-t4",
 count: 1,
 }],
 // GPUs cannot live-migrate, so host maintenance must terminate (and restart) the VM.
 scheduling: {
 onHostMaintenance: "TERMINATE",
 automaticRestart: true,
 },
 networkInterfaces: [{
 network: "default",
 accessConfigs: [{}], // an empty config requests an ephemeral public IP
 }],
 metadata: {
 "user-data": userData, // Ubuntu cloud-init reads this key, not startup-script
 },
}, {
 dependsOn: firewall,
});

export const publicIp = instance.networkInterfaces.apply(nics => nics[0].accessConfigs![0].natIp);
export const ollamaEndpoint = pulumi.interpolate`http://${publicIp}:11434`;
export const generateEndpoint = pulumi.interpolate`http://${publicIp}:11434/api/generate`;
export const tagsEndpoint = pulumi.interpolate`http://${publicIp}:11434/api/tags`;
import * as pulumi from "@pulumi/pulumi";
import * as resources from "@pulumi/azure-native/resources";
import * as network from "@pulumi/azure-native/network";
import * as compute from "@pulumi/azure-native/compute";
import * as random from "@pulumi/random";
import * as fs from "fs";

const cfg = new pulumi.Config();
const model = cfg.get("model") ?? "qwen2.5:14b";
const allowSsh = cfg.getBoolean("allowSsh") ?? false;
const location = cfg.get("location") ?? "eastus";

// Inject the model name into cloud-init at deploy time (a file read, not a shell-out).
const userData = fs.readFileSync("cloud-init.yaml", "utf8").replace(/__MODEL__/g, model);

const resourceGroup = new resources.ResourceGroup("ollama-rg", {
 location,
});

const vnet = new network.VirtualNetwork("ollama-vnet", {
 resourceGroupName: resourceGroup.name,
 addressSpace: { addressPrefixes: ["10.0.0.0/16"] },
});

const subnet = new network.Subnet("ollama-subnet", {
 resourceGroupName: resourceGroup.name,
 virtualNetworkName: vnet.name,
 addressPrefix: "10.0.1.0/24",
});

// Standard SKU public IPs use static allocation.
const publicIpAddress = new network.PublicIPAddress("ollama-pip", {
 resourceGroupName: resourceGroup.name,
 sku: { name: network.PublicIPAddressSkuName.Standard },
 publicIPAllocationMethod: network.IPAllocationMethod.Static,
});

// Azure permits all outbound by default, so only inbound rules are needed.
const nsg = new network.NetworkSecurityGroup("ollama-nsg", {
 resourceGroupName: resourceGroup.name,
 securityRules: [
 {
 name: "allow-ollama",
 priority: 1000,
 direction: network.SecurityRuleDirection.Inbound,
 access: network.SecurityRuleAccess.Allow,
 protocol: network.SecurityRuleProtocol.Tcp,
 sourcePortRange: "*",
 destinationPortRange: "11434",
 sourceAddressPrefix: "0.0.0.0/0", // prod: restrict to your client CIDR
 destinationAddressPrefix: "*",
 },
 ...(allowSsh ? [{
 name: "allow-ssh",
 priority: 1001,
 direction: network.SecurityRuleDirection.Inbound,
 access: network.SecurityRuleAccess.Allow,
 protocol: network.SecurityRuleProtocol.Tcp,
 sourcePortRange: "*",
 destinationPortRange: "22",
 sourceAddressPrefix: "0.0.0.0/0",
 destinationAddressPrefix: "*",
 }] : []),
 ],
});

const nic = new network.NetworkInterface("ollama-nic", {
 resourceGroupName: resourceGroup.name,
 networkSecurityGroup: { id: nsg.id },
 ipConfigurations: [{
 name: "ipconfig1",
 subnet: { id: subnet.id },
 publicIPAddress: { id: publicIpAddress.id },
 privateIPAllocationMethod: network.IPAllocationMethod.Dynamic,
 primary: true,
 }],
});

// Azure requires an admin credential even though we never log in. Generate one
// instead of hard-coding it; it stays a Pulumi secret and is never exported.
const adminPassword = new random.RandomPassword("ollama-admin-password", {
 length: 24,
 special: true,
 overrideSpecial: "!#$%*",
 minLower: 1,
 minUpper: 1,
 minNumeric: 1,
 minSpecial: 1, // Azure requires 3 of 4 character classes
});

const vm = new compute.VirtualMachine("ollama-vm", {
 resourceGroupName: resourceGroup.name,
 hardwareProfile: { vmSize: "Standard_NC4as_T4_v3" }, // 1x NVIDIA T4
 networkProfile: {
 networkInterfaces: [{ id: nic.id, primary: true }],
 },
 osProfile: {
 computerName: "ollama",
 adminUsername: "azureuser",
 adminPassword: adminPassword.result,
 customData: Buffer.from(userData).toString("base64"), // Azure wants base64
 linuxConfiguration: {
 disablePasswordAuthentication: false,
 },
 },
 storageProfile: {
 imageReference: {
 publisher: "Canonical",
 offer: "ubuntu-24_04-lts",
 sku: "server", // Gen2; NC4as_T4_v3 is a Gen2 size
 version: "latest",
 },
 osDisk: {
 name: "ollama-osdisk",
 createOption: "FromImage",
 diskSizeGB: 40,
 managedDisk: { storageAccountType: "StandardSSD_LRS" },
 },
 },
});

// A Standard/Static public IP is reserved at creation, so its address is known
// once the resource exists; no separate lookup needed.
export const publicIp = publicIpAddress.ipAddress.apply(ip => ip!);
export const ollamaEndpoint = pulumi.interpolate`http://${publicIp}:11434`;
export const generateEndpoint = pulumi.interpolate`http://${publicIp}:11434/api/generate`;
export const tagsEndpoint = pulumi.interpolate`http://${publicIp}:11434/api/tags`;

A few per-cloud details are worth calling out, since they are the places the “same program” abstraction leaks:

  • AWS is the shortest program, because the GPU comes with the instance shape: a g4dn.xlarge is a T4 box, so there is no separate accelerator to attach. The one thing to do before you deploy is raise the Running On-Demand G and VT instances vCPU quota in your region; a fresh account starts at zero, and pulumi up fails with VcpuLimitExceeded until you do.
  • Google Cloud attaches the GPU explicitly with guestAccelerators, and that brings the rule that trips people up: a GPU instance cannot live-migrate, so scheduling.onHostMaintenance must be "TERMINATE" or the apply is rejected. The cloud-init also has to ride on the user-data metadata key, not startup-script, and the empty accessConfigs: [{}] is what hands the box a public IP. T4 quota is also zero on a new project.
  • Azure is the longest listing, because the network is à la carte: the resource group, virtual network, subnet, public IP, security group, and NIC are each their own resource before you reach the VM. One detail surprises people: a Linux VM requires an admin credential even when you never log in, so the program generates a throwaway password with random.RandomPassword rather than committing one. The NC4as_T4_v3 is a compute GPU, so the standard server (CUDA) driver from the cloud-init is the right choice here; the GRID driver is only needed for GPU-accelerated visualization workloads.

The full programs, all three Pulumi.<cloud>.yaml files, and the shared cloud-init.yaml are in the companion repo:

GitHub repository: dirien/fully-automated-ai-inference-pulumi
github.com/dirien/fully-automated-ai-inference-pulumi

Deploying

Create a project, install the provider for your cloud, point the stack at pulumi-idp/auth, and deploy. For AWS:

mkdir ai-inference && cd ai-inference
pulumi new typescript
npm install @pulumi/aws

Drop the AWS listing into index.ts, copy the cloud-init.yaml shown earlier into the same directory (or grab it from the companion repo), point the stack at the auth environment, and set your region:

pulumi stack init aws
# add `environment: [pulumi-idp/auth]` to Pulumi.aws.yaml (shown above)
pulumi config set region us-east-1
pulumi config set model qwen2.5:14b
pulumi up

The other two clouds are the same flow with a different provider package and a couple of config keys: npm install @pulumi/gcp with pulumi config set project <your-project-id> and pulumi config set zone us-central1-a, or npm install @pulumi/azure-native @pulumi/random with pulumi config set location eastus.

The slow part is the GPU box: it boots, installs the driver, reboots, installs Ollama, and pulls the model, all without you. Depending on the instance, the region, and the model size, plan on roughly 10 to 15 minutes before the endpoint answers. When pulumi up finishes you get the endpoints straight back as stack outputs:

Outputs:
 generateEndpoint: "http://<public-ip>:11434/api/generate"
 ollamaEndpoint : "http://<public-ip>:11434"
 publicIp : "<public-ip>"
 tagsEndpoint : "http://<public-ip>:11434/api/tags"

Testing the inference server

pulumi up returns as soon as the infrastructure exists, which is before the model has finished downloading. This is exactly where Terraform reached for that null_resource loop, and where Pulumi hands you a URL instead. To check whether the model is ready, ask Ollama what it has loaded:

curl -s $(pulumi stack output tagsEndpoint) | grep -q "qwen2.5:14b" && echo "ready" || echo "still pulling"

Once it reports ready, send it a prompt. This is the same request the Akamai post makes, pointed at your generateEndpoint output:

curl -s $(pulumi stack output generateEndpoint) -d '{
 "model": "qwen2.5:14b",
 "system": "Answer every question with a three-line poem.",
 "prompt": "Why is the sky blue?",
 "stream": false
}'

The first call is slower, because Ollama loads the model into GPU memory before it answers. Every call after that is fast, since OLLAMA_KEEP_ALIVE=-1 keeps it resident:

{
 "model": "qwen2.5:14b",
 "response": "Sunlight scatters, short waves fly,\nBlue light paints the open sky,\nViolet fades as day drifts by.",
 "done": true
}

When you are done, take it all down with one command, so the GPU stops billing:

pulumi destroy

Cost

A GPU instance is the whole bill, and you pay by the hour whether the model is busy or idle. The numbers below are rough on-demand rates for a single T4-class box left running 24/7; in practice you spin it up, use it, and pulumi destroy it, so what you actually pay tracks the hours it stays up.

Cloud Instance GPU ~Hourly ~Monthly (24/7)
AWS g4dn.xlarge 1× T4 (16 GB) ~$0.53 ~$384
Azure Standard_NC4as_T4_v3 1× T4 (16 GB) ~$0.53 ~$384
Google Cloud n1-standard-4 + 1× T4 1× T4 (16 GB) ~$0.54 ~$394

These are on-demand Linux rates in a cheap US region (us-east-1, eastus, us-central1) as of mid-2026, and the GPU dominates the bill. Spot or preemptible capacity cuts it sharply if your workload tolerates interruption (often 60 to 70 percent off), Google Cloud’s sustained-use discount trims an always-on instance on its own, and the 40 GB disk adds only a few dollars a month.

A GPU box left running overnight is the expensive mistake here. Because the whole stack is declarative, the safe habit is cheap: pulumi destroy when you stop using it, pulumi up when you need it back. The state and config are version-controlled, so standing it up again is one command, not a rebuild.

Security considerations

The architecture in this post matches the Akamai original on purpose, which means it carries the same caveat: Ollama is exposed over plain HTTP on a port open to the entire internet. That is fine for a demo on a box you tear down the same day. It is not fine for anything that outlives the afternoon, and self-hosted AI endpoints get found fast: scanners like Shodan enumerate a freshly exposed one within hours, and an open Ollama instance is free compute for whoever finds it first.

The credential side, though, is genuinely better than a static-token setup, and that is the part worth keeping. There is no long-lived cloud key in the program, in your shell, or in CI; every deploy gets short-lived credentials minted through pulumi-idp/auth and thrown away when it finishes.

Concern Akamai/Terraform demo This deployment
Cloud credentials Static PAT in an env var OIDC login, short-lived, via pulumi-idp/auth
Inbound exposure Port 11434 open to 0.0.0.0/0 Same by default; one config flag scopes the CIDR
Transport Plain HTTP Plain HTTP (front with TLS for real use)
Readiness wait null_resource + local-exec shell loop A runtime curl, no resource

My recommendations for taking this past a demo:

  • Scope the inbound rule to your own CIDR instead of 0.0.0.0/0; the allowSsh flag in the program is the pattern to copy for the Ollama port
  • Put a reverse proxy with TLS in front of Ollama, or keep the box off the public internet entirely and reach it over a private network or a Tailscale tailnet, the way I did for the Hermes agent
  • Keep credentials on OIDC through ESC; never fall back to a static key to “just get it working”
  • Treat the box as disposable, and pulumi destroy it when you are not using it, which shrinks both the bill and the attack window

What’s next?

The program is a foundation, and the interesting work is what you layer on once a model is a pulumi up away:

  • Lock it down. Scope the firewall, add TLS, or move the box behind a private network so the endpoint is not on the public internet at all.
  • Scale the model to the GPU. qwen2.5:14b on a T4 is the entry point. Bigger models want more memory, which means an A10G, an L4, or an A100, and the only change in the program is the instance type and the model value.
  • Swap the serving engine. Ollama is the simplest thing that works. For higher throughput, the same shape holds with vLLM in place of Ollama in the cloud-init.
  • Generate the next one. Pulumi Neo can take a target like “a GPU inference box on Azure behind a private endpoint” and produce a first draft of the program, which you then review in a PR like any other change.

Conclusion

Declaring an AI inference server as code buys you the same thing it buys for any other infrastructure: you can reproduce it on any of the three clouds, version-control it, and tear it down with a single pulumi destroy. Porting the Akamai project made the contrast sharp. The credentials moved from a static token to an OIDC login that leaves nothing behind, and the readiness wait disappeared entirely once you stop treating a runtime check as a resource.

That is the line worth carrying to the next thing you deploy. Not every step in a runbook is a resource. A GPU box and a firewall rule are; waiting for a download to finish is not. Draw that line first, and the program gets shorter on its own.

If you run into issues or have questions, drop by the Pulumi Community Slack or GitHub Discussions. New to Pulumi? Get started here.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

A compatibility note on the abuse of Windows window class extra bytes

1 Share

During my discussion of the evolution of system-windows window and class extra bytes, I noted that even though IDs are typically small integers, people liked to stash pointers there, so we had to expand the ID field to a pointer-sized integer.

One thing I’ve learned is that anywhere it’s possible to hide a pointer, people will hide a pointer there. This is true even for small integers.

As I was digging up the history of the extra bytes, I saw a special note in the 16-bit code for Set­Class­Word: It says that there’s an app that expects to be able to modify the value of GWW_CB­CLS­EXTRA.

Now, modifying this value has no practical effect because the memory for the class was allocated when you called Register­Class. You can’t go back in time and change the allocation size.

But one program realized that it could use this value as a place to store some private data, so they did. Sure, that’s not the purpose of the GWW_CB­CLS­EXTRA, but that never stopped them.

For compatibility, Windows lets 16-bit programs modify GWW_CB­CLS­EXTRA. But at least it blocks it for 32-bit and 64-bit programs. One loophole closed. Countless more to go.

The post A compatibility note on the abuse of Windows window class extra bytes appeared first on The Old New Thing.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

Blue/green deployments on Kubernetes with Argo Rollouts

1 Share

One of the harder questions to answer at scale is how to ship without a few seconds where your users are getting timeouts or your fleet is split across two image versions.

Kubernetes’s default rolling update strategy gradually deploys new pods and retires old ones, but during the swap, your service runs both versions side by side, and a regression in the new image affects every request that lands on a new pod.

Progressive delivery patterns like blue/green have long existed: you stand the new version up alongside the old, prove it’s healthy on a separate preview endpoint, then flip user traffic across. Blast radius shrinks to nothing in the bad case, rollback is a single command, and your release stops being a held-breath moment.

In this post, you’ll set that up with Argo Rollouts, the controller behind progressive delivery in the Argo ecosystem (which graduated from the CNCF in 2022).

What is Argo Rollouts (and why you need it)

Argo Rollouts is a Kubernetes controller and a set of CRDs that bolt blue/green, canary, and other progressive delivery strategies onto your cluster. The primary CRD is Rollout, a drop-in replacement for the standard Deployment.

You convert an existing Deployment by changing the apiVersion to argoproj.io/v1alpha1 and the kind to Rollout, then adding a strategy.blueGreen or strategy.canary block that describes how a new revision should roll out.

A major reason to reach for it is that there’s no built-in way to do this kind of traffic control in Kubernetes (you can’t decide where requests go independently of which pods are Ready). There’s no easy rollback to the previous version once the update has started.

Argo Rollouts fills in everything around that. It plugs into ingress controllers (Traefik, ALB) and service meshes (Istio, Linkerd, SMI) for real traffic shaping. It can query metrics providers (Prometheus, Datadog, CloudWatch, New Relic) to gate promotions on hard numbers, and it tracks every revision as its own ReplicaSet, so flipping back is instant.

Prerequisites

This tutorial assumes some familiarity with Kubernetes. You’ll also need:

  • A working Kubernetes cluster (EKS, GKE, AKS, or local like Minikube/Kind).
  • kubectl installed and pointed at the cluster (kubectl get nodes should return at least one Ready node)
  • helm v3 installed
  • curl for hitting the demo app

Step 1: Installing the Argo Rollouts controller

Argo Rollouts ships as a controller that runs in its own namespace, along with a kubectl plugin you’ll use on your laptop to inspect and steer rollouts.

To install the controller, follow these steps:

  1. Add the Argo Helm repo and install the controller:

    helm repo add argo https://argoproj.github.io/argo-helm
    helm repo update
    helm install argo-rollouts argo/argo-rollouts \
      --namespace argo-rollouts \
      --create-namespace \
      --wait
  2. Confirm the controller pods are up:

    kubectl -n argo-rollouts get pods

    You should see something like:

    NAME                             READY   STATUS    RESTARTS   AGE  
    argo-rollouts-dcd465dfc-8m2ql    1/1     Running   0          79s  
    argo-rollouts-dcd465dfc-q92k4    1/1     Running   0          79s

Step 2: Installing the kubectl argo rollouts plugin

The controller is running, but the most ergonomic way to drive a rollout — inspecting state, setting images, promoting, rolling back — is the kubectl argo rollouts plugin. It’s a separate binary that drops onto your PATH, and kubectl picks it up automatically.

To install the plugin, follow these steps:

On macOS with Homebrew:

brew install argoproj/tap/kubectl-argo-rollouts

On Linux:

curl -sLO https://github.com/argoproj/argo-rollouts/releases/latest/download/kubectl-argo-rollouts-linux-amd64
chmod +x kubectl-argo-rollouts-linux-amd64
sudo mv kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rollouts

Verify it’s wired up:

kubectl argo rollouts version

You should see something like kubectl-argo-rollouts: v1.8.3+.... From here on, we’ll use kubectl argo rollouts … subcommands to drive the rollout.

Step 3: Defining the Rollout

Argo Rollouts’ core idea is that you swap your Deployment for a Rollout resource. The pod template inside it is identical to a Deployment’s.

What changes is the spec.strategy block, which describes how a new revision should roll out.

For blue/green, you need three things:

  1. A Rollout with spec.strategy.blueGreen configured
  2. An active Service, which always points to whichever ReplicaSet is currently serving production traffic
  3. A preview Service, which points at the new ReplicaSet before it gets promoted, so that you can test it in isolation

Argo Rollouts injects the rollouts-pod-template-hash label into each Service’s selector at runtime, which is how it switches traffic without you ever editing the Services.

Write the manifest:


cat > rollout.yaml <<'EOF'
apiVersion: v1
kind: Service
metadata:
  name: rollouts-demo-active
spec:
  type: ClusterIP
  ports:
    - port: 80
      targetPort: 8080
      protocol: TCP
  selector:
    app: rollouts-demo
---
apiVersion: v1
kind: Service
metadata:
  name: rollouts-demo-preview
spec:
  type: ClusterIP
  ports:
    - port: 80
      targetPort: 8080
      protocol: TCP
  selector:
    app: rollouts-demo
---
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollouts-demo
spec:
  replicas: 2
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: rollouts-demo
  template:
    metadata:
      labels:
        app: rollouts-demo
    spec:
      containers:
        - name: rollouts-demo
          image: argoproj/rollouts-demo:blue
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          resources:
            requests:
              cpu: 25m
              memory: 32Mi
  strategy:
    blueGreen:
      activeService: rollouts-demo-active
      previewService: rollouts-demo-preview
      autoPromotionEnabled: false
      scaleDownDelaySeconds: 30
EOF

A few fields are worth calling out in the blueGreen block:

  • activeService and previewService are the names of the two ClusterIP Services above. Argo Rollouts owns their selectors from here on; you don’t edit them by hand.
  • autoPromotionEnabled: false is what makes this a manual promotion. The new ReplicaSet comes up, you inspect it on the preview Service, and only then do you flip the active Service over. Set it to true (the default) and Argo will auto-promote the moment the new pods are Ready.
  • scaleDownDelaySeconds: 30 keeps the old (blue) ReplicaSet around for 30 seconds after promotion, so if something goes wrong in those first few seconds, you can flip back instantly without rescheduling pods.

Apply it:

kubectl apply -f rollout.yaml

We’re using argoproj/rollouts-demo,, a tiny app published by the Argo team that serves an HTML dashboard and a /color endpoint that reports which tagged image is running (blue, green, yellow, etc.). It’s perfect for seeing the cutover happen in real time.

Step 4: Checking the initial state

Take a look at the rollout:

kubectl argo rollouts get rollout rollouts-demo

Output:

Name:            rollouts-demo  
Namespace:       default  
Status:          ✔ Healthy  
Strategy:        BlueGreen  
Images:          argoproj/rollouts-demo:blue (stable, active)  
Replicas:  
  Desired:       2  
  Current:       2  
  Updated:       2  
  Ready:         2  
  Available:     2  

NAME                                       KIND        STATUS     AGE  INFO  
⟳ rollouts-demo                            Rollout     ✔ Healthy  34s  
└──\# revision:1  
   └──⧉ rollouts-demo-86c957c6d6           ReplicaSet  ✔ Healthy  34s  stable,active  
      ├──□ rollouts-demo-86c957c6d6-72kjf  Pod         ✔ Running  34s  ready:1/1  
      └──□ rollouts-demo-86c957c6d6-nv3zg  Pod         ✔ Running  34s  ready:1/1

One revision, two pods, both stable and active. Both Services currently point at the same ReplicaSet hash. You can confirm with:

kubectl get svc rollouts-demo-active rollouts-demo-preview \
  -o jsonpath='{range .items[*]}{.metadata.name}{" -> hash="}{.spec.selector.rollouts-pod-template-hash}{"\n"}{end}'

Output:

rollouts-demo-active -> hash=86c957c6d6
rollouts-demo-preview -> hash=86c957c6d6

Step 5: Triggering a new version

Now let’s deploy a new revision. We’ll change the image tag from blue to yellow:

kubectl argo rollouts set image rollouts-demo \
  rollouts-demo=argoproj/rollouts-demo:yellow

Argo creates a new ReplicaSet (rev 2) for the yellow image and waits, because we set autoPromotionEnabled: false. The active Service still points at blue. The preview Service is re-pointed at yellow:

kubectl argo rollouts get rollout rollouts-demo

Output:

Status:          ॥ Paused
Message:         BlueGreenPause
Strategy:        BlueGreen
Images:          argoproj/rollouts-demo:blue (stable, active)
                 argoproj/rollouts-demo:yellow (preview)
Replicas:
  Desired:       2
  Current:       4
  Updated:       2
  Ready:         2
  Available:     2

NAME                                       KIND        STATUS     AGE  INFO
⟳ rollouts-demo                            Rollout     ॥ Paused
├──# revision:2
│  └──⧉ rollouts-demo-7cf9dff6bb           ReplicaSet  ✔ Healthy  38s  preview
│     ├──□ rollouts-demo-7cf9dff6bb-cbp2c  Pod         ✔ Running  38s  ready:1/1
│     └──□ rollouts-demo-7cf9dff6bb-fn2gt  Pod         ✔ Running  38s  ready:1/1
└──# revision:1
   └──⧉ rollouts-demo-86c957c6d6           ReplicaSet  ✔ Healthy  5m   stable,active
      ├──□ rollouts-demo-86c957c6d6-72kjf  Pod         ✔ Running  5m   ready:1/1
      └──□ rollouts-demo-86c957c6d6-nv3zg  Pod         ✔ Running  5m   ready:1/1

This is the heart of blue/green. The cluster is now running both versions, but only blue is serving real traffic.

Step 6: Proving the split with curl

Forward both Services to your laptop on different local ports:

kubectl port-forward svc/rollouts-demo-active 8080:80 >/dev/null 2>&1 &
kubectl port-forward svc/rollouts-demo-preview 8081:80 >/dev/null 2>&1 &
sleep 3

Hit each one using:

echo "active : $(curl -s http://127.0.0.1:8080/color)"
echo "preview: $(curl -s http://127.0.0.1:8081/color)"

Output:

active : "blue"  
preview: "yellow"

This is exactly the window where you’d run smoke tests, point a staging frontend at the preview hostname, or have Argo run an AnalysisTemplate against Prometheus.

Nothing about production traffic has changed yet.

When you’re done with the port-forwards, run the following command:

kill %1 %2 2>/dev/null

Step 7: Promoting

When you’re happy, flip the active Service over with one command:

kubectl argo rollouts promote rollouts-demo

Output:

rollout 'rollouts-demo' promoted

Argo updates the active Service’s selector to the new ReplicaSet hash.

Subsequent requests should show production traffic is in yellow. The old blue pods stick around for scaleDownDelaySeconds (30 by default) before being torn down, which is what makes the next section possible.

Confirm the cutover by running:

kubectl argo rollouts status rollouts-demo --timeout 60s

You should see Healthy, and the Service selectors should now agree:

kubectl get svc rollouts-demo-active rollouts-demo-preview \
  -o jsonpath='{range .items[*]}{.metadata.name}{" -> hash="}{.spec.selector.rollouts-pod-template-hash}{"\n"}{end}'

Output:

rollouts-demo-active -> hash=7cf9dff6bb
rollouts-demo-preview -> hash=7cf9dff6bb

Step 8: Rolling back

If something goes wrong after the promotion (a metric tanks, you spot an error in the logs, a teammate flags a bug), undo it by running:

kubectl argo rollouts undo rollouts-demo

That brings the previous ReplicaSet back as the new “preview” and pauses, waiting for you to confirm with promote once more, which flips the active Service back to it. Because the old pods were kept warm by scaleDownDelaySeconds, this happens in seconds, not whatever your image pull time is.

Step 9: Cleaning up

Once you’re done, you can tear down the demo using:

kubectl delete -f rollout.yaml
helm uninstall argo-rollouts -n argo-rollouts
kubectl delete ns argo-rollouts

How does this fit with Octopus Deploy and Argo CD

Everything we’ve done so far works on its own. You’ve got a Rollout, two Services, and a one-command promote/undo loop.

Argo Rollouts is happy to do its job at the cluster level. What it doesn’t have is an opinion on how dev becomes staging and then becomes production. Who’s allowed to push the button, or what the deployment history looked like six weeks ago.

This is the layer Octopus Deploy is built for. A good mental model is:

  • Argo Rollouts owns the cluster-side mechanics: Which ReplicaSet is active, which is preview, when to flip, and when to scale down old pods.
  • Argo CD owns the GitOps sync: The Rollout (and its Services) live in a Git repo, and the cluster state is reconciled to match.
  • Octopus owns everything above that: Environments, approval gates, release lifecycles, audit trails, and the self-service UI that developers actually click on.

The promotion path between environments is described once in Octopus and reused across every service, instead of being re-encoded in each team’s CI script.

Ship green, sleep through the night

If you made it this far, you’ve got the cluster-side mechanics of progressive delivery sorted: a Rollout flipping between active and preview Services, a manual promotion gate, and instant rollback. That’s the hard, hands-on layer done.

What’s missing is the orchestration above it, including environments, approvals, audit trails, and the self-service flow your developers actually click. That’s where Octopus Deploy slots in, sitting on top of Argo CD and Argo Rollouts to give you a complete progressive delivery stack across every environment, not just one cluster.

Connect your Argo CD instance to Octopus and see how the whole pipeline comes together, or try Octopus free and wire it up against your own cluster.

Happy deployments!

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Announcing Rust 1.96.1

1 Share

The Rust team has published a new point release of Rust, 1.96.1. Rust is a programming language that is empowering everyone to build reliable and efficient software.

If you have a previous version of Rust installed via rustup, getting Rust 1.96.1 is as easy as:

rustup update stable

If you don't have it already, you can get rustup from the appropriate page on our website.

What's in 1.96.1

Rust 1.96.1 fixes:

It also fixes three CVEs affecting libssh2 (which is compiled into Cargo):

Contributors to 1.96.1

Many people came together to create Rust 1.96.1. We couldn't have done it without all of you. Thanks!

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories