Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
149451 stories
·
33 followers

345: Damn It… my excuse is now gone for Disaster Recovery

1 Share

Welcome to episode 345 of The Cloud Pod, where the forecast is always cloudy! Justin, Ryan, and Matt are in the studio this week and are ready to bring you all the latest in cloud and AI news, including what’s going on between Anthropic, the DOD, and OpenAI, what the war means for Middle East data centers (Spoiler – I hope you have a good Disaster Recovery plan), and Transit Gateway pricing changes that are enough to make a grown man cry. And don’t bother waiting: Matt has completely forgotten almost two years of “bye everybody” and now claims full amnesia as to what his outtro is. Oh well. Let’s get into today’s show. 

Titles we almost went with this week

  • Claude Learned to Use a Computer Better Than Your Dad **OpenAI
  • Amazon and OpenAI’s $138 Billion AI Bromance
  • When Two AZs Go Dark the Cloud Gets Crispy
  • Fifty Billion Reasons AWS Loves OpenAI Now **Anthropic
  • Azure Still Wins Even When AWS Thinks It Did
  • Fire, Water, and a Multi-AZ Assumption Goes Up in Smoke
  • Claude Refuses to Go Full Skynet for the Pentagon
  • GPT-5.3 Instant Finally Stops Lecturing You
  • No Killer Robots Without Human Approval Please
  • Terraform Finally Sees Your Forgotten Cloud Resources
  • Stage Before You Rage Deploy Azure Firewall
  • CrowdStrike to Zscaler AWS Wants Your Security Tab
  • One Hub to Rule Your API Sprawl
  • Transit Gateway Attachments Just Got Surprisingly Expensive
  • Azure Container Registry Finally Has Room for Your AI Hoarding
  • Bedrock Gets a Roommate OpenAI Moves In
  • Azure Firewall Gets a Safety on the Trigger
  • Stop Writing Scripts, Just Import the Dang Infrastructure
  • Audit Your APIs Before March 2026 Bites You
  • Damn it… my excuse not to DR is gone
  • I’m Epically Furious about DR

AI Is Going Great – Or How ML Makes Money 

03:34 Anthropic acquires Vercept to advance Claude’s computer use capabilities 

  • Anthropic acquired Vercept, a team specializing in AI perception and interaction, to strengthen Claude’s computer use capabilities. 
  • The Vercept founders, including Ross Girshick, bring deep expertise in how AI systems visually interpret and interact with software interfaces.
  • Claude Sonnet 4.6 shows substantial improvement in computer use benchmarks, jumping from under 15% on the OSWorld evaluation in late 2024 to 72.5% today. 
  • The model is now approaching human-level performance on tasks like navigating spreadsheets and completing multi-tab web forms.
  • Computer use enables Claude to operate inside live applications the way a human would, handling multi-step workflows across tools that cannot be automated through code alone. 
  • This is relevant for enterprise use cases involving document processing, browser-based workflows, and cross-application task management.
  • This is Anthropic’s second acquisition in a short period, following the purchase of Bun, which was tied to the Claude Code milestone. The pattern suggests Anthropic is actively acquiring specialized engineering teams rather than just technology assets.
  • For developers and businesses building agentic workflows on Claude, the improved computer use performance means more reliable automation of complex, real-world software tasks without requiring custom integrations or APIs for every application involved.

05:18 Justin – “It seems like every day I have to update Claude Code because they released a new feature or a new capability.” 

12:34 Improving skill-creator: Test, measure, and refine Agent Skills 

  • Anthropic has updated its skill-creator tool for Claude Agent Skills, now available on Claude.ai, Cowork, and as a plugin for Claude Code
  • The update brings software development practices like testing, benchmarking, and iterative refinement to skill authoring without requiring users to write code.
  • The core addition is an eval framework that lets skill authors define test prompts, describe expected outputs, and verify skill behavior across model updates. 
  • A practical example given is the PDF skill fix, where evals isolated a positioning failure on non-fillable forms and guided a targeted fix.
  • A new benchmark mode tracks eval pass rate, elapsed time, and token usage, and can be integrated into CI systems or local dashboards. Multi-agent parallel eval execution is also included to reduce test time and prevent context bleed between runs.
  • Comparator agents enable A/B testing between two skill versions or between a skill and no skill, with blind judging to reduce bias in assessing whether a change improves output quality.
  • Anthropic notes that as base-model capabilities improve, some capability-uptake skills may become unnecessary, and the eval framework is positioned as a step toward skills being defined by natural-language descriptions of desired outcomes rather than detailed implementation instructions.

13:54 Justin – “For things that are actually in pipelines or agentic capabilities where you want things to be specific, this is great.” 

14:35 Statement on the comments from Secretary of War Pete Hegseth

  • Anthropic has publicly refused to allow Claude to be used for mass domestic surveillance of Americans or fully autonomous weapons, citing concerns about current AI reliability and civil liberties. 
  • These two exceptions led to a breakdown in negotiations with the Department of War after months of discussions.
  • The Department of War is moving to designate Anthropic as a supply chain risk under 10 USC 3252, a designation Anthropic states would be the first time applied to a US adversary. Anthropic has indicated it will challenge any such designation in court.
  • From a practical standpoint, the legal scope of a supply chain risk designation is narrow. It would only affect the use of Claude on Department of War contract work, leaving commercial API customers, Claude.ai users, and non-DoW contractor use cases completely unaffected.
  • This situation raises a broader question for cloud and AI vendors about the terms under which they can negotiate acceptable use policies with government customers. 
  • The outcome could set a precedent for how American companies handle government contracts that conflict with their own usage restrictions.
  • Anthropic notes it has been deployed in US government classified networks since June 2024, making this dispute notable for the AI industry as more frontier model providers pursue federal contracts through programs like FedRAMP and classified cloud environments.

Statement from Dario Amodei on our discussions with the Department of War 

  • Anthropic has publicly refused the Department of War’s requests to remove two specific safeguards from Claude: restrictions on mass domestic surveillance use cases and on fully autonomous weapons systems. 
  • This is notable because Anthropic was already the first frontier AI company to deploy models in US classified networks, National Laboratories, and custom national security configurations.
  • The Department of War has threatened to label Anthropic a “supply chain risk,” a designation previously reserved for US adversaries, and to invoke the Defense Production Act to force removal of these safeguards. Anthropic notes that these two threats are contradictory since one frames Claude as a security risk while the other frames it as essential to national security.
  • The autonomous weapons position has a specific technical basis: Anthropic states current frontier AI systems lack sufficient reliability for fully autonomous target selection and engagement, and they offered to collaborate with the Department on R&D to improve reliability, an offer that was not accepted.
  • For cloud and enterprise listeners, this situation establishes a precedent in which an AI provider publicly declines government contract terms on safety grounds rather than on commercial grounds, with direct implications for how AI vendors structure acceptable use policies in high-stakes government and defense cloud deployments.
  • Anthropic has indicated it will support a smooth transition to another provider if offboarded, signaling that continuity planning for AI-dependent military operations is now a real operational consideration for defense cloud infrastructure teams.

Our agreement with the Department of War 

  • OpenAI signed a classified AI deployment agreement with the Pentagon using a cloud-only architecture, meaning models run on OpenAI infrastructure rather than on edge devices or government-controlled hardware, which is central to how they enforce their safety constraints.
  • The agreement includes three stated red lines: no mass domestic surveillance, no directing autonomous weapons systems, and no automated high-stakes decisions without human approval. 
  • OpenAI retains full control of the safety stack and has cleared personnel embedded with the deployment.
  • The cloud-only deployment model is the key technical differentiator here. By keeping models off edge devices, OpenAI argues it can run and update classifiers independently to verify red lines are not crossed, which would not be possible with on-premise or edge deployments.
  • The contract language locks in current surveillance and autonomous weapons laws as the standard, meaning even if those laws or DoD policies change in the future, usage must still comply with the standards in place at signing. This is a notable contractual mechanism for maintaining guardrails over time.
  • OpenAI requested that the same contract terms be made available to all AI labs, including Anthropic, framing this as an attempt to establish a consistent baseline for how the government engages with frontier AI providers on classified work.

21:04 Justin – “The precedent that could be set, potentially, that the government can declare any vendor they want to a supply chain risk feels like it’s gonna violate several amendments to the Constitution…” 

New Model Section

21:38 Gemini 3.1 Flash Lite: Our most cost-effective AI model yet

  • Google launched Gemini 3.1 Flash-Lite in preview, available through the Gemini API in Google AI Studio and Vertex AI, priced at $0.25 per million input tokens and $1.50 per million output tokens, positioning it as a cost-focused option for high-volume workloads.
  • Compared to 2.5 Flash, the new model delivers 2.5x faster Time to First Answer Token and 45% higher output speed according to Artificial Analysis benchmarks, while scoring 86.9% on GPQA Diamond and 76.8% on MMMU Pro.
  • The model includes configurable thinking levels, letting developers dial reasoning depth up or down depending on task complexity, which is useful for balancing cost and quality across different workload types.
  • Practical use cases highlighted include high-volume content moderation, translation, UI generation, and real-time dashboard creation, with early adopters like Latitude, Cartwheel, and Whering already using it in production.
  • For GCP customers running inference at scale, the combination of low per-token pricing and higher throughput speed makes this a practical option to evaluate against existing model choices in Vertex AI pipelines.

22:09 Google reveals Nano Banana 2 AI image model, coming to Gemini today

  • Google has released Nano Banana 2, technically named Gemini 3.1 Flash Image, which replaces both the standard and Pro variants of the previous Nano Banana model across Gemini, AI Studio, Vertex AI, and Flow simultaneously.
  • The model draws on Gemini 3.1 LLM web knowledge to improve object fidelity and infographic accuracy, and Google claims it delivers text rendering quality comparable to the previous Pro tier at Flash-tier speeds.
  • For developers building multi-character or complex scene workflows, the model supports consistent rendering of up to five characters and up to 14 distinct objects per workflow, with expanded output options ranging from 512px square to 4K widescreen.
  • The full replacement of prior Nano Banana variants means GCP customers on Vertex AI have no migration choice here, so teams relying on the previous Pro model for production workloads should validate outputs against the new model promptly.
  • Pricing details were not disclosed in the announcement, so Vertex AI customers should check the Vertex AI pricing page directly for updated image generation costs tied to the Gemini 3.1 Flash Image model.

22:32 Justin – “I’m excited to plug this one into our show cover generator; I’ve been using Nano Banana 1, and if you’ve checked out our show covers lately, you’ve noticed they’ve become fun cartoons based on our show titles.”  

22:54 GPT-5.3 Instant: Smoother, more useful everyday conversations 

  • OpenAI released GPT-5.3 Instant as the new default model in ChatGPT, available to all users today and to developers via the API as gpt-5.3-chat-latest, with GPT-5.2 Instant remaining available for paid users until June 3, 2026.
  • The update targets conversational quality issues that benchmarks typically miss, specifically reducing unnecessary refusals, moralizing preambles, and overly cautious responses that users flagged as frustrating in GPT-5.2 Instant.
  • Hallucination rates show measurable improvement: 26.8% reduction in high-stakes domains like medicine, law, and finance when using web search, and 19.7% reduction using internal knowledge only, based on OpenAI’s internal evaluations.
  • Web search integration is notably improved, with the model now balancing retrieved results against its own reasoning rather than defaulting to link lists, producing more synthesized and immediately usable answers.
  • Developers should note this is a drop-in update to the existing model endpoint, meaning applications using gpt-5.3-chat-latest will automatically get the improved behavior, which could affect any downstream applications that relied on the previous refusal or response patterns.

25:07 Matt – “Testing the models before you roll them out into production. One of the things… how do you actually test these models and prove they’re working? And a lot of customers and questionnaires all require measurable statistics.” 

AWS 

27:58 Amazon DC Impacted in Operation Epic Fury 

  • Two simultaneous outages hit AWS Middle East regions on March 1-2, with ME-CENTRAL-1 (UAE) suffering physical fire damage to a data center that knocked out two of three availability zones, and ME-SOUTH-1 (Bahrain) experiencing a localized single-AZ power failure.
  • The UAE incident demonstrated a critical edge case where S3, normally resilient to single-AZ loss, began failing for ingest and egress once a second AZ went down, highlighting that multi-AZ redundancy assumptions break down when two zones are simultaneously unavailable.
  • Recovery timelines extended beyond 24 hours in both regions due to the need for physical facility repairs, cooling system restoration, and coordination with local authorities, underscoring that some failure modes fall outside software-level remediation.
  • AWS recommended customers failover to EU regions for ME-CENTRAL-1 workloads, restore from EBS snapshots in unaffected regions, and use the allow-reassociation flag to migrate Elastic IPs to healthy AZs, which are standard DR playbook steps that many customers may not have pre-tested.
  • This incident is a practical reminder that multi-AZ deployments alone are insufficient for high-availability requirements in smaller regions with fewer AZs, and that cross-region DR plans with tested failover procedures are necessary for critical workloads.
  • Directly from Status Page: Due to the ongoing conflict in the Middle East, both affected regions have experienced physical impacts to infrastructure as a result of drone strikes. In the UAE, two of our facilities were directly struck, while in Bahrain, a drone strike in close proximity to one of our facilities caused physical impacts to our infrastructure. Finally, even as we work to restore these facilities, the ongoing conflict in the region means that the broader operating environment in the Middle East remains unpredictable. We recommend that customers with workloads running in the Middle East consider taking action now to back up data and potentially migrate your workloads to alternate AWS Regions

29:38 Justin – “This is a real big deal because as our show title said tonight… DR is going to become a real big deal now. If you’re in the business where you need to host data for other customers across the globe, your job just got a lot harder.” 

37:26 Amazon invests $50B in OpenAI, deepens AWS partnership with expanded $100B cloud deal 

  • Amazon is making a $50 billion investment in OpenAI as part of a $110 billion funding round that also includes SoftBank and NVIDIA, valuing OpenAI at $730 billion pre-money. 
  • Separately, OpenAI and AWS are expanding their existing cloud agreement by $100 billion over eight years, which analysts estimate could add roughly $17 billion annually to AWS revenue.
  • A key technical component of the deal is OpenAI committing to consume 2 gigawatts of capacity on Amazon’s Trainium chips, giving AWS a high-profile validation of its in-house AI silicon at a scale that helps justify Amazon’s $200 billion capital expenditure plan for 2026.
  • AWS and OpenAI will co-create a Stateful Runtime Environment delivered through Amazon Bedrock, allowing enterprise customers to build AI agents that retain context and handle complex multi-step tasks, with AWS serving as the exclusive third-party cloud distribution provider for OpenAI Frontier.
  • Microsoft retains exclusivity over stateless OpenAI API calls, meaning simple one-and-done AI requests still route through Azure, while Amazon is positioning AWS as the infrastructure layer for stateful, context-aware, and agent-based workloads where the compute intensity and revenue potential are substantially higher.
  • Amazon also maintains its existing partnership with Anthropic, meaning AWS customers now have access to models from two of the leading AI labs, which broadens the options available through Bedrock without requiring customers to commit to a single model provider.

41:29 Justin – “I am more and more convinced every day that we are in an AI bubble. I do not see how they’re going to generate the revenues required to cover the capital investments that all of these cloud providers are making.” 

43:18 AWS Security Hub Extended offers full-stack enterprise security with curated partner solutions

  • AWS Security Hub Extended is a new plan that bundles curated third-party security tools from partners like CrowdStrike, Okta, Splunk, Zscaler, and Proofpoint directly into the Security Hub console, covering endpoint, identity, email, network, and cloud security in one place.
  • AWS acts as the seller of record for all partner solutions, meaning customers get a single consolidated bill, pre-negotiated pay-as-you-go pricing, and no long-term commitments, which removes the overhead of managing separate vendor contracts.
  • All security findings from both AWS native services and partner tools are normalized using the Open Cybersecurity Schema Framework (OCSF) and automatically aggregated in Security Hub, making cross-environment threat correlation more straightforward.
  • Enterprise Support customers get unified Level 1 support across all participating solutions, which reduces the friction of figuring out which vendor to contact when issues span multiple tools.
  • The Extended plan is generally available now across all commercial AWS regions where Security Hub is supported, with both consumption-based and flat-rate pricing options available at aws.amazon.com/security-hub/pricing.

44:11 Justin – “Thank you, Amazon. It’s only taken you 10 years to get to this point – because this is cool. Build partnerships with your security vendors, standardize the inputs, and make connections for those things so they all connect together, and if I can do all that through my cloud vendor, who I already have commitments with? I think that’s fantastic.” 

Quick Hits

45:41 AWS announces pricing for VPC Encryption Controls 

  • Just pricing BUT CRAZY
  • VPC Encryption Controls exits free preview on March 1, 2026, introducing a fixed hourly charge per non-empty VPC with the feature enabled in either monitor or enforce mode, with no charge for empty VPCs.
  • The feature offers two operational modes: monitor mode audits for unencrypted traffic flows, while enforce mode actively blocks resources that would allow unencrypted traffic within or across VPCs in a region.
  • A notable billing consideration is that enabling encryption support on a Transit Gateway triggers standard VPC Encryption Controls charges for all attached VPCs, regardless of their individual encryption mode setting, even if those VPCs are empty.
  • For compliance-focused organizations, this feature provides a centralized mechanism to audit and enforce encryption-in-transit across VPC traffic flows, which is a common requirement in regulated industries like finance and healthcare.
  • Customers should audit how many non-empty VPCs they plan to enable this on before March 1, 2026, and pay close attention to Transit Gateway attachment costs, as those charges can accumulate across a large number of attached VPCs. Detailed regional pricing is available on the VPC pricing page.

46:00 Matt – “Go cry a little bit.” 

48:03 Policy in Amazon Bedrock AgentCore is now generally available

  • Policy in Amazon Bedrock AgentCore is now generally available, giving security and compliance teams a way to define and enforce tool access rules for AI agents without touching agent code, which is a meaningful separation of concerns for enterprise governance.
  • The natural language to Cedar conversion is a practical feature, letting non-developers author policies that automatically translate to the AWS open-source policy language, lowering the barrier for ops and compliance teams to participate in agent governance.
  • The AgentCore Gateway acts as an inline policy enforcement point, intercepting agent-tool traffic and evaluating each request before allowing or denying access, which mirrors familiar patterns from API gateway and service mesh architectures.
  • The feature is available across 13 AWS regions at launch, including major US, European, and Asia Pacific regions, giving organizations with data residency requirements reasonable coverage from day one.
  • Pricing details are not specified in the announcement, so teams evaluating this for production workloads should review the AgentCore pricing page and documentation at docs.aws.amazon.com/bedrock-agentcore/latest/devguide/policy.html before planning deployments.

49:27 Ryan – “I like the Cedar natural language processing, but I wonder how practical it is to write policies that allow agent-to-agent and tool communication.” 

GCP

57:07  Combat API sprawl using Apigee API hub

  • Apigee API hub now integrates directly with API Gateway to automatically synchronize API definitions, OpenAPI specs, and gateway configurations in near real-time, giving platform teams a single control plane for APIs spread across multiple gateways and platforms.
  • The new specification boost add-on, currently in public preview, uses AI to scan API specs for gaps like missing usage examples or undefined error codes, then generates an enhanced parallel version labeled specboost-draft without overwriting the original, so teams can compare before adopting.
  • The core problem being addressed is that incomplete or undocumented APIs cause AI agents to fail at function calling or miss APIs entirely, so centralizing and enriching specs directly improves agent reliability in agentic workflows.
  • Both features are available now, with API Gateway users seeing onboarding prompts directly in the console. 
  • Pricing details for the spec boost add-on are not specified in the announcement, so teams should check the Add-on management section of the API hub for current cost information.
  • Organizations running legacy specless proxies with no documentation stand to benefit most immediately, as the spec boost add-on can generate documentation for APIs that currently have none, making them visible to both developers and automated tools.

52:08 Matt – “Any undocumented API is always a problem, whether you’re using it or one team uses something they don’t know, or a client finds that should be a dark API that is public, and that always becomes a problem. So, a way to centralize that and kind of help address API sprawl in general is a great thing and will make people’s lives so much better.”

52:41 Improve chatbot memory using Google Cloud 

  • Google Cloud’s polyglot storage approach for chatbot memory combines Memorystore for Redis, Cloud Bigtable, and BigQuery to handle short, mid, and long-term conversation history, respectively, addressing a common scaling challenge for conversational AI applications.
  • Memorystore for Redis handles the hot layer with sub-millisecond latency using Redis Lists and RPUSH commands, while Bigtable serves as the durable mid-term store using a user_id#session_id#reverse_timestamp key pattern to enable efficient range scans across millions of simultaneous sessions.
  • Bigtable’s garbage collection policies allow teams to retain only recent data, such as the last 60 days, in the high-performance tier, while older data flows asynchronously to BigQuery via Pub/Sub and Dataflow for archival and analytics without impacting live application performance.
  • Cloud Storage handles unstructured multimedia artifacts using a URI pointer strategy with signed URLs, keeping the primary databases lean while maintaining secure, time-limited access to files generated or uploaded during conversations.
  • This architecture is relevant to any team building production-scale agentic applications on Vertex AI Agent Builder, particularly in industries like customer service, healthcare, and financial services, where maintaining accurate long-term conversation context is a compliance or user experience requirement. Pricing varies across each component based on storage volume and query usage.
  • Ryan loves this almost as much as he loves The Eagles.

Quick Hits

55:42 Spanner columnar engine in preview 

  • Spanner columnar engine is now in preview, adding columnar storage alongside traditional row-based storage to enable analytical query acceleration of up to 200x on live operational data without impacting transactional workloads. 
  • This addresses the longstanding trade-off between OLTP and analytical performance in a single horizontally scalable system.
  • The engine uses vectorized execution to process data in batches rather than row-by-row, and Spanner automatically routes large-scan analytical queries to the columnar representation. 
  • A new major compaction API also lets users manually trigger the conversion of existing data into columnar format.
  • A key use case is reverse ETL from Iceberg lakehouses, where processed analytical data from BigQuery, Databricks, Snowflake, or Oracle Autonomous AI Lakehouse gets loaded into Spanner for sub-second, high-concurrency serving. This targets scenarios like real-time dashboards, AI agent features, and user-facing applications that need low-latency access to precomputed insights.
  • The BigQuery integration is notably bidirectional, supporting federated queries via external datasets, reverse ETL pushes from BigLake Iceberg tables into Spanner, and live CDC streaming from Spanner back into BigQuery and BigLake Iceberg via Datastream. Oracle GoldenGate 26ai also now supports direct replication into Spanner.
  • The feature is available in preview and can be enabled on existing Spanner tables via a DDL change, with benchmark queries available on GitHub. 
  • Pricing follows standard Spanner node pricing, with no separate cost structure announced for the columnar engine specifically.

55:52 Justin – “If you don’t know anything about columnar databases, you don’t know how cool that is.” 

Azure

57:31 Announcing new public preview capabilities in Azure Monitor pipeline

  • Azure Monitor pipeline now supports TLS and mutual TLS for TCP-based ingestion endpoints in public preview, allowing teams to encrypt data in transit and enforce mutual authentication without relying on external proxies or custom gateways. 
  • This is particularly relevant for regulated environments and edge deployments where plain TCP ingestion no longer meets security requirements.
  • The new execution placement configuration gives Kubernetes users direct control over how pipeline instances are scheduled across nodes, addressing practical problems like port exhaustion, multi-tenant isolation, and availability zone distribution. 
  • Notably, if the cluster cannot satisfy placement rules, the pipeline simply will not deploy, making failures predictable rather than silent.
  • Data transformations allow teams to filter, aggregate, and normalize telemetry before it reaches Azure Monitor, including converting raw syslog or CEF messages into standardized schemas using KQL templates. This addresses the cost and complexity of ingesting high-volume noisy data and cleaning it up after the fact.
  • All three capabilities are in public preview today and target organizations running Azure Monitor pipeline on on-premises infrastructure, edge locations, and large Kubernetes clusters. 
  • Pricing is not separately detailed for these features, so costs would follow existing Azure Monitor ingestion and data processing rates, which vary by volume.

58:38 Matt – “It’s their ETL pipeline service… that’s kind of why this is a big deal.” 

59:43 Microsoft Sovereign Cloud adds governance, productivity, and support for large AI models securely running even when completely disconnected

  • Microsoft has expanded its Sovereign Cloud offering with three new capabilities targeting organizations that need to operate in fully disconnected environments: Azure Local disconnected operations, Microsoft 365 Local disconnected, and large model support in Foundry Local. 
  • These are aimed at government, defense, and regulated industries where external connectivity may be intentionally restricted or prohibited.
  • Azure Local disconnected operations allow organizations to run infrastructure with Azure governance and policy controls without any cloud connectivity, meaning management and workload execution stay entirely within customer-operated environments. This is now generally available worldwide, though pricing is not publicly listed and would depend on hardware and licensing configurations.
  • Microsoft 365 Local disconnected brings Exchange Server, SharePoint Server, and Skype for Business Server into the sovereign private cloud boundary, with Microsoft committing support for these workloads through at least 2035. This extends productivity capabilities to teams operating in air-gapped or isolated environments without requiring a cloud connection.
  • Foundry Local now supports large multimodal AI models running on-premises using NVIDIA GPU infrastructure, enabling local inferencing entirely within customer-controlled data boundaries. This moves beyond the small model support Foundry Local previously offered and is currently available to qualified customers rather than broadly.
  • The overall architecture is designed to span connected, hybrid, and fully disconnected modes under a consistent governance model, which reduces the operational complexity of managing separate toolsets for different connectivity scenarios. 
  • Organizations considering this stack should evaluate hardware requirements carefully, given the GPU dependencies for AI inferencing workloads.

57:25 Best Practice: Using Self-Signed Certificates with Java on Azure Functions 

Winner of the dumbest feature of the week: 

  • Java developers on Azure Functions Linux who connect to services secured by self-signed certificates frequently encounter SSL handshake errors because the JVM only trusts well-known Certificate Authorities by default. The recommended fix is creating a custom truststore in the persistent /home directory and pointing the JVM to it via JAVA_OPTS application settings.
  • The core reason to use /home for the truststore rather than system JVM directories is that the Linux Functions file system is ephemeral, meaning any changes outside /home are wiped on restart, scaling, or platform updates. Storing the keystore at a path like /home/site/wwwroot/my-truststore.jks ensures it survives those events.
  • One practical deployment gotcha worth noting is that ZipDeploy or Run From Package configurations can overwrite /home/site/wwwroot contents during code deployments, so storing the .jks file in a separate directory like /home/my-certs/ is a safer long-term choice.
  • Azure Functions Linux behaves differently from Azure App Service Linux in a notable way: App Service startup scripts often auto-import platform-managed certificates into the JVM keystore, but Functions does not, meaning OS-level tools like curl may succeed while Java code still throws handshake errors.
  • For teams that prefer not to manage server-side keystore files, two code-based alternatives exist: loading an Azure-managed certificate from /var/ssl/certs via custom SSLContext code, or bundling a locally built JKS file inside the application JAR. Both require application code changes, which adds maintenance overhead compared to the JAVA_OPTS approach.

1:03:46 Justin – “This is just a way for you to troubleshoot certificates even worse than you were troubleshooting it before.” 

Quick Hits

1:05:19 Announcing general availability of Azure Intel® TDX confidential VMs 

  • Azure has moved its Intel TDX confidential VMs to general availability, using 5th Gen Intel Xeon processors to provide hardware-enforced isolation that protects data while in use, which addresses a longstanding barrier for organizations running sensitive workloads in the cloud. Notably, existing applications can be deployed without any code changes.
  • The new VM series (DCesv6, DCedsv6, ECesv6, ECedsv6) introduces NVMe local SSD support as a first for Azure confidential VMs, delivering roughly 5x more throughput and about 16% lower latency compared to the previous SCSI generation, with IO latency reduced by approximately 27 microseconds.
  • These VMs are the first in Azure confidential compute to use the open-source OpenHCL paravisor, which increases transparency and allows customers to cryptographically verify workload integrity rather than simply trusting the cloud operator. 
  • The open-source component is available at github.com/microsoft/openvmm.
  • Intel AMX acceleration is built in, making these VMs suited for confidential AI workloads such as protecting model weights and running cross-organization AI pipelines without exposing underlying data. 
  • Azure Boost support adds up to 205k IOPS, 4 GB/s remote storage throughput, and 40 Gbps network bandwidth.
  • General availability is currently limited to the West US and West US 3 regions, with support for Windows Server 2025 and Ubuntu 22.04 and 24.04. Pricing is not specified in the announcement, and customers can request preview access in additional regions at aka.ms/acc/v6preview.

1:10:10 Generally Available: Draft & Deploy on Azure Firewall

  • Azure Firewall Policy now supports a two-phase Draft and Deploy workflow, meaning teams can stage policy changes before committing them, which reduces the risk of unintended disruptions during updates.
  • Previously, any policy change triggered a full firewall deployment, which could cause delays and service interruptions. 
  • This feature separates the authoring phase from the deployment phase, giving teams more control over when changes go live.
  • The feature is particularly useful for organizations with strict change management processes, as it allows multiple edits to be batched and reviewed before a single deployment is executed, rather than deploying each change individually.
  • This is now generally available, so production workloads can rely on it. Azure Firewall Policy pricing remains consumption-based, and customers should check the Azure Firewall pricing page at azure.microsoft.com for current rates, as costs vary by policy tier and region.
  • Teams managing complex or high-traffic environments will benefit most, since reducing the frequency of full deployments directly translates to fewer maintenance windows and more predictable firewall behavior.

1:10:27 Azure Container Registry Premium SKU Now Supports 100 TiB Storage

  • Azure Container Registry Premium SKU now supports up to 100 TiB of storage, a 2.5x increase from the previous 40 TiB cap, with no configuration changes required for existing registries to benefit automatically.
  • The increase directly addresses a real operational pain point where enterprises were splitting workloads across multiple registries just to stay under limits, adding complexity to access control and networking that had nothing to do with actual business requirements.
  • AI and ML workloads are a clear driver here, as teams storing large model artifacts, training outputs, and inference containers were consuming registry capacity faster than anticipated, alongside normal container workload growth.
  • Microsoft also improved geo-replication data sync speeds for new replicas and added a storage consumption view in the Azure Portal Monitoring tab, two improvements that had been customer requests for some time.
  • The 100 TiB limit is exclusive to Premium SKU, so teams on Basic or Standard tiers will need to upgrade to access it, though Premium also includes geo-replication, private endpoints, and enhanced throughput. 
  • Pricing details for Premium SKU storage are available at the Azure Container Registry pricing page.

1:10:47 Ryan – “So now instead of two windows container images you can store FOUR.” 

1:13:37 New Azure API management service limits 

  • Azure API Management is rolling out updated resource limits starting March 2026, aligning classic tier limits with v2 tier limits across entities like API operations, tags, products, and subscriptions. This affects all service tiers in a phased rollout over several months.
  • Existing classic tier customers whose usage exceeds the new limits will be grandfathered in, with their limits set 10% above observed usage at the time the new limits take effect. 
  • New services and those under the new thresholds will be subject to the updated limits immediately.
  • Limit increase requests will only be considered for Standard, Standard v2, Premium, and Premium v2 tiers, with Premium customers receiving priority. Requests are evaluated case by case and are not guaranteed, so teams relying on high resource counts should audit their usage now.
  • Before requesting a limit increase, Microsoft recommends reviewing the Manage Resources Within Limits documentation at learn.microsoft.com, as some increases can introduce latency or affect service capacity. 
  • This is a practical reminder that limits exist to protect shared infrastructure performance, not just to restrict usage.
  • Pricing for API Management tiers varies, with the Developer tier starting around $0 for testing and the Premium tier running substantially higher for production workloads. Customers on lower tiers, like Consumption or Developer, cannot request limit increases, so production workload planning should account for tier selection early.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod





Download audio: https://episodes.castos.com/5e2d2c4b117f29-10227663/2391099/c1e-9202f2oknqh421p0-25096n60cd5n-efnuoq.mp3
Read the whole story
alvinashcraft
17 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

From Tailnet to platform (Interview)

1 Share

Adam talks with Tailscale co-founder and Chief Strategy Officer David Carney about where Tailscale is headed next: TSIDP, TSNet, multiple tailnets, and Aperture. They get into clickless auth (via TSIDP), TSNet apps, multiple tailnets for isolation and control, and Aperture, Tailscale’s private AI gateway for API key management, observability, and agent security.

Join the discussion

Changelog++ members save 8 minutes on this episode because they made the ads disappear. Join today!

Sponsors:

  • Augment Code – Adam loves “Auggie” – Augment Code’s CLI that brings Augment’s context engine and powerful AI reasoning anywhere your code goes. From building alongside you in the terminal to any part of your development workflow.
  • NordLayer – Toggle-ready network security for modern businesses. Get an exclusive offer: up to 22% off NordLayer yearly plans plus 10% on top with the coupon code changelog-10-NORDLAYER. Try it risk-free with a 14-day money-back guarantee at nordlayer.com/thechangelog
  • Squarespace – Turn your expertise into a business with the all-in-one platform for websites, services, and getting paid. Use code CHANGELOG to save 10% on your first website purchase.
  • Fly.ioThe home of Changelog.com — Deploy your apps close to your users — global Anycast load-balancing, zero-configuration private networking, hardware isolation, and instant WireGuard VPN connections. Push-button deployments that scale to thousands of instances. Check out the speedrun to get started in minutes.

Featuring:

Show Notes:

Send an email to David ~> aperture@tailscale.com

Mentioned in this episode

Something missing or broken? PRs welcome!





Download audio: https://op3.dev/e/https://pscrb.fm/rss/p/https://cdn.changelog.com/uploads/podcast/679/the-changelog-679.mp3
Read the whole story
alvinashcraft
17 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

WW 974: DIY Crocs - Project Helix Details From GDC 2026

1 Share

From bug-busting AI that's transforming Firefox to personal coding breakthroughs, the team breaks down how practical applications are cutting through skepticism and reshaping developer workflows. Plus, hear why lighter Patch Tuesdays are refreshing from time to time!

Windows 11

  • Patch Tuesday's familiar list of updates: Network speed test, Camera tilt and pan controls, sysmon, RSAT improvements, Quick Machine Recovery improvements, WEBP support for desktop wallpaper, Emoji 16.0, etc.
  • It's been a light year so far for Patch Tuesday features - that's a good thing
  • New builds for Canary, Dev, and Beta late last week. Canary is nothing, Dev/Beta get Administrator Protection, Drag Tray refinements, File Explorer improvements, and fixes
  • Android 16 QPR3 brings Desktop Mode to Android devices - and a hands-on with Pixel phones and tablets shows the way forward for Android-based laptops later this year
  • Intel has new gaming processors for creators and gamers and they look excellent and are inexpensive

AI and dev

  • Copilot Cowork is literally Claude Cowork in Microsoft 365 - "Wave 3" for Microsoft 365 Copilot begins with a lot of agentic features, in private preview at first
  • Google Docs, Sheets, Slides, and Drive get big Gemini updates for consumers and Workspace customers
  • Mozilla partners with Anthropic to use AI to find bugs, and it's paying off nicely
  • Visual Studio Code moves to a weekly update schedule
  • The .NET 11 Preview 2 is here

Xbox and gaming

  • Microsoft starts talking up next Xbox console! It's called Project Helix and, yes, it will run Windows games
  • New Xbox Mode is on the way
  • Project Helix dev kits to game makers in 2027
  • Satya Nadella explains why he/Microsoft are "long" on gaming
  • Gaming is a core identity for Microsoft alongside platforms, developers, and knowledge workers

Tips and picks

  • Tip of the week: Nostalgia with a purpose
  • App pick of the week: Stardock Clairvoyance
  • RunAs Radio this week: SQL Server in 2026 with Bob Ward
  • Brown liquor pick of the week: Canadian Centennial Rye Whisky

Hosts: Leo Laporte, Paul Thurrott, and Richard Campbell

Download or subscribe to Windows Weekly at https://twit.tv/shows/windows-weekly

Check out Paul's blog at thurrott.com

The Windows Weekly theme music is courtesy of Carl Franklin.

Join Club TWiT for Ad-Free Podcasts!
Support what you love and get ad-free audio and video feeds, a members-only Discord, and exclusive content. Join today: https://twit.tv/clubtwit

Sponsors:





Download audio: https://pdst.fm/e/pscrb.fm/rss/p/mgln.ai/e/294/cdn.twit.tv/megaphone/ww_974/ARML4920567314.mp3
Read the whole story
alvinashcraft
17 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Introducing Fireworks AI on Microsoft Foundry: Bringing high performance, low latency open model inference to Azure

1 Share

Across industries, organizations are increasingly standardizing on open models to gain greater control over performance, cost, customization, and the security and compliance required for enterprise deployment. Open models give teams the flexibility to choose the right architecture for each workload and avoid lock‑in to a single model provider as their needs evolve.

As adoption grows, however, performance alone is no longer enough. Teams need a consistent way to evaluate models quickly, operate them safely in production, and improve them over time without rebuilding infrastructure or fragmenting their tooling. Too often, organizations are forced to assemble bespoke serving stacks, slowing innovation and making it harder to scale and compound progress.

Microsoft Foundry is designed to address this challenge. It serves as a unified system of record and enterprise control plane for AI, bringing together models, agents, evaluation, deployment, and governance into a single experience. With Microsoft Foundry, teams can move from experimentation to production with confidence, using the models and frameworks that best fit their requirements, while relying on a consistent operational foundation.

Today, we’re announcing the public preview of Fireworks AI on Microsoft Foundry, bringing high‑performance open model inference into Azure. This integration reflects Microsoft Foundry’s broader direction: providing a single place where developers can not only run open models efficiently but also customize and operationalize them as part of a complete enterprise‑ready AI lifecycle.

Fireworks AI models on Microsoft Foundry: A single place for open models

Fireworks AI delivers industry-leading inference for open models, and Microsoft Foundry is what makes that performance usable at enterprise scale. Accessing Fireworks AI through Microsoft Foundry gives teams a single, trusted control plane to evaluate, deploy, customize, and operate open models alongside the rest of their AI stack.

As open models mature, customization increasingly extends beyond training. Teams need consistent ways to configure, deploy, optimize, govern, and iterate on models in production without fragmenting tools or infrastructure. Microsoft Foundry provides the environment where these customization and operational workflows are standardized, while Fireworks AI supplies the performance and efficiency needed to run open models at scale. This means teams can move from experimentation to production using open models without stitching together separate tools, contracts, and deployment paths.

Together, Fireworks AI and Microsoft Foundry enable a more complete and sustainable approach to working with open models combining fast, efficient inference with a platform designed to support enterprise open model operations over time.

With Fireworks AI on Foundry, developers can get access to best-in-class inferencing for open models, including optimized deployments for custom weight models. Fireworks AI is a market leader for high performance inference for open models. Its engine already runs at internet scale processing over 13T tokens daily, sustaining about 180 thousand requests per second, and generating over 1,000 tokens per second on large models, substantiated by leading benchmark performance on Artificial Analysis. This performance is now available on Foundry.

Developers can log into Foundry and access these open models with Fireworks AI today:

  • DeepSeek V3.2
  • OpenAI gpt-oss-120b
  • Kimi K2.5
  • MiniMax M2.5 (new)

This brings a new open model (MiniMax M2.5) to Foundry with serverless support and offers optimized inference for already popular open models.

With Fireworks AI in Microsoft Foundry, developers can:

  • Evaluate models faster with day‑zero access and support: Start building immediately with access to state-of-the-art open models from Fireworks AI through a single Azure endpoint via Foundry.
  • Optimize inference: Requests to open models are served by Fireworks’ high‑throughput inference stack for fast performance with Azure‑grade governance.
  • Run the models you already trust: With bring-your-own-weights (BYOW), you can upload and register quantized or fine‑tuned weights trained elsewhere without changing the serving stack.
  • Choose the right pricing model for your workload: Use serverless, pay-per‑token inference to experiment securely and quickly with Data Zone Standard or choose provisioned throughput units (PTUs) for predictable, steady-state performance with base or custom models. Whether you’re optimizing for agility or efficiency, you get flexibility without managing infrastructure.
  • Operate with enterprise trust and scale: We are committed to enabling customers to build production-ready AI applications quickly, while maintaining the highest levels of safety and security. Foundry provides an end-to-end workspace for agent development, evaluation, and deployment, including unified governance, observability, and agent-ready tooling.

The future of Fireworks and AI use cases

Microsoft Foundry is evolving to support the full lifecycle of open models—from early evaluation through production operation and ongoing optimization. As teams scale their use of open models, having a consistent, enterprise‑ready foundation becomes increasingly important.

By integrating Fireworks AI into Microsoft Foundry, developers gain access to high‑performance inference today while building on a platform designed to support deeper customization and enterprise operations over time. This approach gives teams the confidence to adopt open models not just for what they can do now, but for how they can grow, adapt, and operate reliably as their AI ambitions expand. We’re looking forward to seeing how developers and enterprises use Fireworks AI on Microsoft Foundry to power the next generation of intelligent applications.

To get started:

  1. Go to Microsoft Foundry models and select Fireworks AI open models in the model catalog collection.
  2. Select the open model hosted by Fireworks.
  3. View the model card.
  4. Select your deployment option—serverless or PTU—and deploy.

Learn more about Fireworks on Microsoft Foundry

The post Introducing Fireworks AI on Microsoft Foundry: Bringing high performance, low latency open model inference to Azure appeared first on Microsoft Azure Blog.

Read the whole story
alvinashcraft
17 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Tracking Agent Metrics

1 Share

Agent Economics Dashboard

Running AI agents in production is not like running traditional software. Token costs accumulate continuously, latency spikes are unpredictable, and the messaging infrastructure that connects agents, subagents, and MCP servers all needs to perform reliably. Without observability, you are flying blind.

RockBot and its entire ecosystem — subagents, the research agent, MCP servers, and the internal messaging pipeline — are all instrumented with OpenTelemetry (OTel). This means every LLM request, every token consumed, every message dispatched, and every bit of latency is tracked and exported to the observability stack running in my Kubernetes cluster.

In my case, I host all the RockBot related services in my Kubernetes cluster, so this blog post focuses on what I’ve done. However, OTel is supported by all major cloud vendors and environments, and nearly any modern instrumentation or monitoring software works with it. As a result, the RockBot framework’s OTel support means that it plugs into Azure, AWS, or almost any other cloud seamlessly - in a manner similar to what I have set up in Kubernetes.

The OTel Stack in Kubernetes

The Kubernetes cluster runs a standard cloud-native observability stack:

  • OpenTelemetry Collector — receives metrics, traces, and logs from all instrumented services and routes them to the appropriate backends
  • Prometheus — scrapes and stores the time-series metrics data
  • Grafana — provides dashboards and alerting on top of the collected data

Every component in the RockBot ecosystem—the primary agent, any spawned subagents, the research agent, and the various MCP servers—emits OTel metrics. This gives a unified, aggregated view across the entire agentic system rather than having to piece together logs from individual services.

What Gets Instrumented

Instrumentation falls into broad categories: LLM economics, agent usage, messaging pipeline health, operational health.

For example:

LLM Economics

The economics dashboard captures everything related to the cost and efficiency of LLM calls:

  • Cost rate ($/hr) and total cost (window) — how much is being spent on LLM inference right now and over a rolling window
  • Avg cost per turn — the average spend per agent conversation turn, a useful signal for understanding task complexity trends
  • Token consumption rate — input and output tokens per minute, broken down by model
  • Token efficiency — the output/input token ratio over time; a rising ratio can indicate the agent is generating increasingly verbose responses
  • LLM calls per turn — how many LLM invocations happen per agent turn, which helps identify whether subagent or tool orchestration is driving up call counts

At a glance, the current average cost per turn is $0.1461, with roughly 3.5 LLM calls per turn on average and a total of 6.21 agent turns tracked during the window.

LLM Request Throughput and Latency

Agent Usage Dashboard

The usage dashboard digs into the raw mechanics of LLM calls:

  • LLM requests/min — the request rate across all agents and services, showing traffic spikes as agents become active
  • LLM request latency (p50/p95) — response times from the LLM backends, with percentile breakdowns to surface tail latency issues
  • Input/output tokens (window) — rolling token totals by model
  • Avg tokens/request — currently sitting around 22.4K tokens per request, reflecting the context window sizes in use

Latency and throughput together tell you whether the LLM routing layer is performing well. If p95 latency climbs while request rate is low, that’s a signal to investigate the upstream model providers or the routing-stats MCP server.

Messaging Pipeline

RockBot uses an internal messaging pipeline to coordinate between the primary agent, subagents, and the various background services. The messaging section of the usage dashboard tracks:

  • Message throughput (published/min) — how many messages are flowing through the pipeline
  • Pipeline dispatch latency — how long it takes from a message being enqueued to it being dispatched (p50/p95)
  • Active in-flight messages — the current backlog of unprocessed messages
  • Messaging publish latency — the time to write a message to the pipeline
  • Messaging process latency — the time from pickup to completion on the consumer side

This is particularly useful for spotting backpressure. If in-flight messages climb while throughput stays flat, something downstream is stalling—whether that’s a subagent blocked on a slow tool call, or an MCP server under load.

⚠️ I know people tend to favor using the HTTP protocol because it is well-understood and deterministically synchronous. In reality though, a queued messaging system is far more resilient, cheaper, and easier to manage. RockBot supports both models, but I almost always default to queued messaging when given a choice.

Alerting

Having dashboards is only half the value. The real payoff is alerts. Grafana alert rules fire on conditions like:

  • Cost rate exceeding a threshold (unexpected runaway agent behavior)
  • LLM request latency p95 spiking (model provider degradation)
  • Message pipeline backlog growing beyond a threshold (subagent stalls)
  • Token consumption rate anomalies (prompt injection or unexpected task expansion)

Alerts land in whatever notification channel you configure—Slack, PagerDuty, email, or any other Grafana-supported contact point.

Why This Matters

Observability for agentic systems isn’t optional—it’s a prerequisite for running them reliably at any scale. A single misconfigured tool or a prompt that causes an agent to loop can silently burn through token budget before anyone notices. An MCP server with degraded performance can cause cascading latency throughout the entire agent ecosystem.

By instrumenting everything with OTel and aggregating it in Grafana, you get:

  1. Cost visibility — know what you’re spending and catch runaway costs early
  2. Performance baselines — understand normal latency so anomalies stand out
  3. Pipeline health — ensure the messaging backbone connecting your agents is functioning correctly
  4. Audit trail — metrics data supports post-incident analysis when something goes wrong

The RockBot ecosystem is designed so that every new agent, every new MCP server, and every new subsystem emits OTel metrics by default. As the system grows—more agents, more MCP integrations, more automation—the observability grows with it automatically.

If you’re building production agentic systems, treat OTel instrumentation as a first-class requirement, not an afterthought.

Read the whole story
alvinashcraft
17 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Temporary rollback: build identities can access Advanced Security: read alerts again

1 Share

If you use build service identities like Project Collection Build Service to call Advanced Security APIs, the Advanced Security permission changes in Sprint 269 broke that. We restricted API access for build identities as a security improvement but failed to provide an early notice for customers that relied upon this for various automations.

We’re rolling it back temporarily. The restriction will be re-enforced on April 15, 2026.

What you should do

Action is required. The recommended path is a service principal with Advanced Security: Read alerts permissions for your Advanced Security-enabled repositories. Scope it narrowly, and if the service principal isn’t committing code, it won’t consume an Advanced Security committer license.

Status checks in Sprint 272

We’re also shipping status checks soon, which give teams a native way to gate on security posture without API-driven alert mutations from pipeline identities.

ado status checks image

This won’t replace every automation scenario, though it enables pull request-time blocking on the presence of high and critical alerts.

Have feedback or hitting gaps moving to a service principal? Let us know.


Action required by April 15: move API automation to a service principal with Advanced Security: Read alerts or watch for status checks in Sprint 272.

The post Temporary rollback: build identities can access Advanced Security: read alerts again appeared first on Azure DevOps Blog.

Read the whole story
alvinashcraft
18 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories