The demand for AI data centers has reached a fever pitch - stressing the power grid and people's nerves. Does putting those data centers in space make any sense?
Join Richard Campbell as he explores the challenges and potential of putting data centers into orbit. The balance of cost, performance, latency, power, cooling, and reliability makes for a difficult mix - but the potential for the future is significant!
Ryan welcomes Cricket Liu, DNS expert and Chief Evangelist at Infoblox, to the show to talk all things DNS. They cover the evolution of one of the oldest DNS server implementations, BIND, and what the future holds for protected DNS configurations; the realities of security threats like DDoS and DNS spoofing; and why outages often trace back to a lack of understanding of DNSβs fundamental role.βββββο»Ώβο»Ώββββββο»Ώο»Ώβο»Ώβββββββββο»Ώββββββο»Ώββββββο»Ώβββββββο»Ώβο»Ώββββββο»Ώββββββο»Ώβββο»Ώββββο»Ώβββββββο»Ώο»Ώββββββο»Ώββββββββββο»Ώβββββββββββββββο»Ώβββββββββββο»Ώβββο»Ώβββο»Ώβββο»Ώβο»Ώβο»Ώββββο»Ώο»Ώββο»Ώο»Ώβββο»Ώββο»Ώββο»Ώβο»Ώββο»Ώββο»Ώβο»Ώβο»Ώβββββββββο»Ώβββο»Ώββο»Ώο»Ώβο»Ώβββββββο»Ώββββββββο»Ώβββο»Ώο»Ώβο»Ώβο»Ώββο»Ώββββο»Ώββο»Ώο»Ώββο»Ώο»Ώββββββο»Ώββο»Ώββββββββο»Ώββο»Ώββββο»Ώο»Ώβββββββββββββο»Ώββββο»Ώο»Ώββο»Ώβββο»Ώο»Ώβββββββββο»Ώο»Ώββο»Ώβββο»Ώβββββββο»Ώβο»Ώββββββο»Ώββο»Ώβββββββο»Ώββββββββο»Ώο»Ώββο»Ώββο»Ώβο»Ώβββββββββο»Ώο»Ώββββο»Ώβββββββο»Ώβο»Ώβββββββο»Ώβββο»Ώβββο»Ώββββο»Ώββο»Ώβββο»Ώβββο»Ώβββο»Ώβο»Ώββο»Ώββο»Ώβββββββο»Ώβββο»Ώβο»Ώββο»Ώββββββο»Ώβββββββββο»Ώββο»Ώββο»Ώβββο»Ώβββββββββββο»Ώβββο»Ώβο»Ώβο»Ώβββββββββββββο»Ώβο»Ώβββο»Ώβββο»Ώβο»Ώβο»Ώβββο»Ώβββο»Ώβββββββο»Ώο»Ώβββββββο»Ώβββο»Ώο»Ώβββο»Ώββββββο»Ώο»Ώβο»Ώβο»Ώβο»Ώβββο»Ώβο»Ώβο»Ώβββββββο»Ώβββββββο»Ώο»Ώββββββο»Ώβββββο»Ώβββββο»Ώβββο»Ώβββο»Ώβββο»Ώο»Ώο»Ώβββββββββο»Ώβο»Ώβββββββββο»Ώββββο»Ώββο»Ώο»Ώββββββο»Ώβββο»Ώβββο»Ώβββο»Ώβο»Ώβββββο»Ώβο»Ώβββββββββο»Ώβββββββββββο»Ώβββββββββο»Ώββο»Ώββο»Ώβο»Ώββο»Ώββο»Ώβο»Ώβο»Ώβββββββββο»Ώβββο»Ώββο»Ώο»Ώβο»Ώβββββββο»Ώββββββββο»Ώβββο»Ώο»Ώβο»Ώβο»Ώββο»Ώββββο»Ώββο»Ώο»Ώβββββββββββββο»Ώο»Ώββββο»Ώβββββββο»Ώβο»Ώβββββββο»Ώβββο»Ώβββο»Ώββββο»Ώββο»Ώβββο»Ώβββο»Ώβββο»Ώβο»Ώββο»Ώββο»Ώβββββββο»Ώβββο»Ώβο»Ώββο»Ώββββββο»Ώβββββββββο»Ώββο»Ώββο»Ώβββο»Ώβββββββββββο»Ώβββο»Ώβο»Ώβο»Ώβββββββββββββο»Ώβο»Ώβββο»Ώβββββββο»Ώβββο»Ώβββο»Ώβββββββο»Ώο»Ώβββββββο»Ώβββο»Ώο»Ώβββο»Ώββββββο»Ώο»Ώβο»Ώβο»Ώβο»Ώβββββββο»Ώβββββββο»Ώβββββββο»Ώο»Ώββββββο»Ώβββββο»Ώβββββο»Ώβββο»Ώβββο»Ώβββββββο»Ώβββββββο»Ώβββο»Ώβο»Ώβο»Ώβββββββββο»Ώβο»Ώβββββββο»Ώβββββββο»Ώο»Ώββο»Ώβββο»Ώββββββββο»Ώββββββο»Ώβο»Ώβββββββββββββββββο»Ώο»Ώβ
A Microsoft MVP Blog post on Visual Studio Live!'s longevity arrives as the 2026 conference series continues with upcoming stops at Microsoft HQ, San Diego and Orlando.
Join our next community call on June 23, 2026, to explore Microsoft 365 Copilot Cowork and learn how it can really help you get stuff done.
Reminder: Our community calls are in the Teams webinar format, so you must registerto receive the link to join. The join link will be sent to you in email with your webinar registration confirmation.
The calls will still start at 5 minutes past the hour for both sessions (at 8:05 AM and 5:05 PM PT), and it will still end at the top of the hour (9:00 AM and 6:00 PM PT, respectively).
While our calls are open to everyone, you must be a member of the Microsoft 365 Champion Program in order to access the presentation materials - the access link is in the initial welcome email and the monthly newsletter emails sent the week before the community calls.
If you have not yet joined our Champion community, sign up here to get access to the monthly newsletters, calendar invites, and program assets (e.g., the presentations).
This week, we discuss the Fable ban, SpaceX's $60B Cursor buy, and why Lovable wins when AI picks your stack. Plus, Europeans are at the World Cup and already drank Boston dry.
Welcome to episode 358 of The Cloud Pod, where the weather is always cloudy!Β
Justin, Matt, and Ryan (who, rumour has it, was working on an Eagles music podcast) are in the studio this week to bring you all the latest in AI and cloud news (and begging for a AI spend limit increase), including anthropic wanting everyone β except themselves β to slow down AI development, GitHubβs insane number of commits, and even an announcement from CoreWeave, plus so much more. Letβs get started!Β
Titles we almost went with this week
Stop Configuring Domains One by One Like a Peasant
SSH Into Your AI Agent Like Itβs 1999
Your AWS Bill Finally Has an AI Babysitter
Stop Blaming Engineering, the AI Will Do It Now
GPU Queue Anxiety Meet Your Serverless Spark Therapist
One Wildcard Certificate to Rule All Subdomains
One PTU Reservation to Rule All Regions
Twelve Billion Parameters Walk Into a Laptop
Squeezing Gemma 4 Until the Bits Cry
Azure Cobalt 200 VMs Are Really Arm-ed and Dangerous
AI has gone all Fables and Myth
Arm-ed she blows: but probably not to a region near you
Dash to change your password as Dashlane gets owned
Siri AI shows just how slow Gemini is
AI Announces going public, and then spreads Myths about AI development
A big thanks to this weekβs sponsors:
There are many cloud cost management tools out there, but only Archera provides insured commitments. It sounds fancy, but itβs really simple. Archera gives you the cost savings of a 1 or 3-year AWS Savings Plan with a commitment as short as 30 days. If you do not use all the cloud resources you have committed to, Archera will literally cover the difference. Other cost management tools may say they offer βinsured commitmentsβ, but remember to ask: Will you actually give me my rebate? Because Archera will.Β
GitHubβs scale challenge has grown substantially beyond earlier projections.Β
The platform processed 1 billion commits in all of 2025, but now handles 1.4 billion commits per month, with AI agents alone generating over 17 million pull requests monthly.
The technical remediation work has shifted from surface-level scaling to architectural rebuilding. GitHub has addressed MySQL contention, moved webhooks off MySQL entirely, rewritten the GitHub Actions job dispatch system, and is migrating performance-sensitive code from its Ruby monolith to Go.
GitHubβs migration to Microsoft Azure, previously reported as a capacity move, is now described as a deeper infrastructure overhaul.Β
The goal is service isolation so that a degraded subsystem like Actions does not cascade failures to Git or other core services.
Microsoft is providing engineering support from teams with experience scaling systems at comparable load levels, which represents a more direct operational involvement than what was previously discussed.
New feature releases like the Copilot CLI app are being developed outside the core GitHub.com infrastructure, which GitHub says allows continued product work without adding risk to the systems currently under remediation.
03:0 Ryan β βIβd actually like to see AI coding take this up a little bit, because I think it is a ridiculous sort of growth that I donβt think is sustainable, and so much of vibe-coded garbage is really bloatedβ¦But there are definitely functionality things that it can do a lot more efficiently, and doesnβt.βΒ
The Iceberg v3 support adds VARIANT for semi-structured data, row lineage, deletion vectors, nanosecond timestamps, and geospatial types, closing gaps that previously made Iceberg impractical for many workloads.
Horizon Catalog now supports full bidirectional read and write access from external engines like Spark, Trino, and PyIceberg via vended credentials, meaning teams can define governance policies once in Snowflake and have them enforced across any compatible engine without data migration.
Zero-copy integrations with SAP (GA), Salesforce, and Workday (private preview) bring enterprise system data into Snowflake without ETL pipelines, preserving semantic context so AI agents reason over current, governed data rather than stale copies.
Managed Iceberg replication and failover are coming soon to GA, with an Optimized Refresh feature in public preview showing 1.6x to 22x faster replication performance in preview testing, which directly reduces Recovery Point Objectives for mission-critical workloads.
Horizon Context and Semantic View Autopilot (GA) addresses the semantic fragmentation problem by automatically generating and maintaining shared business definitions across databases, BI tools, and data pipelines, giving AI agents a consistent semantic layer rather than conflicting definitions across systems.
Snowflakeannounced CoCo at Summit 2026, expanding it from an AI coding agent into a full AI development platform with a native desktop app for Windows and macOS, Cloud Agents running inside Snowsight, an Agent SDK, and upcoming Slack and mobile integrations.Β
Each Cloud Agent session provisions an isolated container that can run Python, shell commands, dbt builds, and web searches with no local setup required.
On the ADE-Bench framework from dbt Labs, CoCo achieved a 72.1% pass rate on real-world data engineering tasks, outperforming both Claude Code and OpenAI Codex, which each scored 65.1%.Β
CoCo also used 51% fewer tokens and completed tasks 8% faster than Claude Code on Opus 4.7, attributed to targeted data exploration and native tool integrations with Snowflake, dbt, and Airflow instead of bash-based workflows.
The CoCo Agent SDK packages the same agentic engine as an installable library for JavaScript and Python, giving developers programmatic access to Snowflake querying, SQL execution, codebase search, and file editing without building that infrastructure themselves. This allows platform engineers to embed CoCo into CI/CD pipelines, backend services, and internal tools.
Governance is enforced at the infrastructure level, with every CoCo operation running under the userβs existing Snowflake RBAC, LLM inference staying within Snowflakeβs security perimeter, and full prompt logging, query tagging, and audit trails available for admin oversight.Β
This addresses a common gap where generic coding agents generate plausible-looking code that fails in governed production environments.
Snowflake rebranded Snowflake Intelligence to CoWork, positioning it as a personal work agent for knowledge workers that combines proactive task automation, multi-agent orchestration, and persistent memory across sessions.Β
The system moves beyond reactive Q&A toward background monitoring, scheduled analysis, and direct action in tools like Slack, Gmail, Salesforce, and Jira via MCP connectors.
The upcoming Cortex Sense context layer is a notable technical addition, automatically learning business definitions from query history, dashboards, and metadata without manual configuration. Internal testing showed 83% accuracy on complex enterprise queries with Cortex Sense enabled, compared to 47% without it and 23% for frontier coding agents using Snowflake MCP.
Deep Research, moving to general availability soon, uses a multi-agent swarm orchestration system to analyze both structured and unstructured enterprise data in parallel, outperforming single-agent systems by over a third on Snowflakeβs Hybrid Deep Research Benchmark.Β
This allows users to get fully cited analytical reports in minutes on questions that previously required days of analyst work.
Several features are entering public or private preview, including Memory for persistent user preferences, User Skills for recording reusable multi-step workflows, Async Agent API for long-running background tasks, and an iOS mobile app for full CoWork access on the go.
The governance model is worth noting for enterprise buyers: every agent action is scoped by role-based access controls, admin-defined policies determine what agents can do autonomously versus what requires human approval, and a complete audit trail logs all actions with policy reasons.
10:29 Justin β βI assume Anthropic will be suing them any moment for trademark infringement, but nice to see that youβre getting some smartness for the data friends who desperately need all the DevOps help they can get. So I appreciate theyβre getting these tools.β
Anthropic published a blog post calling on major AI labs to consider slowing development, citing the risk of recursive self-improvement, where AI systems could enhance their own capabilities without human intervention. Co-author Jack Clark estimated this could occur within two years.
The proposal draws a direct parallel to nuclear arms control, suggesting a global agreement and verification regime. Anthropic noted a key challenge: training runs are far easier to conceal than missile silos, raising practical questions about enforcement.
The call for a slowdown comes as Anthropic reported an annualized revenue run rate on track for $50 billion by the end of June 2026, up from $9 billion at the end of 2025, and filed confidential IPO paperwork at a valuation near $1 trillion.
Critics, including David Sacks, characterized the move as regulatory capture, arguing that established players advocating for development restrictions could disadvantage newer or smaller competitors in the AI space.
For cloud practitioners, the broader implication is that compute governance and training run transparency may become compliance considerations, particularly if international frameworks modeled on arms treaties gain traction among governments.
16:41 Ryan β βThis has been what people have been sort of warning for ages with AI development, and this isnβt anything new. Iβm surprised by the timing of it because it doesnβt make sense to me that theyβre doing this now, but this is a huge concern. And I know just from trying to secure workloads in my day job, you try to put human and loop flows in place, but you know, people donβt really want to be in the loop. The whole advantage of using AI is the advantage the velocity gains. So having a human that does all the approval is problematic.β
Anthropic launched Claude Fable 5 for general availability and Claude Mythos 5 for restricted access, both priced at $10 per million input tokens and $50 per million output tokens, which is less than half the cost of the previous Mythos Preview model.Β
Fable 5 is the general-use version with safety classifiers active, while Mythos 5 is the same underlying model with certain safeguards lifted for vetted cybersecurity and biology partners.
The models introduce a tiered safety classifier system that automatically routes flagged requests in cybersecurity, biology/chemistry, and distillation categories to Claude Opus 4.8 instead of refusing outright.Β
Anthropic reports this fallback triggers in fewer than 5% of sessions, and external red-teaming found zero successful universal jailbreaks on harmful cyber queries across 30 different public jailbreak techniques.
On the software engineering side, Stripe reported Fable 5 completed a codebase-wide migration across a 50-million-line Ruby codebase in one day, a task estimated to take a full team over two months manually. The model also scores highest among frontier models on Cognitionβs FrontierCode evaluation for production-quality coding standards.
Mythos 5 demonstrated autonomous scientific research capabilities, including outperforming a recently published Science journal model on a genomics task despite being 100 times smaller, and accelerating drug design workflows roughly ten times in internal protein design testing.
Anthropic is requiring 30-day data retention for all Mythos-class model traffic, including on third-party surfaces, specifically to detect novel jailbreaks and cross-request attacks, with explicit commitments not to use this data for model training.
23:34 Matt β βI would also say you gotta get the foundation of your house set up. So if you are patching, itβs not that youβre patching, itβs how youβre patchingβ¦ I donβt want somebody, to use a very simple example, I have fifty EC2 instances or VMs, and to do patching, I canβt have somebody log into fifty VMs. Thatβs not sustainable, and thatβs not gonna work. Ryan in security here will check the box saying you are doing patching, but Iβve wasted three peopleβs days on this. But if you build it out so that each thing is an auto scaling group and everything else, which is where youβre going with the CICD stuff, and you build that proper workflow out, then patching is just release the new image.β
Attackers exploited Dashlaneβs device enrollment API by brute-forcing six-digit one-time tokens sent to user email addresses, successfully registering new devices on fewer than 20 accounts and downloading encrypted vaults before automated lockouts stopped the campaign.
The attack highlights a known tradeoff in OTP-based authentication: six-digit numeric codes have only one million possible values, making them vulnerable to brute force if rate-limiting and lockout mechanisms are not sufficiently aggressive.
Downloaded vaults remain encrypted and unreadable without the userβs master password, which Dashlane never stores, so the practical risk to affected users depends entirely on the strength of their master password.
This incident is a useful case study for developers building device enrollment or account linking flows, as it demonstrates how API endpoints handling authentication tokens need strict rate limiting, anomaly detection, and account lockout thresholds to prevent automated abuse.
30:55 Ryan β βAnd right now, itβs the strength of that master password. But with quantum encryption, itβs going to be able to break through the algorithm generally.βΒ
HashiCorpBoundary addresses a growing security gap where AI agents need infrastructure access, but traditional IAM models were designed for human users with predictable access patterns.Β
The core value is giving each AI agent a unique identity with just-in-time credentials rather than static long-lived secrets.
Boundaryβs credential injection feature means AI agents never directly handle or see credentials at any point during a session.Β
When paired with HashiCorp Vault, it generates short-lived dynamic credentials that expire after use, which limits the blast radius if an agent or orchestration layer is compromised.
The session-focused control plane enforces identity-aware authorization at the connection layer before infrastructure access is established, rather than relying on application-layer gateways. This means the entire network is abstracted away from agents, and all connections route through a Boundary proxy so only authorized identities can establish sessions.
The incident response use case in the article is worth noting because it shows each discrete action getting its own ephemeral session account that is deactivated once its purpose is fulfilled. This means standing privileges are continuously revoked rather than persisting across an agentβs entire operational lifetime.
Complete session recording and audit logging give security teams the ability to replay and review every action an AI agent took, tied to a specific operator, intent, and timeframe.Β
This addresses the compliance challenge organizations face when they cannot see or verify what autonomous agents are doing across their infrastructure.
37:52 Ryan β βIβm so annoyed by this because theyβre like, this is rethinking an age of agentic AI. No, this is what we should do for all authentication, not just AI. It doesnβt treat anything about AI. It doesnβt identify AI agents. And itβs just setting up a user within HashiCorp boundary and then assigning that user to an agentic AI, just like a human. So this doesnβt actually address anything agentic. And these things should be patterns we need to be moving to in general.βΒ
Amazon Cognito now supports multi-region replication, automatically synchronizing user profiles, credentials, and pool configurations from a primary region to a secondary region of your choice.Β
This eliminates the need for custom-built replication solutions that previously created security risks and operational overhead.
The feature is read-only on the secondary side, meaning authentication continues during failover, but new user registrations and profile updates are unavailable. Teams should note that Lambda triggers, WAF configurations, and log streaming must be manually configured in the target Region separately.
A notable requirement is that customers must configure a multi-region customer-managed KMS key before enabling replication, and OIDC issuer endpoints must be updated across all client applications, including mobile app store resubmissions. This upfront migration work is a practical consideration before adoption.
Pricing is an add-on to existing Essentials and tiers, costing $0.0045 per monthly active user per replica Region on Essentials and $0.006 on Plus, with M2M authentication adding a 30% surcharge on standard token pricing. The feature is available across 20-plus Regions spanning North America, Europe, Asia Pacific, and South America.
For regulated industries like healthcare and financial services, the companion customer-managed keys feature provides encryption control that can help meet compliance requirements, and is available in a broader set of regions, including AWS GovCloud.
43:54 Matt β ββ¦ itβs just a nice quality of life improvement to actually get this out.β
Amazon Cognito now supports an inbound federation Lambda trigger that intercepts federated authentication responses from external identity providers before user attributes are written to the user pool, giving developers programmatic control over attribute transformation, filtering, and enrichment.
For B2B and SaaS applications, the trigger solves a practical problem: enterprise SAML providers send hundreds of group memberships that exceed Cognitoβs 2,048-character attribute limit, enabling developers to filter and normalize groups without coordinating changes with customer IT departments.
For B2C applications, the trigger enables automatic account linking across multiple sign-in methods by matching federated email addresses to existing local Cognito accounts, preventing duplicate user records when customers forget they already registered with a different provider.
The trigger runs on every federated sign-in rather than only at initial account creation, which means linking logic and attribute transformations apply continuously, and developers always have access to the latest IdP attributes.
Key implementation constraints to note: the Lambda function must complete within 5 seconds, errors in the function can block authentication for legitimate users, and automatic email-based account linking will not work with Appleβs Hide My Email feature.Β
The trigger is available now across all regions where Cognito is supported, with no separate pricing beyond standard Cognito and Lambda costs.
AWS Step Functions now supports AI agent reasoning steps via an optimized integration with Amazon Bedrock AgentCore harness, currently in preview, allowing you to embed configurable AI agents directly into visual workflows without managing the underlying agent loop infrastructure.
Practical use cases include document classification, unstructured form extraction, and multi-agent pipelines where agents run in parallel or sequence with optional human approval gates at critical decision points.
Per-invocation overrides for model, system prompt, and tools let teams reuse a single harness configuration across different workflow contexts, and a session ID parameter enables agent context persistence within or across workflow executions.
Observability is built in through workflow execution history showing agent input, output, token usage, and duration, with links to detailed agent turn logs in Amazon CloudWatch for auditing every decision.
The integration is available in four regions (US East N. Virginia, US West Oregon, Europe Frankfurt, Asia Pacific Sydney) and follows standard Step Functions pricing with no additional integration charges, though standard Amazon Bedrock and AgentCore pricing still applies for model inference.
48:25 Ryan β βYou know I lust over state machines, so I find it funny because this is all I think about when Iβm putting an agent workflow together. This would be so much easier in a state machine. And so now theyβve done it. I will absolutely use this so much, because itβs something I already kind of do with lambda functions. Itβs just now that I wonβt have to define the logic as specifically. Itβll just be like four pages of markdown in my lab.β
This is particularly useful for developers running coding agents like Claude Code or Amazon Kiro, allowing them to inspect files, run ad-hoc commands, and debug environment state as if working in a local terminal, with persistent state for environment variables and working directory across commands.
Each shell session is identified by a runtime session ID and shell ID, enabling manual reconnection after network drops, and a single agent runtime supports up to 10 concurrent shells for watching agents work across multiple branches simultaneously.
The CLI entry point is straightforward: agentcore exec βit βruntime followed by the runtime ARN, lowering the barrier for developers already familiar with standard terminal workflows to adopt the feature.
Pricing details are not specified in the announcement, so teams evaluating this feature should check the AgentCore Runtime pricing page directly before building workflows that depend on concurrent shell sessions at scale.
52:46 Matt β βSomebody needed it to debug some environment variable or working directory, and they were like, we could just quickly do this thing because itβs running ECS under the hood. Weβll just literally change the CLI call from AWS ECS exec to AWS Agent Core exec, and weβve added a whole new feature, guys.βΒ
AWS Cost Explorer now includes an βAnalyze with Amazon Qβ button that generates automatic cost analysis covering trends, top drivers, and anomalies based on whatever filters and time period you have configured, eliminating the need to manually cross-reference multiple data points.
The feature adapts its output based on the date range selected, providing historical analysis for past periods, forecasts for future dates, or a combined view for mixed ranges, and maintains conversation context so you can ask follow-up questions to dig deeper.
This continues AWSβs pattern of embedding Amazon Q capabilities directly into existing console tools rather than requiring users to switch contexts, similar to integrations seen in services like CloudWatch and Security Hub.
From a practical standpoint, this is available in all commercial AWS Regions at no additional charge, meaning customers already using Cost Explorer can access it without budget considerations, though standard Cost Explorer usage costs still apply.
The most immediate use case is for teams doing monthly or quarterly cost reviews who previously had to manually build narratives around their spend data, as Q can now generate that explanation automatically as a starting point for optimization conversations.
54:10 Matt β βThat will forever be my goal in life β understand whatβs an EC2 other.βΒ
AWS FinOps Agent is now in preview at no additional charge, offering an AI-driven tool that answers cost questions, surfaces optimization recommendations, and runs scheduled FinOps workflows directly from the AWS Management Console.
The agent integrates with AWS Cost Optimization Hub and AWS Compute Optimizer to surface rightsizing, idle resource, and Savings Plans recommendations, and can automatically open Jira tickets to route action items to engineering teams.
Automated anomaly investigation is a notable capability here, where the agent detects cost spikes, investigates root cause, and posts findings to Slack without requiring manual triage from FinOps or engineering staff.
Preview availability is limited to US East (N. Virginia) for the agent itself, though cost and usage data cover all standard AWS Regions, excluding GovCloud and China regions.
Teams currently spending significant time on manual cost reporting and anomaly triage are the most likely to benefit, as the agent can generate reports for finance teams and handle recurring workflows on a user-defined schedule.
55:02 Justin β βThis is kind of nice. I donβt know if itβs a full-featured solution for everybody, but itβs definitely something thatβs gonna help you get started.β
Google released Gemma 4 12B, a multimodal model that runs locally on consumer hardware with 16GB of VRAM, positioning it between the smaller E4B and the larger 26B MoE model in the Gemma 4 family.
The model uses an encoder-free architecture, meaning vision inputs are processed through a lightweight embedding module and audio is projected directly into the same dimensional space as text tokens, reducing memory usage and latency compared to traditional separate encoder approaches.
Gemma 4 12B is the first mid-sized Gemma model to support native audio input, and it includes Multi-Token Prediction drafters to reduce inference latency for agentic workloads.
For GCP customers, the model can be deployed through Model Garden, Cloud Run with GPU support, and GKE, giving teams flexibility in how they operationalize it in production environments.
Google released Quantization-Aware Training checkpoints for Gemma 4, which integrates quantization directly into the training process rather than applying it afterward, resulting in better quality preservation compared to standard Post-Training Quantization approaches.
The mobile-specialized quantization schema reduces the Gemma 4 E2B model to under 1GB of memory by combining static activations, channel-wise quantization, targeted 2-bit compression for token generation layers, and embedding plus KV cache optimization.
For desktop and server use cases, QAT checkpoints are available in Q4_0 format with GGUF files ready for llama.cpp and compressed tensors for vLLM, with weights downloadable directly from Hugging Face at no cost for the model weights themselves.
Developers can selectively deploy only the modalities they need, such as text-only without audio or vision encoders, which further reduces the memory footprint and makes the models practical for constrained edge environments using Googleβs LiteRT-LM runtime or Transformers.js for web deployment.
The release supports fine-tuning of QAT weights through Hugging Face Transformers and Unsloth, and also preserves the inference speedup from Multi-Token Prediction when using MTP QAT checkpoints, giving developers flexibility to optimize for both quality and throughput simultaneously.
58:17 Ryan β βThese are things we need Jonathan for.βΒ
Google announced that Apple developers can now access cloud-hosted Gemini models through Appleβs Foundation Models framework via the Firebase Apple SDK, starting with iOS 27, macOS 27, and related platforms.Β
The integration allows developers to swap between on-device Apple models and cloud-hosted Gemini models using the same API surface, which simplifies building agentic app experiences.
The integration is built on Firebase AI Logic, which removes the need to build and maintain a separate backend server for Gemini model access.Β
Firebase App Check is included to protect service APIs from abuse, addressing a common production security concern.
Gemini is also being integrated directly into Xcode as an agentic coding assistant for multi-step development tasks like code review, bug fixing, and feature building. Authentication supports both individual developers using a self-serve Gemini API key from Google AI Studio and enterprise teams using the Gemini Enterprise Agent Platform for dedicated quotas and data privacy controls.
Pricing has two tiers: individual developers can start with a free tier through Google AI Studio at ai.google.dev, while enterprise developers access dedicated corporate quotas through the Gemini Enterprise Agent Platform.Β
The preview release of the Foundation Models framework integration was set to begin the day after the WWDC announcement.
This is a practical option for iOS and macOS developers who want to add cloud AI capabilities without leaving the Apple development ecosystem or managing separate infrastructure.Β
The shared API surface between local and cloud inference is particularly useful for managing latency and cost tradeoffs in production apps.
1:00:01 Ryan β βI love the Apple Google partnership on this. You know, Iβm really happy that Apple didnβt decide to develop its own frontier model and just muddy that space.βΒ
Azure Cobalt 200 Arm-based VMs are now in early access preview, built on the Arm Neoverse V3 core and fabricated on TSMCβs 3nm process, delivering up to 50% better CPU performance over Cobalt 100 with up to 128 vCPUs per VM.Β
Real workload benchmarks show up to 135% better performance for database workloads and up to 80% better performance for caching workloads compared to the previous generation.
The VMs are specifically designed for agentic AI workloads, where continuous reasoning and sequential decision-making require sustained per-core performance and low latency.Β
Each physical core gets dedicated 3 MB of L2 cache and a 192 MB system-level L3 cache, allowing more agent sandboxes per VM without sacrificing throughput.
Cobalt 200 expands the Arm VM portfolio with two new families beyond what Cobalt 100 offered: High-Memory Optimized Mpsv4 VMs and Dense Local Storage Lpsv5 VMs, with all series delivering up to 85 Gbps network bandwidth and 70 Gbps remote storage throughput.Β
Memory encryption is enabled by default through a custom memory controller with negligible performance impact.
Microsoftβs own services, including Dataverse and Azure SQL Database, are already validating Cobalt 200, with Dataverse reporting up to 60% better performance over Cobalt 100.Β
Migration from Cobalt 100 is described as seamless, with full compatibility across existing workloads and support for AKS Arm nodes, GitHub Actions runners, and major languages including Python, Java, and .NET.
The preview is currently available in eight regions, including West US3, East US2, Central US, and Sweden Central, with additional regions to follow. Pricing is not yet publicly specified, so teams evaluating cost should sign up at aka.ms/Cobalt200VMs-signup for early access details.
1:04:44 Matt β βItβs great that they added this; I feel like theyβre finally getting into the game of ARM. Getting capacity for them might require some twisting of your account teamβs arm, especially if you want them at any scale. But the other problem is, which I still find comical, is that you canβt run Windows Server on ARM.β
Foundry IQ is Microsoftβs unified knowledge platform for AI agents, now generally available with full SLA coverage, stable APIs, and compliance certifications.Β
It lets developers connect multiple data sources like Azure Blob Storage, OneLake, and web content into a single knowledge base without building custom connectors for each system.
The new Serverless Developer tier, in public preview, scales to zero when idle and bills by Compute Units measured in 0.25 CU increments per minute. Billing is not expected to begin until late 2026, so developers can experiment at no cost for now, accessible through the Foundry portal at ai.azure.com.
Agentic retrieval quality improvements show up to 20% better answer quality benchmarks and up to 54% improved recall compared to single-shot RAG, achieved through better query batching, semantic ranking, and server-side token caching to reduce redundant token consumption across multi-turn conversations.
The Foundry IQ MCP server exposes knowledge bases as a remote Model Context Protocol server, making them accessible from Claude, ChatGPT, LangChain, and the Microsoft Agent Framework without framework-specific integrations.
New security capabilities in preview include cross-tenant customer-managed keys using federated identity credentials, Purview sensitivity-label auditing, and incremental SharePoint permissions sync, keeping enterprise data governance intact as content flows into agent workflows.
Azure Database for PostgreSQL Flexible Server now supports the DuckDB extension in general availability, allowing users to run analytical workloads directly within their PostgreSQL environment without moving data to a separate system.
DuckDB is an in-process analytical database engine optimized for OLAP queries, so this extension lets PostgreSQL users run fast column-oriented analytics alongside their transactional workloads in the same managed service.
This is particularly useful for data engineers and developers who want to avoid the overhead of spinning up separate analytics infrastructure, since DuckDB can query large datasets efficiently using familiar SQL syntax.
The feature falls under the Databases and Hybrid plus multicloud categories, suggesting Microsoft sees this as relevant for customers running mixed workloads or integrating PostgreSQL with other data sources across environments.
Pricing for this extension was not specified in the announcement, so customers should check Azure Database for PostgreSQL Flexible Server pricing pages directly, as costs will likely depend on existing compute and storage tiers rather than a separate charge for the extension itself.
1:10:50 Justin β βI remember when there were companies that made nothing but columnar databases. Now you just get it as an extension on top of PostgreSQL. Kind of impressive. I bet those companies arenβt doing well these days.β
Azureβs Global PTU (Provisioned Throughput Unit) reservations are now region-agnostic as of June 2026, meaning a single reservation can cover AI model deployments across multiple regions instead of requiring separate per-region commitments.
The practical benefit here is reduced stranded capacity. Previously, if you over-provisioned in one region and under-utilized in another, you were paying for unused reservations. Now a single pool covers wherever your workload actually runs.
This is specifically tied to Microsoft Foundry (formerly Azure OpenAI Service infrastructure), so it targets customers running high-throughput AI inference workloads who need predictable performance and cost at scale.
From a cost management standpoint, consolidating reservations simplifies billing and procurement, which matters for enterprises managing AI spend across multiple geographic deployments. Specific pricing still depends on model type and throughput tier, so customers should check the Azure pricing calculator for their specific use case.
The flexibility to deploy where capacity is available without reservation constraints is a practical operational improvement, particularly useful during regional capacity crunches that have been a known pain point for provisioned throughput customers.
1:12:02 Justin β βGood! Glad you learned what the word βglobalβ means.βΒ
Azure API Management Premium v2 and Standard v2 now support wildcard custom hostnames, meaning a single entry like *.api.contoso.com and one wildcard certificate can cover all subdomains automatically instead of requiring separate configuration per subdomain.
The practical benefit is reduced operational overhead at scale. A team onboarding ten new API surfaces previously needed ten separate domain and certificate management tasks, and wildcard support eliminates that repetitive work.
This capability is now available on both Standard v2 and Premium v2 tiers, which means organizations do not need to move to higher-tier deployments just to get flexible domain management. Pricing details are not specified in the announcement, so listeners should check the Azure API Management pricing page for tier comparisons.
Target use cases include rapidly growing API environments with dynamic subdomains, such as microservices architectures or multi-tenant platforms, where new API surfaces are frequently added, and consistent branded endpoints matter.
The feature reached general availability in June 2026 and was announced at Microsoft Build. Teams currently managing large API estates with manual per-subdomain hostname configurations would benefit most from evaluating this update.
CoreWeave Mission Control is an AI-native observability platform that provides end-to-end visibility across infrastructure, clusters, and workloads, addressing a gap that general-purpose monitoring tools often miss in GPU-heavy environments.
The platform combines real-time telemetry with GPU utilization analytics, which is particularly relevant as organizations struggle to justify and optimize the cost of large-scale GPU deployments.
Audit-ready logging and automated operational insights suggest the platform is targeting enterprise customers who need compliance documentation alongside performance monitoring, not just raw metrics.
The full-stack framing here is notable because AI workloads span multiple layers simultaneously, from bare metal GPU performance up through cluster orchestration and individual job execution, making siloed monitoring tools less effective.
For teams running inference or training at scale on CoreWeave, tighter observability tooling built into the platform could reduce the engineering overhead of stitching together third-party solutions like Prometheus, Grafana, and custom GPU exporters.
Closing
And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod