Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
147632 stories
·
32 followers

Hugging Face and VirusTotal collaborate to strengthen AI security

1 Share
Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

326: Oracle Discovers the Dark Side (And Finally Has Cookies)

1 Share

Welcome to episode 326 of The Cloud Pod, where the forecast is always cloudy! Justin and Ryan are your guides to all things cloud and AI this week! We’ve got news from SonicWall (and it’s not great), a host of goodbyes to say over at AWS, Oracle (finally) joins the dark side, and even Slurm – and you don’t even need to ride on a creepy river to experience it. Let’s get started! 

Titles we almost went with this week

  • SonicWall’s Cloud Backup Service: From 5% to Oh No, That’s Everyone
  • AWS Spring Cleaning: 19 Services Get the Boot
  • The Great AWS Service Purge of 2025
  • Maintenance Mode: Where Good Services Go to Die
  • GitHub Gets Assimilated: Resistance to Azure Migration is Futile
  • Salesforce to Ransomware Gang: You Can’t Always Get What You Want
  • Kansas City Gets the Need for Speed with 100G Direct Connect. Peter, what are you up too
  • Gemini Takes the Wheel: Google’s AI Learns to Click and Type 
  • Oracle Discovers the Dark Side (Finally Has Cookies)
  • Azure Goes Full Blackwell: 4,600 Reasons to Upgrade Your GPU Game
  • DataStax to the Future: AWS Hires Database CEO for Security Role
  • The Clone Wars: EBS Strikes Back with Instant Volume Copies
  • Slurm Dunk: AWS Brings HPC Scheduling to Kubernetes
  • The Great Cluster Convergence: When Slurm Met EKS
  • Codex sent me a DM that I’ll ignore too on Slack

General News 

01:24 SonicWall: Firewall configs stolen for all cloud backup customers

  • SonicWall confirmed that all customers using their cloud backup service had firewall configuration files exposed in a breach, expanding from their initial estimate of 5% to 100% of cloud backup users. That’s a big difference…
  • The exposed backup files contain AES-256-encrypted credentials and configuration data, which could include MFA seeds for TOTP authentication, potentially explaining recent Akira ransomware attacks that bypassed MFA.
  • SonicWall requires affected customers to reset all credentials, including local user passwords, TOTP codes, VPN shared secrets, API keys, and authentication tokens across their entire infrastructure.
  • This incident highlights a fundamental security risk of cloud-based configuration backups where sensitive credentials are stored centrally, making them attractive targets for attackers.
  • The breach demonstrates why WebAuthn/passkeys offer superior security architecture since they don’t rely on shared secrets that can be stolen from backups or servers.
  • Interested in checking out their detailed remediation guidance? Find that here

02:36 Justin – “You know, providing your own encryption keys is also good; not allowing your SaaS vendor to have the encryption key is a positive thing to do. There’s all kinds of ways to protect your data in the cloud when you’re leveraging a SaaS service.”

04:43 Take this rob and shove it! Salesforce issues stern retort to ransomware extort

  • Salesforce is refusing to pay ransomware demands from criminals claiming to have stolen nearly 1 billion customer records, stating they will not engage, negotiate with, or pay any extortion demand. 
  • This firm stance sets a precedent for how major cloud providers handle ransomware attacks.
  • The stolen data appears to be from previous breaches rather than new intrusions, specifically from when ShinyHunters compromised Salesloft’s Drift application earlier this year. 
  • The attackers used stolen OAuth tokens to access multiple companies’ Salesforce instances.
  • The incident highlights the security risks of third-party integrations in cloud environments, as the breach originated through a compromised integration app rather than Salesforce’s core platform. 
  • This demonstrates how supply chain vulnerabilities can expose customer data across multiple organizations.
  • Scattered LAPSUS$ Hunters set an October 10 deadline for payment and offered $10 in Bitcoin to anyone willing to harass executives of affected companies. This unusual tactic shows evolving extortion methods beyond traditional ransomware encryption.
  • Salesforce maintains there’s no indication their platform has been compromised, and no known vulnerabilities in their technology were exploited. The company is working with external experts and authorities while supporting affected customers through the incident.

06:31 Ryan – “I do also really like Salesforce’s response, just because I feel like the ransomware has gotten a little out of hand, and I think a lot of companies are quiet quietly sort of paying these ransoms, which has only made the attacks just skyrocket. So making a big public show of saying we’re not going to pay for this is, is a good idea.”

AI is Going Great – Or How ML Makes Money 

07:06 Introducing AgentKit

  • OpenAI’s AgentKit provides a framework for building and managing AI agents with simplified deployment and customization options, addressing the growing need for autonomous AI systems in cloud environments.
  • The tool integrates with existing OpenAI technologies and supports multiple programming languages, enabling developers to create agents that can interact with various cloud services and APIs without extensive infrastructure setup.
  • AgentKit’s architecture allows for efficient agent lifecycle management, including deployment, monitoring, and behavior customization, which could reduce operational overhead for businesses running AI workloads at scale.
  • Key use cases include automated customer service agents, data processing pipelines, and intelligent workflow automation that can adapt to changing conditions in cloud-native applications.
  • This development matters for cloud practitioners as it potentially lowers the barrier to entry for implementing sophisticated AI agents while providing the scalability and reliability expected in enterprise cloud deployments

09:03 Codex Now Generally Available

  • OpenAI’s Codex is now generally available, offering GPT-3-based AI that’s fine-tuned specifically for code generation and understanding across multiple programming languages. This represents a significant advancement in AI-assisted development tools becoming mainstream.
  • Several new features, A new Slack integration: Delegate tasks or ask questions to Codex directly from a team channel or thread, just like a coworker
  • Codex SDK to embed the same agent that powers Codex CLI to your own workflows, tools, and apps for state-of-the-art performance on GPT-5-Codex without more tuning
  • New Admin tools with environment controls, monitoring, and analytics dashboards. ChatGPT workspace admins now have more control

09:48 Ryan – “I don’t know why, but something about having it available in Slack to boss it around sort of rubs me the wrong way. I feel like it’s the poor new college grad joining the team  – it’s just delegated all the crap jobs.” 

10:14 Introducing the Gemini 2.5 Computer Use model

  • Google released Gemini 2.5 Computer Use model via Gemini API, enabling AI agents to interact with graphical user interfaces through clicking, typing, and scrolling actions – available in Google AI Studio and Vertex AI for developers to build automation agents.
  • The model operates in a loop using screenshots and action history to navigate web pages and applications, outperforming competitors on web and mobile control benchmarks while maintaining the lowest latency among tested solutions.
  • Built-in safety features include per-step safety service validation and system instructions to prevent high-risk actions like bypassing CAPTCHA or compromising security, with developers able to require user confirmation for sensitive operations.
  • Early adopters, including Google teams, use it for UI testing and workflow automation, with the model already powering Project Mariner, Firebase Testing Agent, and AI Mode in Search – demonstrating practical enterprise applications.
  • This represents a shift from API-only interactions to visual UI control, enabling automation of tasks that previously required human interaction like form filling, dropdown navigation, and operating behind login screens.

11:48 Ryan – “I think this is the type of thing that really is going to get AI to be as big as the Agentic model in general; having it be able to understand click and UIs and operate on people’s behalf. It’s going to open up just a ton of use cases for it.”    

AWS

12:35 AWS Service Availability Change Announcement

  • AWS is moving 19 services to maintenance mode starting November 7, 2025, including Amazon Glacier, AWS CodeCatalyst, and Amazon Fraud Detector – existing customers can continue using these services but new customers will be blocked from adoption.
  • Several migration-focused services are being deprecated, including AWS Migration Hub, AWS Application Discovery Service, and AWS Mainframe Modernization Service, signaling AWS may be consolidating or rethinking its migration tooling strategy.
  • The deprecation of Amazon S3 Object Lambda and Amazon Cloud Directory suggests AWS is streamlining overlapping functionality – customers will need to evaluate alternatives like Lambda@Edge or AWS Directory Service for similar capabilities.
  • AWS Snowball Edge Compute Optimized and Storage Optimized entering maintenance indicates AWS is likely pushing customers toward newer edge computing solutions like AWS Outposts or Local Zones for hybrid deployments.
  • The sunset of specialized services like AWS HealthOmics Variant Store and AWS IoT SiteWise Monitor shows AWS pruning niche offerings that may have had limited adoption or overlapping functionality with other services.

13:53 Ryan – “It’s interesting, because I was a heavy user of CodeGuru and CodeCatalyst for a while, so the announcement I got as a customer was a lot less friendly than maintenance mode. It was like, your stuff’s going to end. So I don’t know if it’s true across all these services, but I know with at least those two. I did not get one for Glacier – because I also have a ton of stuff in Glacier, because I’m cheap.” 

17:01 AWS Direct Connect announces 100G expansion in Kansas City, MO

  • AWS Direct Connect now offers 100 Gbps dedicated connections with MACsec encryption at the Netrality KC1 data center in Kansas City, expanding high-bandwidth private connectivity options in the central US region.
  • The Kansas City location provides direct network access to all public AWS Regions (except China), AWS GovCloud Regions, and AWS Local Zones, making it a strategic connectivity hub for enterprises in the Midwest.
  • With 100G connections and MACsec encryption, organizations can achieve lower latency and enhanced security for workloads requiring high throughput, such as data analytics, media processing, or hybrid cloud architectures.
  • This expansion brings AWS Direct Connect to over 146 locations worldwide, reinforcing AWS’s commitment to providing enterprises with reliable alternatives to internet-based connectivity for mission-critical applications.
  • For businesses evaluating Direct Connect, the 100G option typically suits large-scale data transfers and enterprises with substantial bandwidth requirements, while the 10G option remains available for more moderate connectivity needs.

18:07 AWS IAM Identity Center now supports customer-managed KMS keys for encryption at rest | AWS News Blog

  • AWS IAM Identity Center now supports customer-managed KMS keys for encrypting identity data at rest, giving organizations in regulated industries full control over encryption key lifecycle, including creation, rotation, and deletion. This addresses compliance requirements for customers who previously could only use AWS-owned keys.
  • The feature requires symmetric KMS keys in the same AWS account and region as the Identity Center instance, with multi-region keys recommended for future flexibility. Implementation involves creating the key, configuring detailed permissions for Identity Center services and administrators, and updating IAM policies for cross-account access.
  • Not all AWS managed applications currently support Identity Center with customer-managed keys – administrators must verify compatibility before enabling to avoid service disruptions. The documentation provides specific policy templates for common use cases, including delegated administrators and application administrators.
  • Standard AWS KMS pricing applies for key storage and API usage while Identity Center remains free. The feature is available in all AWS commercial regions, GovCloud, and China regions.
  • Key considerations include the critical nature of proper permission configuration – incorrect setup can disrupt Identity Center operations and access to AWS accounts. Organizations should implement encryption context conditions to restrict key usage to specific Identity Center instances for enhanced security.

18:52 Justin – “Encrypt setup can disrupt Identity Center operations, like revoking your encryption key, might be bad for your access to your cloud. So be careful with this one.” 

19:28 New general-purpose Amazon EC2 M8a instances are now available | AWS News Blog

  • AWS launches M8a instances powered by 5th Gen AMD EPYC Turin processors, delivering up to 30% better performance and 19% better price-performance than M7a instances for general-purpose workloads.
  • The new instances feature 45% more memory bandwidth and 50% improvements in networking (75 Gbps) and EBS bandwidth (60 Gbps), making them suitable for financial applications, gaming, databases, and SAP-certified enterprise workloads.
  • M8a introduces instance bandwidth configuration (IBC), allowing customers to flexibly allocate resources between networking and EBS bandwidth by up to 25%, optimizing for specific workload requirements.
  • Each vCPU maps to a physical CPU core without SMT, resulting in up to 60% faster GroovyJVM performance and 39% faster Cassandra performance compared to M7a instances.
  • Available in 12 sizes from small to metal-48xl (192 vCPU, 768GiB RAM) across three regions initially, with standard pricing options including On-Demand, Savings Plans, and Spot instances.

20:01 Ryan – “That’s a big one! I still don’t have a use case for it.” 

 

20:09 Announcing Amazon Quick Suite: your agentic teammate for answering questions and taking action | AWS News Blog

  • Amazon Quick Suite combines AI-powered research, business intelligence, and automation into a single workspace, eliminating the need to switch between multiple applications for data gathering and analysis. 
  • The service includes Quick Research for comprehensive analysis across enterprise and external sources, Quick Sight for natural language BI queries, and Quick Flows/Automate for process automation.
  • Quick Index serves as the foundational knowledge layer, creating a unified searchable repository across databases, documents, and applications that powers AI responses throughout the suite. This addresses the common enterprise challenge of fragmented data sources by consolidating everything from S3, Snowflake, Google Drive, and SharePoint into one intelligent knowledge base.
  • The automation capabilities are split between Quick Flows for business users (natural language workflow creation) and Quick Automate for technical teams (complex multi-department processes with approval routing and system integrations). 
  • Both tools generate workflows from simple descriptions, but Quick Automate handles enterprise-scale processes like customer onboarding with advanced orchestration and monitoring.
  • Existing Amazon QuickSight customers will be automatically upgraded to Quick Suite with all current BI capabilities preserved under the “Quick Sight” branding, maintaining the same data connectivity, security controls, and user permissions. Pricing follows a per-user subscription model with consumption-based charges for Quick Index and optional features.
  • The service introduces “Spaces” for contextual data organization and custom chat agents that can be configured for specific departments or use cases, enabling teams to create tailored AI assistants connected to relevant datasets and workflows. This allows organizations to scale from personal productivity tools to enterprise-wide deployment while maintaining access controls.

22:13 Justin – “This is a confusing product. It’s doing a lot of things, probably kind of poorly.” 

23:13 AWS Strengthens AI Security by Hiring Ex-DataStax CEO As New VP – Business Insider

  • AWS hired Chet Kapoor, former DataStax CEO, as VP of Security Services and Observability, reporting directly to CEO Matt Garman, to strengthen security offerings as AWS expands its AI business.
  • Kapoor brings experience from DataStax, where he led Astra DB development and integrated real-time AI capabilities, positioning him to address the security challenges of increasingly complex cloud deployments.
  • The role consolidates leadership of security services, governance, and operations portfolios under one executive, with teams from Gee Rittenhouse, Nandini Ramani, Georgia Sitaras, and Brad Marshall now reporting to Kapoor.
  • This hire follows recent AWS leadership changes, including the departures of VP of AI Matt Wood and VP of generative AI Vasi Philomin, signaling AWS’s focus on strengthening AI security expertise.
  • Kapoor will work alongside AWS CISO Amy Herzog to develop security and observability services that address what Garman describes as changing requirements driven by AI adoption.

26:03 Justin – “Also, DataStax was bought by IBM – and everyone knows that anything bought by IBM will be killed mercilessly.” 

26:50 Amazon Bedrock AgentCore is now generally available

  • Amazon Bedrock AgentCore provides a managed platform for building and deploying AI agents that can execute for up to 8 hours with complete session isolation, supporting any framework like CrewAI, LangGraph, or LlamaIndex, and any model inside or outside Amazon Bedrock.
  • The service includes five core components: Runtime for execution, Memory for state management, Gateway for tool integration via Model Context Protocol, Identity for OAuth and IAM authorization, and Observability with CloudWatch dashboards and OTEL compatibility for monitoring agents in production.
  • AgentCore enables agents to communicate with each other through Agent-to-Agent protocol support and securely act on behalf of users with identity-aware authorization, making it suitable for enterprise automation scenarios that require extended execution times and complex tool interactions.
  • The platform eliminates infrastructure management while providing enterprise features like VPC support, AWS PrivateLink, and CloudFormation templates, with consumption-based pricing and no upfront costs across nine AWS regions.
  • Integration with existing observability tools like Datadog, Dynatrace, and LangSmith allows teams to monitor agent performance using their current toolchain, while the self-managed memory strategy gives developers control over how agents store and process information.

28:17 Ryan – “This really to me, seems like a full app, you know, like this is a core component instead of doing development; you’re just taking  AI agents, putting them together, and giving them tasks. Then, the eight-hour runtime is crazy. It feels like it’s getting warmer in here just reading that.”

28:49 AWS’ Custom Chip Now Powers Most of Its Key AI Cloud Service — The Information

  • AWS has transitioned the majority of its AI inference workloads to its custom Inferentia chips, marking a significant shift away from Nvidia GPUs for production AI services. 
  • The move demonstrates AWS’s commitment to vertical integration and cost optimization in the AI infrastructure space.
  • Inferentia chips now handle most inference tasks for services like Amazon Bedrock, SageMaker, and internal AI features across AWS products. 
  • This custom silicon strategy allows AWS to reduce dependency on expensive third-party GPUs while potentially offering customers lower-cost AI inference options.
  • The shift to Inferentia represents a broader industry trend where cloud providers develop custom chips to differentiate their services and control costs. AWS can now optimize the entire stack from silicon to software for specific AI workloads, similar to Apple’s approach with its M-series chips.
  • For AWS customers, this transition could mean more predictable pricing and better performance-per-dollar for inference workloads. The custom chips are specifically designed for inference rather than training, making them more efficient for production AI applications.
  • This development positions AWS to compete more effectively with other cloud providers on AI pricing while maintaining control over its technology roadmap. 
  • Customers running inference-heavy workloads may see cost benefits as AWS passes along savings from reduced reliance on Nvidia hardware

29:39 Ryan – “Explains all the Oracle and Azure Nvidia announcements.” 

30:16 Introducing Amazon EBS Volume Clones: Create instant copies of your EBS volumes | AWS News Blog

  • Amazon EBS Volume Clones enables instant point-in-time copies of encrypted EBS volumes within the same Availability Zone through a single API call, eliminating the previous multi-step process of creating snapshots in S3 and then new volumes.
  • Cloned volumes are available within seconds with single-digit millisecond latency, though performance during initialization is limited to the lowest of: 3,000 IOPS/125 MiB/s baseline, source volume performance, or target volume performance.
  • This feature targets development and testing workflows where teams need quick access to production data copies, but it complements rather than replaces EBS snapshots, which remain the recommended backup solution with 11 nines durability in S3.
  • Pricing includes a one-time fee per GiB of source volume data at initiation, plus standard EBS charges for the new volume, making cost governance important since cloned volumes persist independently until manually deleted.
  • The feature currently requires encrypted volumes and operates only within the same Availability Zone, supporting all EBS volume types across AWS commercial regions and select Local Zones.

32:06 Running Slurm on Amazon EKS with Slinky | Containers

  • AWS introduces Slinky, an open source project that lets you run Slurm workload manager inside Amazon EKS, enabling organizations to manage both traditional HPC batch jobs and modern Kubernetes workloads on the same infrastructure without maintaining separate clusters.
  • The solution deploys Slurm components as Kubernetes pods with slurmctld on general-purpose nodes and slurmd on GPU/accelerated nodes, supporting features like auto-scaling worker pods based on job queues and integration with Karpenter for dynamic EC2 provisioning.
  • Key benefit is resource optimization – AI inference workloads can scale during business hours while training jobs scale overnight using the same compute pool, with teams able to use familiar Slurm commands (sbatch, srun) alongside Kubernetes APIs.
  • Slinky provides an alternative to AWS ParallelCluster (self-managed), AWS PCS (managed Slurm), and SageMaker HyperPod (ML-optimized) for organizations already standardized on EKS who need deterministic scheduling for long-running jobs.
  • The architecture supports custom container images, allowing teams to package specific ML dependencies (CUDA, PyTorch versions) directly into worker pods, eliminating manual environment management while maintaining reproducibility across environments.

GCP

33:09 Introducing Gemini Enterprise | Google Cloud Blog

  • Google launches Gemini Enterprise as a unified AI platform that combines Gemini models, no-code agent building, pre-built agents, data connectors for Google Workspace and Microsoft 365, and centralized governance through a single chat interface. 
  • This positions Google as offering a complete AI stack, rather than just models or toolkits like competitors.
  • The platform includes notable integrations with Microsoft 365 and SharePoint environments while offering enhanced features when paired with Google Workspace, including new multimodal agents for video creation (Google Vids with 2.5M monthly users) and real-time speech translation in Google Meet. This cross-platform approach differentiates it from more siloed offerings.
  • Google introduces next-generation conversational agents with a low-code visual builder supporting 40+ languages, powered by the latest Gemini models for natural voice interactions and deep enterprise integration. 
  • Early adopters like Commerzbank report 70% inquiry resolution rates, and Mercari projects 500% ROI through 20% workload reduction.
  • The announcement includes new developer tools like Gemini CLI (1M+ developers in 3 months) with extensions from Atlassian, GitLab, MongoDB, and others, plus industry protocols for agent interoperability (A2A), payments (AP2), and model context (MCP). 
  • This creates a foundation infrastructure for an agent economy where developers can monetize specialized agents.
  • Google’s partner ecosystem includes 100,000+ partners with expanded integrations for Box, Salesforce, ServiceNow, and deployment support from Accenture, Deloitte, and others. 
  • The company also launches Google Skills training platform and GEAR program to train 1 million developers, addressing the critical skills gap in enterprise AI adoption.

35:01 Justin – “I think both Azure and Amazon have similar problems; they are rushing so fast to make products, that they’re creating the same products over and over again, just with slightly different limitations or use cases.” 

36:05 Introducing LLM-Evalkit | Google Cloud Blog

  • Google releases LLM-Evalkit, an open-source framework that centralizes prompt engineering workflows on Vertex AI, replacing the current fragmented approach of managing prompts across multiple documents and consoles.
  • The tool shifts prompt development from subjective testing to data-driven iteration by requiring teams to define specific problems, create test datasets, and establish concrete metrics for measuring LLM performance.
  • LLM-Evalkit features a no-code interface designed to democratize prompt engineering, allowing non-technical team members like product managers and UX writers to contribute to the development process.
  • The framework integrates directly with Vertex AI SDKs and provides versioning, benchmarking, and performance tracking capabilities in a single application, addressing the lack of standardized evaluation processes in current workflows.
  • Available now on GitHub as an open-source project, with additional evaluation features accessible through the Google Cloud console, though specific pricing details are not mentioned in the announcement.

37:09 Ryan – “Reading through this announcement, it’s solving a problem I had – but I didn’t know I had.” 

38:17 Announcing enhancements to Google Cloud NetApp Volumes | Google Cloud Blog

  • Google Cloud NetApp Volumes now supports iSCSI block storage alongside file storage, enabling enterprises to migrate SAN workloads to GCP without architectural changes. 
  • The service delivers up to 5 GiB/s throughput and 160K IOPS per volume with independent scaling of capacity, throughput, and IOPS.
  • NetApp FlexCache provides local read caches of remote volumes for distributed teams and hybrid cloud deployments. 
  • This allows organizations to access shared datasets with local-like performance across regions, supporting compute bursting scenarios that require low-latency data access.
  • The service now integrates with Gemini Enterprise as a data store for RAG applications, allowing organizations to ground AI models on their secure enterprise data without complex ETL processes. 
  • Data remains governed within NetApp Volumes while being accessible for search and inference workflows.
  • Auto-tiering automatically moves cold data to lower-cost storage at $0.03/GiB for the Flex service level, with configurable thresholds from 2-183 days. Large-capacity volumes now scale from 15TiB to 3PiB with over 21GiB/s throughput per volume for HPC and AI workloads.
  • NetApp SnapMirror enables replication between on-premises NetApp systems and Google Cloud with zero RPO and near-zero RTO. 
  • This positions GCP competitively against AWS FSx for NetApp ONTAP and Azure NetApp Files for enterprise storage migrations.

40:30 Justin – “I have a specific workload that needs storage, that’s shared across boxes, and iSCSI is a great option for that, in addition to other methods you could use that I’m currently using, which have some sharp edges. So I’m definitely going to do some price calculation models. This might be good, because Google has multi-writer files, like EBS-type solutions, but does not have the performance that I need quite yet.”

Azure

41:08 GitHub Will Prioritize Migrating to Azure Over Feature Development – The New Stack

  • GitHub is migrating its entire infrastructure from its Virginia data center to Azure within 24 months, with teams being asked to delay feature development to focus on this migration due to capacity constraints from AI and Copilot workloads.
  • The migration represents a significant shift from GitHub’s previous autonomy since Microsoft’s 2018 acquisition, with GitHub losing independence after CEO Thomas Dohmke’s departure and being folded deeper into Microsoft’s organizational structure.
  • Technical challenges include migrating GitHub’s MySQL clusters that run on bare metal servers to Azure, which some employees worry could lead to more outages during the transition period, given recent service disruptions.
  • This positions Azure to capture one of the world’s largest developer platforms as a flagship customer, demonstrating Azure’s ability to handle massive scale workloads while potentially raising concerns among open source developers about tighter Microsoft integration.
  • The move highlights how AI workloads are straining traditional infrastructure, with GitHub citing “existential” needs to scale for AI and Copilot demands, showing how generative AI is forcing major architectural decisions across the industry.

43:17 Ryan – “I just hope the service stays up; it’s so disruptive to my day job when GitHub has issues.” 

43:33 Microsoft 365 services fall over in North America • The Register

  • Microsoft 365 experienced a North American outage on October 9, lasting just over an hour, caused by misconfigured network infrastructure that affected all services, including Teams, highlighting the fragility of centralized cloud services when configuration errors occur.
  • This incident followed another Azure outage where Kubernetes crashes took down Azure Front Door instances, suggesting potential systemic issues with Microsoft’s infrastructure management and configuration processes that enterprise customers should factor into their reliability planning.
  • Users reported that switching to backup circuits restored services, and some attributed issues to AT&T’s network, demonstrating the importance of multi-path connectivity and diverse network providers for mission-critical cloud services.
  • Microsoft’s response involved rerouting traffic to healthy infrastructure and analyzing configuration policies to prevent future incidents, though the lack of detailed root cause information raises questions about transparency and whether customers have sufficient visibility into infrastructure dependencies.
  • The back-to-back outages underscore why organizations need robust disaster recovery plans beyond single cloud providers, as even brief disruptions to productivity tools like Teams can significantly impact business operations across entire regions.

44:17 Introducing Microsoft Agent Framework | Microsoft Azure Blog

  • Microsoft Agent Framework converges AutoGen research project with Semantic Kernel into a unified open-source SDK for orchestrating multi-agent AI systems, addressing the fragmentation challenge as 80% of enterprises now use agent-based AI according to PwC.
  • The framework enables developers to build locally and then deploy to Azure AI Foundry with built-in observability, durability, and compliance, while supporting integration with any API via OpenAPI and cross-runtime collaboration through Agent2Agent protocol.
  • Azure AI Foundry now provides unified observability across multiple agent frameworks, including LangChain, LangGraph, and OpenAI Agents SDK, through OpenTelemetry contributions, positioning it as a comprehensive platform compared to AWS Bedrock or GCP Vertex AI’s more limited agent support.
  • Voice Live API reaches general availability, offering a unified real-time speech-to-speech interface that integrates STT, generative AI, TTS, and avatar capabilities in a single low-latency pipeline for building voice-enabled agents.
  • New responsible AI capabilities in public preview include task adherence, prompt shields with spotlighting, and PII detection, addressing McKinsey’s finding that the lack of governance tools is the top barrier to AI adoption.

44:48 Justin – “We continue to be in a world of confusion around Agentic and out of control of Agentic things.” 

45:54 NVIDIA GB300 NVL72: Next-generation AI infrastructure at scale | Microsoft Azure Blog

  • Microsoft deployed the first production cluster with over 4,600 NVIDIA GB300 NVL72 systems featuring Blackwell Ultra GPUs, enabling AI model training in weeks instead of months and supporting models with hundreds of trillions of parameters. 
  • This positions Azure as the first cloud provider to deliver Blackwell Ultra at scale for production workloads.
  • Each ND GB300 v6 VM rack contains 72 GPUs with 130TB/second of NVLink bandwidth and 37TB of fast memory, delivering up to 1,440 PFLOPS of FP4 performance. 
  • The system uses 800 Gbps NVIDIA Quantum-X800 InfiniBand for cross-rack connectivity, doubling the bandwidth of previous GB200 systems.
  • The infrastructure targets frontier AI workloads, including reasoning models, agentic AI systems, and multimodal generative AI, with OpenAI already using these clusters for training and deploying their largest models. 
  • This gives Azure a competitive edge over AWS and GCP in supporting next-generation AI workloads.
  • Azure implemented custom cooling systems using standalone heat exchangers and new power distribution models to handle the high energy density requirements of these dense GPU clusters. 
  • The co-engineered software stack optimizes storage, orchestration, and scheduling for supercomputing scale.
  • While pricing wasn’t disclosed, the scale and specialized nature of these VMs suggest they’ll target enterprise customers and AI research organizations requiring cutting-edge performance for training trillion-parameter models. Azure plans to deploy hundreds of thousands of Blackwell Ultra GPUs globally.

47:24 Ryan – “Pricing isn’t disclosed because it’s the GDP of a small country.” 

48:05 Generally Available: CLI command for migration from Availability Sets and basic load balancer on AKS 

  • Thanks for the timely heads up on this one… 
  • Azure introduces a single CLI command to migrate AKS clusters from deprecated Availability Sets to Virtual Machine Scale Sets before the September 2025 deadline, simplifying what would otherwise be a complex manual migration process.
  • The automated migration upgrades clusters from basic load balancers to standard load balancers, providing improved reliability, zone redundancy, and support for up to 1000 nodes compared to the basic tier’s 100-node limit.
  • This positions Azure competitively with AWS EKS and GCP GKE, which already use more modern infrastructure patterns by default, though Azure’s migration tool reduces the operational burden for existing customers.
  • Organizations running production AKS workloads on Availability Sets should prioritize testing this migration in non-production environments first, as the process involves recreating node pools, which could impact running applications.
  • While the migration itself has no direct cost, customers will see increased charges from standard load balancers (approximately $0.025/hour plus data processing fees) compared to free basic load balancers.

49:01 Ryan – “This is why you drag your feet on getting off of everything.” 

Oracle

49:12 Announcing Dark Mode For The OCI Console

  • Oracle finally joins the dark mode club with OCI Console, following years behind AWS (2017), Azure (2019), and GCP (2020) – a basic UI feature that took surprisingly long for a major cloud provider to implement.
  • The feature allows users to toggle between light and dark themes in the console settings, with Oracle claiming it reduces eye strain and improves battery life on devices – standard benefits that every other cloud provider has been touting for years.
  • Dark mode persists across browser sessions and devices when logged into the same OCI account, though Oracle hasn’t specified if this preference syncs across different OCI regions or tenancies.
  • While this is a welcome quality-of-life improvement for developers working late hours, it highlights Oracle’s ongoing challenge of playing catch-up on basic console features that competitors have long considered table stakes.
  • The rollout appears to be gradual with no specific timeline mentioned, and Oracle provides no details about API or CLI theme preferences, suggesting this is purely a web console enhancement.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod





Download audio: https://episodes.castos.com/5e2d2c4b117f29-10227663/2170375/c1e-o838u27rrjt780gw-xxg4xkqxhjp-qcfp1h.mp3
Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Transform Your AI Applications with Local LLM Deployment

1 Share

Introduction

Are you tired of watching your AI application costs spiral out of control every time your user base grows? As AI Engineers and Developers, we've all felt the pain of cloud-dependent LLM deployments. Every API call adds up, latency becomes a bottleneck in real-time applications, and sensitive data must leave your infrastructure to get processed. Meanwhile, your users demand faster responses, better privacy, and more reliable service.

What if there was a way to run powerful language models directly on your users' devices or your local infrastructure? Enter the world of Edge AI deployment with Microsoft's Foundry Local a game-changing approach that brings enterprise-grade LLM capabilities to local hardware while maintaining full OpenAI API compatibility.

The Edge AI for Beginners https://aka.ms/edgeai-for-beginners curriculum provides AI Engineers and Developers with comprehensive, hands-on training to master local LLM deployment. This isn't just another theoretical course, it's a practical guide that will transform how you think about AI infrastructure, combining cutting-edge local deployment techniques with production-ready implementation patterns.

In this post, we'll explore why Edge AI deployment represents the future of AI applications, dive deep into Foundry Local's capabilities across multiple frameworks, and show you exactly how to implement local LLM solutions that deliver both technical excellence and significant business value. 

Why Edge AI Deployment Changes Everything for Developers

The shift from cloud-dependent to edge-deployed AI represents more than just a technical evolution, it's a fundamental reimagining of how we build intelligent applications. As AI Engineers, we're witnessing a transformation that addresses the most pressing challenges in modern AI deployment while opening up entirely new possibilities for innovation.

Consider the current state of cloud-based LLM deployment. Every user interaction requires a round-trip to external servers, introducing latency that can kill user experience in real-time applications. Costs scale linearly (or worse) with usage, making successful applications expensive to operate. Sensitive data must traverse networks and live temporarily in external systems, creating compliance nightmares for enterprise applications.

Edge AI deployment fundamentally changes this equation. By running models locally, we achieve several critical advantages:

Data Sovereignty and Privacy Protection: Your sensitive data never leaves your infrastructure. For healthcare applications processing patient records, financial services handling transactions, or enterprise tools managing proprietary information, this represents a quantum leap in security posture. You maintain complete control over data flow, meeting even the strictest compliance requirements without architectural compromises.

Real-Time Performance at Scale: Local inference eliminates network latency entirely. Instead of 200-500ms round-trips to cloud APIs, you get sub-10ms response times. This enables entirely new categories of applications—real-time code completion, interactive AI tutoring systems, voice assistants that respond instantly, and IoT devices that make intelligent decisions without connectivity.

Predictable Cost Structure: Transform variable API costs into fixed infrastructure investments. Instead of paying per-token for potentially unlimited usage, you invest in local hardware that serves unlimited requests. This makes ROI calculations straightforward and removes the fear of viral success destroying your margins.

Offline Capabilities and Resilience: Local deployment means your AI features work even when connectivity fails. Mobile applications can provide intelligent features in areas with poor network coverage. Critical systems maintain AI capabilities during network outages. Edge devices in remote locations operate autonomously.

The technical implications extend beyond these obvious benefits. Local deployment enables new architectural patterns: AI-powered applications that work entirely client-side, edge computing nodes that make intelligent routing decisions, and distributed systems where intelligence lives close to data sources. 

Foundry Local: Multi-Framework Edge AI Deployment Made Simple

Microsoft's Foundry Local https://www.foundrylocal.ai  represents a breakthrough in local AI deployment, designed specifically for developers who need production-ready edge AI solutions. Unlike single-framework tools, Foundry Local provides a unified platform that works seamlessly across multiple programming languages and deployment scenarios while maintaining full compatibility with existing OpenAI-based workflows.

The platform's approach to multi-framework support means you're not locked into a single technology stack. Whether you're building TypeScript applications, Python ML pipelines, Rust systems programming projects, or .NET enterprise applications, Foundry Local provides native SDKs and consistent APIs that integrate naturally with your existing codebase.

Enterprise-Grade Model Catalog: Foundry Local comes with a curated selection of production-ready models optimized for edge deployment. The `phi-3.5-mini` model delivers impressive performance in a compact footprint, perfect for resource-constrained environments. For applications requiring more sophisticated reasoning, `qwen2.5-0.5b` provides enhanced capabilities while maintaining efficiency. When you need maximum capability and have sufficient hardware resources, `gpt-oss-20b` offers state-of-the-art performance with full local control.

Intelligent Hardware Optimization: One of Foundry Local's most powerful features is its automatic hardware detection and optimization. The platform automatically identifies your available compute resources, NVIDIA CUDA GPUs, AMD GPUs, Intel NPUs, Qualcomm Snapdragon NPUs, or CPU-only environments and downloads the most appropriate model variant. This means the same application code delivers optimal performance across diverse hardware configurations without manual intervention.

ONNX Runtime Acceleration: Under the hood, Foundry Local leverages Microsoft's ONNX Runtime for maximum performance. This provides significant advantages over generic inference engines, delivering optimized execution paths for different hardware architectures while maintaining model accuracy and compatibility.

OpenAI SDK Compatibility: Perhaps most importantly for developers, Foundry Local maintains complete API compatibility with the OpenAI SDK. This means existing applications can migrate to local inference by changing only the endpoint configuration—no rewriting of application logic, no learning new APIs, no disruption to existing workflows.

The platform handles the complex aspects of local AI deployment automatically: model downloading, hardware-specific optimization, memory management, and inference scheduling. This allows developers to focus on building intelligent applications rather than managing AI infrastructure.

Framework-Agnostic Benefits: Foundry Local's multi-framework approach delivers consistent benefits regardless of your technology choices. Whether you're working in a Node.js microservices architecture, a Python data science environment, a Rust embedded system, or a C# enterprise application, you get the same advantages: reduced latency, eliminated API costs, enhanced privacy, and offline capabilities.

This universal compatibility means teams can adopt edge AI deployment incrementally, starting with pilot projects in their preferred language and expanding across their technology stack as they see results. The learning curve is minimal because the API patterns remain familiar while the underlying infrastructure transforms to local deployment. 

Implementing Edge AI: From Code to Production

Moving from cloud APIs to local AI deployment requires understanding the implementation patterns that make edge AI both powerful and practical. Let's explore how Foundry Local's SDKs enable seamless integration across different development environments, with real-world code examples that you can adapt for your production systems.

Python Implementation for Data Science and ML Pipelines

Python developers will find Foundry Local's integration particularly natural, especially in data science and machine learning contexts where local processing is often preferred for security and performance reasons.

import openai from foundry_local import FoundryLocalManager # Initialize with automatic hardware optimization alias = "phi-3.5-mini" manager = FoundryLocalManager(alias)

This simple initialization handles a remarkable amount of complexity automatically. The `FoundryLocalManager` detects your hardware configuration, downloads the most appropriate model variant for your system, and starts the local inference service. Behind the scenes, it's making intelligent decisions about memory allocation, selecting optimal execution providers, and preparing the model for efficient inference.

# Configure OpenAI client for local deployment client = openai.OpenAI( base_url=manager.endpoint, api_key=manager.api_key # Not required for local, but maintains API compatibility ) # Production-ready inference with streaming def analyze_document(content: str): stream = client.chat.completions.create( model=manager.get_model_info(alias).id, messages=[{ "role": "system", "content": "You are an expert document analyzer. Provide structured analysis." }, { "role": "user", "content": f"Analyze this document: {content}" }], stream=True, temperature=0.7 ) result = "" for chunk in stream: if chunk.choices[0].delta.content: content_piece = chunk.choices[0].delta.content result += content_piece yield content_piece # Enable real-time UI updates return result

Key implementation benefits here:

Automatic model management: The `FoundryLocalManager` handles model lifecycle, memory optimization, and hardware-specific acceleration without manual configuration.

• Streaming interface compatibility: Maintains the familiar OpenAI streaming API while processing locally, enabling real-time user interfaces with zero latency overhead.

Production error handling: The manager includes built-in retry logic, graceful degradation, and resource management for reliable production deployment.

JavaScript/TypeScript Implementation for Web Applications

JavaScript and TypeScript developers can integrate local AI capabilities directly into web applications, enabling entirely new categories of client-side intelligent features.

import { OpenAI } from "openai"; import { FoundryLocalManager } from "foundry-local-sdk"; class LocalAIService { constructor() { this.foundryManager = null; this.openaiClient = null; this.isInitialized = false; } async initialize(modelAlias = "phi-3.5-mini") { this.foundryManager = new FoundryLocalManager(); const modelInfo = await this.foundryManager.init(modelAlias); this.openaiClient = new OpenAI({ baseURL: this.foundryManager.endpoint, apiKey: this.foundryManager.apiKey, }); this.isInitialized = true; return modelInfo; }

The initialization pattern establishes local AI capabilities with full error handling and resource management. This enables web applications to provide AI features without external API dependencies.

async generateCodeCompletion(codeContext, userPrompt) { if (!this.isInitialized) { throw new Error("LocalAI service not initialized"); } try { const completion = await this.openaiClient.chat.completions.create({ model: this.foundryManager.getModelInfo().id, messages: [ { role: "system", content: "You are a code completion assistant. Provide accurate, efficient code suggestions." }, { role: "user", content: `Context: ${codeContext}\n\nComplete: ${userPrompt}` } ], max_tokens: 150, temperature: 0.2 }); return completion.choices[0].message.content; } catch (error) { console.error("Local AI completion failed:", error); throw new Error("Code completion unavailable"); } } }

Implementation advantages for web applications

• Zero-dependency AI features: Applications work entirely offline once models are downloaded, enabling AI capabilities in disconnected environments.

• Instant response times: Eliminate network latency for real-time features like code completion, content generation, or intelligent search.

Client-side privacy: Sensitive code or content never leaves the user's device, meeting strict security requirements for enterprise development tools.

Cross-Platform Production Deployment Patterns

Both Python and JavaScript implementations share common production deployment patterns that make Foundry Local particularly suitable for enterprise applications:

Automatic Hardware Optimization: The platform automatically detects and utilizes available acceleration hardware. On systems with NVIDIA GPUs, it leverages CUDA acceleration. On newer Intel systems, it uses NPU acceleration. On ARM-based systems like Apple Silicon or Qualcomm Snapdragon, it optimizes for those architectures. This means the same application code delivers optimal performance across diverse deployment environments.

Graceful Resource Management: Foundry Local includes sophisticated memory management and resource allocation. Models are loaded efficiently, memory is recycled properly, and concurrent requests are handled intelligently to maintain system stability under load.

Production Monitoring Integration: The platform provides comprehensive metrics and logging that integrate naturally with existing monitoring systems, enabling production observability for AI workloads running at the edge.

These implementation patterns demonstrate how Foundry Local transforms edge AI from an experimental concept into a practical, production-ready deployment strategy that works consistently across different technology stacks and hardware environments. 

Measuring Success: Technical Performance and Business Impact

The transition to edge AI deployment delivers measurable improvements across both technical and business metrics. Understanding these impacts helps justify the architectural shift and demonstrates the concrete value of local LLM deployment in production environments.

Technical Performance Gains

Latency Elimination: The most immediately visible benefit is the dramatic reduction in response times. Cloud API calls typically require 200-800ms round-trips, depending on geographic location and network conditions. Local inference with Foundry Local reduces this to sub-10ms response times—a 95-99% improvement that fundamentally changes user experience possibilities.

Consider a code completion feature: cloud-based completion feels sluggish and interrupts developer flow, while local completion provides instant suggestions that enhance productivity. The same applies to real-time chat applications, interactive AI tutoring systems, and any application where response latency directly impacts usability.

Automatic Hardware Utilization: Foundry Local's intelligent hardware detection and optimization delivers significant performance improvements without manual configuration. On systems with NVIDIA RTX 4000 series GPUs, inference speeds can be 10-50x faster than CPU-only processing. On newer Intel systems with NPUs, the platform automatically leverages neural processing units for efficient AI workloads. Apple Silicon systems benefit from Metal Performance Shaders optimization, delivering excellent performance per watt.

ONNX Runtime Optimization: Microsoft's ONNX Runtime provides substantial performance advantages over generic inference engines. In benchmark testing, ONNX Runtime consistently delivers 2-5x performance improvements compared to standard PyTorch or TensorFlow inference, while maintaining full model accuracy and compatibility.

Scalability Characteristics: Local deployment transforms scaling economics entirely. Instead of linear cost scaling with usage, you get horizontal scaling through hardware deployment. A single modern GPU can handle hundreds of concurrent inference requests, making per-request costs approach zero for high-volume applications.

Business Impact Analysis

Cost Structure Transformation: The financial implications of local deployment are profound. Consider an application processing 1 million tokens daily through OpenAI's API—this represents $20-60 in daily costs depending on the model. Over a year, this becomes $7,300-21,900 in recurring expenses. A comparable local deployment might require a $2,000-5,000 hardware investment with no ongoing API costs.

For high-volume applications, the savings become dramatic. Applications processing 100 million tokens monthly face $60,000-180,000 annual API costs. Local deployment with appropriate hardware infrastructure could reduce this to electricity and maintenance costs—typically under $10,000 annually for equivalent processing capacity.

Enhanced Privacy and Compliance: Local deployment eliminates data sovereignty concerns entirely. Healthcare applications processing patient records, financial services handling transaction data, and enterprise tools managing proprietary information can deploy AI capabilities without data leaving their infrastructure. This simplifies compliance with GDPR, HIPAA, SOX, and other regulatory frameworks while reducing legal and security risks.

Operational Resilience: Local deployment provides significant business continuity advantages. Applications continue functioning during network outages, API service disruptions, or third-party provider issues. For mission-critical systems, this resilience can prevent costly downtime and maintain user productivity during external service failures.

Development Velocity: Local deployment accelerates development cycles by eliminating API rate limits, usage quotas, and external dependencies during development and testing. Developers can iterate freely, run comprehensive test suites, and experiment with AI features without cost concerns or rate limiting delays.

Enterprise Adoption Metrics

Real-world enterprise deployments demonstrate measurable business value:

Local Usage: Foundry Local for internal AI-powered tools, reporting 60-80% reduction in AI-related operational costs while improving developer productivity through instant AI responses in development environments.

Manufacturing Applications: Industrial IoT deployments using edge AI for predictive maintenance show 40-60% reduction in unplanned downtime while eliminating cloud connectivity requirements in remote facilities.

Financial Services: Trading firms deploying local LLMs for market analysis report sub-millisecond decision latencies while maintaining complete data isolation for competitive advantage and regulatory compliance.

ROI Calculation Framework

For AI Engineers evaluating edge deployment, consider these quantifiable factors:

Direct Cost Savings: Compare monthly API costs against hardware amortization over 24-36 months. Most applications with >$1,000 monthly API costs achieve positive ROI within 12-18 months.

Performance Value: Quantify the business impact of reduced latency. For customer-facing applications, each 100ms of latency reduction typically correlates with 1-3% conversion improvement.

Risk Mitigation: Calculate the cost of downtime or compliance violations prevented by local deployment. For many enterprise applications, avoiding a single significant outage justifies the infrastructure investment.

Development Efficiency: Measure developer productivity improvements from unlimited local AI access during development. Teams report 20-40% faster iteration cycles when AI features can be tested without external dependencies.

These metrics demonstrate that edge AI deployment with Foundry Local delivers both immediate technical improvements and substantial long-term business value, making it a strategic investment in AI infrastructure that pays dividends across multiple dimensions. 

Your Edge AI Journey Starts Here

The shift to edge AI represents more than just a technical evolution, it's an opportunity to fundamentally improve your applications while building valuable expertise in an emerging field. Whether you're looking to reduce costs, improve performance, or enhance privacy, the path forward involves both learning new concepts and connecting with a community of practitioners solving similar challenges.

Master Edge AI with Comprehensive Training

The Edge AI for Beginners  https://aka.ms/edgeai-for-beginners curriculum provides the complete foundation you need to become proficient in local AI deployment. This isn't a superficial overview, it's a comprehensive, hands-on program designed specifically for developers who want to build production-ready edge AI applications.

The curriculum takes you through hours of structured learning, progressing from fundamental concepts to advanced deployment scenarios. You'll start by understanding the principles of edge AI and local inference, then dive deep into practical implementation with Foundry Local across multiple programming languages. The program includes working examples and comprehensive sample applications that demonstrate real-world use cases.

What sets this curriculum apart is its practical focus. Instead of theoretical discussions, you'll build actual applications: document analysis systems that work offline, real-time code completion tools, intelligent chatbots that protect user privacy, and IoT applications that make decisions locally. Each project teaches both the technical implementation and the architectural thinking needed for successful edge AI deployment.

The curriculum covers multi-framework deployment patterns extensively, ensuring you can apply edge AI principles regardless of your preferred development stack. Whether you're working in Python data science environments, JavaScript web applications, C# enterprise systems, or Rust embedded projects, you'll learn the patterns and practices that make edge AI successful.

Join a Community of AI Engineers

Learning edge AI doesn't happen in isolation, it requires connection with other developers who are solving similar challenges and discovering new possibilities. The Foundry Local Discord community https://aka.ms/foundry-local-discord provides exactly this environment, connecting AI Engineers and Developers from around the world who are implementing local AI solutions.

This community serves multiple crucial functions for your development as an edge AI practitioner. You'll find experienced developers sharing implementation patterns they've discovered, debugging complex deployment issues collaboratively, and discussing the architectural decisions that make edge AI successful in production environments.

The Discord community includes dedicated channels for different programming languages, specific deployment scenarios, and technical discussions about optimization and performance. Whether you're implementing your first local AI feature or optimizing a complex multi-model deployment, you'll find peers and experts ready to help problem-solve and share insights.

Beyond technical support, the community provides valuable career and business insights. Members share their experiences with edge AI adoption in different industries, discuss the business cases that have proven most successful, and collaborate on open-source projects that advance the entire ecosystem.

Share Your Experience and Build Expertise

One of the most effective ways to solidify your edge AI expertise is by sharing your implementation experiences with the community. As you build applications with Foundry Local and deploy edge AI solutions, documenting your process and sharing your learnings provides value both to others and to your own professional development.

Consider sharing your deployment stories, whether they're successes or challenges you've overcome. The community benefits from real-world case studies that show how edge AI performs in different environments and use cases. Your experience implementing local AI in a healthcare application, financial services system, or manufacturing environment provides valuable insights that others can build upon.

Technical contributions are equally valuable, whether it's sharing configuration patterns you've discovered, performance optimizations you've implemented, or integration approaches you've developed for specific frameworks or libraries. The edge AI field is evolving rapidly, and practical contributions from working developers drive much of the innovation.

Sharing your work also builds your professional reputation as an edge AI expert. As organizations increasingly adopt local AI deployment strategies, developers with proven experience in this area become valuable resources for their teams and the broader industry.

The combination of structured learning through the Edge AI curriculum, active participation in the community, and sharing your practical experiences creates a comprehensive path to edge AI expertise that serves both your immediate project needs and your long-term career development as AI deployment patterns continue evolving.

Key Takeaways

  • Local LLM deployment transforms application economics: Replace variable API costs with fixed infrastructure investments that scale to unlimited usage, typically achieving ROI within 12-18 months for applications with significant AI workloads.

  • Foundry Local enables multi-framework edge AI: Consistent deployment patterns across Python, JavaScript, C#, and Rust environments with automatic hardware optimization and OpenAI API compatibility.

  • Performance improvements are dramatic and measurable: Sub-10ms response times replace 200-800ms cloud API latency, while automatic hardware acceleration delivers 2-50x performance improvements depending on available compute resources.

  • Privacy and compliance become architectural advantages: Local deployment eliminates data sovereignty concerns, simplifies regulatory compliance, and provides complete control over sensitive information processing.

  • Edge AI expertise is a strategic career investment: As organizations increasingly adopt local AI deployment, developers with hands-on edge AI experience become valuable technical resources with unique skills in an emerging field.

Conclusion

Edge AI deployment represents the next evolution in intelligent application development, transforming both the technical possibilities and economic models of AI-powered systems. With Foundry Local and the comprehensive Edge AI for Beginners curriculum, you have access to production-ready tools and expert guidance to make this transition successfully.

The path forward is clear: start with the Edge AI for Beginners curriculum to build solid foundations, connect with the Foundry Local Discord community to learn from practicing developers, and begin implementing local AI solutions in your projects. Each step builds valuable expertise while delivering immediate improvements to your applications.

As cloud costs continue rising and privacy requirements become more stringent, organizations will increasingly rely on developers who can implement local AI solutions effectively. Your early adoption of edge AI deployment patterns positions you at the forefront of this technological shift, with skills that will become increasingly valuable as the industry evolves.

The future of AI deployment is local, private, and performance-optimized. Start building that future today.

Resources

Edge AI for Beginners Curriculum: Comprehensive training with 36-45 hours of hands-on content examples, and production-ready deployment patterns https://aka.ms/edgeai-for-beginners 
Foundry Local GitHub Repository: Official documentation, samples, and community contributions for local AI deployment https://github.com/microsoft/foundry_local 

Foundry Local Discord Community: Connect with AI Engineers and Developers implementing edge AI solutions worldwide https://aka.ms/foundry/discord 

Foundry Local Documentation: Complete technical documentation and API references Foundry Local documentation | Microsoft Learn 

Foundry Local Model Catalog: Browse available models and deployment options for different hardware configurations Foundry Local Models - Browse AI Models

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

MCP Support in Visual Studio Reaches General Availability

1 Share

Microsoft announced in August 2025 that support for the Model Context Protocol (MCP) is generally available in Visual Studio. MCP enables AI agents within Visual Studio to connect to external tools and services via a consistent protocol.

By Edin Kapić
Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

CSLA 9 with Rocky Lhotka

1 Share
The next version of CSLA is out! Carl and Richard talk to Rocky Lhotka about his business objects framework that pre-dates .NET itself! Rocky discusses the surge in development that occurred for version 9, where a company heavily dependent on CSLA contracted developers to clear some of the backlog. The result is a few new long-term contributors, resulting in an increased development candence and a substantial modernization of the code base. The conversation also turns to AI and its role in development, as well as Rocky's experiments with making an MCP server for CSLA!



Download audio: https://dts.podtrac.com/redirect.mp3/api.spreaker.com/download/episode/68247228/dotnetrocks_1973_csla_9.mp3
Read the whole story
alvinashcraft
7 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Google Porting All Internal Workloads To Arm

1 Share
Google is migrating all its internal workloads to run on both x86 and its custom Axion Arm chips, with major services like YouTube, Gmail, and BigQuery already running on both architectures. The Register reports: The search and ads giant documented its move in a preprint paper published last week, titled "Instruction Set Migration at Warehouse Scale," and in a Wednesday post that reveals YouTube, Gmail, and BigQuery already run on both x86 and its Axion Arm CPUs -- as do around 30,000 more applications. Both documents explain Google's migration process, which engineering fellow Parthasarathy Ranganathan and developer relations engineer Wolff Dobson said started with an assumption "that we would be spending time on architectural differences such as floating point drift, concurrency, intrinsics such as platform-specific operators, and performance." [...] The post and paper detail work on 30,000 applications, a collection of code sufficiently large that Google pressed its existing automation tools into service -- and then built a new AI tool called "CogniPort" to do things its other tools could not. [...] Google found the agent succeeded about 30 percent of the time under certain conditions, and did best on test fixes, platform-specific conditionals, and data representation fixes. That's not an enormous success rate, but Google has at least another 70,000 packages to port. The company's aim is to finish the job so its famed Borg cluster manager -- the basis of Kubernetes -- can allocate internal workloads in ways that efficiently utilize Arm servers. Doing so will likely save money, because Google claims its Axion-powered machines deliver up to 65 percent better price-performance than x86 instances, and can be 60 percent more energy-efficient. Those numbers, and the scale of Google's code migration project, suggest the web giant will need fewer x86 processors in years to come.

Read more of this story at Slashdot.

Read the whole story
alvinashcraft
11 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories