Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
151098 stories
·
33 followers

How we build Azure SRE Agent with agentic workflows

1 Share

The Challenge: Ops is critical but takes time from innovation

Microsoft operates always-on, mission-critical production systems at extraordinary scale. Thousands of services, millions of deployments, and constant change are the reality of modern cloud engineering. These are titan systems that power organizations around the globe—including our own—with extremely low risk tolerance for downtime. While operations work like incident investigation, response and recovery, and remediation is essential, it’s also disruptive to innovation.

For engineers, operational toil often means being pulled away from feature work to diagnose alerts, sift through logs, correlate metrics across systems, or respond to incidents at any hour. On-call rotations and manual investigations slow teams down and introduce burnout. What's more, in the era of AI, demand for operational excellence has spiked to new heights. It became clear that traditional human-only processes couldn't meet the scale and complexity needs for system maintenance especially in the AI world where code shipping velocity has increased exponentially.

At the same time, we needed to integrate with the AI landscape which continues to evolve at a breakneck pace. New models, new tooling, and new best practices released constantly, fragmenting ecosystems between different platforms for observability, DevOps, incident management, and security. Beyond simply automating tasks, we needed to build an adaptable approach that could integrate with existing systems and improve over time.

 

 

Microsoft needed a fundamentally different way to perform operations—one that reduced toil, accelerated response, and gave engineers the time to focus on building great products.

The Solution: How we build Azure SRE Agent using agentic workflows

To address these challenges, Microsoft built Azure SRE Agent, an AI-powered operations agent that serves as an always-on SRE partner for engineers. In practice, Azure SRE Agent continuously observes production environments to detect and investigate incidents. It reasons across signals like logs, metrics, code changes, and other deployment records to perform root cause analysis. It supports engineers from triage to resolution and it’s used in a variety of autonomy levels from assistive investigation to automating remediation proposals. Everything occurs within governance guardrails and human approval checks grounded in role‑based access controls and clear escalation paths. What’s more, Azure SRE Agent learns from past incidents, outcomes, and human feedback to improve over time. But just as important as what was built is how it was built.

Azure SRE Agent was created using the agentic workflow approach—building agents with agents. Rather than treating AI as a bolt-on tool, Microsoft embedded specialized agents across the entire software development lifecycle (SDLC) to collaborate with developers, from planning through operations.

 

 

The diagram above outlines the agents used at each stage of development. They come together to form a full lifecycle:

  • Plan & Code: Agents support spec‑driven development to unlock faster inner loop cycles for developers and even product managers. With AI, we can not only draft spec documentation that defines feature requirements for UX and software development agents but also create prototypes and check in code to staging which now enables PMs/UX/Engineering to rapidly iterate, generate and improve code even for early-stage merges.
  • Verify, Test & Deploy: Agents for code quality review, security, evaluation, and deployment agents work together to shift left on quality and security issues. They also continuously assess reliability, ensure performance, and enforce consistent release best practices.
  • Operate & Optimize: Azure SRE Agent handles ongoing operational work from investigating alerts, to assisting with remediation, and even resolving some issues autonomously. Moreover, it learns continuously over time and we provide Azure SRE Agent with its own specialized instance of Azure SRE Agent to maintain itself and catalyze feedback loops.

While agents surface insights, propose actions, mitigate issues and suggest long term code or IaC fixes autonomously, humans remain in the loop for oversight, approval, and decision-making when required. This combination of autonomy and governance proved critical for safe operations at scale. We also designed Azure SRE agent to integrate across existing systems. Our team uses custom agents, Model Context Protocol (MCP) and Python tools, telemetry connections, incident management platforms, code repositories, knowledge sources, business process and operational tools to add intelligence on top of established workflows rather than replacing them.

Built this way, Azure SRE Agent was not just a new tool but a new operational system. And at Microsoft’s scale, transformative systems lead to transformative outcomes.

The Impact: Reducing toil at enterprise scale

The impact of Azure SRE Agent is felt most clearly in day-to-day operations. By automating investigations and assisting with remediation, the agent reduces burden for on-call engineers and accelerates time to resolution. 

 

 

Internally at Microsoft in the last nine months, we've seen:

  • 35,000+ incidents have been handled autonomously by Azure SRE Agent.
  • 50,000+ developer hours have been saved by reducing manual investigation and response work.
  • Teams experienced a reduced on-call burden and faster time-to-mitigation during incidents.

To share a couple of specific cases, the Azure Container Apps and Azure App Service product group teams have had tremendous success with Azure SRE Agent. Engineers for Azure Container Apps had overwhelmingly positive (89%) responses to the root cause analysis (RCA) results from Azure SRE agent, covering over 90% of incidents. Meanwhile, Azure App Service has brought their time-to-mitigation for live-site incidents (LSIs) down to 3 minutes, a drastic improvement from the 40.5-hour average with human-only activity.

And this impact is felt within the developer experience. When asked developers about how the agent has changed ops work, one of our engineers had this to say:

[It’s] been a massive help in dealing with quota requests which were being done manually at first. I can also say with high confidence that there have been quite a few CRIs that the agent was spot on/ gave the right RCA / provided useful clues that helped navigate my initial investigation in the right direction RATHER than me having to spend time exploring all different possibilities before arriving at the correct one. Since the Agent/AI has already explored all different combinations and narrowed it down to the right one, I can pick the investigation up from there and save me countless hours of logs checking.

-            Software Engineer II, Microsoft Engineering

Beyond the impact of the agent itself, the agentic workflow process has also completely redefined how we build.

Key learnings: Agentic workflow process and impact

It's very easy to think of agents as another form of advanced automation, but it's important to understand that Azure SRE agent is also a collaborative tool. Engineers can prompt the agent in their investigations to surface relevant context (logs, metrics, and related code changes) to propose actions far faster and easier than traditional troubleshooting. What’s more, they can also extend it for data analysis and dashboarding. Now engineers can focus on the agent’s findings to approve actions or intervene when necessary. The result is a human-AI partnership that scales operations expertise without sacrificing control.

While the process  took time and experimentation to refine, the payoff has been extraordinary; our team is building high-quality features faster than ever since we introduced specialized agents for each stage of the SDLC. While these results were achieved inside Microsoft, the underlying patterns are broadly applicable.

First, building agents with agents is essential to scaling, as manual development quickly became a bottleneck; agents dramatically accelerated inner loop iteration through code generation, review, debugging, security fixes, etc. When it comes to agents, specialization matters, because generic agents plateau quickly. Real impact comes from agents equipped with domain‑specific skills, context, and access to the right tools and data.

Microsoft also learned to integrate deeply with existing systems, embedding agents into established telemetry, workflows, and platforms rather than attempting to replace them. Throughout this process, maintaining tight human‑in‑the‑loop governance proved critical. Autonomy had to be balanced with clear approval boundaries, role‑based access, and safety checks to build trust.

Finally, teams learned to invest in continuous feedback and evaluation, using ongoing measurement to improve agents over time and understand where automation added value versus where human judgment should remain central.

Want to learn more?

Azure SRE Agent is one example of how agentic workflows can transform both product development and operations at scale. Teams at Microsoft are on a mission of leading the industry by example, not just sharing results. We invite you to take the practical learnings from this blog and apply the same principles in your own environments.

Read the whole story
alvinashcraft
8 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Android Weekly Issue #721

1 Share
Articles & Tutorials
Sponsored
Shipping white-label apps used to mean repeating the same steps and signing in and out of Google Play Console dozens of times per release. With Runway, ship everything in one place, just once.
alt
Tezov shows how to use Koin with the Koin Compiler and annotations to generate the dependency graph, and we validate everything through unit tests.
Jaewoong Eum explains the suspendCoroutine bridge pattern for converting callback-based Android APIs into clean suspend functions.
Shreyas Patil explains Android AppFunctions, the new API that exposes app functionality to AI agents and assistants.
Jorge Castillo shares a set of animated loaders based on mathematics.
inDrive.Tech explains how Jetpack Compose's built-in maxLength filter skips programmatic text changes, causing TextField to become completely unusable.
KMP Bits shows how to write custom Detekt rules that enforce design system constraints like banning hardcoded colors in Compose.
Marcin Moskala walks through new IntelliJ IDEA warnings for common Kotlin coroutines misuses, including awaitAll, currentCoroutineContext, and more.
Nick Skelton walks through implementing drag and drop in Kotlin Multiplatform with Compose, navigating experimental API documentation gaps.
Nav Singh demonstrates the new biometric-compose library for integrating biometric authentication directly in Jetpack Compose.
Jaewoong Eum demonstrates hot-reloading Jetpack Compose UI on real Android devices using Compose HotSwan.
Place a sponsored post
We reach out to more than 80k Android developers around the world, every week, through our email newsletter and social media channels. Advertise your Android development related service or product!
alt
Libraries & Code
A Kotlin Multiplatform network inspection SDK that intercepts HTTP and WebSocket traffic, mocks API responses, and throttles requests without a proxy.
A native block-based rich text editor for Compose Multiplatform with drag-and-drop, slash commands, and custom block types.
News
Google releases Media3 1.10 with Material 3 Compose playback widgets, a new Player composable, and improved Transformer export speed adjustment.
alt
Google releases Android Studio Panda 3 with agent skills for custom AI workflows, granular Agent Mode permissions, and updated car development support.
Google announces a 64-bit native code requirement for Wear OS apps starting September 15, 2026, with guidance on how to prepare.
Google announces Gemma 4 is available via the AICore Developer Preview, the foundation for the next-generation Gemini Nano 4 on-device AI.
Google announces Gemma 4 for Android, enabling local AI for both Android Studio coding assistance and on-device app development.
Google announces Gemma 4 is now available in Android Studio for local AI coding assistance, offering privacy and cost efficiency.
Jake Wharton announces the Android KTX libraries are being retired, as Kotlin extensions have been merged into their respective AndroidX libraries.
Videos & Podcasts
Android Developers demonstrates building on-device AI experiences in Android apps with the new Gemma 4 model.
alt
Android Developers shows how to build AI-powered Android apps using Gemma 4 for local coding assistance.
Read the whole story
alvinashcraft
8 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Anthropic’s $1B to $19B growth run: how Claude became the fastest-growing AI product in history | Amol Avasare

1 Share

Amol Avasare is Head of Growth at Anthropic, which is going through the most unprecedented growth trajectory in history—scaling from $1 billion to over $19 billion in ARR in just 14 months. Previously, Amol worked on the growth teams at Mercury and MasterClass. Before that he was a founder, and he cold emailed his way into the Anthropic role when no job listing existed. Most remarkably, he overcame a traumatic brain injury from a Muay Thai match that meant he couldn't work for nearly a year.

In our in-depth discussion, Amol shares:

1. How Amol landed his role by cold emailing Anthropic’s CPO Mike Krieger

2. How Anthropic is automating growth experiments with Claude (their internal tool called “CASH”)

3. Why the ratio of PMs to engineers might need to flip (more PMs than engineers) as AI makes engineers exponentially more productive

4. Why activation is the single highest-leverage growth problem in AI

5. Why Anthropic indexes 70/30 toward big bets (the opposite of most growth teams)

6. How he uses Cowork to detect team misalignment in Slack

7. How the company’s focus on AI coding created a research flywheel that accelerated their models

Brought to you by:

WorkOS—Modern identity platform for B2B SaaS, free up to 1 million MAUs

Vanta—Automate compliance, manage risk, and accelerate trust with AI

Episode transcript: https://www.lennysnewsletter.com/p/anthropics-1b-to-19b-growth-run

Archive of all Lenny's Podcast transcripts: https://www.dropbox.com/scl/fo/yxi4s2w998p1gvtpu4193/AMdNPR8AOw0lMklwtnC0TrQ?rlkey=j06x0nipoti519e0xgm23zsn9&st=ahz0fj11&dl=0

Where to find Amol Avasare:

• X: https://x.com/TheAmolAvasare

• LinkedIn: https://www.linkedin.com/in/amolavasare

Where to find Lenny:

• Newsletter: https://www.lennysnewsletter.com

• X: https://twitter.com/lennysan

• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/

In this episode, we cover:

(00:00) Introduction to Amol and Anthropic’s growth

(03:15) The story of cold emailing Mike Krieger to get the job

(08:28) What it’s like leading growth at the fastest-growing company ever

(10:46) What the growth team actually does at Anthropic

(12:16) The concept of “success disasters”

(13:55) Why activation is the biggest challenge in AI products

(18:05) Improving Mercury’s onboarding experience

(20:57) The importance of adding the right kind of friction

(25:10) Anthropic’s org structure

(27:06) Why Anthropic focuses on big bets over micro-optimizations

(33:34) Automating growth experiments with Claude (CASH)

(38:20) How AI is starting to identify what experiments to run

(41:07) The future of PM, engineering, and design roles

(47:19) Why you might need more PMs as engineers get more productive

(51:13) How Amol uses AI to prototype ideas and skip PRDs

(58:10) Amol’s morning routine: AI analyzes 20 to 25 charts automatically

(1:03:31) Getting coaching from an AI version of your manager

(1:06:27) How Anthropic’s focus on coding and B2B drove their success

(1:12:10) Balancing growth with AI safety as a core mission

(1:18:09) Advice for thriving in an AI-first future

(1:22:53) Anthropic’s culture and the “notebook channels” on Slack

(1:35:12) Failure corner: Shutting down his startup after raising money

(1:38:25) The traumatic brain injury that changed everything

(1:46:49) Lightning round

References: https://www.lennysnewsletter.com/p/anthropics-1b-to-19b-growth-run

Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.

Lenny may be an investor in the companies discussed.



To hear more, visit www.lennysnewsletter.com



Download audio: https://api.substack.com/feed/podcast/192660974/b8f2d64f00864328ab4d430039ff6b58.mp3
Read the whole story
alvinashcraft
8 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

String Performance: Avoid Unnecessary Conversions with StringBuilder

1 Share
The excerpt from "Rock Your Code" advises caution when using StringBuilder with non-string types, highlighting that unnecessary conversions can hinder performance.



Read the whole story
alvinashcraft
8 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

AI Unit Testing in 2026: What Developers Still Get Wrong

1 Share

AI is transforming software development at an incredible pace. Tools can now generate unit tests in seconds, covering edge cases, happy paths, and even complex flows.

It feels like we’ve solved testing.

We haven’t.

AI didn’t eliminate the need for unit testing.
It exposed a deeper problem:

We don’t validate our tests.


⚡ AI Unit Testing Is Fast, But Is It Correct?

Recent industry discussions highlight how AI is accelerating unit testing, but also raising new risks.

Two strong examples from SD Times:

These articles point to a clear shift:

We’ve moved from writing tests → generating tests.

But they stop just before the real challenge:

Who validates those tests?


AI makes test creation incredibly easy.

Today, you can:

  • Generate hundreds of tests in seconds
  • Reach impressive coverage numbers
  • Simulate multiple execution paths

And that feels like progress.

But here’s the catch:

Speed amplifies mistakes.

AI doesn’t understand your system—it predicts patterns based on existing code.

That leads to tests that are:

  • Redundant
  • Based on incorrect assumptions
  • Passing… but not testing anything meaningful

That’s the gap in AI unit testing today.

Not generation.

Validation.

❗ The Dangerous Illusion: Passing Tests

A test passing used to mean something.

Today?

Not always.

Here’s a simple example:

TEST(CalculatorTests, Add_ReturnsCorrectValue)<br>{<br>    Calculator calc;<br>    ASSERT_EQ(calc.Add(2, 3), 5);<br>}

Now imagine AI generates 20 variations of this:

  • Different inputs
  • Same logic
  • Same assertions

You get:

  • More tests
  • Higher coverage

But no additional value.

This is what we call:

False confidence.


🧩 The Real Problem: These Aren’t True Unit Tests

AI-generated tests often:

  • Call real file systems
  • Depend on time (DateTime.Now)
  • Use real services or processes

They look like unit tests.
They pass like unit tests.

But they’re not isolated.

And without isolation, you don’t have unit testing.


🛠️ Why Mocking and Isolation Matter More Than Ever

In the age of AI, mocking isn’t optional—it’s critical.

A real unit test must:

  • Run fast
  • Be deterministic
  • Isolate dependencies

This is where tools like Typemock come in.

With isolator-based unit testing:

  • You can mock static, non-virtual, and hard dependencies
  • Ensure tests don’t touch external resources
  • Keep tests truly independent

Without this?

AI will happily generate tests that:

  • Pass today
  • Break tomorrow
  • And never tell you why

🧠 The Shift: From Test Creation to Test Validation

This is the real evolution:

Before AIAfter AI
Writing tests is hardWriting tests is easy
Few tests, high intentMany tests, unclear value
Focus on creationFocus on validation

We are entering a new era:

Test Validation is the new bottleneck.


🔍 What Should You Validate?

To trust AI-generated tests, you need to verify:

1. Duplication

Are multiple tests checking the same thing?

2. Coverage Quality

Do tests actually exercise meaningful logic?

3. Isolation

Are external dependencies properly mocked?

4. Assertions

Do the assertions reflect real business intent?


🚀 Where Typemock Fits

Typemock was built for this exact challenge.

In a world of AI-generated tests, you need:

  • Strong mocking capabilities (.NET & C++)
  • Isolation of any dependency
  • Confidence that tests are real—not illusions

Typemock helps you:

  • Turn generated tests into real unit tests
  • Remove hidden dependencies
  • Ensure your test suite actually protects your code

👉 Learn more:


💡 Final Thought

AI didn’t break testing.

It revealed something we ignored:

A test that passes is not necessarily a test you can trust.

The future isn’t about writing more tests.

It’s about knowing which ones matter.

The post AI Unit Testing in 2026: What Developers Still Get Wrong appeared first on Typemock.

Read the whole story
alvinashcraft
9 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Exploring Gemma 4: The Future of Local AI Models

1 Share

Recently, Deepmind unveiled Gemma 4, the highly anticipated successor to the popular Gemma 3 model lineup. We’re excited to explore its performance when run locally, especially using vLLM at full capacity. As we delve into its capabilities, we’ll also share insights on setting up your own local AI environment to test Gemma 4’s prowess.

Based on content from Digital Spaceport

Technical Setup

For those eager to replicate our setup, we recommend checking out the Hermes OpenwebUI Setup guide and the 8 GPU Rack build video for detailed instructions. Here’s a list of hardware essentials we used:

  • GPUs: 3090 24GB, 5060Ti 16GB, 4090 24GB
  • Motherboard: MZ32-AR0
  • CPU: AMD EPYC 7702
  • RAM: 256GB DDR4 DIMMs
  • Power Supplies: Corsair HX1500i, Seasonic PRIME PX1600
  • Riser Cables and Rack: x16 PCIe Risers, PCIe3 x1 USB risers, Plastic Rack

Visit Digital Spaceport for a comprehensive DIY guide.

Exploring Gemma 4’s Features

Gemma 4 introduces several enhancements, including support for up to 140 languages and a context window of up to 256. Models range from lightweight variants like E2B and E4B, optimized for low-end hardware, to the most robust 31B model. One standout feature is its ability to handle diverse AI tasks with impressive reasoning and multimodality, even on smaller models.

Benchmarking and Performance

The improved context window prevents quality deterioration, a significant upgrade from its predecessor. Notably, tests showed exceptional performance jumps in MMLU and code evaluation scenarios, indicating a considerable leap compared to the Gemma 3 series. While we’re still conducting nuanced benchmark testing, early results are promising.

The Ethical Dimension

In exploring AI capabilities, ethical considerations remain paramount. One of our tests posed a classic ethical dilemma, where Gemma 4 demonstrated commendable reasoning, albeit with some limitations around inherent safety protocols. This scenario underscores the need for continual improvements in AI ethics training, ensuring comprehensive self-governance in complex situations.

Conclusion

Gemma 4 represents a promising stride in local AI deployment, offering versatility and power across various configurations. Whether you’re looking to harness its capabilities for coding tasks or exploring its safety features, Gemma 4’s versatility holds immense potential for both hobbyists and professionals.

To stay updated with our latest AI explorations, consider supporting us through membership, Patreon, or purchasing via our affiliate links. For more details on the Gemma 4 model and associated resources, visit the links provided.

Read the whole story
alvinashcraft
11 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories