Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
151694 stories
·
33 followers

A hodgepodge of ideas spewing in my head

1 Share
As I sit down to write, I have a hodgepodge of ideas spewing in my head, but none that has taken hold in any immersive way. Usually a blog post has a single topic of focus, and I try to go somewhat deep into it. But this approach can be problematic: If I don't have an idea that catches my attention, I feel I have nothing to write about. Hence, I'll skip my writing time “until the muse strikes” or something. But then days pass without the muse striking, and I start to wonder if I've gone about the creative process all wrong.

Read the whole story
alvinashcraft
35 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Context Engineering Lessons from Building Azure SRE Agent

1 Share

We spent a long time chasing model upgrades, polishing prompts, and debating orchestration strategies. The gains were visible in offline evals, but they didn’t translate into the reliability and outcomes we wanted in production. The real breakthrough came when we started caring much more about what we were adding to the context, when, and in what form. In other words: context engineering.

Every context decision involves tradeoffs: latency, autonomy (how far the agent goes without asking), user oversight, pre-work (retrieve/verify/compute before answering), how the agent decides it has sufficient evidence, and the cost of being wrong. Push on one dimension, and you usually pay for it elsewhere.

This blog is our journey building Azure SRE Agent – a cloud AI agent that takes care of your Azure resources and handles your production incidents autonomously. We'll talk about how we got here, what broke along the way, which context patterns survived contact with production, and what we are doing next to treat context engineering as the primary lever for reliable AI-driven SRE.

Tool Explosion, Under-Reasoned

We started where everyone starts: scoped tools and prescriptive prompts. We didn't trust the model in prod, so we constrained it. Every action got its own tool. Every tool got its own guardrails.

Azure is a sprawling ecosystem - hundreds of services, each with its own APIs, failure modes, and operational quirks. Within 2 weeks, we had 100+ tools and a prompt that read like a policy manual.

The cracks showed fast. User hits an edge case? Add a tool. Tool gets misused? Add guardrails. Guardrails too restrictive? Add exceptions. The backlog grew faster than we could close it.

Worse, the agent couldn’t generalize. It was competent at the scenarios we’d already encoded and brittle everywhere else. We hadn't built an agent - we'd built a workflow with an LLM stapled on.

Insight #1: If you don’t trust the model to reason, you’ll build brittle workflows instead of an agent.

Wide tools beat many tools

Our first real breakthrough came from asking a different question: what if, instead of 100 narrow tools, we gave the model two wide ones?

We introduced `az` and `kubectl` CLI commands as first-class tools. These aren’t “tools” in the traditional sense - they’re entire command-line ecosystems. But from the model’s perspective, they’re just two entries: “execute this Azure CLI command” and “execute this Kubernetes command”

The impact was immediate:

  • Context compression: Three tools instead of hundreds. Massive headroom recovered.
  • Capability expansion: The model now had access to the entire az/kubectl surface area, not just the subset we had wrapped.
  • Better reasoning: LLMs already “know” these CLIs from training data. By hiding them behind custom abstractions, we were fighting their priors.

This was our first hint of a deeper principle:

Insight #2: Don’t fight the model’s existing knowledge - lean on it. 

Multi-Agent Architectures: Promise, Pain, and the Pivot

Looking at the success of generic tools, we went further and built a full multi-agent system with handoffs.  A “handoff” meant one agent explicitly transferring control - along with the running context and intermediate results - to another agent.

Human teams are organized by specialty, so we mirrored that structure: specialized sub-agents with focused personas, each owning one azure service, handing off when investigations crossed boundaries. 

The theory was elegant: lazy tool loading.

  • The orchestrator knows about sub-agents, not individual tools.
  • User asks about Kubernetes? Hand off to the K8s agent.
    Networking question? Route to the networking agent.
  • Each agent loads only its own tools. Context stays lean.

It worked beautifully at small scale. Then we grew to 50+ sub-agents and it fell apart.

The results showed a bimodal distribution: when handoffs worked, everything worked; when they didn't, the agent got lost. We saw a clear cliff – problems requiring more than four handoffs almost always failed.

The following patterns emerged:

  1. Discovery problems.
    Each sub-agent only knew sub-agents it could directly call. Users would ask reasonable questions and get “I don’t know how to help with that” - not because the capability didn’t exist, but because the orchestrator didn’t know that the right sub-agent was buried three hops away.
  2. System prompt fragility.
    Each sub-agent has its own system prompt. A poorly tuned sub-agent doesn’t just fail locally - it affects the entire reasoning chain with its conflicting instructions. The orchestrator’s context gets polluted with confused intermediate outputs, and suddenly nothing works. One bad agent drags down the whole interaction, and we had over 50 SubAgents at this point.
  3. Infinite Loops.
    In the worst cases, agents started bouncing work around without making progress. The orchestrator would call a sub-agent, which would defer back to the orchestrator or another sub-agent, and so on. From the user’s perspective, nothing moved forward; under the hood, we were burning tokens and latency on a “you handle it / no, you handle it” loop. Hop limits and loop detection helped, but they also undercut the original clean architecture of the design.
  4. Tunnel Vision.
    Human experts have overlapping domains - a Kubernetes engineer knows enough networking to suspect a route issue, enough about storage to rule it out. This overlap makes human handoffs intelligent. Our agents had hard boundaries. They either surrendered prematurely or developed tunnel vision, chasing symptoms in their domain while the root cause sat elsewhere.

Insight #3: Multi-agent systems are hard to scale - coordination is the real work.

The failures revealed a familiar pattern. With narrow tools, we'd constrained what the model could do – and paid in coverage gaps. With domain-scoped agents, we'd constrained what it could explore – and paid in coordination overhead. Same overcorrection, different layer.

The fix was to collapse dozens of specialists into a small set of generalists. This was only possible because we already had generic tools. We also moved the domain knowledge from system prompts into files the agents could read on demand (later morphed to agent skills capability inspired by Anthropic). 

Our system evolved: fewer agents, broader tools, and on-demand knowledge replaced brittle routing and rigid boundaries. Reliability improved as we stopped depending on the handoff roulette.

Insight #4: Invest context budget in capabilities, not constraints.

A Real Example: The Agent Debugging Itself

Case in point: Our own Azure OpenAI infrastructure deployment started failing. We asked the SRE agent to debug it. 

Without any predefined workflow, it checked deployment logs, spotted a quota error, queried our subscription limits, found the correct support request category, and filed a ticket with the support team. The next morning, we had an email confirming our quota increase.

Our old architecture couldn't have done this - we had no Cognitive Services sub-agent, no support request tool. But with az as a wide tool and cross-domain knowledge, the model could navigate Azure's surface area the same way a human would.

This is what we mean by capability expansion. We never anticipated this scenario. With generalist agents and wide tools, we didn't need to.

Context Management Techniques for Deep Agents

After consolidating tools and agents, we focused on context management for long-running conversations.

1. The Code Interpreter Revelation

Consider metrics analysis. We started with the naive approach: dump all metrics into the context window and ask the model to find anomalies.

This was backwards. We were taking deterministic, structured data and pushing it through a probabilistic system. We were asking an LLM to do what a single Pandas one-liner could do. We ended up paying in tokens, latency, and accuracy (models don’t like zero-valued metrics).

Worse, it kind of worked. For short windows. For simple queries. Just enough success to hide how fundamentally wrong the approach was. Classic “works in demo, fails in prod.”

The fix was obvious in hindsight: let the model write code.

  • Don’t send 50K tokens of metrics into the context.
  • Send the metrics to a code interpreter.
  • Let the model write the pandas/numpy analysis.
  • Execute it. Return only the results and analysis of the results.

Metrics analysis had been our biggest source of tool failures. After this change: zero failures. And because we weren’t paying the token tax anymore, we could extend time ranges by an order of magnitude.

Insight #5: LLMs are orchestrators, not calculators.
Use them to decide what computation to run, then let actual code perform the computation.

2. Planning and Compaction

We also added two other patterns: a todo-style planner and more aggressive compaction.

  • Todo planner: Represent the plan as an explicit checklist outside the model’s context, and let the model update it instead of re-deriving the workflow on every turn.
  • Compaction: Continuously shrink history into summaries and structured state (e.g., key incident facts), so the context stays a small working set rather than an ever-growing log.

Insight #6: Externalizing plans and compacting history effectively “stretch” the usable context window.

3. Progressive disclosure with Files

With code interpretation working, we hit the next wall: tool calls returning absurd amounts of data.

Real example: an internal App Service Control Plane log table on which a user fires off a SELECT * – style query. The table has ~3,000 columns. Single digit log entry expands to 200K+ tokens. The context window is gone. The model chokes. The user gets an error.

Our solution was session-based interception.

Tool calls that can return large payloads never go straight into context. Instead, they write as a “file” into a sandboxed environment where the data can be:

  • Inspected ("what columns exist?")
  • Filtered ("show only the error-related columns")
  • Analyzed via code ("find rows where latency > p99")
  • Summarized before anything enters the model’s context

The model never sees the raw 200K tokens. It sees a reference to a session and a set of tools to interact with that session. We turned an unbounded context explosion into a bounded, interactive exploration. You have seen this with coding agents, and the idea was similar - could the model find its way through the large amount of data?

 

 

Insight #7: Treat large tool outputs as data sources, not context.

4. What's Next: Tool Call Chaining

The next update we’re working on is tool call chaining. This idea started with solving Troubleshooting Guides (TSGs) as Code.

A lot of agent workflows are predictable: “run this query, fetch these logs, slice this data, summarize the result.” Today, we often force the model to walk that path with one tool at a time:

Today, it often looks like:

Model → Tool A → Model → Tool B → Model → Tool C → Model → … → Response

The alternative:

Model → [Script: Tool A → Tool B → Tool C → … → Final Output] → Model → Response

The model writes a small script that chains the tools together. The platform executes the script and returns consolidated results. Three roundtrips become one. Context overhead drops by 60–70%.

This also unlocks something subtle: deterministic workflows inside probabilistic systems. Long-running operations that must happen in a specific order can be encoded as scripts. The model decides what should happen; the script guarantees how it happens. Anthropic recently published  a similar capability with Programmatic tool calling.

The Meta Lesson

Six months ago, we thought we were building an SRE agent. In reality, we were building a context engineering system that happens to do Site Reliability Engineering.

Better models are table stakes, but what moved the needle was what we controlled: generalist capabilities and disciplined context management.

Karpathy’s analogy holds: If context windows are the agent’s “RAM” then context engineering is memory management: what to load, what to compress, what to page out, and what to compute externally. As you fill it up, model quality often drops non-linearly - “lost in the middle,” “not adhering to my instructions,” and plain old long-context degradation shows up well before we hit the advertised limits. More tokens don’t just cost latency; they quietly erode accuracy.

We’re not done. Most of what we have done is “try it, observe, watch it break, tighten the loop”. But these patterns that keep working - wide tools, code execution, context compaction, tool chaining - are the same ones we see rediscovered across other agent stacks. In the end, the throughline is simple: give the model fewer, cleaner choices and spend your effort making the context it sees small, structured, and easy to operate on.

Read the whole story
alvinashcraft
35 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

PPP 487 | Why Humor Is a Serious Leadership Skill, with comedian Adam Christing

1 Share

Summary

In this episode, Andy talks with comedian and corporate emcee Adam Christing, author of The Laughter Factor: The 5 Humor Tactics to Link, Lift, and Lead. If you have ever hesitated to use humor at work because you were unsure it would land, or worried it might backfire, this conversation offers both encouragement and a practical path forward.

Adam shares how his early influences shaped his approach to humor and why he believes every human is also a "humor being." You will hear why humor is more than chasing chuckles, including how it can build trust, improve learning, and strengthen relationships on teams. Adam introduces the concept of "laugh languages" and walks through examples such as Surprise and Poke, along with guidance on how to tease without crossing the line. They also discuss tailoring humor across cultures and how leaders can bring the laughter factor home with their families.

If you are looking for practical insights on leading with humor, building trust, and bringing more humanity into your projects and teams, this episode is for you!

Sound Bites

  • "If you're a human being, you are also a humor being, and I would say not only do you have a sense of humor, but a sense of humor has you."
  • "The audience is actually, whether it's three people or 300, they're actually rooting for you."
  • "They don't want to be bored. They want to be entertained."
  • "When we think back on the things that have made us laugh the most, it's often the flops that are the funniest."
  • "They won't trust your humor until you do."
  • "There's a saying in show business, 'funny is money'."
  • "I really believe that humor is a bridge that helps you connect heart to heart with other people."
  • "You're a leader. You need to be the one building trust."
  • "Humor is a shortcut to trust."
  • "Leaders help their people learn with laughter."
  • "Increase your LPMs: laughs per meeting."
  • "If in doubt, leave it out."
  • "Every meeting really should be a party with a purpose."

Chapters

  • 00:00 Introduction
  • 01:43 Start of Interview
  • 03:38 Adam's Backstory and Early Influences
  • 05:23 "I'm Not Funny" and the Confidence Barrier
  • 10:36 Why Humor Is More Than Just Chuckles
  • 16:00 The Laughter Factor Explained
  • 18:10 Laugh Languages and the Power of Surprise
  • 21:09 Poke: Teasing Without Crossing the Line
  • 24:42 Using Humor Across Cultures
  • 30:14 How You Know the Laughter Factor Is Working
  • 32:17 Developing a Laughter Factor at Home
  • 34:25 End of Interview
  • 34:55 Andy Comments After the Interview
  • 38:02 Outtakes

Learn More

Get a copy of Adam's book The Laughter Factor: The 5 Humor Tactics to Link, Lift, and Lead.

You can learn more about Adam and his work at TheLaughterFactor.com. While you are there, check out the short questionnaire to discover your laugh language.

For more learning on this topic, check out:

  • Episode 316 with Jennifer Aaker and Naomi Bagdonas. They are completely on this theme of humor being a strategic ability for leaders and teams.
  • Episode 109 with Peter McGraw. Peter breaks down what makes something funny based on his book The Humor Code, an episode Andy still calls back to today.
  • Episode 485 with John Krewson, a conversation about lessons from sketch comedy that nicely reinforce ideas from today's episode.

Level Up Your AI Skills

Join other listeners from around the world who are taking our AI Made Simple course to prepare for an AI-infused future.

Just go to ai.PeopleAndProjectsPodcast.com. Thanks!

Pass the PMP Exam

If you or someone you know is thinking about getting PMP certified, we've put together a helpful guide called The 5 Best Resources to Help You Pass the PMP Exam on Your First Try. We've helped thousands of people earn their certification, and we'd love to help you too. It's totally free, and it's a great way to get a head start.

Just go to 5BestResources.PeopleAndProjectsPodcast.com to grab your copy. I'd love to help you get your PMP this year!

Join Us for LEAD52

I know you want to be a more confident leader, that's why you listen to this podcast. LEAD52 is a global community of people like you who are committed to transforming their ability to lead and deliver. It's 52 weeks of leadership learning, delivered right to your inbox, taking less than 5 minutes a week. And it's all for free. Learn more and sign up at GetLEAD52.com. Thanks!

Thank you for joining me for this episode of The People and Projects Podcast!

Talent Triangle: Power Skills

Topics: Leadership, Humor At Work, Trust Building, Communication, Team Culture, Psychological Safety, Cross-Cultural Leadership, Meeting Facilitation, Emotional Intelligence, Influence, Learning And Development, People Management, Project Management

The following music was used for this episode:

Music: The Fantastical Ferret by Tim Kulig
License (CC BY 4.0): https://filmmusic.io/standard-license

Music: Synthiemania by Frank Schroeter
License (CC BY 4.0): https://filmmusic.io/standard-license





Download audio: https://traffic.libsyn.com/secure/peopleandprojectspodcast/487-AdamChristing.mp3?dest-id=107017
Read the whole story
alvinashcraft
36 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

IoTCT Webcast Episode 293 - "Non-Alcoholic Predictions" (Hot IoT+ guess for 2026)

1 Share
From: Iot Coffee Talk
Duration: 1:00:46
Views: 8

Welcome to IoT Coffee Talk, where hype comes to die a terrible death. We have a fireside chat about all things #IoT over a cup of coffee or two with some of the industry's leading business minds, thought leaders and technologists in a totally unscripted, organic format.

This week Rob, Pete, and Leonard jump on Web3 to host a discussion about:

🎶 🎙️ BAD KARAOKE! 🎸 🥁 "Light Up The Sky" by Van Halen
🐣 AI music will rule the day because most people don't appreciate jazz!
🐣 Rob thinks that rock n' roll may be on an extinction path! How an analog board and Dave Grohl will save rock n' roll humanity!
🐣 Will our over-dependence and over-reliance on tech and networks be the downfall of humanity?
🐣 Did we shoot ourselves in the branding foot naming our podcast after "IoT"? The Metaverse of Everything (MOE) Coffee Talk?
🐣 What is an industry analyst, and how are they different than influencers and Walls street analysts?
🐣 Why it was important that Arthur Andersen should have stayed independent with Enron.
🐣 Rob predicts that everyone will go back to Windows 10 to escape MS AI bloatware!
🐣 Let's get physical! We will be getting physical in 2026 with physical AI!
🐣 The gang drop their predictions for 2026, and it is sooooo good!

It's a great episode. Grab an extraordinarily expensive latte at your local coffee shop and check out the whole thing. You will get all you need to survive another week in the world of IoT and greater tech!

Tune in! Like! Share! Comment and share your thoughts on IoT Coffee Talk, the greatest weekly assembly of Onalytica and CBT tech and IoT influencers on the planet!!

If you are interested in sponsoring an episode, please contact Stephanie Atkinson at Elevate Communities. Just make a minimally required donation to www.elevatecommunities.org and you can jump on and hang with the gang and amplify your brand on one of the top IoT/Tech podcasts in the known metaverse!!!

Take IoT Coffee Talk on the road with you on your favorite podcast platform. Go to IoT Coffee Talk on Buzzsprout, like, subscribe, and share: https://lnkd.in/gyuhNZ62

Read the whole story
alvinashcraft
36 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Are your .NET 10 Framework Dependent (not self-contained) EXEs 5x larger than they were on .NET 9?

1 Share

While building the unpackaged binaries for Text Grab I couldn’t understand why the build of --self-contained false was not producing EXEs that were any smaller than --self-contained true. Copilot and Claude were not helpful so I did some searching and could not find the answer… until I searched GitHub and found several open issues discussing.

According to this issue https://github.com/dotnet/sdk/issues/52070 the way I was trying to build framework dependent with --self-contained false is no longer supported and instead we should be using --no-self-contained.

Once I updated my build script, my x64 and Arm64 EXEs were the size I expected! Hopefully this helps someone else who was flipping all the different flags trying to find the issue.

Joe



Read the whole story
alvinashcraft
36 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

How Do I Audit Source Code?

1 Share
Learn how to audit source code step by step. This practical guide explains tools, techniques, and best practices for secure code audits, AI generated code review, and compliance ready software.
Read the whole story
alvinashcraft
37 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories