When you first set up Azure SRE Agent, it’s tempting to give it everything. Connect all your alert sources, route every severity, set up scheduled tasks to poll your channels every 30 seconds. The agent can handle all of it.
But a few simple configuration choices can help you get more value from every token the agent uses. Each investigation creates a conversation thread, and each thread consumes tokens. With the right setup, you can make sure the agent is spending those tokens on the work that has the highest impact.
The pattern that works best: start focused, see results, and expand from there. Here are three ways to do that.
1. Start with the incidents that matter most
It's natural to want full coverage from day one. But in practice, starting narrow and expanding works better. When you route only high-severity or high-impact incidents to the agent first, you get to see the quality of its investigations on the work that matters most. Once you trust the output, expanding to broader coverage is a confident decision, not a leap of faith.
The mechanism for this is your **incident response plan**. Instead of relying on a default handler that routes everything, create a targeted response plan with filters that match the incidents you want the agent to investigate.
Incident response plan filters: severity, title keywords, and exclusions.
Getting started:
- Go to Response plan configuration and create a new incident response plan.
- Set the Severity filter. A good starting point is Sev0 through Sev2. These are the incidents where deep investigation has the highest impact.
- Use Title contains to focus on specific incident patterns, or Title does not contain to exclude known noisy alerts.
- Preview the filter results to see which past incidents would have matched.
As you see results and get comfortable, widen the filters. Add Sev3. Remove title exclusions. Bring in more incident sources. The agent will handle the volume, and you'll know what the cost looks like because you've been watching it grow incrementally.
If you already have an agent running with broad filters, it's worth reviewing your response plan. A quick check on your severity and title filters can make sure the agent is spending its time on the incidents you care about.
2. Replace high-frequency polling with smarter patterns
Scheduled tasks are one of the most powerful features of the agent, but they're also where cost can quietly balloon. The reason is simple: a scheduled task runs on a timer whether there's anything to find. An incident investigation fires once per incident. A task polling every 2 minutes fires 720 times a day, and most of those runs may find nothing new.
High-frequency polling is generally a weak engineering pattern regardless of cost. It wastes compute, creates unnecessary load, and in the case of an AI agent, burns tokens checking for changes that haven't happened. Better patterns exist.
Prefer push over poll. If the source system can send a signal (an alert, a webhook, a ticket), use that to trigger the agent. Push-based workflows fire only when something happens. This is cheaper and faster than polling.
When polling is the right fit, batch it. Instead of checking every 2 minutes, run a thorough check every hour. One consolidated report from 24 daily runs is more useful than 720 micro-checks that mostly say "nothing changed." The hourly report shows trends. The 2-minute poll shows snapshots.
Consider HTTP triggers. If you have an external system that knows when work is needed (a deployment pipeline, a CI/CD tool, a monitoring platform), use an HTTP trigger to invoke the agent on demand. The agent only runs when there's actually something to do.
Match frequency to the operational cadence. A Teams channel monitor works fine at 5-minute intervals. Humans don't type that fast. A health summary runs once a day. A shift-handoff report runs once per shift. Ask: how quickly do I actually need to detect this change? The answer is almost always slower than the timer you first set.
3. Keep threads fresh
Here's a detail that's easy to miss: every time a scheduled task runs, it adds to the same conversation thread. The agent reads the full thread history before responding. So a task that runs hourly accumulates 24 conversations a day in the same thread. After a week, the agent is reading through hundreds of prior exchanges before it even starts on the new work.
The work stays the same. The cost per run keeps climbing. It's the equivalent of reopening a document and reading the entire thing from page one every time you want to add a sentence at the end.
The fix is one setting. When creating or editing a scheduled task, set "Message grouping for updates" to "New chat thread for each run."
That gives the agent a clean context on every execution. No accumulated history, no growing cost. One dropdown, predictable token usage on every run.
The pattern
Start small with incident routing, expand as you see results. Replace high-frequency polling with push signals, batching, and HTTP triggers. Keep scheduled task threads fresh with "New chat thread for each run."
The agent is built to handle whatever you throw at it. These patterns just make sure you're getting the most value for what you spend.