Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
151139 stories
·
33 followers

How I set up Claude Code in iTerm2 to launch all my AI coding projects in one click

1 Share
Managing multiple Claude Code projects doesn't have to be chaotic. My iTerm2 setup dramatically reduces friction in my daily AI-assisted coding workflows - here's how.
Read the whole story
alvinashcraft
50 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Introducing Maturity Maps: A New Way to Measure AI Adoption

1 Share
From: AIDailyBrief
Duration: 22:27
Views: 493

Maturity Maps present a framework for assessing AI readiness across six dimensions: Use, Data and Infrastructure, Workflow Integration, Agent Deployment, Talent and Culture, and Governance. Benchmarks expose an adoption mirage in marketing and sales and widespread governance and monitoring gaps. Customer service reveals high AI adoption paired with oversight shortfalls and human workload strain, while the capability overhang highlights missing data pipelines, workflow integration, and organized agent management.

The AI Daily Brief helps you understand the most important news and discussions in AI.
Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614
Get it ad free at http://patreon.com/aidailybrief
Learn more about the show https://aidailybrief.ai/

Read the whole story
alvinashcraft
50 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Random.Code() - Managing Properties From Records in C#, Part 8

1 Share
From: Jason Bock
Duration: 44:33
Views: 4

Hopefully I can wrap up this feature today by writing more tests, and confirming everything is done.

https://github.com/JasonBock/Transpire/issues/44

#dotnet #csharp

Read the whole story
alvinashcraft
50 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

993: It’s Been A Hell Of Week

1 Share

Scott and Wes break down a chaotic week in dev news — the Claude Code source leak, a nasty Axios npm supply chain hack, and Railway’s private cache exposure — plus how to keep these nightmare scenarios from hitting your own projects.

Show Notes

Sick Picks

Shameless Plugs

Hit us up on Socials!

Syntax: X Instagram Tiktok LinkedIn Threads

Wes: X Instagram Tiktok LinkedIn Threads

Scott: X Instagram Tiktok LinkedIn Threads

Randy: X Instagram YouTube Threads





Download audio: https://traffic.megaphone.fm/FSI8822565885.mp3
Read the whole story
alvinashcraft
50 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Episode 513 - Everything's Broken w/ Mike Peditto

1 Share

if you want to check out all the things ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠torc.dev⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ has going on head to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠linktr.ee/taylordesseyn⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ for more information on how to get plugged in!





Download audio: https://anchor.fm/s/ce6260/podcast/play/117905116/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2026-3-3%2Fe6520b20-13ec-96be-b4a3-4a59cddf0109.mp3
Read the whole story
alvinashcraft
50 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Tracking Every Token: Granular Cost and Usage Metrics for Microsoft Foundry Agents

1 Share

As organizations scale their use of AI agents, one question keeps surfacing: how much is each agent actually costing us? Not at the subscription level. Not at the resource group level. Per agent, per model, per request.

This post walks through a solution that answers that question by combining three Azure services Microsoft AI Foundry, Azure API Management (APIM), and Application Insights into an observable, metered AI gateway with granular token-level telemetry including custom dates greater than a month for deeper analysis.

The Problem: AI Costs Can be a Black Box

Foundry’s built-in monitoring and cost views are ultimately powered by telemetry stored in Application Insights, and the out-of-the-box dashboards don’t always provide the exact per-request/per-caller token breakdown or the custom aggregations/joins teams may want for bespoke dashboards (for example, breaking down tokens by APIM subscription, product, tenant, user, route, or agent step). Using APIM to stamp consistent caller/context metadata (headers/claims), Foundry to generate the agent/model run telemetry, and App Insights as the queryable store to let you correlate gateway, agent run, tool/model calls and then build custom KQL-driven dashboards. With data captured in App Insights and custom KQL queries, questions such as below can be answered:

  • Which agent consumed the most tokens last week?
  • What's the average cost per request for a specific agent?
  • How do prompt tokens vs. completion tokens break down per model?
  • Is one agent disproportionately expensive compared to others?

Why This Solution Was Built

This solution was built to close the observability gap between "we deployed agents" and "we understand what those agents cost." The goals were straightforward:

  1. Per-agent, per-model cost attribution - Know exactly which agent is consuming what, down to the token.
  2. Real-time telemetry, not batch reports - Metrics flow into Application Insights within minutes, query via KQL.
  3. Zero agent modification - The agents themselves don't need to know about telemetry. The tracking happens at the gateway layer.
  4. Extensibility - Any agent hosted in Microsoft Foundry and exposed through APIM can be added with a single function call.

How It Works

The architecture is intentionally simple three services, one data flow. The notebook serves as a testing and prototyping environment, but the same `call_agent()` and `track_llm_usage()` code can be lifted directly into any production Python application that calls Foundry agents.

 

Azure API Management acts as the AI Gateway. Every request to a Foundry-hosted agent flows through APIM, which handles routing, rate limiting, authentication, and tracing. APIM adds its own trace headers (`Ocp-Apim-Trace-Location`) so you can correlate gateway-level diagnostics with your application telemetry. After the API request is successfully completed, we can extract the necessary data from response headers.

The notebook is designed for testing and rapid iteration call an agent, inspect the response, verify that telemetry lands in App Insights. It uses `httpx` to call agents through APIM, authenticating with `DefaultAzureCredential` and an APIM subscription key. After each response, it extracts the `usage` object `input_tokens`, `output_tokens`, `total_tokens` — and calculates an estimated cost based on built-in per-model pricing.

Application Insights receives this telemetry via OpenTelemetry. The solution sends data to two tables:

customMetrics - Cumulative counters for prompt tokens, completion tokens, total tokens, and cost in USD. These power dashboards and alerts.

traces - Structured log entries with `custom_dimensions` containing agent name, model, operation ID, token counts, and cost per request. These power ad-hoc KQL queries.

traces - stores your application’s trace/log messages (plus custom properties/measurements) as queryable records in Azure Monitor Logs.

Demonstrating Granular Cost and Usage Metrics

This is where the solution shines. Once telemetry is flowing, you can answer detailed questions with simple KQL queries.

Per-Request Detail

Query the `traces` table to see every individual agent call with full token and cost breakdown:

traces | where message == "llm.usage" | extend cd = parse_json(replace_string( tostring(customDimensions["custom_dimensions"]), "'", "\"")) | extend agent_name = tostring(cd["agent_name"]), model = tostring(cd["model"]), prompt_tokens = toint(cd["prompt_tokens"]), completion_tokens = toint(cd["completion_tokens"]), total_tokens = toint(cd["total_tokens"]), cost_usd = todouble(cd["cost_usd"]) | project timestamp, agent_name, model, prompt_tokens, completion_tokens, total_tokens, cost_usd | order by timestamp desc

This gives you a line-item audit trail every request, every agent, every token.

Aggregated Metrics Per Agent

Summarize across all requests to see averages and totals grouped by agent and model:

traces | where message == "llm.usage" | extend cd = parse_json(replace_string( tostring(customDimensions["custom_dimensions"]), "'", "\"")) | extend agent_name = tostring(cd["agent_name"]), model = tostring(cd["model"]), prompt_tokens = toint(cd["prompt_tokens"]), completion_tokens = toint(cd["completion_tokens"]), total_tokens = toint(cd["total_tokens"]), cost_usd = todouble(cd["cost_usd"]) | summarize calls = count(), avg_prompt = avg(prompt_tokens), avg_completion = avg(completion_tokens), avg_total = avg(total_tokens), avg_cost = avg(cost_usd), total_cost = sum(cost_usd) by agent_name, model | order by total_cost desc

 

Now you can see at a glance:

  1. Which agent is the most expensive across all calls
  2. Average token consumption per request useful for prompt optimization
  3. Prompt-to-completion ratio a high ratio may indicate verbose system prompts that could be trimmed
  4. Cost trends by model is GPT-4.1 worth the premium over GPT-4o-mini for a particular agent?

The same can be done in code with your custom solution:

from datetime import timedelta from azure.identity import DefaultAzureCredential from azure.monitor.query import LogsQueryClient KQL = r""" traces | where message == "llm.usage" | extend cd_raw = tostring(customDimensions["custom_dimensions"]) | extend cd = parse_json(replace_string(cd_raw, "'", "\"")) | extend agent_name = tostring(cd["agent_name"]), model = tostring(cd["model"]), operation_id = tostring(cd["operation_id"]), prompt_tokens = toint(cd["prompt_tokens"]), completion_tokens = toint(cd["completion_tokens"]), total_tokens = toint(cd["total_tokens"]), cost_usd = todouble(cd["cost_usd"]) | project timestamp, agent_name, model, operation_id, prompt_tokens, completion_tokens, total_tokens, cost_usd | order by timestamp desc """ def query_logs(): credential = DefaultAzureCredential() client = LogsQueryClient(credential) resp = client.query_resource( resource_id=APP_INSIGHTS_RESOURCE_ID, # defined in config cell query=KQL, timespan=None, # No time filter — returns all available data (up to 90-day retention) ) if resp.status != "Success": raise RuntimeError(f"Query failed: {resp.status} - {getattr(resp, 'error', None)}") table = resp.tables[0] rows = [dict(zip(table.columns, r)) for r in table.rows] return rows if __name__ == "__main__": rows = query_logs() if not rows: print("No telemetry found. Wait 2-5 min after running the agent cell and try again.") else: print(f"Found {len(rows)} records\n") print(f"{'Timestamp':<28} {'Agent':<16} {'Model':<12} {'Op ID':<12} " f"{'Prompt':>8} {'Completion':>11} {'Total':>8} {'Cost ($)':>10}") print("-" * 110) for r in rows[:20]: ts = str(r.get("timestamp", ""))[:19] print(f"{ts:<28} {r.get('agent_name',''):<16} {r.get('model',''):<12} " f"{r.get('operation_id',''):<12} {r.get('prompt_tokens',0):>8} " f"{r.get('completion_tokens',0):>11} {r.get('total_tokens',0):>8} " f"{r.get('cost_usd',0):>10.6f}")

 

 

What You Can Build on Top

Azure Workbooks - Build interactive dashboards showing cost trends over time, agent comparison charts, and token distribution heatmaps.

Alerts - Trigger notifications when a single agent exceeds a cost threshold or when token consumption spikes unexpectedly.

Azure Dashboard pinning - Pin KQL query results directly to a shared Azure Dashboard for team visibility.

Power BI integration - Export telemetry data for executive-level cost reporting across all AI agents.

Extensibility: Add Any Agent in One Line

The solution is designed to scale with your agent portfolio. Any agent hosted in Microsoft Foundry and exposed through APIM can be integrated without modifying the telemetry pipeline. Adding a new agent is a single function call:

response = call_agent("YourNewAgent", "Your prompt here")

Token tracking, cost estimation, and telemetry export happen automatically. No additional configuration, no new infrastructure.

From Notebook to Production

The notebook is a testing harness, a fast way to validate agent connectivity, inspect raw responses, and confirm that telemetry arrives in App Insights. But the code isn't limited to notebooks.

The core functions `call_agent()`, `track_llm_usage()`, and the OpenTelemetry configuration are plain Python. They can be dropped directly into any production application that calls Foundry agents through APIM:

 

FastAPI / Flask web service - Wrap `call_agent()` in an endpoint and get per-request cost tracking out of the box.

Azure Functions - Call agents from a serverless function with the same telemetry pipeline.

Background workers or batch pipelines - Process multiple agent calls and aggregate cost data across runs.

CLI tools or scheduled jobs - Run agent evaluations on a schedule with automatic cost logging.

 

The pattern stays the same regardless of where the code runs:

# 1. Configure OpenTelemetry + App Insights (once at startup) configure_azure_monitor(connection_string=APP_INSIGHTS_CONN) # 2. Call any agent through APIM response = call_agent("FinanceAgent", "Summarize Q4 earnings") # 3. Token usage and cost are tracked automatically # → customMetrics and traces tables in App Insights

Start with the notebook to prove the pattern works. Then move the same code into your production codebase, the telemetry travels with it.

Key Takeaways

  • AI cost observability matters. As agent counts grow, per-agent cost attribution becomes essential for budgeting and optimization.
  • APIM as an AI Gateway gives you routing, rate limiting, and tracing in one place without touching agent code.
  • OpenTelemetry + Application Insights provides a battle-tested telemetry pipeline that scales from a single notebook to production workloads.
  • KQL makes the data actionable. Per-request audits, per-agent summaries, and cost trending are all a query away.
  • The solution is additive, not invasive. Agents don't need modification. The telemetry layer wraps around them.
  • This approach gives developers the abiility to view metrics per user, API Key, Agent, request / tool call, or business dimensions(Cost Center, app, environment).

If you're running AI agents in Microsoft Foundry and want to understand what they cost at a granular level this pattern gives you the visibility to make informed decisions about model selection, prompt design, and budget allocation.

The full solution is available on GitHub: https://github.com/ccoellomsft/foundry-agents-apim-appinsights

 

Read the whole story
alvinashcraft
50 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories