Maturity Maps present a framework for assessing AI readiness across six dimensions: Use, Data and Infrastructure, Workflow Integration, Agent Deployment, Talent and Culture, and Governance. Benchmarks expose an adoption mirage in marketing and sales and widespread governance and monitoring gaps. Customer service reveals high AI adoption paired with oversight shortfalls and human workload strain, while the capability overhang highlights missing data pipelines, workflow integration, and organized agent management.
The AI Daily Brief helps you understand the most important news and discussions in AI.
Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614
Get it ad free at http://patreon.com/aidailybrief
Learn more about the show https://aidailybrief.ai/
Hopefully I can wrap up this feature today by writing more tests, and confirming everything is done.
https://github.com/JasonBock/Transpire/issues/44
#dotnet #csharp
Scott and Wes break down a chaotic week in dev news — the Claude Code source leak, a nasty Axios npm supply chain hack, and Railway’s private cache exposure — plus how to keep these nightmare scenarios from hitting your own projects.
Syntax: X Instagram Tiktok LinkedIn Threads
Wes: X Instagram Tiktok LinkedIn Threads
Scott: X Instagram Tiktok LinkedIn Threads
Randy: X Instagram YouTube Threads
if you want to check out all the things torc.dev has going on head to linktr.ee/taylordesseyn for more information on how to get plugged in!
As organizations scale their use of AI agents, one question keeps surfacing: how much is each agent actually costing us? Not at the subscription level. Not at the resource group level. Per agent, per model, per request.
This post walks through a solution that answers that question by combining three Azure services Microsoft AI Foundry, Azure API Management (APIM), and Application Insights into an observable, metered AI gateway with granular token-level telemetry including custom dates greater than a month for deeper analysis.
Foundry’s built-in monitoring and cost views are ultimately powered by telemetry stored in Application Insights, and the out-of-the-box dashboards don’t always provide the exact per-request/per-caller token breakdown or the custom aggregations/joins teams may want for bespoke dashboards (for example, breaking down tokens by APIM subscription, product, tenant, user, route, or agent step). Using APIM to stamp consistent caller/context metadata (headers/claims), Foundry to generate the agent/model run telemetry, and App Insights as the queryable store to let you correlate gateway, agent run, tool/model calls and then build custom KQL-driven dashboards. With data captured in App Insights and custom KQL queries, questions such as below can be answered:
This solution was built to close the observability gap between "we deployed agents" and "we understand what those agents cost." The goals were straightforward:
The architecture is intentionally simple three services, one data flow. The notebook serves as a testing and prototyping environment, but the same `call_agent()` and `track_llm_usage()` code can be lifted directly into any production Python application that calls Foundry agents.
Azure API Management acts as the AI Gateway. Every request to a Foundry-hosted agent flows through APIM, which handles routing, rate limiting, authentication, and tracing. APIM adds its own trace headers (`Ocp-Apim-Trace-Location`) so you can correlate gateway-level diagnostics with your application telemetry. After the API request is successfully completed, we can extract the necessary data from response headers.
The notebook is designed for testing and rapid iteration call an agent, inspect the response, verify that telemetry lands in App Insights. It uses `httpx` to call agents through APIM, authenticating with `DefaultAzureCredential` and an APIM subscription key. After each response, it extracts the `usage` object `input_tokens`, `output_tokens`, `total_tokens` — and calculates an estimated cost based on built-in per-model pricing.
Application Insights receives this telemetry via OpenTelemetry. The solution sends data to two tables:
customMetrics - Cumulative counters for prompt tokens, completion tokens, total tokens, and cost in USD. These power dashboards and alerts.
traces - Structured log entries with `custom_dimensions` containing agent name, model, operation ID, token counts, and cost per request. These power ad-hoc KQL queries.
traces - stores your application’s trace/log messages (plus custom properties/measurements) as queryable records in Azure Monitor Logs.
This is where the solution shines. Once telemetry is flowing, you can answer detailed questions with simple KQL queries.
Per-Request Detail
Query the `traces` table to see every individual agent call with full token and cost breakdown:
traces | where message == "llm.usage" | extend cd = parse_json(replace_string( tostring(customDimensions["custom_dimensions"]), "'", "\"")) | extend agent_name = tostring(cd["agent_name"]), model = tostring(cd["model"]), prompt_tokens = toint(cd["prompt_tokens"]), completion_tokens = toint(cd["completion_tokens"]), total_tokens = toint(cd["total_tokens"]), cost_usd = todouble(cd["cost_usd"]) | project timestamp, agent_name, model, prompt_tokens, completion_tokens, total_tokens, cost_usd | order by timestamp descThis gives you a line-item audit trail every request, every agent, every token.
Aggregated Metrics Per Agent
Summarize across all requests to see averages and totals grouped by agent and model:
traces | where message == "llm.usage" | extend cd = parse_json(replace_string( tostring(customDimensions["custom_dimensions"]), "'", "\"")) | extend agent_name = tostring(cd["agent_name"]), model = tostring(cd["model"]), prompt_tokens = toint(cd["prompt_tokens"]), completion_tokens = toint(cd["completion_tokens"]), total_tokens = toint(cd["total_tokens"]), cost_usd = todouble(cd["cost_usd"]) | summarize calls = count(), avg_prompt = avg(prompt_tokens), avg_completion = avg(completion_tokens), avg_total = avg(total_tokens), avg_cost = avg(cost_usd), total_cost = sum(cost_usd) by agent_name, model | order by total_cost desc
Now you can see at a glance:
The same can be done in code with your custom solution:
from datetime import timedelta from azure.identity import DefaultAzureCredential from azure.monitor.query import LogsQueryClient KQL = r""" traces | where message == "llm.usage" | extend cd_raw = tostring(customDimensions["custom_dimensions"]) | extend cd = parse_json(replace_string(cd_raw, "'", "\"")) | extend agent_name = tostring(cd["agent_name"]), model = tostring(cd["model"]), operation_id = tostring(cd["operation_id"]), prompt_tokens = toint(cd["prompt_tokens"]), completion_tokens = toint(cd["completion_tokens"]), total_tokens = toint(cd["total_tokens"]), cost_usd = todouble(cd["cost_usd"]) | project timestamp, agent_name, model, operation_id, prompt_tokens, completion_tokens, total_tokens, cost_usd | order by timestamp desc """ def query_logs(): credential = DefaultAzureCredential() client = LogsQueryClient(credential) resp = client.query_resource( resource_id=APP_INSIGHTS_RESOURCE_ID, # defined in config cell query=KQL, timespan=None, # No time filter — returns all available data (up to 90-day retention) ) if resp.status != "Success": raise RuntimeError(f"Query failed: {resp.status} - {getattr(resp, 'error', None)}") table = resp.tables[0] rows = [dict(zip(table.columns, r)) for r in table.rows] return rows if __name__ == "__main__": rows = query_logs() if not rows: print("No telemetry found. Wait 2-5 min after running the agent cell and try again.") else: print(f"Found {len(rows)} records\n") print(f"{'Timestamp':<28} {'Agent':<16} {'Model':<12} {'Op ID':<12} " f"{'Prompt':>8} {'Completion':>11} {'Total':>8} {'Cost ($)':>10}") print("-" * 110) for r in rows[:20]: ts = str(r.get("timestamp", ""))[:19] print(f"{ts:<28} {r.get('agent_name',''):<16} {r.get('model',''):<12} " f"{r.get('operation_id',''):<12} {r.get('prompt_tokens',0):>8} " f"{r.get('completion_tokens',0):>11} {r.get('total_tokens',0):>8} " f"{r.get('cost_usd',0):>10.6f}")
Azure Workbooks - Build interactive dashboards showing cost trends over time, agent comparison charts, and token distribution heatmaps.
Alerts - Trigger notifications when a single agent exceeds a cost threshold or when token consumption spikes unexpectedly.
Azure Dashboard pinning - Pin KQL query results directly to a shared Azure Dashboard for team visibility.
Power BI integration - Export telemetry data for executive-level cost reporting across all AI agents.
The solution is designed to scale with your agent portfolio. Any agent hosted in Microsoft Foundry and exposed through APIM can be integrated without modifying the telemetry pipeline. Adding a new agent is a single function call:
response = call_agent("YourNewAgent", "Your prompt here")Token tracking, cost estimation, and telemetry export happen automatically. No additional configuration, no new infrastructure.
The notebook is a testing harness, a fast way to validate agent connectivity, inspect raw responses, and confirm that telemetry arrives in App Insights. But the code isn't limited to notebooks.
The core functions `call_agent()`, `track_llm_usage()`, and the OpenTelemetry configuration are plain Python. They can be dropped directly into any production application that calls Foundry agents through APIM:
FastAPI / Flask web service - Wrap `call_agent()` in an endpoint and get per-request cost tracking out of the box.
Azure Functions - Call agents from a serverless function with the same telemetry pipeline.
Background workers or batch pipelines - Process multiple agent calls and aggregate cost data across runs.
CLI tools or scheduled jobs - Run agent evaluations on a schedule with automatic cost logging.
The pattern stays the same regardless of where the code runs:
# 1. Configure OpenTelemetry + App Insights (once at startup) configure_azure_monitor(connection_string=APP_INSIGHTS_CONN) # 2. Call any agent through APIM response = call_agent("FinanceAgent", "Summarize Q4 earnings") # 3. Token usage and cost are tracked automatically # → customMetrics and traces tables in App InsightsStart with the notebook to prove the pattern works. Then move the same code into your production codebase, the telemetry travels with it.
If you're running AI agents in Microsoft Foundry and want to understand what they cost at a granular level this pattern gives you the visibility to make informed decisions about model selection, prompt design, and budget allocation.
The full solution is available on GitHub: https://github.com/ccoellomsft/foundry-agents-apim-appinsights