Adoption of Generative AI (GenAI) and agentic AI has accelerated from experimentation into real enterprise deployments. What began with copilots and chat interfaces has quickly evolved into powerful business systems that autonomously interact with sensitive data, call external APIs, connect to consequential tools, initiate workflows, and collaborate with other agents across enterprise environments. As these AI systems become core infrastructure, establishing clear, continuous visibility into how these systems behave in production can help teams detect risk, validate policy adherence, and maintain operational control.
Observability is one of the foundational security and governance requirements for AI systems operating in production. Yet many organizations don’t understand the critical importance of observability for AI systems or how to implement effective AI observability. That mismatch creates potential blind spots at precisely the moment when visibility matters most.
In February, Microsoft Corporate Vice President and Deputy Chief Information Security Officer, Yonatan Zunger, blogged about expanding Microsoft’s Secure Development Lifecycle (SDL) to address AI-specific security concerns. Today, we continue the discussion with a deep dive into observability as a necessity for the secure development of GenAI and agentic AI systems.
For additional context, read the Secure Agentic AI for Your Frontier Transformation blog that covers how to manage agent sprawl, strengthen identity controls, and improve governance across your tenant.
In traditional software, client apps make structured API calls and backend services execute predefined logic. Because code paths follow deterministic flows, traditional observability tools can surface straightforward metrics like latency, errors, and throughput to track software performance in production.
GenAI and agentic AI systems complicate this model. AI systems are probabilistic by design and make complex decisions about what to do next as they run. This makes relying on predictable finite sets of success and failure modes much more difficult. We need to evolve the types of signals and telemetry collected so that we can accurately understand and govern what is happening in an AI system.
Consider this scenario: an email agent asks a research agent to look up something on the web. The research agent fetches a page containing hidden instructions and passes the poisoned content back to the email agent as trusted input. The email agent, now operating under attacker influence, forwards sensitive documents to unauthorized recipients, resulting in data exfiltration.
In this example, traditional health metrics stay green: no failures, no errors, no alerts. The system is working exactly as designed… except a boundary between untrusted external content and trusted agent context has been compromised.
This illustrates how AI systems require a unique approach to observability. Without insights into how context was assembled at each step—what was retrieved, how it impacted model behavior, and where it propagated across agents—there is no way to detect the compromise or reconstruct what occurred.
Traditional monitoring, built around uptime, latency, and error rates, can miss the root cause here and provide limited signal for attribution or reconstruction in AI-related scenarios. This is an example of one of the new categories of risk that the SDL must now account for, and it is why Microsoft has incorporated enhanced AI observability practices within our secure development practices.
Observability of AI systems means the ability to monitor, understand, and troubleshoot what an AI system is doing, end-to-end, from development and evaluation to deployment and operation. Traditional services treat inputs as bounded and schema-defined. In AI systems, input is assembled context. This includes natural language instructions plus whatever the system pulls in and acts on, such as system and developer instructions, conversation history, outputs returned from tools, and retrieved content (web pages, emails, documents, tickets).
For AI observability, context is key: capture which input components were assembled for each run, including source provenance and trust classification, along with the resulting system outputs.
Traditional observability is often optimized for request-level correlation, where a single request maps cleanly to a single outcome, with correlation captured inside one trace. In AI systems, dangerous failures can unfold across many turns. Each step looks harmless until the conversation ramps into disallowed output, as we’ve seen in multi-turn jailbreaks like Crescendo.
For AI observability, best practices call for propagating a stable conversation identifier across turns, preserving trace context end-to-end, so outcomes can be understood within the full conversational narrative rather than in isolation. This is “agent lifecycle-level correlation,” where the span of correlation should be the same as the span of persistent memory or state within the system.
Traditional observability is built on logs, metrics, and traces. This model works well for conventional software because it’s optimized around deterministic, quantifiable infrastructure and service behavior such as availability, latency, throughput, and discrete errors.
AI systems aren’t deterministic. They evaluate natural language inputs and return probabilistic results that can differ subtly (or significantly) from execution to execution. Logs, metrics, and traces still apply here, but what gets captured within them is different. Observability for AI systems updates traditional observability to capture AI-native signals.
Logs, metrics, and traces indicate what happened in the AI system at runtime.
AI observability also incorporates two new core components: evaluation and governance.
These key components of observability give teams improved oversight of AI systems, helping them ship with greater confidence, troubleshoot faster, and tune quality and cost over time.
The SDL provides a formal mechanism by which technology leaders and product teams can operationalize observability. The following five steps can help teams implement observability in their AI development workflows.
To learn more about how Microsoft can help you manage agent sprawl, strengthen identity controls, and improve governance across your tenant, read the Secure Agentic AI for Your Frontier Transformation blog.
Making enterprise AI systems observable transforms opaque model behavior into actionable security signals, strengthening both proactive risk detection and reactive incident investigation.
When embedded in the SDL, observability becomes an engineering control. Teams define data contracts early, instrument during design and build, and verify before release that observability is sufficient for detection and incident response. Security testing can then validate that key scenarios such as indirect prompt injection or tool-mediated data exfiltration are surfaced by runtime protections and that logs and traces enable end-to-end forensic reconstruction of event paths, impact, and control decisions.
Many organizations already deploy inference-time protections, such as Microsoft Foundry guardrails and controls. Observability complements these protections, enabling fast incident reconstruction, clear impact analysis, and measurable improvement over time. Security teams can then evaluate how systems behave in production and whether controls are working as intended.
Adapting traditional SDL and monitoring practices for non-deterministic systems doesn’t mean reinventing the wheel. In most cases, well-known instrumentation practices can be simply expanded to capture AI-specific signals, establish behavioral baselines, and test for detectability. Standards and platforms such as OpenTelemetry and Azure Monitor can support this shift.
AI observability should be a release requirement. If you cannot reconstruct an agent run or detect trust-boundary violations from logs and traces, the system may not be ready for production.
The post Observability for AI Systems: Strengthening visibility for proactive risk detection appeared first on Microsoft Security Blog.

How are CTOs feeling about AI?
According to Andy Skipper, founder of CTO Craft, they’re experiencing fear, uncertainty, and doubt.
And if the technical leaders of companies are feeling that way, what can the rest of us expect? Certainly, we dream of productivity boosts and an AI El Dorado – but that’s not the reality.
That’s why we sat down with Skipper to talk about how CTOs should manage expectations for AI, and how to navigate the hype versus reality.
Many CTOs, Skipper notes, are navigating intense pressure from non-technical stakeholders and investors alike, especially with the massive resources being invested in AI and LLM technologies.
He’s a bit careful about this:
AI is not going to reduce costs or increase productivity in the way some non-technical people think just yet. It’s getting there, but it’s not there yet.
At the same time, Skipper points out a surprising upside: AI is giving engineering leaders a chance to reconnect with the code and architecture without writing all the code themselves:
One of the things you have to accept as an engineering leader is that you are going to get further away from the code the more senior you become. AI gives people an opportunity to get back to architecture and development work, even if they aren’t coding themselves.
When Skipper became a CTO for the first time, he quickly realized just how isolating the role could be. There was nowhere for tech leaders to share challenges, get support, or navigate the non-technical side of the job.
That gap inspired him to start CTO Craft, now a community helping senior engineering leaders navigate team dynamics, strategy, and AI.
When I was a CTO for the first time, I didn’t have somebody who I could talk to about the issues I was seeing or compare notes with people who had similar challenges. That’s what CTO Craft is all about – helping people understand where the challenges come from and understand they’re not alone in having those challenges.
As a coach and mentor, Andy works closely with CTOs around the world, helping them deal with issues like burnout, communication with nontechnical stakeholders, and, lately, how to adapt in the AI era.
Many first-time CTOs struggle with burnout, overextending themselves to shield teams from stress, and balancing hands-on coding with high-level responsibilities. He explains:
A lot of the people that I work with directly are suffering from burnout. First time CTOs commonly miss out self-preservation. And usually that’s a combination of too much expectation of their own energy levels, their own abilities, backlogs…
And after overextending themselves, first-time CTOs often make another common mistake: chasing the newest technologies. While adopting the latest tools and frameworks can seem exciting, Skipper warns that it’s not always the best choice for fast-moving teams trying to scale.
“Using bleeding-edge tech can slow you down, make systems harder to maintain, and even complicate hiring because the talent pool for newer technologies might be limited,” he explains.
As a coach, Skipper says these are just some of the recurring challenges he sees among engineering leaders, alongside a range of other operational and people-related issues.
For aspiring engineering leaders, Skipper highlights that growing into a successful CTO requires more than technical excellence: commercial understanding, communication, coaching, and vision-setting are just as crucial:
The difference between a good engineering manager and a great CTO is understanding how technology drives business success, while still inspiring and guiding your teams.
But technical and business skills are only part of the picture. Motivation and team management are equally critical. Skipper stresses that not everyone is motivated by the same things, and leaders need to understand individual drivers:
Having a vision in the first place is very important. But when it comes to actually bringing individuals along on the journey, they all need to be worked with differently. You can’t just set it and expect everyone to be motivated.
He also warns against a common mistake among CTOs: trying to shield their teams from the challenges of a pivot or rapid change. While the instinct is understandable, it often backfires and drains the leader’s emotional energy. Instead, transparency and realistic communication are key:
Being transparent, being realistic, measuring your words, not being super negative about everything, but still being realistic, I think all these things are really important.
Skipper believes resilience and peer support are crucial for engineering leaders navigating the complexity of the CTO role. Sharing experiences and learning from others can help leaders realize they’re not alone when facing difficult decisions.
Looking ahead, however, he admits that the pace of technological change makes it hard to predict what the role will look like in the future.
Five years from now, I honestly have no idea what the role of a CTO will look like. The way we build software is already changing rapidly, especially with AI. But the fundamentals like setting a vision, communicating it clearly, and connecting technology with business outcomes, will always remain essential.
For Skipper, that uncertainty makes peer support crucial: it helps leaders adapt, learn, and navigate a fast-changing profession.
Ultimately, he believes the most important skill for CTOs is the ability to keep learning and tackle challenges without going it alone.
*Infobip, the global communications API leader that launched ShiftMag, was an Event Partner at CTO Craft 2026.
The post CTOs Face Pressure to Deliver AI Gains, but Productivity Isn’t There Yet appeared first on ShiftMag.

At CES 2026, NVIDIA CEO Jensen Huang made the case that AI will proliferate when open innovation is activated across every company and every industry. If that’s the future (and the trajectory of DeepSeek, Llama, Mistral, and the broader open-model ecosystem suggests it is), then the infrastructure that runs AI can’t be proprietary either.
“As AI becomes the dominant consumer of compute, the community is closing the gap between ‘possible’ and ‘first-class.'”
Kubernetes has been running AI workloads almost as long as it has existed. It was not originally designed for AI, but AI teams have always found ways to make GPU workloads run, even when the core APIs didn’t understand GPUs beyond a simple integer count. As AI becomes the dominant consumer of compute, the community is closing the gap between “possible” and “first-class.” Here’s where that work stands.
The original device plugin API worked when all you needed was a count of available GPUs. It breaks down when a workload needs a specific partition of a shared GPU, when multiple pods need to share a single device, or when training jobs need high-speed interconnects across nodes.
Dynamic resource allocation (DRA) changes this. Vendors expose structured device information through ResourceSlices, and workloads declare ResourceClaims describing what they need. The scheduler matches claims to devices, reasoning about attributes, sharing policies, and topology. DRA gives us the primitives. It reached GA in Kubernetes 1.34, and the policies to use those primitives effectively are the next frontier.
Distributed training and inference jobs require gang scheduling, where all pods start together or not at all, to prevent resource deadlocks. But placement also depends on understanding the cluster’s physical topology: landing pods on nodes that share a network spine or high-speed interconnect domain can dramatically reduce communication overhead.
The KAI Scheduler, accepted into the CNCF Sandbox, provides DRA-aware gang scheduling, hierarchical queues with fairness policies, and topology-aware placement for large-scale clusters. Topograph discovers the underlying network topology and exposes it, enabling schedulers to make smarter placement decisions across cloud and on-premises environments. The Workload API discussions in the broader community are pushing these scheduling patterns further upstream.
Inference is where production GPU cycles increasingly concentrate, and where Kubernetes’ assumptions break hardest. The horizontal pod autoscaler scales on CPU and memory. LLM inference needs to scale with KV cache utilization, request queue depth, and time-to-first-token. Scaling on the wrong metrics means wasting GPU hours or missing latency targets.
Inference Gateway extends the Gateway API with model-aware routing. The llm-d and Dynamo communities are collaborating on distributed serving with prefix-cache-aware routing and disaggregated prefill/decode, creating entirely new scheduling and autoscaling demands. The building blocks are emerging, but the abstractions that tie them together will likely span both Kubernetes primitives and higher-level control planes.
And the next wave is already arriving. Teams are beginning to orchestrate autonomous AI agents as containerized workloads on Kubernetes, adding yet another class of compute to manage.
“Open-source AI doesn’t stop at the model weights. The infrastructure needs to be open too, and the community is ready to build it.”
The Kubernetes AI Conformance Program, launched at KubeCon North America 2025 with twelve certified vendors, is a start. But the patterns to solve these problems exist across the organizations running AI at scale, even if implementations differ. That knowledge is currently locked inside individual companies. It belongs upstream, in the open, where it can compound.
Open-source AI doesn’t stop at the model weights. The infrastructure needs to be open, too, and the community is ready to build it.
This guest column is being published ahead of KubeCon + CloudNativeCon Europe, the Cloud Native Computing Foundation’s flagship conference, which will bring together adopters and technologists from leading open-source and cloud-native communities in Amsterdam, the Netherlands, from March 23-26, 2026.
The post The AI revolution will be open-sourced appeared first on The New Stack.
You can now run an AI agent on your local machine and send it messages—all from azd, no portal required.
The azure.ai.agents extension adds two commands:
azd ai agent run starts your agent locally with automatic dependency detection and installation.azd ai agent invoke sends a message to a running agent—whether it’s local or deployed in Azure AI Foundry.Developing AI agents often means having to switch between your editor, a terminal, and a cloud portal to test changes. Each round trip slows you down. With run and invoke, your inner development loop stays in the terminal. Start your agent, send it a prompt, see the streamed response, and iterate—all without leaving your workflow.
To start your agent locally, use azd ai agent run. The command detects your project type (Python, Node.js, etc.), installs dependencies, and launches the agent process. If you have multiple agents in your project, specify one by name. To send a message to your agent, use azd ai agent invoke. By default, invoke targets the remote Foundry endpoint; add --local to talk to a locally running agent instead.
azd ai agent run # Start the agent locally
azd ai agent run my-agent # Start a specific agent by name
azd ai agent invoke "Summarize this doc" # Send a message to the remote endpoint
azd ai agent invoke "Hello" --local # Send a message to a locally running agent
Session and conversation identifiers persist across invocations, so you can carry on a multi-turn conversation without extra flags.
These commands are available in the azure.ai.agents extension v0.1.14-preview. To upgrade an existing installation:
azd extension upgrade azure.ai.agents
New to azd? Install it, then run azd ai agent init to get started.
Have questions or ideas? File an issue or start a discussion on GitHub. Want to help shape the future of azd? Sign up for user research.
This feature was introduced in PR #7026 and contributed by Travis Angevine(https://github.com/trangevi).
The post Azure Developer CLI (azd): Run and test AI agents locally with azd appeared first on Azure SDK Blog.