Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
154951 stories
·
33 followers

Announcing Microsoft Web IQ

1 Share

AI applications are only as good as the information they reason from. Without fresh, high-quality web data, they are less dependable. Today, Microsoft is launching Web IQ: a suite of AI-native grounding APIs built for the agentic era, connecting AI systems and agents to fresh, real-world intelligence from across the web — including web pages, news, images, and videos.

The systems that define the agentic era will be the ones that can retrieve fresh, authoritative evidence quickly, transform it into useful context, and do so within the latency and efficiency budgets that multi-step reasoning demands.

Model capability alone no longer determines whether an AI system is useful. What matters is how effectively the full system connects models to the world, including information created after models were trained, and information too vast to encode in model weights.

Web IQ is a search engine for AI systems. Where Bing was built to help people search the web, Web IQ is built to help AI agents find the right information, turn it into useful evidence, and use it inside reasoning. Unlike other APIs that layer on top of fragile infrastructure, Web IQ is a new kind of search system, one that delivers the right evidence with the speed, quality, and efficiency modern agents require.

It builds on years of learning from Bing, but it required a major ground-up re-architecture to meet the demands of agentic workloads.

Built on Bing, Re-Architected for the Agentic Era

Web IQ starts from the foundation Microsoft has been building for decades: the Bing global index and ecosystem. Grounding quality depends on the breadth, freshness, and trustworthiness of the world representation underneath it – something that is achieved by building on Bing’s expansive reach.

But the agentic era asks fundamentally different questions of the stack. Agents do not issue a single search and stop. They retrieve repeatedly, reason over evidence, adapt to new information, and operate inside tight latency budgets. Meeting those requirements could not be solved by tuning a single component. It required re-architecting the system from the ground up from indexing and retrieval to ranking, passage selection, and orchestration so every layer is aligned around the needs of inference-time grounding. That is the core idea behind Web IQ: preserve the strengths of Bing’s foundation while redesigning the grounding stack to serve as the execution fabric for AI agents.

Shows the high-level architecture of Web IQ

At the base of this system is something that predates Web IQ by many years: the Bing global index and ecosystem.

Evolving beyond a large crawl, it is a continuously refined representation of the web, built over decades through a combination of infrastructure, partnership, and discipline. It reflects millions of decisions about what to include, how to rank it, how to ensure freshness, and how to maintain trust.

That discipline extends to how we participate in the open web itself. Web IQ inherits Bing’s long-standing commitment to the conventions and evolving standards of the internet ecosystem, including honoring robots exclusion protocols, publisher controls, and access preferences that govern how content can be discovered, accessed, and used. We are actively engaging with the broader ecosystem through the IETF and other industry forums to help evolve interoperable standards for the AI era. Our goal is to be a sincere and trusted participant in the open web — one that respects publisher choice and helps sustain a healthy ecosystem for content providers, advertisers, developers, and users alike.

The role of that foundation is often underestimated. Grounding systems cannot exceed the quality of the world they observe. If the index is incomplete, stale, or unreliable, no amount of modeling can compensate. Web IQ begins from the premise that grounding quality is anchored in the quality of the underlying corpus and that corpus must be global, fresh, honor publisher preferences by default, and continuously evolving.

On top of that foundation sits the model layer, where we made a different kind of decision.

Rather than building a large collection of specialized models, we focused on a small number of models that are world-class and tightly integrated into the system. These models serve distinct but coordinated roles: they analyze content, they represent it in embedding space, and they rank and select it for use inside inference.

One of the central components here is our best-in-class embedding model, which defines how information is projected into a space where semantic similarity becomes computationally tractable. That decision alone has far-reaching consequences; the quality of embeddings determines not just recall during retrieval, but the shape of the candidate space that every downstream component operates on. We have built it to be competitive at the top of public benchmarks, but its role inside Web IQ is more pragmatic: when we search, we search the right neighborhood of the information space.

The embedding model is one part of the system. Alongside it are models that are optimized for content understanding and ranking, trained not for isolated metrics but for how their outputs are used inside LLM-driven reasoning. That alignment, between model objectives and system objectives, is what allows the stack to behave coherently under load.

Beneath the model layer, the problem changes character.

Grounding is no longer about semantics alone. It becomes a distributed systems problem at scale and this is where a great deal of our earlier work becomes relevant, particularly with systems like DiskANN. DiskANN changed the practical limits of nearest neighbor search by making it possible to operate over large, disk-resident vector spaces without sacrificing latency, removing the need to trade recall for memory footprint.

In Web IQ, this work is extended into a broader retrieval fabric. Retrieval is executed across distributed partitions, routed globally, and tightly optimized to meet latency constraints. Networking, data placement, and execution paths are all part of the design space – and they matter, because grounding is not a one-time operation. It is executed repeatedly within agentic workflows and at that scale, even small inefficiencies compound.

What happens after retrieval is equally consequential. Web IQ does not just return documents; it returns passages and structured evidence objects. Models do not need documents, they need information and documents are often a poor proxy for that. By operating at the level of passages, we can concentrate useful signal while eliminating irrelevant context, producing a much higher ratio of information to tokens.

This is why we often summarize the system with a simple principle: fewer tokens in, better answers out, lower cost per call. Cost is only part of it. The deeper value is maintaining precision in reasoning under constrained contexts.

At the top of the stack sits the orchestration layer, where the system comes together.

Queries are interpreted, retrieval is fanned out, results are merged, filtered, and transformed into evidence. Modalities are combined, trade-offs are enforced, and the system adapts to the structure of the request. What makes this layer different from traditional systems is that it’s just not an outer API layer. It is part of the execution loop of an AI agent. Latency here is not just user-visible but structurally significant: determining whether a system can afford to take multiple reasoning steps or must compress everything into a single attempt. That constraint shapes everything above it.

Quality, Latency, Token Density, and the Right Operating Points

When grounding becomes part of an agent’s execution loop, the core challenge is no longer retrieval in isolation. The system has to operate at the right point across latency, grounding quality, and token efficiency, because those three factors together determine whether multi-step reasoning is practical in the real world.

Quality is the first dimension. A grounding system also has to return evidence that actually satisfies user intent: complete, fresh, authoritative, and useful for downstream reasoning. We measure that with GDSAT, or grounding satisfaction – a metric, that unlike traditional relevance scores, captures whether the grounding truly meets user intent across completeness, freshness, and authority. Across production query sets, Web IQ consistently achieves higher grounding satisfaction than alternative systems in comparable configurations, which matters because it translates directly into greater user trust and stronger downstream outcomes.

Bar chart shows Web IQ grounding satisfaction compared to competitors
Source: 3K Global, Blind Queries sampled from prod, Config: 10 results, 10K chars per result (or equivalent).

Speed is also imperative. It determines whether an agent can afford multiple retrieval-and-reasoning steps or must collapse everything into a single attempt. Web IQ is designed for production-scale speed, operating at sub-165ms p95 latency and, in our internal comparisons, nearly 2.5× faster than the next best alternative from the previous cohort of competitors under similar conditions.

Bar Graph shows Web IQ P95 latency compared to competitors 
From VMs hosted in 5 DCs: West US2, North Central US, East US2, North Europe, South Korea. P95 numbers are averaged across DCs for the cohort of competitors. Unique queries were used for avoiding cache hits. Config: 10 results, 10K chars per result.

The third variable is token efficiency, which determines whether the system can scale economically. Every token sent to the model carries both a cost and a latency implication. By operating on passage-level evidence and maintaining high information density per token, Web IQ reduces how much context is required to achieve a given level of quality.

Graph shows Web IQ token efficiency compared to competitors

Source: 3K Query set (Global, Prod sampled). Web results: 10, 15, 20. #Chars: 3K, 5K, 10K, 20k per result (or equivalent).

The result is a system designed to move the frontier on all three dimensions at once: lower latency, higher grounding satisfaction, and fewer tokens per call.

Architecture and Design Principles

The architecture of Web IQ follows a few core principles.

  • Grounding is a full-stack systems problem. Trustworthy outputs depend on aligning every layer, from corpus quality and retrieval to ranking and orchestration.
  • Foundation matters. A global and fresh corpus is essential for trustworthy grounding, which is why the Bing foundation remains so important even as the system above it is re-architected.
  • The right evidence unit is the passage or structured evidence object that maximizes information density while minimizing token cost.
  • Real operating points define the system. Latency, quality, and token constraints determine whether agents work reliably in practice, and the architecture is engineered around them.

The agentic web will be built by systems that can reason against the world as it actually is: fresh, contested, and constantly changing. That requires more than a model with a search tool attached. It requires a grounding layer engineered for the speed, quality, and economics of inference-time retrieval, built on a foundation the open web can trust.

That is what Web IQ is built to be, and where we believe the next decade of AI infrastructure is heading.

Find out more information about Web IQ, including how to express interest, here.

Knut Risvik
Distinguished Engineer, Search & AI

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Build and run agents at scale with Microsoft Foundry at Build 2026

1 Share

Developers are already building agents, and the early productivity gains speak for themselves. Thanks to coding agents like GitHub Copilot, standing up a working prototype is the easy part.

The hard part starts after the prototype. The moment an agent leaves your laptop and has to run inside an enterprise workflow, the cracks show. Every tool and data source becomes its own integration, with a different auth flow, protocol, and lifecycle to maintain — and grounding the agent in enterprise knowledge means building a RAG pipeline from scratch. Running the agent in production is its own problem: you need isolation between sessions, durable state, and a runtime that can hold up under real load. And once it’s live, you can’t see what’s happening — traces stop at the agent boundary, evaluations are manual, and there’s no path from “this failed in prod” to “here’s a fixed version.” This is the same inflection point microservices hit a decade ago: a single service is easy; everything around it (discovery, isolation, observability, deployment) is where the real work lives. Agents are there now.

The Microsoft Agent Platform is built for that work — build in GitHub, run in Foundry, and reach users where they already are. At Build 2026, we’re shipping a connected platform in Microsoft Foundry across three layers:

  • Build: Microsoft Agent Framework updates including the agent harness, skills support in Toolboxes in Foundry, procedural memory, and the Voice Live integration — so developers can stay in the IDE and frameworks they already use.
  • Deploy: hosted agents in Foundry Agent Service, long-running agents and routines, publishing to Microsoft Teams and Microsoft 365 Copilot — so any agent can ship into the apps your users already open.
  • Operate: tracing and evaluation for hosted agents and agent optimizer in Foundry Agent Service — a closed loop that turns production failures into ranked, reviewable agent improvements.

Microsoft Agent Factory image

 


Build: framework, tools, memory

Building agents today is no longer about getting a prototype to work — it’s about making the right architectural choices from the start.

Framework: your harness

Production agents shouldn’t force a framework choice up front. Microsoft Foundry treats the agent harness as a flex point, not a lock-in: investments in LangGraph, GitHub Copilot SDK, or Claude Agent SDK carry forward. If you’re starting fresh, Microsoft Agent Framework is our opinionated, open-source agent framework, stable across Python and .NET. It unifies the enterprise foundations of Semantic Kernel with the multi-agent orchestration of AutoGen, so you no longer need to choose between them. The updates in Microsoft Agent Framework include:

  • Agent harness with skills, memory, and middleware (stable release)
  • Integrations with GitHub Copilot SDK and Claude Agent SDK (stable release)
  • Multi-agent orchestration patterns including Magentic-One (stable release)
  • File system tools, memory tools, and the deep research agent (public preview)

“The development and integration of mobile data model within Azure services put us in a privileged way to speed-up network optimization transformation program. Foundry Agent Service and Microsoft Agent Framework enable AI-solutions embedded both within and on top of mobile networks, which are a must in future network development towards 6G”Jaime Lluch, Head of Mobile Network Technology & Optimization, Telefonica Spain

It all composes locally. Foundry Toolkit for VS Code (GA) is the purpose-built developer experience without ever leaving the editor: create agents from templates or with GitHub Copilot, test and debug runs locally with full trace visualization, inspect agent behavior step by step, connect to Toolboxes, and deploy to Foundry Agent Service directly — all from VS Code.

The image depicts Foundry Toolkit in VS Code, showing various options for file editing, running a review agent, and a dialogue box for a multi-agent workflow process.


Tools: how agents take action

Tools are how agents do things — calling APIs, searching documents, executing code, talking to other agents. Most agents fail at this layer long before they fail at reasoning. Each tool introduces differences in protocol, authentication, and lifecycle management, increasing integration overhead.

Toolboxes in Foundry (public preview) gives your agent a single managed endpoint for every tool type. Configure tools once, point any MCP client at one URL, and let Foundry handle auth, lifecycle, and governance. Skills (preview) are now first-class — versioned in a project-scoped catalog and discoverable as MCP resources by any agent in the project. Tool search (preview) is available in Toolboxes to intelligently select the right tools per task instead of surfacing every tool to the model. Toolbox also connects to Microsoft IQ – including Web IQ, Work IQ (preview), Fabric IQ (preview) with Fabric data agent, Ontology, and semantic models, and Foundry IQ – so agents tap enterprise data without custom plumbing. Beyond Toolboxes, Foundry IQ is now generally available as the dedicated knowledge layer behind Foundry agents, unifying Work IQ, Fabric IQ, Azure SQL, File Search, and MCP sources behind one SLA-backed retrieval endpoint, with a Serverless tier in public preview and Web IQ for sub-200ms live web grounding.

The image displays a Microsoft Foundry interface, featuring a search bar, various tabs such as Build, Docs, and Agents, and a list of tools with their names, versions, and descriptions.

Explore sample codes for Microsoft Agent Framework + Toolbox.


Multimodal: eyes and voice for your agent

Production agents need to read documents and talk in real time, not just call endpoints.

Azure Content Understanding (ACU) is a unified content layer that simplifies how applications parse, classify, and extract information across documents, images, and more — whether identifying structured fields in digital documents, reading handwriting and signatures from images, or extracting key data from low-quality scanned invoices, all while significantly reducing token costs. At Build, ACU adds prebuilt analyzers now available in Microsoft Foundry, making it easier to integrate with other Foundry models and workflows. Developers can take advantage of support for the latest GPT models, along with seamless integrations across Microsoft Agent Framework, LangChain, and Logic Apps to accelerate end-to-end automation. Coming next month, ACU introduces agentic mode in preview, enabling multi-step document workflows with minimal orchestration, alongside synchronous read and layout APIs and an expanded set of prebuilt analyzers designed to reduce token costs by over 80 percent.

“By embedding Azure Content Understanding in DataSnipper, we are turning unstructured documents into structured, actionable data — directly within Excel. Together, we are enabling faster reviews, reliable evidence, and AI you can trust.”Vidya Peters, CEO, DataSnipper

“By integrating Content Understanding into our solutions, our customers turn complex, unstructured data into actionable insights — faster and more accurately. The result is streamlined workflows, less manual effort, and clear, measurable business value from AI.”Adam Orentlicher, SVP CTO, Wolters Kluwer

Azure Content Understanding playground in Microsoft Foundry showing a document analysis interface for extracting structured information from unstructured content

Voice Live unifies speech recognition, text-to-speech, turn detection, interruption handling, avatars, and other real-time conversational features into a single API.

For teams building with prompt agents, Voice Live is now generally available as the fastest path to adding real-time voice experiences. Existing agent capabilities — including tool calling, knowledge, memory, guardrails, and enterprise integrations — continue to work seamlessly, now enhanced with low-latency speech interactions. For teams that need full control over their agent runtime and orchestration framework, hosted agents with Voice Live is available in public preview. Developers can build with the frameworks they prefer — Microsoft Agent Framework, LangChain, or a custom orchestration stack —host on Foundry Agent Service and connect directly to Voice Live for a smooth voice experience.

“Integrated with Foundry Agent Service and Voice Live, the real-time conversational capability enables executives, including the CEO, leadership team, and operational management, to speak naturally and receive immediate, accurate spoken answers grounded in live operational data.”Ahmed Naeemi, Chief Information Officer, Technology and Digital Services, Gulf Air

The image shows the Foundry Agent port inside the agent builder feature for voice live building a customer support system, featuring various tabs for configuration, chat, and tools, and options for training and managing the AI agent.


Memory: long-term context across sessions

Tools let agents act. Memory makes those actions informed and better over time. Memory in Foundry Agent Service (public preview) now includes three types:

  • Procedural memory (new at Build in public preview) – agents learn how to do the work across runs, not just what was said. Early Tau-bench results show +7–14% absolute success-rate gains at near-baseline cost.
  • User memory — remembers preferences and facts across sessions (e.g., “user is allergic to dairy”)
  • Session memory — maintains context within a conversation thread

With the new procedural memory, a developer using a PR-review agent coaches it once: “Check test coverage first, then flag any new dependencies, then look for breaking API changes.” Weeks later, on a different PR, the agent runs the same three checks in the same order — no re-instruction.


Deploy: runtime, distribution, interoperability

Your agent works locally. Now it needs a production home — a runtime that isolates untrusted code, protocols that let agents talk to other agents, and a path into the apps your users already open every day.

Runtime: isolated execution for production agents

Hosted agents in Foundry Agent Service (reaching general availability in the next 30 days) is the managed runtime for production agents. Every session runs in its own sandbox, isolating every agent execution with dedicated compute, memory and filesystem. The runtime is framework-agnostic, so agents built with Microsoft Agent Framework, GitHub Copilot SDK, LangGraph, or other SDKs can be deployed without rewrites. Two protocols are supported: the Responses API for OpenAI-compatible stateful interactions, and the Invocations protocol for schema-free, pass-through scenarios where you control the request and response forma. Explore sample codes with Microsoft Agent Framework.

But production agents don’t just answer chat — they run continuously, hold state, and act on their own. Hosted agents now support long-running autonomous agents like OpenClaw and Hermes with durable state and file system access, and routines (public preview) for operationalizing any agent on a timer or a schedule. Imagine an agent that monitors a GitHub repo overnight, triages new issues by morning, and posts a summary to Teams before standup.

The image shows a black PowerShell terminal window using the Foundry CLI with a green arrow pointing to a command line, indicating a command is ready to be entered.

“Hosted agents in Foundry Agent Service use a framework-agnostic design and flexible invocation to let developers deploy Twilio Agent Connect directly inside its serverless runtime. Fast startup enables latency-sensitive real-time voice use cases, and zero idle cost suits messaging conversations where replies can take hours.”Ryan Rouleau, Staff Software Engineer, Twilio

“Hosted agents in Foundry Agent Service will provide KPMG with the flexibility, observability, and control required to run agents at scale. This capability will be a foundational component of the global KPMG Workbench platform, enabling developers to build powerful agent-driven solutions for both client engagements and internal use cases.”Werner Vanzyl, Sr. Director, KPMG AI & Data Labs


Distribution: agents inside the apps users already open

With publishing to Microsoft Teams and Microsoft 365 Copilot (generally available next month), any Foundry agent can be deployed directly into the tools employees already use, with identity, permissions, and policy flowing through automatically.

The image displays a Microsoft Foundry portal, specifically a form for managing the ZavalnventoryAgent, which is an inventory control assistant for Zavalnventory Corporation.

Foundry already supports two ways agents show up in Microsoft 365: assistive agents that act on the user’s behalf inside Copilot or chat, and autonomous agents that act on their own behalf in the background — triggered by events or schedules, with no collaborative surface. At Build, we’re introducing a third: autopilot agents (public preview). These agents act independently with Entra Agent ID, email address, Microsoft Teams presence, and place in the org chart. They can initiate conversations, work on shared files, follow up on action items, and collaborate with humans over time. Every action is attributable, auditable, and governed via Agent 365 in Microsoft Admin Center. To get started, clone the sample code and customize it for your scenario — the Azure Developer CLI handles provisioning, identity, and admin approval in a single workflow.


Interoperability: cross-framework, cross-org agent collaboration

Agents within your organization now have identity and reach. But enterprise agents also need to connect beyond it — to agents built by partners, vendors, or other teams on entirely different stacks. Foundry has supported outbound A2A (Agent2Agent) — calling remote agents as a tool — since the A2A tool launched. At Build, we’re adding the other direction: incoming A2A (public preview). Developers can now expose any Foundry agent as an A2A endpoint, and other agents discover it through its agent card and invoke it via the open A2A protocol, regardless of framework or cloud.


Operate: observability and optimization

Most teams lose confidence at the operate layer. Traces stop at the agent boundary. Evaluation is manual. There’s no systematic path from “this agent failed” to “here’s a better version.” Foundry closes that gap with a connected loop: observe what’s happening, evaluate what matters, and let the platform propose the next improvement.

Observability: end-to-end tracing and evaluation

Tracing and evaluation for hosted agents will be generally available later in June 2026. Every model call, tool invocation, sub-agent hop, and handoff flows through one OpenTelemetry pipeline, and evaluations link directly back to the trace that produced them in the Foundry Control Plane. When a regression shows up, you move from the score to the exact production trace that exposed it instead of stitching the story together across separate dashboards. Without production traces and eval signals, there’s nothing to protect, score, or optimize. This is the foundation everything else builds on.

“Hosted agents in Foundry Agent Service provide a production-grade foundation for AI — combining identity, memory, security, and observability by design. This allows us to scale AI systems across critical energy operations with full control and trust.”Xabier Muruaga, Global Head of AI and Data, Iberdrola


Optimization: a closed loop from traces to better agents

Improving an agent today is a guess-and-check cycle: teams ship, watch users hit failures, try a prompt tweak, push the fix, and hope it sticks. Agent optimizer in Foundry Agent Service (coming to public preview in the next 30 days — sign up here for private preview) replaces that cycle with a governed, evidence-backed loop. It consumes production traces and evaluations from hosted agents, generates ranked candidate improvements across prompts and skills, validates each candidate against your scenarios and constraints, and recommends the winner with full lineage, diffs, audit, and rollback. That signal comes from a connected evaluation pipeline:

  • ASSERT generates adversarial tests from your policies and surfaces where the agent fails
  • Agent Control Specification turns those risks into enforceable runtime guardrails across input, model, state, tool execution, and output
  • Rubric (public preview) defines what “good” looks like — generating weighted evaluation criteria (task success, tone, safety, cost, latency) and scoring every run against them

Agent optimizer runs a reflective observe → evaluate → optimize → deploy cycle. Every candidate is evaluated against your rubric and surfaced with side-by-side comparisons — showing exactly what improved, what regressed, and why. Once promoted, new traces feed back into evaluation — a continuous improvement loop where every interaction makes the agent measurably better.

The image shows a screen capture of a Microsoft Foundry dashboard, displaying various tabs such as agents, projects, and a search bar. It also includes information about a build, including version number, duration, and various statuses like completed, failed, and running.

“Agent optimizer is a vital step in helping enterprises move AI agents beyond proof of concept and into trusted production use. By bringing together governance, observability, and continuous improvement, it helps organizations reduce hallucinations, enhance safety, and continuously evaluate and optimize agent performance. As these capabilities continue to evolve—including Context Engineering and AgentOps, one of the core technologies behind NTT DATA’s Smart AI Agent® concept—we believe Agent Optimizer will play an important role in enabling business leaders to confidently adopt agentic AI at scale.”Yuji Shono, Head of the Global AI Office, NTT Data Group Corporation


Get started today

The easiest way to explore is through the Microsoft Foundry portal. From there you can create a project, deploy a model, and build your agent. Follow the documentation and Microsoft Learn courses. Developers can get started in minutes by following the Quickstart, which walks through setting up, testing, and deploying a production-ready hosted agent end to end.

Check out AI Agents for Beginners for a 12-lesson curriculum, then go deeper with guided labs: Develop AI Agents in Azure, Hosted Agents Workshop (.NET), and the ZavaShop Supply Chain Workshop.

📺 Watch: Foundry Agent Service + Microsoft Agent Framework Explained — Jeff Hollan walks through how to operationalize AI agents from deployment to real-world impact.

If you’re attending Microsoft Build 2026, or watching on-demand content later, be sure to check out these sessions:

The post Build and run agents at scale with Microsoft Foundry at Build 2026 appeared first on Microsoft Foundry Blog.

Read the whole story
alvinashcraft
19 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

A Developer’s Guide to Managing Models, Cost and Quality in Microsoft Foundry

1 Share

A Developer’s Guide to Managing Models, Cost and Quality in Microsoft Foundry

The hardest part of building AI systems today is no longer getting access to a capable model. It is knowing how to choose, validate, optimize, and operate the right model across the full lifecycle of a real application.

Take a retrieval-augmented generation (RAG)-based customer support copilot or a tool-calling agent that helps employees complete business workflows. In a prototype, it may be enough to pick a strong model, connect a few data sources, and get a useful response. In production, the system needs to retrieve the right context, call the right tools, meet quality and safety thresholds, stay within latency targets, and run at a cost the business can sustain.

Models evolve, costs shift, and production requirements often arrive after the first version is already working. Success depends less on choosing the most powerful model and more on building a disciplined operating approach around the application.

That is where Microsoft Foundry comes in: a unified platform to select, evaluate, optimize, operate, and continuously improve AI applications at production scale.

What’s new

Microsoft Foundry continues to expand the model ecosystem and operating surface for developers building production AI systems.

Fireworks AI on Microsoft Foundry is now generally available, giving developers access to production-grade open model inference through a single Azure endpoint, with enterprise service-level agreements (SLAs) and zero-setup onboarding.

Foundry is also adding new model families and capabilities across modalities, including Microsoft AI models, partner models, open-source models, custom models, and post-trained variants. Together, these updates give developers more choice while keeping selection, evaluation, deployment, and operations in one consistent workflow.

The challenge is no longer access. It is operations.

In a prototype, the questions are simple: Can the model answer the prompt? Can it connect to my data? Can it complete the happy path?

In production, the questions change. Which model fits each task? How do I validate it on my own data? What latency budget does this experience require? How much throughput do I need at peak? What happens when quota is constrained, costs spike, or a newer model becomes available? How do I monitor quality, detect eval drift, roll back safely, and prove the system is governed?

Agentic systems often fail when the model is mismatched, evaluation is incomplete, costs run unchecked, or governance arrives too late. Teams that rely on a single provider face another risk: lock-in, with no escape hatch when a model degrades, pricing changes, or capacity becomes constrained.

Foundry is built on the opposite philosophy. It is a model-agnostic platform spanning Microsoft, open-source, and independent software vendor (ISV) partner models, all on the same operating surface.

The answer is to treat model selection and optimization as a continuous operating discipline: 

Model optimization loop showing how teams select, evaluate, optimize, operate, and improve models over time

1. Select the right model for the task

Model selection is about workload fit, not leaderboard rank. Before choosing a model, define the task contract: what the model needs to do, what good looks like, what constraints it must operate within, and which failure modes are unacceptable.

A routing step may need low latency. A policy question may need grounded reasoning with citations. A coding agent may need deeper reasoning and tool use. A customer-facing copilot may need strong safety boundaries, predictable latency, and cost efficiency at scale.

A simple model selection framework:

Workload need Favor this approach Why
Classification, routing, extraction, or high-volume chat Smaller, lower-latency model Keeps cost and latency low
Complex reasoning, coding, or planning Stronger reasoning model Improves quality for harder tasks
Image, speech, voice, or physical AI Modality-specific model Matches the model to the input and output type
Mixed workloads with different complexity Model Router Routes each request based on quality, cost, and latency
Domain-specific behavior, tone, or format Fine-tuned or custom model Improves consistency for your scenario

Effective model choice depends on four dimensions: capability, safety, latency, and cost.

Foundry helps developers make these tradeoffs through a broad model ecosystem and a consistent operating surface. Developers can access Microsoft models, leading base models, partner models like Fireworks AI, open-source models, custom models, and post-trained variants through one selection, evaluation, and deployment workflow.

Developer tip: For developers who want to bypass manual selection, Foundry provides Model Router in Foundry Models. Model Router automatically routes each request to the most appropriate model based on workload characteristics, cost targets, and latency requirements.

2. Validate with your own evals and data

Benchmarks are not enough. A model that leads a public leaderboard may still underperform on your prompts, your data, your users, and your business rules. Production confidence comes from evaluating against the workloads your application will actually run.

With Foundry, developers can bring their own evaluation inputs, including CSV or JSONL datasets with prompts, expected outputs, labels, or ground-truth answers. They can run side-by-side comparisons across models and prompts, evaluate agents and multi-step workflows, and inspect results across datasets, traces, and production-like scenarios.

Built-in quality and safety evaluators help measure signals such as relevance, groundedness, coherence, fluency, safety, and policy adherence. Custom evaluators can capture application-specific rules, formats, and business logic.

A strong evaluation covers:

Quality: Did the model complete the task correctly? Accuracy and groundedness: Did it produce reliable answers based on the right context? Safety: Did it follow policies and avoid unacceptable responses? Performance: Did it meet latency, throughput, and reliability requirements? Cost: Did it deliver the right outcome at the right price?

Evaluation should run continuously as new model versions, fine-tuned variants, agent changes, or new model families become available.

Developer tip: Define success criteria before opening the model catalog. Criteria-first evaluation prevents anchoring on model reputation instead of workload fit.

3. Optimize cost and performance

Cost is a first-class architectural concern, not an afterthought. In prototypes, it may be acceptable to send every task to the most capable model. In production, that approach breaks down quickly.

A simple classification task, a RAG response, a long-context reasoning workflow, and a multi-step agentic process should not always use the same model or deployment strategy.

Foundry gives developers levers to optimize across quality, cost, and latency at the system level:

Intelligent routing: Send each task to the right model based on complexity and budget. Batching: Use asynchronous processing for workloads that do not require real-time responses. Caching: Avoid paying repeatedly for identical or near-identical requests. Provisioned throughput: Use dedicated capacity for predictable performance at scale. Quota management: Scale more predictably with quota tiering, global customer quota, and data zone customer quota. Model optimization: Use model compression, fine-tuning, or distillation where appropriate.

Fireworks AI on Foundry is now generally available, giving developers access to a high-performance open model catalog through a single Azure endpoint, with enterprise SLAs, no separate infrastructure, and no separate contracts.

Developer tip: Profile cost by task type before optimizing globally. Routing decisions are workload-specific, not one-size-fits-all.

4. Operate at scale with enterprise confidence

Deploying an endpoint is not the same as operating a production AI system. Teams need to understand how the system behaves, enforce policies, monitor usage and cost, test model changes safely, and roll back when quality or performance regresses.

Foundry brings these operating capabilities into one surface: versioning, SLA-backed reliability, security, governance, access controls, audit logging, usage monitoring, and controlled upgrades.

Teams can monitor token usage and throughput, inspect logs and traces, evaluate model and agent behavior, enforce policies, and compare changes before rolling them out broadly. As new model versions become available, they can test against evaluation datasets and traces, validate quality, latency, and cost impact, and reduce risk with versioning and rollback strategies.

The Fireworks AI on Foundry generally available (GA) release is a concrete example of this operating model, with enterprise SLAs, provisioned throughput unit (PTU) Data Zone support, SOC2 readiness, and the same access controls and audit logging that govern Foundry.

Production adopters span AI-native and traditional enterprise workloads, including Perplexity, Motif, UiPath, and StackBlitz. During preview, the platform processed more than 176 billion tokens across 17 S&P 500 enterprises.

Developer tip: Treat model upgrades like dependency upgrades: test against baselines, stage rollouts, monitor regressions, and keep a rollback plan.

5. Continuously improve as models and workloads evolve

AI systems are dynamic. Models improve, workloads shift, user behavior changes, pricing evolves, and new model families arrive. The best system today may not be the best system six months from now.

That is why the lifecycle loop matters:

Select the right model for the task. Evaluate it against your own data and production baselines. Optimize for quality, cost, latency, and throughput. Operate with governance, observability, and reliability. Improve as new models, tools, and customization options emerge.

For engineering teams, every model, prompt, tool, agent, or workflow change should be treated like a production change. New model versions should be tested automatically against regression datasets, production traces, and known edge cases before rollout.

A model may improve quality but increase latency, reduce cost but weaken groundedness, or perform better on common cases while regressing on high-risk scenarios. Automated evaluations help teams detect those tradeoffs early.

Developer tip: Automate your evaluation pipeline so every new model version is compared against production baselines for quality, safety, latency, throughput, and cost before deployment.

What this means for developers

The next phase of AI development will not be won by teams that simply have access to the biggest models. It will be won by teams that know how to operate models well.

That means choosing by workload fit, validating with real data, optimizing cost and performance, deploying with governance, and improving as the landscape shifts.

Microsoft Foundry is designed for exactly this reality: a model-agnostic platform spanning Microsoft, open-source, and ISV models, all on one operating surface. No lock-in. No re-architecture. No guesswork.

The future of AI development is not about guessing which model might work. It is about building an operating discipline that lets you know.

Get started

The post A Developer’s Guide to Managing Models, Cost and Quality in Microsoft Foundry appeared first on Microsoft Foundry Blog.

Read the whole story
alvinashcraft
28 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Running an AI-native engineering org

1 Share
Running an AI-native engineering org
Read the whole story
alvinashcraft
40 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Foundry IQ: Build smarter agents faster with unified knowledge and serverless retrieval

1 Share

Foundry IQ: Build smarter agents faster with unified knowledge and serverless retrieval

Developers building agent fleets keep hitting the same pattern: the agent logic is ready, but the knowledge infrastructure underneath is complex to do well. Getting to production means solving for stability, scale, data access, answer quality, security, and content ingestion all at once. Today, we are enabling developers to have faster impact by simplifying the enterprise knowledge platform.

Your company’s IQ, powered by Microsoft IQ, is the collective intelligence locked in documents, emails, meetings, operational data, and the live web. This is where your true competitive edge lives. Foundry IQ grounds agents with the knowledge from these sources and continuously improves based on your business goals.

The announcements today are designed to help customers provision knowledge bases faster, unify enterprise and external sources, and expose that knowledge through the Foundry IQ Model Context Protocol (MCP) server for any agent framework or MCP-compatible hosts.

What’s new

  • Foundry IQ Serverless in preview: Provision instant, no-friction context retrieval with scale to zero pricing. Developer tier now available in public preview with more coming soon. Docs | Create a Foundry IQ resource
  • New knowledge sources in preview: Ground agents across Work IQ, Fabric IQ (including Data agents and Ontology), File Search, Azure SQL, and MCP through a multi-source knowledge base, with no custom integrations required. Docs | Cookbook
  • Web IQ in Foundry IQ is now available: Extend agent context to the web, honoring publisher preferences, and marketplace data with sub-165 ms latency and zero data retention. Blog | Website
  • Foundry IQ knowledge bases are generally available: Ship production agents on a fully SLA-backed knowledge layer with stable APIs, compliance certifications, and the Foundry IQ MCP server for any MCP-compatible host. Docs | Quickstart
  • Agentic retrieval quality improvements: The latest updates to the agentic retrieval engine improve answer performance across datasets, effort tiers, and model sizes while spending fewer tokens. Blog | Quickstart
  • Data pipeline updates in preview: Automatic layout-aware ingestion of documents, image enrichment, and broader SharePoint indexing ground agents in complete documents, not just raw text. Blog | Quickstart
  • Security updates in preview: New controls for encryption, permissions sync, and sensitivity-label governance keep enterprise policy intact as content flows into agents. Blog | Quickstart

Foundry IQ Serverless in public preview

We know agent workloads are bursty and event-driven: an agent might execute hundreds of steps in seconds, then go idle for hours. Serverless eliminates infrastructure friction: no clusters to manage, no capacity to reserve, no idle costs. Go from zero to production fast, with instant retrieval-augmented generation (RAG) and state-of-the-art retrieval quality built in.

Foundry IQ Serverless (Developer tier) is available in public preview. You are billed for compute resources and storage used, and the service scales to zero when idle.

Serverless tiers use Compute Units (CU) to measure resource consumption, including CPU utilization, memory and storage I/O. Usage is calculated each minute in increments of 0.25 CUs.

For large-scale serverless deployments, contact us for additional options.

Capability Developer tier
Compute usage $0.24 CU / hour
Indexed storage Up to $0.29 GB / month; GB cost is region dependent
Indexed storage per index 1 GB / index
Indexes per service 30 indexes / service
Services per subscription per region 5 services / subscription / region

Billing is expected to begin in late 2026 with details provided at least 30 days in advance. Customers using Serverless Developer won’t be charged before billing is enabled. Current Compute Unit measurements are estimates only and subject to change before billing is enabled.

Next steps: create a Foundry IQ Serverless resource in the Foundry portal.

New knowledge sources in preview

How do I give an agent access to organizational knowledge and structured business data without building a custom connector for every system?

Bringing enterprise knowledge into agent workflows often means stitching together custom integrations across each data source. Developers must account for different data formats, permission models, retrieval patterns, and source-specific logic before an agent could reliably use that knowledge.

Foundry IQ simplifies this by bringing enterprise content and structured systems into a single knowledge base for multi-source, agentic retrieval. Developers can give agents access to that knowledge without building and maintaining separate connectors or source-specific retrieval strategies.

New knowledge sources in preview:

  • Work IQ brings organizational signals like emails, meetings, files, and Teams messages into one enriched, AI-ready source, all while respecting user permissions. Agents can answer questions about how the organization operates, what decisions were made, and what is top of mind for teams.
  • Fabric IQ lets agents query data agents and company ontologies: formal models of business entities, relationships, and rules linked to live data in OneLake and a specialized semantic layer. This returns structured answers alongside unstructured document context for a query.
  • File Search allows you to directly upload files to a knowledge base.
  • Azure SQL brings structured relational data into a knowledge base.
  • MCP Server connects knowledge served over the Model Context Protocol.

A Microsoft Foundry interface showing knowledge source selection for Foundry IQ, with options for connecting enterprise data sources into a knowledge base.

Next steps: use the Foundry IQ Forgebook to try out additional knowledge sources.

Microsoft Web IQ in Foundry IQ now available

When an answer needs fresh, real-world context, how do I reach the open web without paying a latency or compliance penalty?

Microsoft Web IQ is available in limited access through the Foundry IQ MCP knowledge source. It gives agents access to external retrieval across web, news, images, video, and shopping sources while honoring publisher preferences. It is designed for large language model (LLM) workflows rather than traditional search pages, with industry-leading low-latency ranking.

Combined with Foundry IQ, agents can plan, search, reason, and synthesize answers that draw on both internal knowledge and real-world external context in one retrieval engine.

Next steps: read the blog announcement for Microsoft Web IQ.

Knowledge bases in Foundry IQ are generally available

What does it actually take to move a prototype into production?

Production means guarantees: stable contracts, predictable performance, and security that holds under audit. Foundry IQ knowledge bases and select knowledge sources, and security capabilities are generally available: with full SLA coverage, compliance certifications, stable APIs, and enterprise-grade network isolation with identity and policy enforced by default.

What is included in GA:

  • Knowledge bases: agentic retrieval references, output and activity logs, the Foundry IQ MCP server, and minimal retrieval reasoning effort.
  • Foundry IQ MCP server: exposes Foundry IQ knowledge bases as a remote MCP server, making them accessible from any MCP-compatible host or client, including Claude, ChatGPT, LangChain, and the Microsoft Agent Framework. Network isolation, document-level security, cross-source ranking, and agentic retrieval all work over the open MCP standard, making it available for the broader agent ecosystem.
  • Knowledge sources: Azure Blob Storage (with a status API to check indexing progress), search indexes, Web, and OneLake.
  • Security: network isolation and managed identity support.

“We’ve been using Foundry IQ in our research and prototyping work, and the reusable knowledge base approach has cut a lot of the setup overhead we’d normally expect. Being able to ground agents in trusted enterprise content from day one, without rebuilding retrieval logic each time, has made early-stage experimentation noticeably faster and higher quality.”Jane Chen, Lead AI Developer, Baringa Partners

Next steps: use the Mastering Foundry IQ cookbook to get started building with the Foundry IQ MCP server.

Agentic retrieval quality improvements

The latest retrieval enhancements improved our answer quality benchmarks by up to 20%, across our evaluated datasets, effort tiers, and model sizes. Compared to single-shot RAG, knowledge bases improved recall by up to 54%.

Foundry IQ improved its iterative agentic retrieval loop to batch queries more effectively, surface more relevant passages via semantic ranker, and apply server-side token caching to reduce redundant consumption across multi-turn conversations. This results in meaningfully fewer tokens spent without sacrificing answer quality, while beating previous benchmarks on answer quality.

Next steps: read our blog for more on the latest evaluations and Foundry IQ benchmarks.

Security updates in preview

How do I keep enterprise data permissions intact as content flows into agents?

Security belongs at the data layer, not approximated in application code. Several security capabilities are now in preview, including cross-tenant customer-managed keys (CMK) using federated identity credentials — eliminating shared secrets — Purview sensitivity-label auditing, incremental SharePoint permissions sync, APIM support for Foundry model integrations, and surfacing Purview sensitivity labels inside knowledge sources so label-based access controls are honored end to end.

Private connectivity between Foundry IQ and Foundry products, via Shared Private Link and Network Security Perimeter, is generally available.

“By integrating Foundry IQ, we provide a managed, permission-aware business context layer that connects marketing and brand knowledge into every agent so they can access the right information, at the right time, with the right governance.”Andrei Pop, Director of PM, Innovation, Sitecore

Next steps: read more about the latest Foundry IQ security announcements.

Data pipeline updates in preview

How do I make sure agents are grounded in the whole document (tables, diagrams, and images) not just the raw text?

Ingestion quality sets the ceiling on retrieval quality. New data pipeline capabilities in preview include first-class SharePoint indexing for ASPX pages and Lists alongside document libraries and document enrichment to process images plus serve them at query time in knowledge bases, so agents and users can reference original visuals and ask follow-up questions about them. We are also introducing Azure Content Understanding chunking with image verbalization — a layout-aware ingestion pipeline that converts diagrams, charts, and scanned images into meaningful text so agents are grounded in complete, semantically accurate representations of source documents.

Next steps: read Foundry IQ’s data pipeline deep dive blog post.

Get started today

Build once, reuse everywhere: Foundry IQ enables you to ground multiple agents with the same knowledge base, connecting and unifying data from anywhere. Foundry IQ is designed for agent workloads to deliver better results from your company’s IQ. With Foundry IQ, accelerate agent delivery, deliver context without blind spots, and ensure every answer respects your organization’s security by default.

The easiest way to explore Foundry IQ is through the Microsoft Foundry portal. From there you can create a knowledge base, access the documentation, and follow the Microsoft Foundry Learn courses, all in a few clicks.

Be sure to check out the latest news from Foundry IQ at Microsoft Build 2026:

  • BRK246: Foundry IQ: Fuel agents with enterprise knowledge and agentic retrieval
  • LAB532: From data to context: agent-ready knowledge with Foundry IQ
  • BRK240: Build context-aware agents: From data to decisions with Microsoft IQ
  • DEM331: Turn APIs, tools, and data into real agent velocity

The post Foundry IQ: Build smarter agents faster with unified knowledge and serverless retrieval appeared first on Microsoft Foundry Blog.

Read the whole story
alvinashcraft
50 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Build agents you can trust across any framework with open evals and a control standard

1 Share

Build agents you can trust across any framework with open evals and a control standard

We are four years into the generative AI era, and agents are everywhere. Enterprises are deploying them at scale, but trust has not kept pace. The gap is concrete: written policies do not translate into working runtime controls, evaluating agent safety across changing contexts is hard, and controls scattered across prompts, code, gateways, and frameworks make it risky to move from demo to production.

At Microsoft Build 2026, we are closing that gap. By the end of this post, you will be able to evaluate an agent against your own policies, place runtime controls at the exact checkpoints where it can fail, and monitor its behavior in production. You can start today, on any framework, with open source.

What’s new

Today we are announcing a new trust framework and a set of capabilities for developers building AI agents on any framework. It starts with two open-source projects that any developer can use regardless of their stack:

  1. ASSERT (Adaptive Spec-driven Scoring for Evaluation and Regression Testing), a policy-driven evaluation framework built on Microsoft Research.
  2. Agent Control Specification (ACS), a portable runtime control standard and part of the Agent Governance Toolkit, built for broad ecosystem adoption.

ASSERT: Open-source agent evaluation

Agents fail in ways that are hard to see. They drift from policy, produce unsafe outputs in edge cases, and behave differently in production than they did in testing. Generic benchmarks do not catch these failures because they are not built around your policies, your agent, or your use case.

ASSERT is Microsoft’s open-source framework for policy-driven agent evaluation, built on a proven Microsoft Research approach. ASSERT takes your organizational policies and requirements as input, systematically generates targeted evaluation scenarios, and surfaces safety and quality defects before they reach production.

ASSERT is:

  • Requirements driven. ASSERT converts your policies into concrete, measurable evaluations, so rather than generic benchmarks you get context-specific test cases tailored to your agent’s intended behavior.
  • Safety focused. ASSERT uses a systematization approach specifically validated for safety evaluation rather than quality alone, which distinguishes it from other evaluation tools that focus on quality metrics only.
  • Open source, any framework. ASSERT works across LangChain, CrewAI, LiteLLM, OpenAI, and more. Because it is not tied to Microsoft Foundry, it is built for the 6 to 13 million generative AI developers building today.
  • An integrated workflow. Run ASSERT to identify defects, apply controls, then re-run ASSERT to validate improvement, with before-and-after metrics telling a clear story. ASSERT is the developer’s starting point, giving you a way to understand what your agent is doing wrong before you try to fix it.

We are grateful to be launching ASSERT with support from partners who are already building with and validating this framework, including CrewAI, Arize AI, LiteLLM, Pipecat, and Pydantic. Their participation reflects a shared belief that agent evaluation needs to be open, policy-driven, and portable across the ecosystem.

Partner logos for organizations supporting ASSERT, including CrewAI, Arize AI, LiteLLM, Pydantic, and Pipecat

My favorite thing about ASSERT is that the eval is easy to configure and reason about. I describe the behavior I care about in YAML, point it at a real agent, and get artifacts back. Not just pass/fail. They show why the judge made each call. That openness matters. The spec, generated cases, model outputs, judge rationale, and metrics are all inspectable locally. The eval feels auditable, not like a black box.

— Lorenze Jay Hernandez, Open Source Lead, CrewAI

Agent Control Specification: An open standard for agent safety controls

Knowing where your agent is failing is only half the problem. The other half is having a consistent, portable way to fix it, one that works across frameworks, travels with the agent, and does not lock you into any single vendor or infrastructure.

ACS is an open industry specification for placing deterministic safety and security controls at checkpoints throughout agentic workflows, and it is part of the Agent Governance Toolkit. Think of ACS as the MCP or A2A of agent safety. Just as Model Context Protocol (MCP) standardized how agents connect to tools and Agent2Agent (A2A) standardized how agents communicate with each other, ACS provides one open standard for safety controls that any framework can adopt, with Microsoft providing reference implementations for major platforms.

What ACS does

ACS:

  • Defines five key validation checkpoints in an agent’s lifecycle, covering input, large language model (LLM), state, tool execution, and output.
  • Enables deterministic control logic, including classifier endpoints, LLM judges, and custom content filters, placed exactly where you need them.
  • Is expressed as standard policy YAML, making controls portable, versionable, and auditable.
  • Works with any agent framework and is intentionally designed for industry-wide adoption.

ACS launches with a broad ecosystem of customers and partners spanning governance, security, observability, and framework categories. These partners have endorsed the specification and are building integrations and reference implementations.

Customer and partner logos for Agent Control Specification, including KPMG, Zscaler, Arize AI, IBM, CrewAI, and other ecosystem partners

  • Customers: KPMG, Zscaler
  • Partners: Arize AI, Aviatrix, BigSpin, CrewAI, Geordie, HoneyHive, IBM, Monte Carlo, Obsidian

Securing AI agents has been stuck between advisory system prompts and brittle per-framework code, and neither scales to the enterprise. Agent Control Specification (ACS) treats agent guardrails the way OpenInference treats traces: a portable, declarative contract enforced outside the model, reviewed once by security and applied everywhere. Every block, every human approval, and every state transition Agent Control Specification emits lands in Arize alongside the OpenInference trace that produced it, so policy and observability finally travel together.

— Aparna Dhinakaran, Co-founder & Chief Product Officer, Arize AI

Through our experience with Agent Control Specification, IBM has built AI agents for our clients that are not only innovative, but also secure, governed, and transparently compliant. Centralized agent controls give us the ability to consistently apply policies, monitor behavior, and ensure accountability across complex environments, so our clients can deploy agentic AI with confidence.

— Miha Kralj, Global CTO, IBM Consulting, Microsoft Practice

From policy to production confidence

ASSERT and ACS are designed to work together:

  1. Run ASSERT to identify where your agent is failing policy requirements.
  2. Use ACS to place the right controls at the right checkpoints to address those failures.
  3. Re-run ASSERT to confirm improvement.

It is a closed loop from evaluation to enforcement, and ACS gives developers a portable control layer that travels with the agent, not locked to any infrastructure or dependent on any single vendor.

Workflow diagram showing ASSERT and Agent Control Specification moving from policy to evaluation, runtime controls, validation, and production confidence

Together, these capabilities help developers move through a continuous trust lifecycle: identify risk, evaluate the agent, apply controls, observe behavior, and improve over time.

Trust lifecycle diagram showing a loop from identifying risk to evaluation, applying controls, observing behavior, and improving agents over time

Continuous governance in Foundry: Guardrails recommended for your agent

Most teams know they need guardrails, but far fewer know which guardrails apply to their agent.

Guided Guardrail Setup in Foundry, now in public preview, gives developers personalized guardrail recommendations in minutes. A short questionnaire about your agent’s audience, data access, and use case surfaces the specific risks relevant to your scenario and recommends the right controls, including personally identifiable information (PII) filters, jailbreak protection, and task adherence, all with no security expertise required.

Learn more about guided guardrail setup in Foundry.

Most teams know they need guardrails, but far fewer know which guardrails apply to their agent. Guided Guardrail Setup closes that gap by translating your agent’s actual context into a concrete configuration you can ship with confidence.

Continuous observability in Foundry: See, evaluate, and improve agent behavior at every stage

Shipping an agent is the beginning, not the end. Keeping agents accurate, safe, and aligned with users requires the ability to see, evaluate, and improve behavior across the full lifecycle.

This spring marked a major milestone: tracing and evaluations in Foundry reached general availability, delivering production-ready visibility into agent behavior, with hosted agents coming soon. At Build 2026, we are building on that foundation with a new wave of capabilities. Learn more about agent observability.

Rubric: Context-aware evaluation at scale

Rubric evaluator, now in public preview, is a new evaluator in Microsoft Foundry that automatically generates evaluation criteria based on your agent’s specific context.

Unlike static benchmarks, Rubric:

  • Creates custom quality criteria from your agent definition and use case.
  • Uses a two-step process to generate the rubric, then evaluate performance against it.
  • Applies weighted dimensions for aggregate scoring, giving a more nuanced view of quality.
  • Feeds directly into Agent Optimizer, using evaluation results to drive continuous improvement across traces, evaluations, and memory.

Rubric bridges development-time evaluation and production monitoring. Where ASSERT is your open-source, safety-focused tool for inner-loop development, Rubric is your Foundry-native evaluator for measuring and improving quality at scale in production.

Interoperability and core observability

Foundry observability is designed to integrate with your existing stack. These capabilities bring production-grade tracing and evaluation to any agent without requiring teams to change frameworks or workflows.

  • Tracing and evaluations for any agent framework, now in public preview, brings Foundry’s production-grade tracing and evaluations to agents built on LangChain, Semantic Kernel, or any custom framework, so no team has to choose between their stack and their observability.
  • Azure Developer CLI (AZD) observability developer experience, now in public preview, brings tracing, logging, and insights directly into the developer workflow. This reduces friction and helps teams diagnose and improve applications without leaving their development environment.

Tracing, evaluation, and optimization

These capabilities help teams evaluate real-world performance, surface issues earlier, and close the loop from production signals to better agents.

  • Multi-turn evaluation, now in public preview, evaluates agent quality across full multi-step conversations, not just single responses, catching degradation and safety issues that only surface when context accumulates over time.
  • User Simulation, now in public preview, automatically generates realistic multi-turn conversations and scenarios to evaluate how agents perform.
  • Evaluations with intelligent sampling, now in public preview, automatically run evaluations against a curated sample of live production traces, using smart filtering to surface the most signal-rich interactions so quality monitoring happens continuously without the cost of evaluating every request.
  • Traces to dataset, now in public preview, converts production traces into relevant structured evaluation datasets to improve offline test coverage.
  • Trace replay and visualization, now in public preview, replays and visually steps through agent execution traces to understand exactly how outcomes are produced. This makes debugging faster, improves model behavior, and builds confidence in production AI systems.
  • Agent Optimizer in Foundry Agent Service, now in private preview, runs Foundry’s full evaluation suite directly within Foundry AI Operations Service and feeds results into Foundry Optimizer, closing the loop from production signal to continuous improvement.

Microsoft Foundry evaluation results screen showing rubric pass and fail scores across detailed agent evaluation rows

Business value

Knowing your agent works is critical, but so is proving that it delivers business value. We are introducing a new capability to help close that gap.

Return on investment (ROI) for agents in Microsoft Foundry, now in private preview, measures the real business impact of your agents, including task completion rates, time saved, and cost efficiency, giving stakeholders the data they need to justify investment and prioritize what to improve.

Microsoft Foundry dashboard showing agent ROI metrics, net value, value generated, total cost, current ROI, and cost analysis charts

By combining evaluations and tracing capabilities in Microsoft Foundry with Azure Monitor, we transform AI into an enterprise-grade, production-ready system with built-in observability and continuous optimization — enabling ongoing evolution across the agent lifecycle and accelerating NTT DATA’s Smart AI Agent® vision.

— Yuji Shono, Head of the Global AI Office, NTT DATA Group Corporation

Security in Foundry: Developer-scoped data protection for agents

Evaluation and observability tell you how your agent is behaving. Security ensures every interaction adheres to your data protection policies, across prompts, responses, and tool calls. At Build 2026, Foundry brings Purview-grade data protection directly into the agent development experience, enabling real-time policy enforcement as agents are built and deployed.

  • Runtime Data Loss Prevention (DLP) in Foundry, now in public preview, extends Microsoft Purview DLP into agent interactions, enabling real-time detection and blocking of sensitive data in prompts and across AI interaction flows within Foundry-built apps and agents. By bringing Purview enforcement directly into the developer workflow, teams building agents can apply data protection controls as they build, rather than relying solely on centralized policy rollout. Learn more about Purview for developers.
  • Purview insights embedded directly into the Foundry Control Plane, now generally available, brings rich data security context to the place developers already work. Purview surfaces crucial signals, such as sensitive information types (SITs) detected in agentic interactions, the percentage of agentic interactions involving sensitive data, and the spread of high-risk users, in-line so Foundry admins can understand how AI apps and agents are built. This shift enables developers to make faster, better decisions in the moment, reducing rework and closing security gaps early on. For customers, the value is clear: stronger security by design and at enterprise scale, accelerated development cycles, and reduced risk of data leaks or compliance issues without slowing down innovation.

Together, these capabilities raise the bar for building safe agents, with built-in enforcement of data protection and policy at every interaction. Data protection moves into the inner loop, alongside evaluation, control, and observability, as a core part of building production agents.

Get started today

To learn more about ASSERT and ACS, check out these deeper-dive resources:

Join our open-source community:

Explore Microsoft Foundry documentation:

If you are attending Microsoft Build 2026, or watching on-demand content later, be sure to check out these sessions:

The post Build agents you can trust across any framework with open evals and a control standard appeared first on Microsoft Foundry Blog.

Read the whole story
alvinashcraft
55 seconds ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories