Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
154949 stories
·
33 followers

A Developer’s Guide to Managing Models, Cost and Quality in Microsoft Foundry

1 Share

A Developer’s Guide to Managing Models, Cost and Quality in Microsoft Foundry

The hardest part of building AI systems today is no longer getting access to a capable model. It is knowing how to choose, validate, optimize, and operate the right model across the full lifecycle of a real application.

Take a retrieval-augmented generation (RAG)-based customer support copilot or a tool-calling agent that helps employees complete business workflows. In a prototype, it may be enough to pick a strong model, connect a few data sources, and get a useful response. In production, the system needs to retrieve the right context, call the right tools, meet quality and safety thresholds, stay within latency targets, and run at a cost the business can sustain.

Models evolve, costs shift, and production requirements often arrive after the first version is already working. Success depends less on choosing the most powerful model and more on building a disciplined operating approach around the application.

That is where Microsoft Foundry comes in: a unified platform to select, evaluate, optimize, operate, and continuously improve AI applications at production scale.

What’s new

Microsoft Foundry continues to expand the model ecosystem and operating surface for developers building production AI systems.

Fireworks AI on Microsoft Foundry is now generally available, giving developers access to production-grade open model inference through a single Azure endpoint, with enterprise service-level agreements (SLAs) and zero-setup onboarding.

Foundry is also adding new model families and capabilities across modalities, including Microsoft AI models, partner models, open-source models, custom models, and post-trained variants. Together, these updates give developers more choice while keeping selection, evaluation, deployment, and operations in one consistent workflow.

The challenge is no longer access. It is operations.

In a prototype, the questions are simple: Can the model answer the prompt? Can it connect to my data? Can it complete the happy path?

In production, the questions change. Which model fits each task? How do I validate it on my own data? What latency budget does this experience require? How much throughput do I need at peak? What happens when quota is constrained, costs spike, or a newer model becomes available? How do I monitor quality, detect eval drift, roll back safely, and prove the system is governed?

Agentic systems often fail when the model is mismatched, evaluation is incomplete, costs run unchecked, or governance arrives too late. Teams that rely on a single provider face another risk: lock-in, with no escape hatch when a model degrades, pricing changes, or capacity becomes constrained.

Foundry is built on the opposite philosophy. It is a model-agnostic platform spanning Microsoft, open-source, and independent software vendor (ISV) partner models, all on the same operating surface.

The answer is to treat model selection and optimization as a continuous operating discipline: 

Model optimization loop showing how teams select, evaluate, optimize, operate, and improve models over time

1. Select the right model for the task

Model selection is about workload fit, not leaderboard rank. Before choosing a model, define the task contract: what the model needs to do, what good looks like, what constraints it must operate within, and which failure modes are unacceptable.

A routing step may need low latency. A policy question may need grounded reasoning with citations. A coding agent may need deeper reasoning and tool use. A customer-facing copilot may need strong safety boundaries, predictable latency, and cost efficiency at scale.

A simple model selection framework:

Workload need Favor this approach Why
Classification, routing, extraction, or high-volume chat Smaller, lower-latency model Keeps cost and latency low
Complex reasoning, coding, or planning Stronger reasoning model Improves quality for harder tasks
Image, speech, voice, or physical AI Modality-specific model Matches the model to the input and output type
Mixed workloads with different complexity Model Router Routes each request based on quality, cost, and latency
Domain-specific behavior, tone, or format Fine-tuned or custom model Improves consistency for your scenario

Effective model choice depends on four dimensions: capability, safety, latency, and cost.

Foundry helps developers make these tradeoffs through a broad model ecosystem and a consistent operating surface. Developers can access Microsoft models, leading base models, partner models like Fireworks AI, open-source models, custom models, and post-trained variants through one selection, evaluation, and deployment workflow.

Developer tip: For developers who want to bypass manual selection, Foundry provides Model Router in Foundry Models. Model Router automatically routes each request to the most appropriate model based on workload characteristics, cost targets, and latency requirements.

2. Validate with your own evals and data

Benchmarks are not enough. A model that leads a public leaderboard may still underperform on your prompts, your data, your users, and your business rules. Production confidence comes from evaluating against the workloads your application will actually run.

With Foundry, developers can bring their own evaluation inputs, including CSV or JSONL datasets with prompts, expected outputs, labels, or ground-truth answers. They can run side-by-side comparisons across models and prompts, evaluate agents and multi-step workflows, and inspect results across datasets, traces, and production-like scenarios.

Built-in quality and safety evaluators help measure signals such as relevance, groundedness, coherence, fluency, safety, and policy adherence. Custom evaluators can capture application-specific rules, formats, and business logic.

A strong evaluation covers:

Quality: Did the model complete the task correctly? Accuracy and groundedness: Did it produce reliable answers based on the right context? Safety: Did it follow policies and avoid unacceptable responses? Performance: Did it meet latency, throughput, and reliability requirements? Cost: Did it deliver the right outcome at the right price?

Evaluation should run continuously as new model versions, fine-tuned variants, agent changes, or new model families become available.

Developer tip: Define success criteria before opening the model catalog. Criteria-first evaluation prevents anchoring on model reputation instead of workload fit.

3. Optimize cost and performance

Cost is a first-class architectural concern, not an afterthought. In prototypes, it may be acceptable to send every task to the most capable model. In production, that approach breaks down quickly.

A simple classification task, a RAG response, a long-context reasoning workflow, and a multi-step agentic process should not always use the same model or deployment strategy.

Foundry gives developers levers to optimize across quality, cost, and latency at the system level:

Intelligent routing: Send each task to the right model based on complexity and budget. Batching: Use asynchronous processing for workloads that do not require real-time responses. Caching: Avoid paying repeatedly for identical or near-identical requests. Provisioned throughput: Use dedicated capacity for predictable performance at scale. Quota management: Scale more predictably with quota tiering, global customer quota, and data zone customer quota. Model optimization: Use model compression, fine-tuning, or distillation where appropriate.

Fireworks AI on Foundry is now generally available, giving developers access to a high-performance open model catalog through a single Azure endpoint, with enterprise SLAs, no separate infrastructure, and no separate contracts.

Developer tip: Profile cost by task type before optimizing globally. Routing decisions are workload-specific, not one-size-fits-all.

4. Operate at scale with enterprise confidence

Deploying an endpoint is not the same as operating a production AI system. Teams need to understand how the system behaves, enforce policies, monitor usage and cost, test model changes safely, and roll back when quality or performance regresses.

Foundry brings these operating capabilities into one surface: versioning, SLA-backed reliability, security, governance, access controls, audit logging, usage monitoring, and controlled upgrades.

Teams can monitor token usage and throughput, inspect logs and traces, evaluate model and agent behavior, enforce policies, and compare changes before rolling them out broadly. As new model versions become available, they can test against evaluation datasets and traces, validate quality, latency, and cost impact, and reduce risk with versioning and rollback strategies.

The Fireworks AI on Foundry generally available (GA) release is a concrete example of this operating model, with enterprise SLAs, provisioned throughput unit (PTU) Data Zone support, SOC2 readiness, and the same access controls and audit logging that govern Foundry.

Production adopters span AI-native and traditional enterprise workloads, including Perplexity, Motif, UiPath, and StackBlitz. During preview, the platform processed more than 176 billion tokens across 17 S&P 500 enterprises.

Developer tip: Treat model upgrades like dependency upgrades: test against baselines, stage rollouts, monitor regressions, and keep a rollback plan.

5. Continuously improve as models and workloads evolve

AI systems are dynamic. Models improve, workloads shift, user behavior changes, pricing evolves, and new model families arrive. The best system today may not be the best system six months from now.

That is why the lifecycle loop matters:

Select the right model for the task. Evaluate it against your own data and production baselines. Optimize for quality, cost, latency, and throughput. Operate with governance, observability, and reliability. Improve as new models, tools, and customization options emerge.

For engineering teams, every model, prompt, tool, agent, or workflow change should be treated like a production change. New model versions should be tested automatically against regression datasets, production traces, and known edge cases before rollout.

A model may improve quality but increase latency, reduce cost but weaken groundedness, or perform better on common cases while regressing on high-risk scenarios. Automated evaluations help teams detect those tradeoffs early.

Developer tip: Automate your evaluation pipeline so every new model version is compared against production baselines for quality, safety, latency, throughput, and cost before deployment.

What this means for developers

The next phase of AI development will not be won by teams that simply have access to the biggest models. It will be won by teams that know how to operate models well.

That means choosing by workload fit, validating with real data, optimizing cost and performance, deploying with governance, and improving as the landscape shifts.

Microsoft Foundry is designed for exactly this reality: a model-agnostic platform spanning Microsoft, open-source, and ISV models, all on one operating surface. No lock-in. No re-architecture. No guesswork.

The future of AI development is not about guessing which model might work. It is about building an operating discipline that lets you know.

Get started

The post A Developer’s Guide to Managing Models, Cost and Quality in Microsoft Foundry appeared first on Microsoft Foundry Blog.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Running an AI-native engineering org

1 Share
Running an AI-native engineering org
Read the whole story
alvinashcraft
20 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Foundry IQ: Build smarter agents faster with unified knowledge and serverless retrieval

1 Share

Foundry IQ: Build smarter agents faster with unified knowledge and serverless retrieval

Developers building agent fleets keep hitting the same pattern: the agent logic is ready, but the knowledge infrastructure underneath is complex to do well. Getting to production means solving for stability, scale, data access, answer quality, security, and content ingestion all at once. Today, we are enabling developers to have faster impact by simplifying the enterprise knowledge platform.

Your company’s IQ, powered by Microsoft IQ, is the collective intelligence locked in documents, emails, meetings, operational data, and the live web. This is where your true competitive edge lives. Foundry IQ grounds agents with the knowledge from these sources and continuously improves based on your business goals.

The announcements today are designed to help customers provision knowledge bases faster, unify enterprise and external sources, and expose that knowledge through the Foundry IQ Model Context Protocol (MCP) server for any agent framework or MCP-compatible hosts.

What’s new

  • Foundry IQ Serverless in preview: Provision instant, no-friction context retrieval with scale to zero pricing. Developer tier now available in public preview with more coming soon. Docs | Create a Foundry IQ resource
  • New knowledge sources in preview: Ground agents across Work IQ, Fabric IQ (including Data agents and Ontology), File Search, Azure SQL, and MCP through a multi-source knowledge base, with no custom integrations required. Docs | Cookbook
  • Web IQ in Foundry IQ is now available: Extend agent context to the web, honoring publisher preferences, and marketplace data with sub-165 ms latency and zero data retention. Blog | Website
  • Foundry IQ knowledge bases are generally available: Ship production agents on a fully SLA-backed knowledge layer with stable APIs, compliance certifications, and the Foundry IQ MCP server for any MCP-compatible host. Docs | Quickstart
  • Agentic retrieval quality improvements: The latest updates to the agentic retrieval engine improve answer performance across datasets, effort tiers, and model sizes while spending fewer tokens. Blog | Quickstart
  • Data pipeline updates in preview: Automatic layout-aware ingestion of documents, image enrichment, and broader SharePoint indexing ground agents in complete documents, not just raw text. Blog | Quickstart
  • Security updates in preview: New controls for encryption, permissions sync, and sensitivity-label governance keep enterprise policy intact as content flows into agents. Blog | Quickstart

Foundry IQ Serverless in public preview

We know agent workloads are bursty and event-driven: an agent might execute hundreds of steps in seconds, then go idle for hours. Serverless eliminates infrastructure friction: no clusters to manage, no capacity to reserve, no idle costs. Go from zero to production fast, with instant retrieval-augmented generation (RAG) and state-of-the-art retrieval quality built in.

Foundry IQ Serverless (Developer tier) is available in public preview. You are billed for compute resources and storage used, and the service scales to zero when idle.

Serverless tiers use Compute Units (CU) to measure resource consumption, including CPU utilization, memory and storage I/O. Usage is calculated each minute in increments of 0.25 CUs.

For large-scale serverless deployments, contact us for additional options.

Capability Developer tier
Compute usage $0.24 CU / hour
Indexed storage Up to $0.29 GB / month; GB cost is region dependent
Indexed storage per index 1 GB / index
Indexes per service 30 indexes / service
Services per subscription per region 5 services / subscription / region

Billing is expected to begin in late 2026 with details provided at least 30 days in advance. Customers using Serverless Developer won’t be charged before billing is enabled. Current Compute Unit measurements are estimates only and subject to change before billing is enabled.

Next steps: create a Foundry IQ Serverless resource in the Foundry portal.

New knowledge sources in preview

How do I give an agent access to organizational knowledge and structured business data without building a custom connector for every system?

Bringing enterprise knowledge into agent workflows often means stitching together custom integrations across each data source. Developers must account for different data formats, permission models, retrieval patterns, and source-specific logic before an agent could reliably use that knowledge.

Foundry IQ simplifies this by bringing enterprise content and structured systems into a single knowledge base for multi-source, agentic retrieval. Developers can give agents access to that knowledge without building and maintaining separate connectors or source-specific retrieval strategies.

New knowledge sources in preview:

  • Work IQ brings organizational signals like emails, meetings, files, and Teams messages into one enriched, AI-ready source, all while respecting user permissions. Agents can answer questions about how the organization operates, what decisions were made, and what is top of mind for teams.
  • Fabric IQ lets agents query data agents and company ontologies: formal models of business entities, relationships, and rules linked to live data in OneLake and a specialized semantic layer. This returns structured answers alongside unstructured document context for a query.
  • File Search allows you to directly upload files to a knowledge base.
  • Azure SQL brings structured relational data into a knowledge base.
  • MCP Server connects knowledge served over the Model Context Protocol.

A Microsoft Foundry interface showing knowledge source selection for Foundry IQ, with options for connecting enterprise data sources into a knowledge base.

Next steps: use the Foundry IQ Forgebook to try out additional knowledge sources.

Microsoft Web IQ in Foundry IQ now available

When an answer needs fresh, real-world context, how do I reach the open web without paying a latency or compliance penalty?

Microsoft Web IQ is available in limited access through the Foundry IQ MCP knowledge source. It gives agents access to external retrieval across web, news, images, video, and shopping sources while honoring publisher preferences. It is designed for large language model (LLM) workflows rather than traditional search pages, with industry-leading low-latency ranking.

Combined with Foundry IQ, agents can plan, search, reason, and synthesize answers that draw on both internal knowledge and real-world external context in one retrieval engine.

Next steps: read the blog announcement for Microsoft Web IQ.

Knowledge bases in Foundry IQ are generally available

What does it actually take to move a prototype into production?

Production means guarantees: stable contracts, predictable performance, and security that holds under audit. Foundry IQ knowledge bases and select knowledge sources, and security capabilities are generally available: with full SLA coverage, compliance certifications, stable APIs, and enterprise-grade network isolation with identity and policy enforced by default.

What is included in GA:

  • Knowledge bases: agentic retrieval references, output and activity logs, the Foundry IQ MCP server, and minimal retrieval reasoning effort.
  • Foundry IQ MCP server: exposes Foundry IQ knowledge bases as a remote MCP server, making them accessible from any MCP-compatible host or client, including Claude, ChatGPT, LangChain, and the Microsoft Agent Framework. Network isolation, document-level security, cross-source ranking, and agentic retrieval all work over the open MCP standard, making it available for the broader agent ecosystem.
  • Knowledge sources: Azure Blob Storage (with a status API to check indexing progress), search indexes, Web, and OneLake.
  • Security: network isolation and managed identity support.

“We’ve been using Foundry IQ in our research and prototyping work, and the reusable knowledge base approach has cut a lot of the setup overhead we’d normally expect. Being able to ground agents in trusted enterprise content from day one, without rebuilding retrieval logic each time, has made early-stage experimentation noticeably faster and higher quality.”Jane Chen, Lead AI Developer, Baringa Partners

Next steps: use the Mastering Foundry IQ cookbook to get started building with the Foundry IQ MCP server.

Agentic retrieval quality improvements

The latest retrieval enhancements improved our answer quality benchmarks by up to 20%, across our evaluated datasets, effort tiers, and model sizes. Compared to single-shot RAG, knowledge bases improved recall by up to 54%.

Foundry IQ improved its iterative agentic retrieval loop to batch queries more effectively, surface more relevant passages via semantic ranker, and apply server-side token caching to reduce redundant consumption across multi-turn conversations. This results in meaningfully fewer tokens spent without sacrificing answer quality, while beating previous benchmarks on answer quality.

Next steps: read our blog for more on the latest evaluations and Foundry IQ benchmarks.

Security updates in preview

How do I keep enterprise data permissions intact as content flows into agents?

Security belongs at the data layer, not approximated in application code. Several security capabilities are now in preview, including cross-tenant customer-managed keys (CMK) using federated identity credentials — eliminating shared secrets — Purview sensitivity-label auditing, incremental SharePoint permissions sync, APIM support for Foundry model integrations, and surfacing Purview sensitivity labels inside knowledge sources so label-based access controls are honored end to end.

Private connectivity between Foundry IQ and Foundry products, via Shared Private Link and Network Security Perimeter, is generally available.

“By integrating Foundry IQ, we provide a managed, permission-aware business context layer that connects marketing and brand knowledge into every agent so they can access the right information, at the right time, with the right governance.”Andrei Pop, Director of PM, Innovation, Sitecore

Next steps: read more about the latest Foundry IQ security announcements.

Data pipeline updates in preview

How do I make sure agents are grounded in the whole document (tables, diagrams, and images) not just the raw text?

Ingestion quality sets the ceiling on retrieval quality. New data pipeline capabilities in preview include first-class SharePoint indexing for ASPX pages and Lists alongside document libraries and document enrichment to process images plus serve them at query time in knowledge bases, so agents and users can reference original visuals and ask follow-up questions about them. We are also introducing Azure Content Understanding chunking with image verbalization — a layout-aware ingestion pipeline that converts diagrams, charts, and scanned images into meaningful text so agents are grounded in complete, semantically accurate representations of source documents.

Next steps: read Foundry IQ’s data pipeline deep dive blog post.

Get started today

Build once, reuse everywhere: Foundry IQ enables you to ground multiple agents with the same knowledge base, connecting and unifying data from anywhere. Foundry IQ is designed for agent workloads to deliver better results from your company’s IQ. With Foundry IQ, accelerate agent delivery, deliver context without blind spots, and ensure every answer respects your organization’s security by default.

The easiest way to explore Foundry IQ is through the Microsoft Foundry portal. From there you can create a knowledge base, access the documentation, and follow the Microsoft Foundry Learn courses, all in a few clicks.

Be sure to check out the latest news from Foundry IQ at Microsoft Build 2026:

  • BRK246: Foundry IQ: Fuel agents with enterprise knowledge and agentic retrieval
  • LAB532: From data to context: agent-ready knowledge with Foundry IQ
  • BRK240: Build context-aware agents: From data to decisions with Microsoft IQ
  • DEM331: Turn APIs, tools, and data into real agent velocity

The post Foundry IQ: Build smarter agents faster with unified knowledge and serverless retrieval appeared first on Microsoft Foundry Blog.

Read the whole story
alvinashcraft
30 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Build agents you can trust across any framework with open evals and a control standard

1 Share

Build agents you can trust across any framework with open evals and a control standard

We are four years into the generative AI era, and agents are everywhere. Enterprises are deploying them at scale, but trust has not kept pace. The gap is concrete: written policies do not translate into working runtime controls, evaluating agent safety across changing contexts is hard, and controls scattered across prompts, code, gateways, and frameworks make it risky to move from demo to production.

At Microsoft Build 2026, we are closing that gap. By the end of this post, you will be able to evaluate an agent against your own policies, place runtime controls at the exact checkpoints where it can fail, and monitor its behavior in production. You can start today, on any framework, with open source.

What’s new

Today we are announcing a new trust framework and a set of capabilities for developers building AI agents on any framework. It starts with two open-source projects that any developer can use regardless of their stack:

  1. ASSERT (Adaptive Spec-driven Scoring for Evaluation and Regression Testing), a policy-driven evaluation framework built on Microsoft Research.
  2. Agent Control Specification (ACS), a portable runtime control standard and part of the Agent Governance Toolkit, built for broad ecosystem adoption.

ASSERT: Open-source agent evaluation

Agents fail in ways that are hard to see. They drift from policy, produce unsafe outputs in edge cases, and behave differently in production than they did in testing. Generic benchmarks do not catch these failures because they are not built around your policies, your agent, or your use case.

ASSERT is Microsoft’s open-source framework for policy-driven agent evaluation, built on a proven Microsoft Research approach. ASSERT takes your organizational policies and requirements as input, systematically generates targeted evaluation scenarios, and surfaces safety and quality defects before they reach production.

ASSERT is:

  • Requirements driven. ASSERT converts your policies into concrete, measurable evaluations, so rather than generic benchmarks you get context-specific test cases tailored to your agent’s intended behavior.
  • Safety focused. ASSERT uses a systematization approach specifically validated for safety evaluation rather than quality alone, which distinguishes it from other evaluation tools that focus on quality metrics only.
  • Open source, any framework. ASSERT works across LangChain, CrewAI, LiteLLM, OpenAI, and more. Because it is not tied to Microsoft Foundry, it is built for the 6 to 13 million generative AI developers building today.
  • An integrated workflow. Run ASSERT to identify defects, apply controls, then re-run ASSERT to validate improvement, with before-and-after metrics telling a clear story. ASSERT is the developer’s starting point, giving you a way to understand what your agent is doing wrong before you try to fix it.

We are grateful to be launching ASSERT with support from partners who are already building with and validating this framework, including CrewAI, Arize AI, LiteLLM, Pipecat, and Pydantic. Their participation reflects a shared belief that agent evaluation needs to be open, policy-driven, and portable across the ecosystem.

Partner logos for organizations supporting ASSERT, including CrewAI, Arize AI, LiteLLM, Pydantic, and Pipecat

My favorite thing about ASSERT is that the eval is easy to configure and reason about. I describe the behavior I care about in YAML, point it at a real agent, and get artifacts back. Not just pass/fail. They show why the judge made each call. That openness matters. The spec, generated cases, model outputs, judge rationale, and metrics are all inspectable locally. The eval feels auditable, not like a black box.

— Lorenze Jay Hernandez, Open Source Lead, CrewAI

Agent Control Specification: An open standard for agent safety controls

Knowing where your agent is failing is only half the problem. The other half is having a consistent, portable way to fix it, one that works across frameworks, travels with the agent, and does not lock you into any single vendor or infrastructure.

ACS is an open industry specification for placing deterministic safety and security controls at checkpoints throughout agentic workflows, and it is part of the Agent Governance Toolkit. Think of ACS as the MCP or A2A of agent safety. Just as Model Context Protocol (MCP) standardized how agents connect to tools and Agent2Agent (A2A) standardized how agents communicate with each other, ACS provides one open standard for safety controls that any framework can adopt, with Microsoft providing reference implementations for major platforms.

What ACS does

ACS:

  • Defines five key validation checkpoints in an agent’s lifecycle, covering input, large language model (LLM), state, tool execution, and output.
  • Enables deterministic control logic, including classifier endpoints, LLM judges, and custom content filters, placed exactly where you need them.
  • Is expressed as standard policy YAML, making controls portable, versionable, and auditable.
  • Works with any agent framework and is intentionally designed for industry-wide adoption.

ACS launches with a broad ecosystem of customers and partners spanning governance, security, observability, and framework categories. These partners have endorsed the specification and are building integrations and reference implementations.

Customer and partner logos for Agent Control Specification, including KPMG, Zscaler, Arize AI, IBM, CrewAI, and other ecosystem partners

  • Customers: KPMG, Zscaler
  • Partners: Arize AI, Aviatrix, BigSpin, CrewAI, Geordie, HoneyHive, IBM, Monte Carlo, Obsidian

Securing AI agents has been stuck between advisory system prompts and brittle per-framework code, and neither scales to the enterprise. Agent Control Specification (ACS) treats agent guardrails the way OpenInference treats traces: a portable, declarative contract enforced outside the model, reviewed once by security and applied everywhere. Every block, every human approval, and every state transition Agent Control Specification emits lands in Arize alongside the OpenInference trace that produced it, so policy and observability finally travel together.

— Aparna Dhinakaran, Co-founder & Chief Product Officer, Arize AI

Through our experience with Agent Control Specification, IBM has built AI agents for our clients that are not only innovative, but also secure, governed, and transparently compliant. Centralized agent controls give us the ability to consistently apply policies, monitor behavior, and ensure accountability across complex environments, so our clients can deploy agentic AI with confidence.

— Miha Kralj, Global CTO, IBM Consulting, Microsoft Practice

From policy to production confidence

ASSERT and ACS are designed to work together:

  1. Run ASSERT to identify where your agent is failing policy requirements.
  2. Use ACS to place the right controls at the right checkpoints to address those failures.
  3. Re-run ASSERT to confirm improvement.

It is a closed loop from evaluation to enforcement, and ACS gives developers a portable control layer that travels with the agent, not locked to any infrastructure or dependent on any single vendor.

Workflow diagram showing ASSERT and Agent Control Specification moving from policy to evaluation, runtime controls, validation, and production confidence

Together, these capabilities help developers move through a continuous trust lifecycle: identify risk, evaluate the agent, apply controls, observe behavior, and improve over time.

Trust lifecycle diagram showing a loop from identifying risk to evaluation, applying controls, observing behavior, and improving agents over time

Continuous governance in Foundry: Guardrails recommended for your agent

Most teams know they need guardrails, but far fewer know which guardrails apply to their agent.

Guided Guardrail Setup in Foundry, now in public preview, gives developers personalized guardrail recommendations in minutes. A short questionnaire about your agent’s audience, data access, and use case surfaces the specific risks relevant to your scenario and recommends the right controls, including personally identifiable information (PII) filters, jailbreak protection, and task adherence, all with no security expertise required.

Learn more about guided guardrail setup in Foundry.

Most teams know they need guardrails, but far fewer know which guardrails apply to their agent. Guided Guardrail Setup closes that gap by translating your agent’s actual context into a concrete configuration you can ship with confidence.

Continuous observability in Foundry: See, evaluate, and improve agent behavior at every stage

Shipping an agent is the beginning, not the end. Keeping agents accurate, safe, and aligned with users requires the ability to see, evaluate, and improve behavior across the full lifecycle.

This spring marked a major milestone: tracing and evaluations in Foundry reached general availability, delivering production-ready visibility into agent behavior, with hosted agents coming soon. At Build 2026, we are building on that foundation with a new wave of capabilities. Learn more about agent observability.

Rubric: Context-aware evaluation at scale

Rubric evaluator, now in public preview, is a new evaluator in Microsoft Foundry that automatically generates evaluation criteria based on your agent’s specific context.

Unlike static benchmarks, Rubric:

  • Creates custom quality criteria from your agent definition and use case.
  • Uses a two-step process to generate the rubric, then evaluate performance against it.
  • Applies weighted dimensions for aggregate scoring, giving a more nuanced view of quality.
  • Feeds directly into Agent Optimizer, using evaluation results to drive continuous improvement across traces, evaluations, and memory.

Rubric bridges development-time evaluation and production monitoring. Where ASSERT is your open-source, safety-focused tool for inner-loop development, Rubric is your Foundry-native evaluator for measuring and improving quality at scale in production.

Interoperability and core observability

Foundry observability is designed to integrate with your existing stack. These capabilities bring production-grade tracing and evaluation to any agent without requiring teams to change frameworks or workflows.

  • Tracing and evaluations for any agent framework, now in public preview, brings Foundry’s production-grade tracing and evaluations to agents built on LangChain, Semantic Kernel, or any custom framework, so no team has to choose between their stack and their observability.
  • Azure Developer CLI (AZD) observability developer experience, now in public preview, brings tracing, logging, and insights directly into the developer workflow. This reduces friction and helps teams diagnose and improve applications without leaving their development environment.

Tracing, evaluation, and optimization

These capabilities help teams evaluate real-world performance, surface issues earlier, and close the loop from production signals to better agents.

  • Multi-turn evaluation, now in public preview, evaluates agent quality across full multi-step conversations, not just single responses, catching degradation and safety issues that only surface when context accumulates over time.
  • User Simulation, now in public preview, automatically generates realistic multi-turn conversations and scenarios to evaluate how agents perform.
  • Evaluations with intelligent sampling, now in public preview, automatically run evaluations against a curated sample of live production traces, using smart filtering to surface the most signal-rich interactions so quality monitoring happens continuously without the cost of evaluating every request.
  • Traces to dataset, now in public preview, converts production traces into relevant structured evaluation datasets to improve offline test coverage.
  • Trace replay and visualization, now in public preview, replays and visually steps through agent execution traces to understand exactly how outcomes are produced. This makes debugging faster, improves model behavior, and builds confidence in production AI systems.
  • Agent Optimizer in Foundry Agent Service, now in private preview, runs Foundry’s full evaluation suite directly within Foundry AI Operations Service and feeds results into Foundry Optimizer, closing the loop from production signal to continuous improvement.

Microsoft Foundry evaluation results screen showing rubric pass and fail scores across detailed agent evaluation rows

Business value

Knowing your agent works is critical, but so is proving that it delivers business value. We are introducing a new capability to help close that gap.

Return on investment (ROI) for agents in Microsoft Foundry, now in private preview, measures the real business impact of your agents, including task completion rates, time saved, and cost efficiency, giving stakeholders the data they need to justify investment and prioritize what to improve.

Microsoft Foundry dashboard showing agent ROI metrics, net value, value generated, total cost, current ROI, and cost analysis charts

By combining evaluations and tracing capabilities in Microsoft Foundry with Azure Monitor, we transform AI into an enterprise-grade, production-ready system with built-in observability and continuous optimization — enabling ongoing evolution across the agent lifecycle and accelerating NTT DATA’s Smart AI Agent® vision.

— Yuji Shono, Head of the Global AI Office, NTT DATA Group Corporation

Security in Foundry: Developer-scoped data protection for agents

Evaluation and observability tell you how your agent is behaving. Security ensures every interaction adheres to your data protection policies, across prompts, responses, and tool calls. At Build 2026, Foundry brings Purview-grade data protection directly into the agent development experience, enabling real-time policy enforcement as agents are built and deployed.

  • Runtime Data Loss Prevention (DLP) in Foundry, now in public preview, extends Microsoft Purview DLP into agent interactions, enabling real-time detection and blocking of sensitive data in prompts and across AI interaction flows within Foundry-built apps and agents. By bringing Purview enforcement directly into the developer workflow, teams building agents can apply data protection controls as they build, rather than relying solely on centralized policy rollout. Learn more about Purview for developers.
  • Purview insights embedded directly into the Foundry Control Plane, now generally available, brings rich data security context to the place developers already work. Purview surfaces crucial signals, such as sensitive information types (SITs) detected in agentic interactions, the percentage of agentic interactions involving sensitive data, and the spread of high-risk users, in-line so Foundry admins can understand how AI apps and agents are built. This shift enables developers to make faster, better decisions in the moment, reducing rework and closing security gaps early on. For customers, the value is clear: stronger security by design and at enterprise scale, accelerated development cycles, and reduced risk of data leaks or compliance issues without slowing down innovation.

Together, these capabilities raise the bar for building safe agents, with built-in enforcement of data protection and policy at every interaction. Data protection moves into the inner loop, alongside evaluation, control, and observability, as a core part of building production agents.

Get started today

To learn more about ASSERT and ACS, check out these deeper-dive resources:

Join our open-source community:

Explore Microsoft Foundry documentation:

If you are attending Microsoft Build 2026, or watching on-demand content later, be sure to check out these sessions:

The post Build agents you can trust across any framework with open evals and a control standard appeared first on Microsoft Foundry Blog.

Read the whole story
alvinashcraft
35 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Unlocking a New Way to Build Enterprise Data Apps with Microsoft Fabric

1 Share
We believe enterprise software should move at the speed of ideas. Today, we’re collaborating with Microsoft to make it dramatically easier for enterprises to build AI-powered apps in Replit and deploy them directly into Microsoft Fabric with enterprise-grade governance and security built in. The integration helps teams go from prompt to production faster- turning governed enterprise data into internal tools, dashboards, workflows, and AI applications in a fraction of the time traditional development requires. Bringing AI-Native Development to the Microsoft Data Ecosystem Microsoft Fabric is a unified platform for enterprise data, bringing together analytics, governance, storage, and business intelligence into a single environment. At the same time, Replit has emerged as one of the fastest ways to build software using AI-assisted development. With the introduction of Rayfin, a new open-source SDK and CLI for building application backends, business teams can now go beyond simply connecting to data. They can define and deploy complete, production-ready applications directly on top of it. This enables a new workflow: applications can be described, generated, and deployed directly to Fabric, where enterprise data already lives.

Read the whole story
alvinashcraft
42 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

What’s Coming Next in Visual Studio: Our Microsoft Build 2026 Announcements

1 Share

Microsoft Build kicks off today in San Francisco, June 2 and 3. If you cannot make it in person, the sessions are streaming online for free, and I want to walk you through what we are announcing for Visual Studio this week.

One idea tie most of it together. Code is an asset, not just an artifact. The tools around it should help you keep it healthy, correct, and easy to evolve as your codebase grows. Every announcement below is a step toward that.

Agents that participate in the work, not next to it

GitHub Copilot in Visual Studio is moving beyond chat and completions. The direction is agents that can participate more actively in the development lifecycle, helping with debugging, profiling, and testing alongside you.

This is not about replacing the tools you already rely on. It is about connecting them more effectively. The debugger, profiler, and test tools already provide deep insight. Agents help turn that insight into action:

  • Identify issues faster
  • Explain what is going on
  • Suggest concrete fixes
  • Help validate the results

This matters most if you work in large C# or C++ codebases where the hard problems are not “write this function” but “figure out why this thing is slow under load.” That is the work Visual Studio has always been built for. Agents extend it.

Catching errors before the build starts

This one is small and I think you will notice it daily.

Today, a build can still run even when there are obvious errors already sitting in the Error List. The build runs, you wait, the build fails on something you could have seen up front.

We are changing that flow so Visual Studio checks errors and warnings before the build starts. Simple change. Real time saved. The kind of thing that adds up across a week.

Merge conflicts with less manual work

Merge conflicts are something every developer runs into, and they are rarely a good use of anyone’s time.

We are working on AI-assisted conflict resolution to reduce the manual effort these situations require. The goal is not to auto-merge everything. The goal is to help you understand the conflict, make a sensible decision, and get back to the work you were actually doing.

Modernization that moves your apps forward

This summer, we are bringing new capabilities to GitHub Copilot modernization, the integrated agent experience built into Visual Studio that helps you upgrade your applications to the latest .NET stack.

You can migrate Web Forms applications to Blazor for a modern, component-based web stack. You can add Aspire to existing apps for cloud-ready observability and orchestration. The modernization agent assesses your project, builds a plan, and executes upgrades step by step, helping you improve performance and security without starting from scratch.

If you have been carrying a Web Forms app for years because the rewrite math never penciled out, this is worth a fresh look.

Skills that show up when you need them

One of the harder problems with AI tooling is that the right capability often exists, but it shows up at the wrong moment, or you have to know to ask for it.

We are introducing Microsoft-authored skills that apply automatically based on your project type and the task at hand. Less prompting. Less guesswork. A more helpful experience overall. The right capabilities show up when you need them, without requiring you to already know they exist.

Bring your own key, bring your own model

This is the one I have been waiting to talk about.

Historically, AI integration in Visual Studio has been limited to a small set of sanctioned endpoints. That works for a lot of developers, but it has left real customers behind, including teams whose environments call for different choices.

We are moving toward a BYOK approach, bring your own key or model, so you can use different AI models whether they run locally or in the cloud. That gives you more flexibility around performance, cost, and compliance based on the needs of your environment.

If you have been waiting for Visual Studio to meet your environment instead of asking your environment to bend, this is the announcement to watch.

Built on the GitHub Copilot SDK

Underneath all of this is a more unified foundation. Visual Studio is moving to the GitHub Copilot SDK as the foundation for its AI integration going forward.

This one sits below the surface. You will not see it in a menu. What it means in practice is that we can move faster, stay aligned with the broader ecosystem, and bring new capabilities into Visual Studio sooner. Worth knowing about, even though you will mostly feel it through everything else getting better.

Where this is heading

If there is one way to sum up this roadmap, it is this. We are focused on a set of meaningful improvements that remove friction from the inner loop and make day-to-day development feel better.

Code that compiles by default. Faster feedback before you build. Smarter handling of real-world pain points like merge conflicts. AI that works with your tools, not next to them. Flexibility in how you bring AI into your environment.

All of it is designed to fit how you already use Visual Studio, not force you into a different workflow.

Watch it live at Build this week

If you want to see this work in action, here are the sessions I would put on your schedule. All times in Pacific.

Microsoft Build opening keynote (KEY01) Tuesday, June 2, 9:30 AM to 12:00 PM PT Satya Nadella and Microsoft leaders open the week with how Microsoft is creating new opportunities for developers across our platforms in this era of AI. This is the one that sets the frame for everything else.

GitHub, Copilot, VS Code, and More: Live from San Francisco (LIVE104) Wednesday, June 3, 9:00 AM to 11:00 AM PT The closest thing Build has to a hallway conversation with the engineers shipping the work. Live demos, surprise guests, live coding, straight from the teams. Watch this one live if you can.

GitHub Copilot in Visual Studio: Agents That Debug, Profile, and Test (BRK207) Wednesday, June 3, 4:00 PM to 4:45 PM PT This is the demo-heavy session on the agents work above, with Mads Kristensen and Nik Karpinsky from the Visual Studio team. You will see agents root-cause bugs using live runtime behavior, pinpoint performance bottlenecks, and build test coverage to catch regressions before they ship. If you work in enterprise C#, .NET, or C++, this is the one.

Make GitHub Copilot Work Your Way: Custom Tools, Context and Workflows (LAB502D) Self-paced lab, opens Tuesday, June 2, at 12:00 PM PT Build custom Copilot agents from scratch, create reusable Agent Skills, and connect to external services via MCP. Works across VS Code, Visual Studio, CLI, and Copilot coding agent. Complete it on your own schedule.

The full Build schedule, including everything streaming online for free, is at build.microsoft.com.

If something we announced today changes how you think about your day-to-day in Visual Studio, I want to hear about it.

The post What’s Coming Next in Visual Studio: Our Microsoft Build 2026 Announcements appeared first on Visual Studio Blog.

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories