Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
149098 stories
·
33 followers

Seven Years Later, I’d Still Say This

1 Share

Seven years ago, I was in an interview when I was asked what sounded like a simple question: “What advice would you give someone just starting out as a researcher?”

I did not pause. I did not hedge. I just said it.

“Learn to code.”

I did not get the job.

I remember driving home afterward replaying the moment in my head, wondering if I had missed something obvious. Maybe they were looking for something more traditional. Something safer. Advice like building strong relationships, getting really good at storytelling, or sharpening your interviewing craft.

All of that is solid advice. It is still true.

But that is not what came out of my mouth. And it is a moment I have found myself thinking about more than once in the years since.

The Part I Was Starting to Notice

At that point in my career, I had already started noticing a pattern. Research would land well. People would nod. Sometimes roadmaps would even shift. From the outside, it looked like the work was having impact.

But the impact was episodic.

A few months later we would often find ourselves right back where we started. New priorities would emerge. New debates would begin. The same kinds of decisions would be made in the same ways. The deck mattered. The insights mattered. But the system shaping those decisions had not changed.

What I slowly realized was that we were influencing moments, not mechanisms. We could shape a conversation or a decision in the moment, but the machinery that produced decisions quarter after quarter remained untouched.

And machinery is what scales.

What I Really Meant

When I said “learn to code,” I did not mean everyone should become a software engineer. What I meant was: stop standing outside the system that drives decisions.

Learn how the product is instrumented. Understand where the data actually comes from and how metrics are defined , sometimes quietly redefined. Pay attention to how evidence moves through your organization, and just as importantly, where it gets stuck.

When you understand those mechanics, you stop depending on someone else to translate reality for you. You can pull telemetry yourself. You can question how a metric is calculated. You can prototype a quick dashboard instead of waiting two quarters for one. You can design research that connects directly to how decisions are made.

That changes your leverage. You are no longer just delivering insights. You are shaping the system that determines whether those insights actually matter.

Why It Hits Harder Now

Seven years ago, that answer probably sounded a little off. Today, it feels almost obvious.

We are building AI systems that depend on structured data. Teams are thinking more intentionally about evidence maturity, instrumentation, and how products learn over time. At the same time, we are trying to connect qualitative nuance with behavioral signals in ways that can operate in near real time.

In that environment, if you do not understand how the system works, it becomes much harder to meaningfully shape it. The researchers who will define the next phase of this field will not just be great interviewers or facilitators. They will understand how the product learns, how data flows through the system, and how insight becomes part of how decisions are actually made.

If I Were Asked Again

If someone asked me that question today, I might phrase the answer a little differently. I might talk about developing technical fluency, understanding how your work becomes operational, and learning enough about the system that you can actually influence it.

But the core idea would be the same. If you want system-level impact, you need system-level literacy. You do not need to become an engineer, but you do need to understand how things are built and how decisions move through the system.

Otherwise, you are limited to influencing one decision at a time. When you understand the system, you have the chance to influence the way decisions get made.

Seven years later, I would still say this.

And if you give an answer in an interview that costs you something, but you still believe it afterward, that is probably worth paying attention to. The roles you do not get are not always verdicts. Sometimes they are just redirections.

I proudly used AI to help shape and polish this post — not to replace my voice, but to strengthen it. Sometimes it can be a struggle to find the right words, and AI gives me the space and support to bring my real voice forward with clarity and confidence.


Seven Years Later, I’d Still Say This was originally published in UXR @ Microsoft on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read the whole story
alvinashcraft
28 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

How RockBot Learns New Skills

1 Share

RockBot Skills

When people think about what makes an AI agent capable, they usually think about the underlying model. Bigger model, smarter agent. But in practice, a large chunk of an agent’s usefulness comes from something much simpler: knowing how to do things in your specific environment.

A general-purpose LLM knows that email exists. It does not know that your organization routes all support requests through a specific label, that the MCP server you’re using has a quirk where threading works differently than expected, or that you’ve learned the hard way never to reply-all on a certain type of message. That kind of procedural, context-specific knowledge has to be built up over time — and it has to be surfaced at the right moment.

That’s the problem RockBot skills are designed to solve.

What Skills Actually Are

In RockBot, a skill is just a markdown file. It has a name, some content describing how to do something, and a short auto-generated summary. Nothing exotic.

What makes skills useful is what they represent: distilled procedural knowledge. Not general facts, but specific instructions for how to accomplish tasks in a given environment. A skill might describe how to schedule a meeting across multiple calendars, how to structure a research delegation task, or how to handle a particular edge case when using an MCP server. Skills are the difference between an agent that knows email exists and one that actually knows how to handle your email.

Skills are stored on disk, organized by category, and version-controlled alongside the rest of the agent configuration. This means they are auditable, shareable, and recoverable. If an agent learns something wrong, you can correct it directly. If you want to pre-populate an agent with knowledge about your systems before it starts learning on its own, you can do that too.

Why Skills Matter More Than You’d Expect

Most AI agent frameworks focus on tools: give the agent access to APIs, let it call them. Tools are necessary but not sufficient.

Tools tell the agent what actions are possible. Skills tell the agent how to use those actions well. And in real-world usage, the gap between those two things is enormous.

When you first give an agent access to a new MCP server — say, one that connects to your project management system — it can read the tool descriptions and probably muddle through. But it will make mistakes. It will try operations in the wrong order, misinterpret what certain fields mean, or miss subtle constraints that aren’t obvious from the schema. Over time, through interaction, it should learn. The question is whether that learning sticks.

Without something like skills, it doesn’t. Every session starts fresh. The agent makes the same mistakes it made last week, because it has no memory of having made them. Skills close that loop: when an agent learns something worth keeping, it writes a skill. The next time it needs to do something similar, it retrieves the relevant skill and starts from a better baseline.

Closed Feedback Loops

There’s a concept in control systems called a closed feedback loop: the output of a system feeds back into the system itself to correct and improve future behavior. An open loop system, by contrast, has no such correction mechanism — it just runs, regardless of how well or poorly it’s doing.

Most AI agent systems today are open loop. The agent does things. If it does them badly, you correct it in the conversation. But that correction evaporates at the end of the session. The next conversation starts from zero.

RockBot’s skill system is a mechanism for closing that loop. Feedback from users — both explicit (thumbs up or down on a response) and implicit (corrections mid-conversation) — feeds back into the agent’s skill set. The agent doesn’t just do things; it learns from doing them, in a way that persists.

This matters a lot in practice. The first time an agent handles a complex multi-step workflow, it will probably be clumsy. With a closed feedback loop, each subsequent attempt benefits from what was learned before. Without one, you’re training the same session from scratch, every time.

Pulling in the Right Skills at the Right Time

Having skills stored on disk is only useful if the agent retrieves the right ones at the right time. You can’t just dump every skill into the context window on every turn — that would be expensive, noisy, and would push out other relevant information.

RockBot uses two mechanisms to handle this.

Session-start injection. At the beginning of each session, the agent receives a structured index of all available skills: names, auto-generated one-line summaries, ages, and last-used timestamps. This is injected once per session, not on every turn. The agent now knows what skills exist without having to load all their content.

BM25 recall on each turn. When a user message arrives, RockBot runs a BM25 keyword search against the skill store to find the most relevant skills for what’s being discussed. BM25 is a well-understood retrieval algorithm — the same family of techniques behind many document search systems — that scores skills by how closely their content matches the current query.

Skills that surface through this search are injected into the context for that turn. But here’s the key detail: once a skill has been injected in a session, it won’t be injected again. This “delta injection” approach means the agent is always getting new information rather than repeatedly loading the same skills. As the conversation shifts topics, different skills surface naturally.

Skills can also cross-reference each other via seeAlso references. When one skill is retrieved, its related skills become candidates for retrieval too. This enables a kind of serendipitous discovery — the agent might not have searched for a particular skill, but because it’s related to something it did search for, it surfaces and becomes available.

The result is a system where the agent has awareness of everything it knows (via the index) and efficient access to what’s relevant right now (via BM25 recall and delta injection), without paying the token cost of loading everything upfront.

How Skills Are Created

Skills are created by the agent itself, using the SaveSkill tool. When the agent encounters a workflow it expects to repeat, or learns something specific about an environment or integration, it writes a skill.

After saving, a background task uses the LLM to generate a concise one-line summary — fifteen words maximum — describing what the skill covers and when to use it. This summary is what appears in the skill index at session start. The agent sees it and can make a quick judgment about whether to retrieve the full skill content.

The agent can also update existing skills as its understanding improves, and delete skills that are no longer accurate or relevant. Skills are living documents, not static ones.

How Skills Improve Over Time

Creating skills is the easy part. Keeping them accurate and useful over time is harder.

RockBot handles this through feedback-driven background processing.

Explicit feedback. The chat UI supports thumbs up and thumbs down on agent responses. Positive feedback reinforces the pattern — a note is appended to conversation history signaling that the approach was well-received. Negative feedback triggers something more significant: the agent re-evaluates its response with full access to its tool set, including skills, memory, and MCP integrations. It can consult existing skills, update them if they led it astray, or create new ones capturing what it should have done differently. Both types of feedback are recorded in a feedback store for later analysis.

Anti-pattern mining. The Dream Service — a background process that runs periodically when the agent is idle — scans accumulated correction feedback for failure patterns. When it finds them, it creates anti-patterns/{domain} memory entries that surface as constraints. “Don’t do X because of Y; instead do Z.” These anti-patterns are retrieved via the same BM25 mechanism as skills, so the agent sees them when it’s about to do something it has been corrected on before.

Skill consolidation. The Dream Service also performs ongoing maintenance of the skill set itself. It looks for overlapping skills and merges them, prunes stale skills that haven’t been used in a long time, detects clusters of related skills that suggest an abstract parent skill would be useful, and improves structurally sparse skills — ones that are too short to be genuinely useful. This consolidation happens automatically, without requiring explicit user action.

Usage tracking. Every time a skill is retrieved via GetSkill, its LastUsedAt timestamp is updated. This gives the Dream Service the signal it needs for staleness detection: a skill that hasn’t been used in months is a candidate for pruning, especially if its content is thin. Skills that are frequently retrieved are treated as valuable and are candidates for optimization rather than pruning.

The Effect Over Time

What you end up with is an agent that gets meaningfully better at its job as you use it. Not in a vague, hard-to-measure way, but concretely: specific workflows become more reliable, edge cases that caused problems are handled correctly, and the agent stops making the same class of mistakes it was corrected on before.

This is what it means to close the feedback loop. The agent’s behavior isn’t determined solely by the LLM’s general capabilities — it’s shaped by an accumulated layer of specific, contextual knowledge that grows and refines itself over time.

The first week with a new agent, you’re correcting a lot. A month in, you’re correcting much less. The skills system is the mechanism that makes that trajectory possible.

If you’re interested in seeing how this works in practice, the full source is at https://github.com/MarimerLLC/rockbot. The skill-related code lives in RockBot.Skills, with the agent-side handling in RockBot.Agent. It’s all open source under the MIT license.

Read the whole story
alvinashcraft
48 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Optimizing Multi-Agent AI Systems With Couchbase

1 Share

In a previous post, Building Multi-Agent AI Workflows With Couchbase Capella AI Services, we explored how collaborative AI agents can be designed and orchestrated using Capella AI Services, Vector Search, and RAG patterns.

As AI systems move from experimentation into production, the next step is not just building agents, but learning how to operate them responsibly at scale.

Running production-grade multi-agent systems means they need to be: 

  • Reliable
  • Observable
  • Predictable
  • Economically sustainable

Multi-agent systems require more than coordination logic; they require structured architectural foundations.

Agent Catalog: Establishing a Control Plane for Autonomy

In production environments, agents cannot remain implicit pieces of application logic. They must be treated as governed, versioned, auditable assets.

Capella AI enables structured Agent Catalog integration, allowing teams to define each agent in terms of:

  • Agent definition
  • Model configuration
  • Tool integration
  • Deployment configuration
  • Runtime parameters

This transforms autonomy from something opaque into something intentional.

The Agent Catalog becomes the control plane of the system. It defines deployment and capability boundaries. It clarifies ownership. It makes capabilities explicit. And it enables controlled evolution as agents change over time.

Episodic Memory: Reasoning at Scale

As agents operate, they accumulate decisions: inputs, retrieved knowledge, outputs, confidence scores, and outcomes. These events form the lived history of the system.

But episodic memory is not traditional logging.

Traditional application logic relies on identifiers and deterministic queries. Episodic reasoning, however, requires similarity-based retrieval.

For this reason, episodic memory must support similarity-based retrieval rather than simple identifier lookups. Using Capella Vector Search, each interaction can be embedded and stored as a searchable artifact. This allows agents to retrieve prior situations that are contextually similar, not just structurally related.

This enables:

  • Precedent-based reasoning
  • Consistent decision patterns
  • Improved explainability
  • Reduced behavioral randomness

In production systems, this continuity matters. Decisions are grounded in prior experience, not generated in isolation.

Episodic memory becomes part of behavioral governance.

Semantic Memory: Policy and Knowledge Grounding

If episodic memory answers “What happened before?”, semantic memory answers “What is allowed?”.

Enterprise AI systems rely on approved knowledge:

  • Corporate policies
  • Regulatory constraints
  • Product documentation
  • Compliance rules
  • Operational guidelines

Through semantic search, agents retrieve and ground their reasoning in enterprise-approved knowledge. This layer is conceptually different from episodic memory. It does not provide precedent. It provides alignment.

Semantic memory ensures that autonomous decisions remain within defined business, regulatory, and operational boundaries. It is the normative layer of the system.

Observational Memory: Turning Autonomy Into Measurable Behavior

Autonomous systems without observability are operational risks.

Observational memory captures structured behavioral telemetry across agents, including:

  • Agent-to-agent delegation
  • Tool and API usage
  • Model invocation metadata such as model version, token usage, latency, cache utilization signals, and retrieval references
  • Error rates

Observational memory transforms distributed autonomous behavior into measurable system activity. Capella AI Services provides tracing capabilities, including Agent Tracer, that make these execution paths visible and inspectable in real time. 

It allows organizations to reconstruct decisions, analyze behavior, and build confidence in systems that act independently.

Analytical Governance: From Interactions to Patterns

Individual interactions rarely reveal structural inefficiencies.

Patterns emerge when behavior is analyzed across thousands or millions of sessions.

With Capella Analytics, organizations can perform large-scale aggregations on operational telemetry without impacting transactional workloads. This enables:

  • Drift detection
  • Retrieval efficiency analysis
  • Token consumption forecasting
  • Autonomy risk scoring
  • Context-shift pattern identification

Governance operates at the level of patterns, not individual events.

At this stage, memory itself becomes subject to refinement:

  • Retrieval filters can be tightened
  • Episodic segmentation strategies can be improved
  • Low-impact interactions can be deprioritized
  • Cost-heavy patterns can be optimized

When these structural insights require systemic adjustment, they can be written back into operational configurations in a controlled manner

Memory evolves based on evidence.

Active Governance: Closing the Loop

Observation without enforcement is incomplete.

Using Capella Eventing, governance policies can respond dynamically to behavioral signals:

  • Adjusting autonomy thresholds
  • Applying memory decay strategies
  • Triggering escalation to human oversight
  • Throttling high-cost patterns
  • Limiting risk exposure

Runtime governance can also incorporate model-level safeguards such as guardrails, output filtering, and deployment-time policy constraints defined within Capella AI Services.

These mechanisms create a continuous feedback loop:

Observe → Analyze → Enforce → Adapt

Multi-agent systems do not simply act. They adapt within defined boundaries. Governance becomes dynamic rather than static.

A Real-World Scenario: Multi-Agent in Online Gaming

Consider a large-scale multiplayer strategy game with a dynamic in-game economy.

The AI system includes:

  • Session Agent that orchestrates player interactions
  • Reward Agent that calculates loot and bonuses
  • Economy Agent that monitors inflation and balance
  • Moderation Agent that detects anomalous behavior

Each agent is registered in the Agent Catalog with defined autonomy, tool access, and memory scope.

Step 1: A High-Level Raid Completion

A player completes a high-difficulty raid.

Before assigning rewards, the Reward Agent queries episodic memory. It retrieves prior sessions with similar characteristics:

  • Comparable player level
  • Similar completion time
  • Equivalent raid difficulty
  • Previously granted 15% bonus

The similarity score is high.

Rather than inventing a reward, the agent reasons from precedent.

Step 2: Policy Grounding via Semantic Memory

Before finalizing the 15% bonus, the agent retrieves economy policies:

  • Maximum reward multiplier without review is 20%
  • Inflation threshold limits
  • Anti-exploitation safeguards

The agent verifies that the proposed reward aligns with macroeconomic constraints.

Precedent does not override policy.

Step 3: Observational Capture

The full decision trace is stored as structured telemetry within Capella:

  • Similar episode ID
  • Similarity score
  • Policy documents referenced
  • Token usage
  • Latency
  • Final reward decision
  • Raid map identifier
  • Player progression tier
  • Current global currency index

This structured persistence ensures that decisions can be reconstructed, segmented, and analyzed across millions of sessions. It also provides the contextual metadata necessary for later optimization, segmentation, and structural adjustments.

Autonomy becomes auditable and optimizable.

Step 4: Analytical Governance

After millions of matches, Capella Analytics reveals:

  • Certain raid maps generate 23% higher currency output
  • Context shifts from gameplay to trading correlate with token spikes
  • Specific reward patterns cluster around exploit-prone scenarios

These insights are not visible at the level of a single session. They emerge through aggregated analysis.

Memory segmentation strategies are refined. Retrieval precision improves. Reward for specific raid maps can be recalibrated through controlled writeback. Inflation stabilizes.

Step 5: Adaptive Enforcement

If the in-game economy crosses predefined inflation thresholds:

  • Reward multipliers are automatically adjusted
  • Reward Agent autonomy is temporarily reduced
  • Manual review is triggered for extreme cases

These safeguards are enforced in real time through event-driven logic.

The system adapts to protect long-term balance while continuing to learn from accumulated evidence.

From Building Agents to Operating Intelligent Systems

Multi-agent architectures introduce new layers of complexity. Episodic reasoning, semantic grounding, behavioral telemetry, analytical insight, and adaptive enforcement are not optional enhancements. They are essential architectural components in production AI systems.

Each of these layers requires different technical capabilities and performance characteristics.

When treated as separate systems, complexity increases and operational efficiency becomes harder to maintain.

Cost-efficiency and execution stability are not achieved through isolated optimizations. They emerge from consolidation. Repeated reasoning patterns can be handled efficiently. Retrieval remains consistent at scale. Analytical workloads remain isolated from transactional flows.

As AI systems mature, the ability to support diverse reasoning patterns and workload characteristics within the same platform becomes essential.

Capella accelerates innovation within a unified operational data platform for AI. Organizations reduce architectural sprawl, minimize synchronization complexity, and maintain predictable performance characteristics. No more plugging holes. Entire stacks are replaced with a single AI-ready engine built for speed and flexibility.

Capella is already designed to meet these demands, enabling organizations to extend existing architectures into AI-driven systems without introducing unnecessary fragmentation.

The post Optimizing Multi-Agent AI Systems With Couchbase appeared first on The Couchbase Blog.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

Call For Papers Listings for 3/6

1 Share

A collection of upcoming CFPs (call for papers) from across the internet and around the world.

The post Call For Papers Listings for 3/6 appeared first on Leon Adato.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

1.0.2

1 Share

2026-03-06

To commemorate GitHub Copilot CLI reaching general availability last week, we're incrementing the major version to 1.0!

  • Type 'exit' as a bare command to close the CLI
  • Ask_user form now submits with Enter key and allows custom responses in enum fields
  • Support 'command' field as cross-platform alias for bash/powershell in hook configs
  • Hook configurations now accept timeout as alias for timeoutSec
  • Fix handling of meta with control keys (including shift+enter from /terminal-setup)
Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

What’s new in Microsoft Foundry | February 2026

1 Share

TL;DR

  • Claude Opus 4.6 + Sonnet 4.6: Anthropic’s frontier models arrive in Foundry with 1M-token context (beta), adaptive thinking, and context compaction — Opus for deep reasoning, Sonnet for cost-efficient scale.
  • GPT-Realtime-1.5 & GPT-Audio-1.5: Next-gen audio models with +7% instruction following, better multilingual support, and +10% alphanumeric transcription accuracy.
  • Grok 4.0 (GA) + Grok 4.1 Fast (Preview): xAI’s reasoning model graduates to GA; the new Fast variant lands at $0.20/M input tokens for high-throughput non-reasoning workloads.
  • FLUX.2 Flex: Text-heavy image generation purpose-built for UI prototyping and typography at $0.05/megapixel.
  • Microsoft Agent Framework (RC): 1.0.0rc1 for Python — API surface locked. Major breaking changes in credentials, sessions, and response patterns. Migration guide published.
  • Durable Agent Orchestration: New HITL pattern pairs Azure Durable Functions with Agent Framework and SignalR for agents that survive restarts and wait days for human approval.
  • Foundry Local — Sovereign Cloud: Large multimodal models now run fully disconnected on local hardware with APIs that mirror the cloud surface.
  • AI Toolkit for VS Code v0.30.0: Tool Catalog, Agent Inspector with F5 debugging, and a redesigned Agent Builder.
  • REST API v1 (GA): The core Foundry REST surface is now production-ready. SDKs across all languages are building on it in pre-release — GA announcements imminent.
  • SDK releases across all languages: Python (2.0.0b4), .NET (2.0.0-beta.1), JS/TS (2.0.0-beta.4), and Java (2.0.0-beta.1) all shipped new betas targeting the GA v1 REST surface with significant breaking changes — tool class renames, credential updates, and preview feature opt-in flags.

Join the community

Connect with 25,000+ developers on Discord, ask questions in GitHub Discussions, or subscribe via RSS to get this digest monthly.


Models

Claude Opus 4.6

Claude Opus 4.6 — Anthropic’s most capable reasoning model — is now available in Microsoft Foundry as a first-party deployment. If you need a model that can hold an entire codebase in context and reason over it end-to-end, this is it.

  • 1 million token context window (beta) — roughly 2,000 pages of documentation or an entire medium-sized repository in a single prompt
  • 128K max output tokens — complete code refactors, full-length analyses, and comprehensive documentation in one response
  • Adaptive thinking — the model dynamically decides how much reasoning a task needs; you control the floor with four effort levels (low, medium, high, max)
  • Context compaction (beta) — in long-running agentic sessions, older context is automatically summarized so agents don’t hit the wall mid-workflow
  • Pricing: $5 / $25 per million tokens (standard); premium tier applies beyond 200K input tokens ($10 / $37.50)

Available via serverless and managed compute deployments across all Azure regions.

Claude Sonnet 4.6

One week after Opus, Claude Sonnet 4.6 landed in Microsoft Foundry — nearly the same intelligence tier at a substantially lower price point. If Opus is the model you reach for when accuracy is everything, Sonnet is the one you deploy when you need that quality at scale.

  • Same 1M-token context (beta) and 128K output as Opus 4.6
  • Same adaptive thinking and context compaction capabilities
  • Optimized for coding, agentic workflows, and professional content at high throughput
  • Designed for teams prioritizing cost-per-token without sacrificing frontier performance

GPT-Realtime-1.5 & GPT-Audio-1.5

Two new audio models shipped on February 23, targeting the real-time voice and audio processing stack. If you’re building voice agents, IVR deflection, or live transcription pipelines, these are meaningful upgrades.

Model Deployment ID Key Improvements
GPT-Realtime-1.5 gpt-realtime-1.5-2026-02-23 +7% instruction following, improved multilingual support, better tool calling, +5% reasoning (Big Bench Audio)
GPT-Audio-1.5 gpt-audio-1.5-2026-02-23 +10.23% alphanumeric transcription accuracy, improved multilingual support

Both maintain low-latency real-time interactions via chat completion APIs. Drop-in replacements for their predecessors — no API surface changes required.

Grok 4.0 (GA) & Grok 4.1 Fast (Preview)

Grok 4.0 from xAI graduated to general availability on February 27 — the first xAI model to reach GA in Microsoft Foundry (it entered preview back in September 2025). Alongside it, Grok 4.1 Fast arrived in public preview as a high-throughput, non-reasoning variant.

Model Status Pricing (per M tokens) Best For
Grok 4.0 GA $5.50 input / $27.50 output Complex reasoning, multi-step analysis
Grok 4.1 Fast (non-reasoning) Preview $0.20 input / $0.50 output High-throughput classification, extraction, routing

The Grok 4.1 reasoning variant is coming soon. Both are available via serverless or provisioned throughput deployments.

FLUX.2 Flex

FLUX.2 Flex from Black Forest Labs is purpose-built for text-heavy design work — UI prototyping, infographics, typography, and marketing assets where text rendering fidelity matters. It complements the FLUX.2 [pro] model (previewed in December) with a focus on getting text right in generated images.

  • Adjustable inference steps for speed/quality tradeoff
  • Pricing: $0.05 per megapixel (Global Standard)
  • Strong multi-prompt adherence for complex, text-rich compositions

Model Router — GPT-5 Series Support

The Model Router now supports GPT-5 series models. Deploy the router as a single endpoint and it automatically selects the best underlying chat model based on your prompt characteristics — choose Balanced, Cost, or Quality mode to control the optimization axis. No application-level routing logic required.


Agents

Microsoft Agent Framework Reaches Release Candidate

The Microsoft Agent Framework hit 1.0.0rc1 for Python on February 19 — the API surface is locked, and GA is around the corner. This is the biggest developer-facing milestone of the month. If you’ve been tracking the beta releases, now is the time to migrate.

What’s new in the RC:

  • AgentFunctionApp — host agents on Azure Functions with automatic endpoints for workflow runs, status checks, and HITL responses
  • Fan-out/fan-in orchestration with shared state and configurable timeouts
  • BaseAgent implementations for Claude and GitHub Copilot SDKs — use Anthropic or GitHub models as first-class agent providers
  • Simplified AG-UI run method and Anthropic structured outputs via response_format

Breaking changes — this RC ships significant renames and API surface changes. Here’s what you need to update:

Area Before After
Credentials ad_token, ad_token_provider, get_entra_auth_token Single credential parameter (Azure Identity)
Sessions AgentThread, get_new_thread() AgentSession, create_session() / get_session()
Context Multiple context_providers Single context_provider per Agent; middleware as list
Responses text= on ChatResponse/AgentResponse messages=; updates use contents=[Content.from_text(...)]
Exceptions ServiceException, ServiceResponseException AgentException, AgentInvalidResponseException
Factory create_agent as_agent

Action: Pin agent-framework-core==1.0.0rc1 and agent-framework-azure-ai==1.0.0rc1. Follow the migration guide to update your credential handling and session management code before GA lands.

Durable Agent Orchestration — Human-in-the-Loop

A new architectural pattern dropped on February 26: Durable Agent Orchestration — pairing Azure Durable Functions with the Microsoft Agent Framework and SignalR to build agents that can pause indefinitely for human approval.

The core idea: your agent does the heavy lifting (analyze logs, draft a remediation plan, prepare infrastructure changes), then calls wait_for_external_event and halts until a human approves. The durable orchestration survives process restarts, can wait for days, and picks up exactly where it left off.

  • One-line pause: wait_for_external_event in your orchestrator function
  • Real-time UX via SignalR streaming — humans see results as they’re generated
  • Use cases: incident response, infrastructure provisioning, document review workflows

Platform

Foundry Local — Large Model Support for Sovereign Cloud

Foundry Local now supports large multimodal AI models in fully disconnected, sovereign environments. Announced February 24 as part of Microsoft’s broader Sovereign Cloud expansion, this is a significant capability jump — previously limited to smaller language models, Foundry Local can now run advanced multimodal models (text, image, audio) on local NVIDIA GPU hardware with zero cloud connectivity required.

  • APIs mirror the cloud surface: Responses API, function calling, agent services — same code, different runtime
  • Part of a unified sovereign stack alongside Azure Local (infrastructure) and Microsoft 365 Local (productivity)
  • Targets government, defense, finance, healthcare, and telecom organizations with strict data sovereignty requirements

AI Toolkit for VS Code v0.30.0

The AI Toolkit for VS Code shipped a major update — v0.30.0 — focused on making agent development more discoverable and debuggable:

  • Tool Catalog: A centralized hub to discover, configure, and integrate tools into your agents — no more hunting through docs to find what’s available
  • Agent Inspector: End-to-end debugging with F5 breakpoints, variable inspection, step-through execution, and real-time streaming visualization
  • Agent Builder redesign: Quick switcher between agents, Foundry prompt agent support, Tool Catalog integration, and an “Inspire Me” feature for generating agent instructions
  • Model Catalog: OpenAI Response API models (including gpt-5.2-codex) now appear in the catalog with improved reliability
  • Build Agent with GitHub Copilot: Generate entire agent workflows from natural language prompts

REST API v1 (GA)

The Foundry REST API v1 is now generally available. The core endpoints that everything else builds on — chat completions, responses, embeddings, files, fine-tuning, models, and vector stores — are production-ready and carry GA SLAs.

GA endpoints:

Endpoint Status
/openai/v1/chat/completions GA
/openai/v1/responses GA
/openai/v1/embeddings GA
/openai/v1/files GA
/openai/v1/fine_tuning/ GA
/openai/v1/models GA
/openai/v1/vector_stores GA

Still in preview: /openai/v1/evals and fine-tuning alpha graders.

Why this matters: every Foundry SDK across Python, .NET, JS/TS, and Java is building its pre-release versions on top of this GA REST surface. The REST API going GA is the prerequisite for the SDK GA announcements that are now imminent. If you need production stability today and can’t wait for the SDK GA, you can target the v1 REST API directly — the contract is locked.


SDK & Language Changelog (February 2026)

The big picture: the Foundry REST API v1 went GA this month (see above). Every language SDK is now building its pre-release on that stable REST surface. The SDKs are converging on a unified azure-ai-projects package per language — agents, inference, evaluations, and memory all live under one roof. GA SDK announcements are imminent; here’s where each language stands today.

REST

The v1 REST surface is the foundation everything else builds on. Core endpoints (/openai/v1/chat/completions, /openai/v1/responses, /openai/v1/embeddings, /openai/v1/files, /openai/v1/fine_tuning/, /openai/v1/models, /openai/v1/vector_stores) are GA. Preview endpoints (/openai/v1/evals, fine-tuning alpha graders) remain in preview.

Action: If you need production SLAs today and your SDK is still pre-release, target the v1 REST API directly. The contract is locked.

API Lifecycle · REST Reference

Python

Foundry SDK (azure-ai-projects 2.0.0b4, Feb 24)

The fourth beta in the v2 consolidation line — and the first release explicitly targeting the GA v1 REST APIs. Python SDK GA is next.

Features:

  • Unified AIProjectClient for agents, evaluations, datasets, indexes, and memory stores
  • Bundles openai and azure-identity as direct dependencies
  • Tracing improvements align with OpenTelemetry gen_ai.* conventions

Breaking changes:

Tool class renames to align with OpenAI naming conventions (same pattern as JS/TS):

# Before
from azure.ai.projects import AzureAISearchAgentTool, OpenApiAgentTool, BingGroundingAgentTool

# After — GA tools drop the "Agent" infix
from azure.ai.projects import AzureAISearchTool, OpenApiTool, BingGroundingTool

# Before
from azure.ai.projects import MicrosoftFabricAgentTool, SharepointAgentTool, A2ATool

# After — Preview tools adopt a "PreviewTool" suffix
from azure.ai.projects import MicrosoftFabricPreviewTool, SharepointPreviewTool, A2APreviewTool

Agent creation API changed — create() and update() removed in favor of versioned methods:

# Before
agent = project_client.agents.create(model="gpt-5", name="my-agent", instructions="...")
project_client.agents.update(agent.id, name="updated-agent")

# After — use create_version() for all agent creation
agent = project_client.agents.create_version(model="gpt-5", name="my-agent", instructions="...")

Preview operations moved to .beta subclient:

# Before
project_client.memory_stores.create(name="my-store")
project_client.evaluators.list_latest_versions()

# After — preview operations live under .beta
project_client.beta.memory_stores.create(name="my-store")
project_client.beta.evaluators.list()  # also renamed from list_latest_versions()

Preview features now require explicit opt-in via foundry_features:

# Workflow Agents (preview) — requires opt-in flag
from azure.ai.projects import FoundryFeaturesOptInKeys

project_client.agents.create_version(
    model="gpt-5",
    name="my-workflow-agent",
    foundry_features=FoundryFeaturesOptInKeys.WORKFLOW_AGENTS_V1_PREVIEW,
)

Tracing provider and attribute renames:

# Before
# Provider name: "azure.ai.agents"
# Span names: "responses {model_name}" / "responses {agent_name}"
# Event: gen_ai.system.instructions

# After
# Provider name: "microsoft.foundry"
# Span names: "chat {model_name}" / "invoke_agent {agent_name}"
# Attribute: gen_ai.system_instructions (not an event)

Action: Upgrade to azure-ai-projects==2.0.0b4. This is the last beta before the GA announcement — get your code on it now.

Changelog

.NET

Foundry SDK (Azure.AI.Projects 2.0.0-beta.1, Feb 24)

The .NET SDK joins the v2 consolidation with its first beta targeting the GA v1 REST surface, now with full net10 framework compatibility.

Features:

  • Full net10 framework support — <EnablePreviewFeatures> flagging removed
  • Added Evaluation sample

Breaking changes:

ImageBasedHostedAgentDefinition has been merged into HostedAgentDefinition:

// Before
var agentDef = new ImageBasedHostedAgentDefinition(model, image);

// After — Image is now an optional property on HostedAgentDefinition
var agentDef = new HostedAgentDefinition(model) { Image = image };

Tracing provider name and event format updated:

// Before
// Tracing event: gen_ai.system.instructions
// Provider name: "azure.ai.agents"

// After
// Tracing attribute: gen_ai.system_instructions
// Provider name: "microsoft.foundry"

Known issues: Computer use tool, fine tuning, red teams, and evaluation operations don’t yet support the latest API version — pin to the previous library version for those features until v1 operations are available.

Action: Upgrade to Azure.AI.Projects 2.0.0-beta.1. This is the first .NET beta on the v2 consolidation line — watch for the GA announcement.

Changelog

JavaScript / TypeScript

Foundry SDK (@azure/ai-projects 2.0.0-beta.4)

This release aligns all Azure tool class names with OpenAI naming conventions — a breaking change that will carry through to GA.

Breaking — GA tools drop the Agent infix:

// Before
import { AzureAISearchAgentTool, OpenApiAgentTool, AzureFunctionAgentTool, BingGroundingAgentTool } from "@azure/ai-projects";

// After
import { AzureAISearchTool, OpenApiTool, AzureFunctionTool, BingGroundingTool } from "@azure/ai-projects";

Breaking — Preview tools adopt a PreviewTool suffix:

// Before
import { MicrosoftFabricAgentTool, SharepointAgentTool, BingCustomSearchAgentTool, BrowserAutomationAgentTool, A2ATool } from "@azure/ai-projects";

// After
import { MicrosoftFabricPreviewTool, SharepointPreviewTool, BingCustomSearchPreviewTool, BrowserAutomationPreviewTool, A2APreviewTool } from "@azure/ai-projects";
  • ResponsesUserMessageItemParam removed as a valid ItemUnion member

Action: Upgrade to @azure/ai-projects@2.0.0-beta.4 and update all tool class references. These renames are final.

Changelog

Java

Foundry SDK (azure-ai-projects 2.0.0-beta.1, Feb 25)

The Java SDK joins the v2 consolidation with its first beta targeting the GA v1 REST surface.

Features:

  • buildOpenAIClient() and buildOpenAIAsyncClient() on AIProjectClientBuilder for directly obtaining a Stainless OpenAI client
  • New FoundryFeaturesOptInKeys enum for preview feature opt-in flags: EVALUATIONS_V1_PREVIEW, SCHEDULES_V1_PREVIEW, RED_TEAMS_V1_PREVIEW, INSIGHTS_V1_PREVIEW, MEMORY_STORES_V1_PREVIEW
  • Added ModelSamplingParams and AzureAIModelTarget models

Breaking changes:

Service version updated from 2025-11-15-preview to v1. Credential classes and sub-client methods are renamed — here’s what your code looks like before and after:

Before (pre-v2):

// Credentials used plural names
ApiKeyCredentials creds = new ApiKeyCredentials("your-key");
EntraIdCredentials entraId = new EntraIdCredentials(tokenCredential);
AgenticIdentityCredentials agenticId = new AgenticIdentityCredentials();

// Sub-client methods were generic
DeploymentsClient deployments = client.getDeploymentsClient();
deployments.get("my-deployment");
deployments.delete("my-deployment");

SchedulesClient schedules = client.getSchedulesClient();
schedules.delete("my-schedule");

IndexesClient indexes = client.getIndexesClient();
indexes.createOrUpdate("my-index", indexSpec);

EvaluationsClient evals = client.getEvaluationsClient();
OpenAIClient openai = evals.getOpenAIClient();

After (2.0.0-beta.1):

// Credentials drop the plural suffix
ApiKeyCredential creds = new ApiKeyCredential("your-key");
EntraIdCredential entraId = new EntraIdCredential(tokenCredential);
AgenticIdentityPreviewCredentials agenticId = new AgenticIdentityPreviewCredentials();

// Sub-client methods now include the resource name
DeploymentsClient deployments = client.getDeploymentsClient();
deployments.getDeployment("my-deployment");
deployments.deleteDeployment("my-deployment");

SchedulesClient schedules = client.getSchedulesClient();
schedules.deleteSchedule("my-schedule");

IndexesClient indexes = client.getIndexesClient();
indexes.createOrUpdateVersion("my-index", indexSpec);

EvaluationsClient evals = client.getEvaluationsClient();
EvalService evalService = evals.getEvalService();

This rename pattern is pervasive across all sub-clients — see the full changelog for the complete list.

Action: First Java beta on the GA REST surface. Review the breaking changes thoroughly before adopting.

Changelog

Agent Framework

Agent Framework (agent-framework-core 1.0.0rc1, Feb 19) Agent Framework (agent-framework-azure-ai 1.0.0rc1, Feb 19)

Release Candidate — the API surface is locked. See the Agents section above for the full breakdown.

Features:

  • AgentFunctionApp for hosting agents on Azure Functions
  • BaseAgent implementations for Claude and GitHub Copilot SDKs
  • Fan-out/fan-in orchestration with shared state and configurable timeouts
  • Simplified AG-UI run method and Anthropic structured outputs

Breaking — Credential handling unified under Azure Identity:

# Before
from agent_framework_azure_ai import get_entra_auth_token

agent = AzureAIAgent(
    ad_token=get_entra_auth_token(),
    ad_token_provider=my_token_provider,
)

# After — single credential parameter
from azure.identity import DefaultAzureCredential

agent = AzureAIAgent(
    credential=DefaultAzureCredential(),
)

Breaking — Sessions replace Threads:

# Before
thread = agent.get_new_thread()
result = await agent.run("Hello", thread=thread)

# After
session = await agent.create_session()
result = await agent.run("Hello", session=session)

Breaking — Response access pattern changed:

# Before
response = await agent.run("Summarize this")
print(response.text)

# After — use messages list
response = await agent.run("Summarize this")
print(response.messages[-1].content)

# Before — updating responses
update = AgentResponse(text="Updated content")

# After
from agent_framework_core import Content
update = AgentResponse(messages=[], contents=[Content.from_text("Updated content")])

Breaking — Exceptions and factory renames:

# Before
from agent_framework_core import ServiceException, ServiceResponseException
agent = create_agent(config)

# After
from agent_framework_core import AgentException, AgentInvalidResponseException
agent = as_agent(config)

Action: Pin to 1.0.0rc1. Follow the migration guide before GA.

GitHub Releases


Documentation updates

February was a landmark month for Foundry documentation. We shipped 100+ new articles across agents, fine-tuning, safety, and platform setup. Here are the highlights.

New articles

Article
Model Context Protocol (MCP) – Connect agents to external tools via MCP
Build your own MCP server – Create custom MCP servers for agent tool integration
Hosted agents – Deploy and manage cloud-hosted managed agents
Agent-to-agent communication – Coordinate multiple agents in collaborative workflows
Computer use – Enable agents to interact with desktop applications
Foundry IQ – AI-powered assistant for navigating the Foundry platform
Agent development lifecycle – End-to-end guide from build to deploy for agents
Publish agents to Copilot – Publish Foundry agents to Microsoft 365 Copilot
Playgrounds – Interactive environments for testing models and agents
Fine-tuning vision models – Customize GPT-4 Vision for your domain data
Realtime Audio API – Build voice apps with streaming audio over WebSockets
Deep research – Use deep research mode for multi-step reasoning queries

Updated articles

Article
Guardrails overview – Comprehensive rewrite of safety policy configuration
Content filter prompt shields – New severity levels and prompt injection defenses
Configure private link – Revised private networking and managed VNet guidance
Model retirement and lifecycle – Updated dates for GPT-4o, GPT-5, and Cohere models
Quotas and limits – Expanded with expected outputs and usage examples

Stay Connected

March is shaping up to be a big one — SDK GA announcements are on the horizon, and the Foundry SDK will be the single package you need across agents, inference, evaluations, and memory. Get ahead of it now by upgrading to the latest pre-release and targeting the v1 REST surface.

The post What’s new in Microsoft Foundry | February 2026 appeared first on Microsoft Foundry Blog.

Read the whole story
alvinashcraft
2 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories