Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
154267 stories
·
33 followers

Episode 64: From Reaction to Creation: Understanding How We Can Create Our Desired Reality

1 Share

Why do we react the way we do—and what would it look like to choose differently?
In this episode, we’re joined by Shoshi Kalderon, coach and change practitioner, who introduces the Conscious-Mind Leadership framework developed by Dr. Mati Har-Lev. Together, we explore a powerful way of understanding how we respond to challenges, transitions, and everyday moments in our lives and careers.
Drawing on Dr. Har-Lev’s work, Shoshi walks us through the idea that each of us operates from an internal “operating system”—a set of automatic patterns that shape how we think, feel, and respond, often without us even realizing it.
Description: Why do we react the way we do—and what would it look like to choose differently?
In this episode, we’re joined by Shoshi Kalderon, coach and change practitioner, who introduces the Conscious-Mind Leadership framework developed by Dr. Mati Har-Lev. Together, we explore a powerful way of understanding how we respond to challenges, transitions, and everyday moments in our lives and careers.
Drawing on Dr. Har-Lev’s work, Shoshi walks us through the idea that each of us operates from an internal “operating system”—a set of automatic patterns that shape how we think, feel, and respond, often without us even realizing it.
We unpack what it means to move from autopilot reactions to more conscious choices, and how expanding our awareness can fundamentally shift the way we experience both work and life.





Download audio: https://anchor.fm/s/44cc6cdc/podcast/play/120002315/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2026-4-14%2Fecfb1784-8785-a99a-dbfa-55d20b67cdcd.mp3
Read the whole story
alvinashcraft
27 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

CI/CD for AI Agents on Microsoft Foundry

1 Share

Introduction

Building an AI agent is the straightforward part. Shipping it reliably to production with version control, evaluation-driven quality gates, multi-environment promotion, and enterprise governance is where most teams run into friction.

Microsoft Foundry changes this. It is Microsoft's AI app and agent factory: a fully managed platform for building, deploying, and governing AI agents at scale. It provides a first-class agent runtime with built-in lifecycle management, making it possible to apply the same CI/CD rigour you already use for application software to AI agents — regardless of whether you are building containerised hosted agents or declarative prompt-based agents.

This post walks through a complete, production-ready reference architecture for doing exactly that. You will find the GitHub Actions workflow, the Azure DevOps pipeline YAML, and the architecture diagram linked throughout.

Reference implementation repository: foundry-agents-lifecycle
and CI/CD for AI Agents on Microsoft Foundry


Why Agent CI/CD Is Different

Traditional software pipelines gate releases on test pass/fail. Agent pipelines require an additional, critical layer: evaluation-driven quality gates. Before any agent version can be promoted to the next environment, it must pass three categories of evaluation:

  • Quality — answer correctness, task completion rate, hallucination rate
  • Safety — grounded responses, policy compliance, tool usage validation
  • Performance — token usage per query, p95 response latency

A second key difference is the deployment unit. You are not deploying a binary or a container tag in isolation. You are deploying an agent version — an immutable artefact that bundles the model selection, system instructions, tool definitions, and configuration together. This is what enables deterministic promotion and full auditability across environments.

"Agents follow a standard CI/CD pattern, but with a critical shift: promotion happens at the agent version level, and release gates are driven by evaluation outcomes, not just test results."


Reference Architecture

Figure 1: End-to-end CI/CD reference architecture for hosted and prompt-based agents on Microsoft Foundry.

The architecture has five logical layers, flowing from developer commit to production monitoring:

Layer 1 — Developer Layer

The developer layer is a standard source-controlled repository in GitHub or Azure DevOps. It contains:

  • Agent code written in Python or .NET
  • agent.yaml or prompt definition files for prompt-based agents
  • Tool configurations: MCP servers, REST API connectors, or other integrations
  • Infrastructure as Code: Bicep or ARM templates for provisioning the Foundry project and dependencies

Layer 2 — CI Pipeline (Build · Validate · Evaluate)

Every push or pull request triggers the CI pipeline. It performs five steps:

  1. Docker build — for hosted agents, build and tag the container image
  2. Static checks — lint with ruff, security scan with bandit, agent YAML schema validation
  3. Unit and tool testspytest suites covering agent logic and tool integrations
  4. Evaluation gate — run evaluation datasets; fail the pipeline if thresholds are breached
  5. Image push — push the validated container to Azure Container Registry (ACR)

Prompt-based agents skip the Docker build step. Instead, the YAML definition and prompt bundle are validated against schema and evaluated against golden datasets.

Layer 3 — CD Pipeline (Multi-stage Promotion)

A single agent version is promoted through three Foundry project environments:

StageEnvironmentActivitiesGate
Stage 1Dev Foundry ProjectDeploy vNext version, smoke tests, developer evalsEval quality thresholds
Stage 2Test / QA Foundry ProjectScenario tests, HITL validation, safety evaluationEval gates + human approval
Stage 3Production Foundry ProjectPromote version, enable endpoint, post-deploy smoke testRequired reviewer approval

Rollback is straightforward: switch the active version pointer back to the previous agent version. No re-deployment is needed.

Layer 4 — Microsoft Foundry Agent Service

The Foundry Agent Service runtime provides:

  • Hosted agent runtime — managed container execution supporting Agent Framework, LangGraph, Semantic Kernel, or custom code
  • Prompt-based agent runtime — declarative agent definitions, no container required
  • Built-in lifecycle operations — version, start, stop, rollback
  • Entra Agent Identity — each deployed version receives a dedicated Microsoft Entra managed identity
  • RBAC and policy enforcement — Azure role-based access controls per project
  • Observability — distributed traces, structured logs, and evaluation signals

Layer 5 — Monitoring, Governance, and Control Plane

  • Foundry control plane: agent registry, environment configuration, version history
  • OpenTelemetry forwarded to Azure Monitor and Application Insights
  • Continuous evaluation pipelines for ongoing quality, grounding, and safety monitoring
  • Azure Policy and RBAC enforcement at the platform level

Environment Topology

There are two topology options. We recommend Option A for all production workloads:

OptionStructureBest forTrade-off
A — RecommendedDev Project → Test Project → Prod Project (separate Foundry projects)Enterprise workloadsFull isolation, clean RBAC boundaries, easier governance
B — LightweightSingle Foundry project with agent version tags (dev/test/prod)Small teams, prototypingSimpler setup, but weaker environment separation

Separate projects mean separate RBAC policies, separate connection strings, and separate evaluation signals. A developer service principal has access only to the Dev project; the CI/CD identity has restricted access to promote to Test and Production.


Evaluation Gates — The Core Difference

Evaluation gates transform a standard software pipeline into an AI-safe deployment pipeline. They run at two points: pre-merge (CI) and pre-promotion (CD).

Defining the Gates

CategoryMetricCI thresholdProd threshold
QualityHallucination rate< 5%< 3%
QualityTask completion rate> 90%> 95%
SafetyGrounded response rate> 95%> 98%
SafetyPolicy violations00
Performancep95 latency< 4 000 ms< 3 000 ms
CostToken usage per queryTrack onlyAlert on > 20% regression

Gate Enforcement (Python)

import json
import sys

def check_gates(results_path: str) -> None:
    with open(results_path) as f:
        results = json.load(f)

    failures = []

    if results["hallucination_rate"] > 0.05:
        failures.append(f"Hallucination rate {results['hallucination_rate']:.1%} exceeds 5% threshold")

    if results["task_completion_rate"] < 0.90:
        failures.append(f"Task completion {results['task_completion_rate']:.1%} below 90% threshold")

    if results["latency_p95_ms"] > 4000:
        failures.append(f"p95 latency {results['latency_p95_ms']}ms exceeds 4000ms threshold")

    if results.get("policy_violations", 0) > 0:
        failures.append(f"Policy violations detected: {results['policy_violations']}")

    if failures:
        for f in failures:
            print(f"GATE FAILED: {f}", file=sys.stderr)
        sys.exit(1)

    print("All evaluation gates passed — proceeding to deployment")

if __name__ == "__main__":
    check_gates(sys.argv[1])

Hosted vs Prompt-Based Agents — Pipeline Differences

CapabilityHosted AgentsPrompt-Based Agents
Deployment unitContainer image + agent definitionYAML / prompt configuration bundle
Build step requiredYes — Docker build + ACR pushNo — YAML validation only
Supported frameworksAgent Framework, LangGraph, Semantic Kernel, customFoundry declarative runtime
Promotion artefactVersioned agent with container image referenceVersioned prompt/config bundle
CI focusCode quality, tool tests, evaluationPrompt schema validation, evaluation
Rollback mechanismSwitch active agent versionSwitch active agent version
Runtime managementFoundry manages container lifecycleFoundry manages declarative runtime

CI Pipeline Walkthrough

The following steps are representative of the full GitHub Actions workflow available in github-actions-pipeline.yml alongside this post.

Hosted Agent CI

# 1. Static checks
ruff check .
bandit -r src/ -ll
python scripts/validate_agent_config.py --config agent.yaml

# 2. Tests
pytest tests/unit/ -v --tb=short
pytest tests/tools/ -v --tb=short

# 3. Evaluation gate
python scripts/run_evaluations.py \
    --dataset eval/datasets/golden_set.jsonl \
    --output  eval/results/results.json

python scripts/check_eval_gates.py \
    --results eval/results/results.json \
    --max-hallucination   0.05 \
    --min-task-completion 0.90 \
    --max-latency-p95     4000

# 4. Push container image
az acr build \
    --registry myregistry.azurecr.io \
    --image    "myagent:$SHA" \
    --file     Dockerfile .

Prompt-Based Agent CI

# Validate YAML / prompt definitions
python scripts/validate_agent_config.py --config agent.yaml

# Evaluation against golden dataset
python scripts/run_evaluations.py \
    --dataset eval/datasets/golden_set.jsonl \
    --output  eval/results/results.json

python scripts/check_eval_gates.py \
    --results eval/results/results.json

CD Pipeline Walkthrough

Stage 1 — Dev Deployment

python scripts/deploy_agent.py \
    --env              dev \
    --image            "myregistry.azurecr.io/myagent:$SHA" \
    --foundry-endpoint $FOUNDRY_ENDPOINT_DEV \
    --agent-config     agent.yaml

# Returns the new agent version ID, stored for promotion
AGENT_VERSION=$(python scripts/get_active_version.py --env dev)

Stage 2 — Promote to Test (after approval gate)

python scripts/promote_agent.py \
    --from-env         dev \
    --to-env           test \
    --agent-version    $AGENT_VERSION \
    --foundry-endpoint $FOUNDRY_ENDPOINT_TEST

# Run scenario tests and safety evaluation
python scripts/run_evaluations.py \
    --dataset  eval/datasets/scenario_set.jsonl \
    --output   eval/results/test-results.json

python scripts/check_eval_gates.py \
    --results              eval/results/test-results.json \
    --max-hallucination    0.03 \
    --min-task-completion  0.95

Stage 3 — Promote to Production (after required reviewer approval)

python scripts/promote_agent.py \
    --from-env         test \
    --to-env           prod \
    --agent-version    $AGENT_VERSION \
    --foundry-endpoint $FOUNDRY_ENDPOINT_PROD

# Enable the production endpoint
python scripts/enable_agent_endpoint.py \
    --agent-version    $AGENT_VERSION \
    --foundry-endpoint $FOUNDRY_ENDPOINT_PROD

Rollback

# Switch the active version to the previous known-good version
python scripts/promote_agent.py \
    --from-env         prod \
    --to-env           prod \
    --agent-version    $PREVIOUS_AGENT_VERSION \
    --foundry-endpoint $FOUNDRY_ENDPOINT_PROD

# OR delete the failing version
python scripts/delete_agent_version.py \
    --agent-version    $AGENT_VERSION \
    --foundry-endpoint $FOUNDRY_ENDPOINT_PROD

Deployment Using the Azure AI Projects SDK

The azure-ai-projects SDK provides programmatic control over the full agent lifecycle. This is the recommended approach for CI/CD scripts where you need deterministic, scriptable deployment.

from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient

# Connect to the Foundry project
client = AIProjectClient(
    endpoint=FOUNDRY_PROJECT_ENDPOINT,
    credential=DefaultAzureCredential()
)

# List existing agents (useful for idempotent deploy scripts)
for agent in client.agents.list():
    print(f"Agent: {agent.name}  version: {agent.id}")

# Create a new agent version (hosted agent)
agent = client.agents.create_agent(
    model="gpt-4o",
    name="my-enterprise-agent",
    instructions="You are a helpful assistant ...",
    tools=[...],  # tool definitions
    metadata={"version": GIT_SHA, "environment": "dev"}
)
print(f"Created agent version: {agent.id}")

For hosted agents, the SDK call also references the container image pushed to ACR. Refer to the Deploy a hosted agent — Microsoft Foundry documentation for the full SDK flow including container image registration and version polling.


Reference Implementation Stack

ConcernTechnology
Source control and pipelinesGitHub Actions or Azure DevOps Pipelines
Infrastructure and agent deploymentAzure Developer CLI (azd up)
Programmatic agent lifecycleazure-ai-projects Python SDK
Agent evaluationazure-ai-evaluation Python SDK
Agent runtimeMicrosoft Foundry Agent Service
Container registryAzure Container Registry (hosted agents only)
ObservabilityOpenTelemetry, Azure Monitor, Application Insights
Identity and accessMicrosoft Entra (Agent ID, OIDC workload identity federation)
GovernanceAzure Policy, RBAC, Foundry control plane

Governance and Responsible AI

Shipping AI agents at enterprise scale requires governance beyond what a traditional CI/CD pipeline provides. Microsoft Foundry addresses this at the platform level:

  • RBAC per environment — each Foundry project has independent access controls. Developers deploy to Dev; only CI/CD service principals (with audited OIDC tokens) can promote to Test and Production.
  • Agent registry and audit trail — the Foundry control plane records which agent version is active in each environment, who deployed it, and when. This satisfies enterprise audit requirements without additional tooling.
  • Content safety and policy enforcement — Azure Policy governs model access, data handling, and content safety rules at the infrastructure level, not just at the application code level. Policy violations block deployment automatically.
  • Entra Agent Identity — each deployed agent version receives a dedicated, short-lived managed identity. Agents authenticate to downstream services using least-privilege credentials scoped to that specific deployment.
  • Continuous evaluation in production — evaluation pipelines run on sampled production traffic, alerting when quality, safety, or cost metrics drift from their baseline.

A key trade-off to be transparent about: evaluation datasets must be maintained and updated as the agent's tasks evolve. Stale datasets produce misleading pass/fail signals. Treat your golden evaluation set as a first-class engineering artefact alongside the agent code itself.


Pipeline Files

Two pipeline files accompany this reference architecture. Both implement the same four-stage pipeline (CI Build, CI Evaluate, CD Dev, CD Test, CD Production) with environment-appropriate approval gates.

  • github-actions-pipeline.yml — GitHub Actions workflow. Uses GitHub Environments for approval gates and OIDC Workload Identity Federation for passwordless Azure authentication. No stored Azure credentials required.
  • azure-devops-pipeline.yml — Azure DevOps multi-stage YAML pipeline. Uses ADO Environments with required approvers and variable groups per environment.

Both pipelines share these security practices:

  • OIDC / Workload Identity Federation — no long-lived Azure credentials stored in pipeline secrets
  • Per-environment variable groups, each with scoped connection strings and endpoints
  • Evaluation quality gates enforced before every promotion step
  • Mandatory human approval before production deployment

Summary

The full pipeline in one view:

Developer commit
        |
   CI Pipeline
   ├── Docker build (hosted agents) / YAML validation (prompt agents)
   ├── Static checks + unit tests + tool tests
   └── Evaluation gate  ←  quality · safety · performance
        |
   Agent Version created  ← immutable, versioned artefact
        |
   CD Pipeline
   ├── Deploy to Dev       → smoke tests + eval gate
   ├── Promote to Test     → scenario tests + HITL + approval gate
   └── Promote to Prod     → enable endpoint + monitoring
        |
   Microsoft Foundry Agent Service
   └── Versioned runtime · Entra identity · RBAC · Observability
        |
   Control Plane
   └── Agent registry · Governance · Continuous evaluation

Microsoft Foundry provides the platform primitives — versioned agent deployments, multi-environment Foundry projects, built-in lifecycle management, and an enterprise observability stack — needed to operate AI agents with the same confidence as any production software system.

The key takeaway: treat the agent version as your deployment artefact, and evaluation outcomes as your release gate. The rest follows familiar CI/CD patterns you already know and trust.


Next Steps

Read the whole story
alvinashcraft
27 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Choosing CSS Selectors for Production: Specificity, Modern Pseudo-Classes, and Maintainable Styles

1 Share

Choosing CSS Selectors for Production Specificity, Modern Pseudo‑Classes, and Maintainable Styles

TL;DR: Production CSS selectors require more than correctness; they demand predictable specificity, modern pseudo classes like :has(), :is(), and :where(), and patterns that scale. This article breaks down how to choose selectors that improve maintainability, reduce bugs, and keep large codebases fast, readable, and resilient as apps grow.

Selectors look simple when a project is small. You add a class, style a few elements, and move on. As a codebase grows, selector choice starts to affect more than just appearance. It influences maintainability, override behavior, debugging effort, and, sometimes, rendering cost.

Modern CSS gives us more selector options than ever, but the real skill is knowing which ones keep a codebase easier to work with over time. This guide focuses on choosing selectors that are clear, resilient, easy to override, and realistic to ship in production.

A simple decision framework

A practical way to choose selectors is to move from simple to specialized:

  1. Use a type selector for safe global defaults
  2. Use a class selector for reusable UI
  3. Use an attribute selector for real state or semantics
  4. Use a combinator when the relationship matters
  5. Use :is(), :where(), or :has() when they remove real complexity

Start with the simplest selector that communicates intent, then add structural or relational logic only when the UI actually needs it. This helps avoid overengineering and keeps CSS predictable.

Quick Selector Decision Cheat Sheet

Use this table as a fast reference when choosing selectors in production.

Situation Recommended Selector
Global typography or resets Type selector (body, h1)
Reusable UI components Class selector (.button, .card)
State already in markup Attribute selector ([disabled], [aria-expanded])
Layout relationships Combinators (+, >)
Shared patterns :is()
Soft defaults :where()
Context-based styling :has()

Classes, attributes, and combinators: Choosing the right tool

Classes for reusable UI

For buttons, cards, modals, alerts, and most component styles, classes are usually the best default. They describe what an element is rather than where it appears in the DOM.

.button {
  display: inline-flex;
  align-items: center;
  gap: 0.5rem;
}

.button--primary {
  background: royalblue;
  color: white;
}

This works well because the styling hook is tied to the component itself; changes to the HTML layout do not break the selector.

Attribute selectors for real state and semantics

Attribute selectors are useful when a state already exists in markup, such as disabled, aria-expanded, aria-selected, or data-* attributes.

button[disabled] {
  opacity: 0.6;
  cursor: not-allowed;
}

.accordion-trigger[aria-expanded="true"] {
  font-weight: 600;
}

This keeps styling aligned with behavior and accessibility without introducing extra classes.

Combinators when structure is meaningful

Combinators are useful when the relationship between elements matters, especially for spacing or local layout rules.

.form-row + .form-row {
  margin-top: 1rem;
}

.card > h2 {
  margin-block-end: 0.5rem;
}

Combinators work best when the relationship is short and clear. They become harder to maintain when they describe long paths through the DOM.

Modern Selectors: When to use :is(), :where(), and :has()

:is() —  Reduce repetition

Use :is() when multiple selectors share the same pattern and repetition hurts readability.

:is(header, nav, footer) a:hover {
  text-decoration: underline;
}

This selector improves clarity, but it takes on the specificity of its most specific argument.

:where() — Low-specificity defaults

:where() always contributes zero specificity, making it ideal for broad defaults that should remain easy to override.

:where(article, section, aside) h2 {
  margin-block-end: 0.5rem;
}

When override flexibility matters more than selector weight, :where() is usually the better choice.

:has() — Context-driven styling

:has() lets you style an element based on what it contains or what appears around it, often replacing extra classes or writing JavaScript just to change styles.

/* Before: extra class and JavaScript */
.card.has-error {
  border-color: crimson;
}

/* After: pure CSS with real state */
.card:has(.error-message) {
  border-color: crimson;
}

The key question is whether :has() simplifies the code. If a plain class is clearer, it remains the better option.

Specificity: Rules that matter in practice

Most selector problems eventually manifest as specificity issues. Heavier selectors are harder to override cleanly.

Practical rules to remember:

  • Type selectors are light
  • Classes, attributes, and pseudo‑classes are heavier
  • IDs are heavier still
  • :is() and :has() inherit the weight of their most specific arguments
  • :where() contributes zero specificity

This is why production CSS usually favors classes over IDs, shallow selectors over deep chains, and low‑specificity defaults where possible. Frequent use of !important is often a sign of selector strategy issues.

Native CSS nesting and accidental complexity

Native CSS nesting makes it easy to accidentally create massive specificity chains. Nesting heavily (e.g., .card { .card-body { .card-title { span { ... } } } }) compiles down to deep descendant selectors that are fragile and hard to override. Keep nesting shallow, ideally, no more than one or two levels deep.

Selector performance: Measure before you optimize

Selector performance can matter in large or highly dynamic DOMs, but it is easy to overestimate its impact. Whether selector complexity affects performance depends on DOM size, rule count, and the frequency of DOM changes.

The practical approach is measurement. If CSS matching is causing slow interactions, use browser performance tools to identify which selectors actually appear in the data. Optimize what proves expensive, not what merely looks expensive.

Real-world patterns that work well

1. Form state without extra classes

A strong use of :has() is styling a wrapper based on input state.

.form-group:has(input:invalid) {
  border-color: crimson;
}

This removes the need for an extra state class and keeps the styling tied to the real form state.

2. Grouped interaction rules

:is() works well when repeated container patterns share the same interaction styles.

:is(header, nav, footer) a:hover,
:is(header, nav, footer) a:focus-visible {
  text-decoration: underline;
}

This is easier to read than repeating the same selector chain multiple times.

3. Soft defaults for content blocks

:where() is a good fit when you want consistency without creating override friction.

:where(article, section, aside) :where(h2, h3) {
  line-height: 1.2;
}

This works well for content-heavy layouts where flexibility still matters.

4. Accessibility-driven state styling

Attribute selectors are often the cleanest option when state is already exposed through accessibility attributes.

.disclosure-button[aria-expanded="true"] + .disclosure-panel {
  display: block;
}

That keeps state, behavior, and styling aligned.

Common Anti-Patterns We Avoid

  • Deep descendant chains that tightly couple styles to DOM structure
  • Styling everything with IDs, which adds unnecessary specificity
  • Using advanced selectors when simpler class‑based solutions are clearer

Powerful selectors are useful only when they reduce real complexity.

Frequently Asked Questions

Do modern selectors like :has() work in JavaScript selectors?

Yes. Browser APIs like querySelector() use CSS selector syntax, subject to browser support.

Should the selector strategy be combined with cascade layers?

Yes. Cascade layers help control precedence across resets, defaults, utilities, and components, reducing reliance on heavy selectors.

When should @supports selector(...) be used?

When a selector enhances the experience but is not required for core functionality, especially with newer features like :has().

Are there cases :where() is a bad choice?

Yes. If a rule needs meaningful selector weight inside a component, :where() may be too weak.

Conclusion

Thank you for reading! The best selectors in production are rarely the most advanced ones. They are the ones that make the code easier to understand, override, and change. Modern CSS provides better tools, but the real advantage comes from using them with judgment.

Choose selectors for clarity first, specificity second, and complexity only when it removes a real problem. If you’ve developed your own selector strategies in production, feel free to share them in the comments.

Read the whole story
alvinashcraft
28 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

DevExpress Blazor AI Chat — Multi-Model Support, MCP Server Integration, and a Look at What's Coming Next

1 Share

We continue to extend the capabilities of the DevExpress Blazor AI Chat component and publish GitHub examples designed to address real-world usage scenarios. This post highlights two new examples: a multi-model chat with persistent conversation history, and MCP server integration that extends AI context with external data sources. I'll also share planned features for v26.1 (scheduled for mid-June 2026).

Multi-Model Chat with Conversation History

Our earlier multi-LLM chat application example demonstrated how to switch between AI providers within a single chat session. The new DevExpress Blazor AI Chat — Multi-Model Chat with Conversation History example adds persistent conversation threads and automated chat session title generation.

The multi-model chat UI showing the left sidebar with a list of conversation threads (each with an auto-generated title), a model selector dropdown at the top, and the active chat pane on the right.

The application uses a two-pane layout with DxSplitter. The left pane is a sidebar that hosts a DxComboBox for model selection and a DxListBox for conversation threads. InMemoryChatThreadStore manages thread data. This thread-safe dictionary-backed store tracks message history and timestamps. The right pane hosts the DxAIChat component. The following Razor markup defines the layout:

<DxSplitter CssClass="chat-splitter" Height="100%">
    <Panes>
        <DxSplitterPane Size="320px" MinSize="220px" MaxSize="500px">
            <DxButton RenderStyle="ButtonRenderStyle.Primary"
                      RenderStyleMode="ButtonRenderStyleMode.Contained"
                      Text="New Chat"
                      Click="CreateNewThreadAsync" />
            <DxComboBox Data="@ModelsList"
                        Value="@SelectedModel"
                        TextFieldName="@nameof(ChatClientSession.Name)"
                        ValueChanged="@((ChatClientSession session) => OnSelectedThreadModelChangedAsync(session))" />
            <DxListBox Data="@Threads"
                       Value="@SelectedThread"
                       ValueChanged="@((ChatThread thread) => OnThreadSelectedAsync(thread))"
                       TextFieldName="@nameof(ChatThread.Title)">
                <ItemDisplayTemplate>
                    <div class="thread-list-item">
                        <div class="thread-title">@context.DataItem.Title</div>
                        <div class="thread-model">@GetModelName(context.DataItem.ModelSessionId)</div>
                    </div>
                </ItemDisplayTemplate>
            </DxListBox>
        </DxSplitterPane>
        <DxSplitterPane>
            <DxAIChat @ref="DxAiChat"
                      Initialized="OnChatInitialized" />
        </DxSplitterPane>
    </Panes>
</DxSplitter>

Automatic thread title generation is a key implementation detail. The CompositeChatClient class implements IChatClient and intercepts outgoing user messages via GetResponseAsync and GetStreamingResponseAsync methods. On the first message in a new thread, the class sends a background request to the selected AI model using a dedicated system prompt and requests a concise 3–6 word title. IChatThreadStore stores the result. The ThreadTitleUpdated event updates the UI and refreshes the sidebar without blocking the main chat response:

// CompositeChatClient.cs
public IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(IEnumerable<ChatMessagegt; messages, 
ChatOptions? options = null, CancellationToken cancellationToken = new CancellationToken())
    {
        var selectedSession = GetRequiredSelectedSession();
        var messageList = messages.ToList();
        TryQueueTitleGeneration(messageList, selectedSession);
		...
        await foreach (var update in selectedSession.Client.GetStreamingResponseAsync(
        	messageList, options, cancellationToken))
            yield return update;
        ...
    }

private void TryQueueTitleGeneration(IEnumerable<ChatMessage> messages, ChatClientSession selectedSession) {
    var threadId = _activeThreadId.Value;
    var firstUserMessage = GetFirstUserMessage(messages);
    ...
    _ = GenerateTitleForThreadAsync(threadId, selectedSession, firstUserMessage);
}

private async Task GenerateTitleForThreadAsync(Guid threadId,
    ChatClientSession selectedSession, string firstUserMessage) {
    try {
        var thread = await _threadStore.GetThreadAsync(threadId, CancellationToken.None);
        if (thread is null || thread.HasGeneratedTitle)
            return;

        var modelSession = AvailableChatClients
            .FirstOrDefault(x => x.Id == thread.ModelSessionId) ?? selectedSession;

        string generatedTitle;
            try {
                generatedTitle = await _titleGenerator.GenerateTitleAsync(modelSession, firstUserMessage, CancellationToken.None);
            }
            catch {
                generatedTitle = _titleGenerator.BuildFallbackTitle(firstUserMessage);
            }

            if (string.IsNullOrWhiteSpace(generatedTitle)) {
                generatedTitle = _titleGenerator.BuildFallbackTitle(firstUserMessage);
            }

            await _threadStore.UpdateTitleAsync(threadId, generatedTitle, true, CancellationToken.None);
            lock (_syncRoot) {
                _titledThreadIds.Add(threadId);
            }
            ThreadTitleUpdated?.Invoke(threadId, generatedTitle);
        }
        catch (OperationCanceledException) { }
        finally {
            lock (_syncRoot)
                _titleGenerationInProgress.Remove(threadId);
        }
}

The example includes an in-memory store. The IChatThreadStore interface allows for replacement with an EF Core-backed implementation for applications that require persistent history.

To download and explore our implementation, navigate to the following DevExpress GitHub repository: Blazor AI Chat — Multi-Model Chat with Conversation History.

MCP Server Integration

The DevExpress Blazor AI Chat — Integration with Model Context Protocol example connects our Blazor AI Chat component to external data through the Model Context Protocol (MCP).

The Blazor AI Chat UI with an MCP-powered chat session open, showing the chat querying a server access log and receiving an AI-generated analysis in response.

The solution includes two projects.

  • AIChatMcpServer is a custom MCP server that exposes sample tools, resources, and prompt templates to the client application.
  • AIChatMcpClient is a Blazor Server application that hosts DxAIChat and loads MCP capabilities at startup through a hosted McpRepository service.

The sample MCP server exposes three primitives: tools (executable functions the AI model can call automatically), resources (static content such as logs, text files, and binary images), and prompts (reusable parameterized templates). McpRepository loads these primitives at startup and passes them to DxAIChat.

Each primitive maps directly to a DxAIChat feature. Resources map to AIChatResource objects and populate the Resources collection. Prompts map to DxAIChatPromptSuggestion entries displayed when the chat opens. Tools attach to IChatClient through UseFunctionInvocation at startup.

Index.razor:

<DxAIChat FileUploadEnabled="true"
          Resources="Resources"
          IncludeFunctionCallInfo="true">
        <PromptSuggestions>
            @foreach (var suggestion in PromptSuggestions){
                <DxAIChatPromptSuggestion PromptMessage="@suggestion.PromptMessage" Title="@suggestion.Title" Text="@suggestion.PromptMessage"/>
            }
        </PromptSuggestions>
        <AIChatSettings>
            <DxAIChatFileUploadSettings MaxFileSize="10000000" MaxFileCount="3"/>
        </AIChatSettings>
    </DxAIChat>

@code {
    IEnumerable<AIChatResource> Resources { get; set; } = [];
    IEnumerable<PromptSuggestion> PromptSuggestions { get; set; } = [];

    protected override async Task OnInitializedAsync() {
        // Map MCP resources to AIChatResource — DxAIChat fetches content on demand via LoadResourceData
        Resources = McpRepository.Resources.Select(x =>
            new AIChatResource(x.Uri, x.Name, LoadResourceData, x.MimeType, x.Description));
        // Map MCP prompt templates to prompt suggestions shown in the chat UI
        PromptSuggestions = McpRepository.PromptSuggestions;
    }

    async Task<IList<AIContent>> LoadResourceData(AIChatResource resource, CancellationToken ct) {
        var result = await McpRepository.Client.ReadResourceAsync(resource.Uri, cancellationToken: ct);
        return result.Contents.ToAIContents();
    }
}

Program.cs:

using Azure;
using Azure.AI.OpenAI;
using AIChatMcpClient;
using AIChatMcpClient.Components;
using AIChatMcpClient.Services;
using Microsoft.Extensions.AI;
...

builder.Services.AddSingleton<McpRepository>();
builder.Services.AddHostedService(sp => sp.GetRequiredService<McpRepository>());

builder.Services.AddSingleton<IChatClient>(sp => {
    var mcpRepository = sp.GetService<McpRepository>();
    var azureOpenAIClient = new AzureOpenAIClient(
        new Uri(azureOpenAISettings.Endpoint),
        new AzureKeyCredential(azureOpenAISettings.ApiKey));
    var chatClient = azureOpenAIClient.GetChatClient(azureOpenAISettings.DeploymentName).AsIChatClient();
    return new ChatClientBuilder(chatClient)
        .ConfigureOptions(co => {
            co.Tools = mcpRepository.Tools.ToArray<AITool>();
        })
        .UseFunctionInvocation()
        .Build();
});
...

The implementation follows MCP standards. Client code requires no changes when you switch to another MCP-compliant backend. To connect the Blazor application to a different MCP server, modify the McpRepository endpoint:

using AIChatMcpClient.Models;
using ModelContextProtocol.Client;
using ModelContextProtocol.Protocol;
...
public class McpRepository : IHostedService, IAsyncDisposable {
    private readonly string _mcpEndpoint;

    public McpClient Client { get; private set; } = null!;
    public List<McpClientTool> Tools { get; } = [];
    public List<McpClientResource> Resources { get; } = [];
    public List<McpClientPrompt> Prompts { get; } = [];
    public List<PromptSuggestion> PromptSuggestions { get; } = [];

    public McpRepository(IConfiguration configuration) {
        _mcpEndpoint = configuration.GetSection("McpServer:Endpoint").Value 
                       ?? throw new InvalidOperationException("McpServer:Endpoint is not configured in appsettings.json");
    }

    public async Task StartAsync(CancellationToken cancellationToken) {
        var transport = new HttpClientTransport(new() { Endpoint = new(_mcpEndpoint) });
        Client = await McpClient.CreateAsync(transport);
        
        var tools = await Client.ListToolsAsync(cancellationToken: cancellationToken);
        var resources = await Client.ListResourcesAsync(cancellationToken: cancellationToken);
        var prompts = await Client.ListPromptsAsync(cancellationToken: cancellationToken);
        
        Tools.AddRange(tools);
        Resources.AddRange(resources);
        Prompts.AddRange(prompts);

        // Preload prompt suggestions at startup
        foreach (var prompt in Prompts) {
            var result = await prompt.GetAsync();
            var content = result.Messages[0].Content;
            PromptSuggestions.Add(new PromptSuggestion {
                PromptMessage = ((TextContentBlock)content).Text,
                Title = prompt.Title ?? "Untitled"
            });
        }
    }

    public Task StopAsync(CancellationToken cancellationToken) => Task.CompletedTask;

    public async ValueTask DisposeAsync() {
        await Client.DisposeAsync();
    }
}

To download and explore our implementation, navigate to the following DevExpress GitHub repository: DevExpress Blazor AI Chat — Integration with Model Context Protocol.

What's Coming in v26.1

Our v26.1 release is scheduled for mid-June 2026 and includes the following enhancements to DevExpress AI Chat components for Blazor, WinForms, and WPF.

Microsoft Agent Framework and OpenAI Responses API Support

The most substantial addition is a new IChatResponseProvider abstraction layer that decouples the chat UI from the underlying AI service. This layer allows you to bind DxAIChat to a wider set of AI backends beyond the standard IChatClient interface, including the Microsoft Agent Framework (with support for agents, executors, and multi-step workflows), the OpenAI Responses API, and Azure AI Projects. The API also supports custom IChatResponseProvider implementations for usage scenarios that don't fit standard providers.

Planned demos will illustrate how to connect our AI Chat Control to individual agents, composite workflows, AG-UI backends, and tool approval workflows in agentic pipelines.

API Enhancements

v26.1 replaces the MessageSent event with MessageSending. This event fires before the message is added to chat history and sent to the AI service. Additionally, this event exposes an e.Cancel parameter that allows you to block send operations entirely. Use it to preprocess and validate input, filter content, call external services and handle the messaging pipeline manually. Alternatively, if e.Cancel is set to false, the AI Chat Control will continue sending and displaying messages and allow you to log and audit user messages without disruption to the normal message pipeline.

The new event also supports augmentation before delivery — for example, appending a system message or supplemental context to the chat history via the new AppendMessageAsync method:

async void AiChatControl1_MessageSending(object sender, AIChatControlMessageSendingEventArgs e) {
    // Append a system message before sending the user's prompt to the AI service.
    await e.Chat.AppendMessageAsync("Translate text to Spanish", ChatRole.System);
}

Empty Chat Customization

v26.1 introduces two new properties designed to customize initial chat state. EmptyMessageAreaText specifies text dispalyed in the empty chat area, and InputBoxNullText specifies placeholder text in the input box. These properties allow you to align the initial chat experience with application context and tone:

<DxAIChat EmptyMessageAreaText="How can I help you today?"
          InputBoxNullText="Ask a question or describe a task..." />
Blazor AI Chat — Customized Empty Text Area and Input Null Text

Share Your Feedback

Looking for a particular code example? Contact us via the DevExpress Support Center to share your usage scenario and we'll be happy to recommend an implementation.

Read the whole story
alvinashcraft
28 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Introduction to LM Studio

1 Share

LM Studio is a desktop app for discovering, downloading, and running open-weight language models on your own machine.

Open-weight means the model’s trained parameters (the “weights”) are published so you can download them and run them yourself — unlike a cloud-only API where you never get the files. It is related to open source, but not the same thing: the weights might be available under a licence that still restricts commercial use or redistribution, so check the model card before you ship anything.

Inference is what happens when you actually use the model: you send a prompt (and maybe a system message), it runs the maths stored in the weights, and it produces tokens back — the words you see in chat, or JSON if you asked for structured output. Training is the expensive one-off job that creates the weights; inference is the thing you do on every message. LM Studio is almost entirely about inference on your PC. There is no per-token bill from a cloud vendor for that step — you pay in disk space, RAM, electricity, and patience while a multi-gigabyte download finishes.

Quantisation is a way to shrink the weight files by storing numbers with fewer bits (less precision). A Q4 build is roughly 4-bit quantisation: much smaller on disk and often faster to run, with a small drop in quality compared with full FP16 (16-bit) weights. Names like Q4_K_M are specific recipes — you do not need to memorise them; treat them as “this is a popular compressed variant” unless you are benchmarking. For a first model on a laptop, a Q4 instruct build in the single-digit GB range is usually the sensible choice.

You can use this as a way to play with LLM concepts and ideas without paying for licenses for Claude or Chat GPT. In this post I’m simply introducing this tool - I’ve written more on interacting with docker here.

What You Get

At a glance, LM Studio gives you:

  • A model catalogue tied into community and publisher listings (search, sort, staff picks)
  • A chat UI with per-session controls (temperature, context, system prompt, presets)
  • A Developer view that runs a local HTTP server — OpenAI-compatible endpoints on port 1234 by default
  • Optional hooks for MCP (Model Context Protocol) so the same app can talk to external tools while you experiment

You do need reasonable hardware. A 4B model in Q4 quantisation is approachable on many laptops; a 31B download at ~20 GB is a different commitment entirely.

Finding and Downloading a Model

After install, the discover/search flow is where most people start. Search for a family (here, gemma), browse results on the left, and open the detail pane on the right.

Searching and downloading a model in LM Studio

The detail view is worth reading before you click download:

  • Publisher and variant — e.g. google/gemma-4-31b with parameter count and architecture
  • Format — typically GGUF, the common file packaging for running quantised weights locally
  • Quantisation — the row you pick on download (e.g. Q4_K_M); see the explanation above
  • Capability badges — vision, tool use, reasoning, depending on the model
  • Hardware hints — file size and whether partial GPU offload is possible

Pick a quantisation row, hit download, and wait. Smaller instruct models (a few GB) are better first steps than the largest dense variants unless you know you have the VRAM.

Loading a Model and Opening Chat

Switch to the Chat tab (keyboard shortcut Ctrl+L to pick a model). Until something is loaded, the centre panel prompts you to open the model loader or reload the last model you used.

Chat tab before a model is loaded

The right-hand rail is the Parameters panel — we’ll come back to that once a model is running.

Tuning Parameters

LM Studio exposes the knobs that affect how the model generates text, without editing JSON by hand (unless you want to).

Model parameters — presets, sampling, and system prompt

Useful controls you’ll see there:

Control What it does
Preset Save and recall combinations of settings
System prompt Instructions or persona applied across the session
Temperature Higher = more varied; lower = more deterministic
Limit response length Cap tokens per reply (e.g. 500)
Context overflow What to do when the conversation exceeds context — e.g. truncate middle
CPU threads How much CPU to use for inference on your machine
Top K / Top P Sampling filters on the next-token distribution
Repeat penalty Reduces stutter and repetition in longer answers

Blue dots on a row mean you’ve changed it from the default. For a first play, leave most values alone; nudge temperature and system prompt and see how replies change.

Your First Conversation

With google/gemma-3-4b loaded, the chat view shows the exchange, timing, and token stats under each assistant message (tokens per second, total tokens, stop reason).

Chat with a loaded model and performance stats

In this screenshot the model replied to a simple “Hello” — and the input area shows an MCP attachment (mcp-server-offline-llm). That’s LM Studio acting as a client to an MCP server you’ve registered, so the model can use tools defined elsewhere. You don’t need MCP to get started; it’s there when you want IDE-style integrations or custom tools.

The top bar shows the active model ID — the same string you’ll use later if you call the local API by name.

The Developer Tab — Local Server

When you’re ready to call the model from another app (Cursor, a script, your own .NET project), open the Developer tab and start the server.

Developer tab — server running with loaded model and endpoints

What you’re looking at:

  • Status — server on or off; reachable address (often http://127.0.0.1:1234 or your LAN IP on port 1234)
  • Loaded models — what’s in memory, size, parallelism, Eject when you want to free RAM
  • Supported endpoints — tabs for LM Studio’s API shape, OpenAI-compatible, and Anthropic-compatible routes (/api/v1/models, /api/v1/chat, load/unload, and so on)
  • Developer logs — requests, MCP plugin traffic, errors
  • Model information — GGUF, quantisation, architecture, capabilities (e.g. Vision on Gemma 3), and the API model identifier to pass in requests

The cURL shortcut on a loaded model is handy for a quick sanity check before you point your own code at the server.

MCP and mcp.json

If you’re experimenting with MCP, the Developer screen also surfaces server settings and an mcp.json entry point — the config LM Studio uses to spawn or connect MCP servers alongside the local model.

Developer tab — server settings and MCP configuration

In the logs you may see lines for ModelContextProtocol.Server.McpServer and named plugins (e.g. an offline LLM tool server). That confirms the bridge is live: LM Studio hosts the model, MCP supplies tools, and the chat or API session can combine both. Building those servers in .NET is a topic for another day; here it’s enough to know where the switch lives.

A Sensible First Session

If you’re new to the whole stack, this order works well:

  1. Install LM Studio from lmstudio.ai
  2. Download a small instruct model (single-digit GB, Q4 quantisation)
  3. Chat — try a system prompt, watch token speed, adjust temperature once
  4. Open Developer, start the server, hit cURL or open the OpenAI-compatible docs in the UI
  5. Only then add MCP or wire up your own client

Where to Go Next

LM Studio’s job is to make the first mile easy: find a model, run it, tune it, and expose it locally. Everything after that is just HTTP — but you don’t have to start there.

References

LM Studio

GGUF format (Hugging Face wiki)

Model Context Protocol

Read the whole story
alvinashcraft
28 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Agent Skills for Python: File, Code, and Class – Composed in One Provider

1 Share

Python developers working with Agent Skills can now author skills as files on disk, as inline Python code, or as reusable classes – and mix them freely through composable source classes that handle discovery, filtering, and deduplication. A skill living in your local repository, one installed from your organization’s internal package index, and a quick inline bridge you wrote ten minutes ago all plug into the same provider.

This is the third post in our Agent Skills series. The first post introduced file-based skills; the second added code-defined skills, script execution, and approval for Python. This post walks through the two additions that complete the picture: class-based skills and multi-source composition.

If you’ve been following the .NET side, the companion post Agent Skills in .NET: Three Ways to Author, One Provider to Run Them covers the same capabilities for C#. Everything shown here is the Python equivalent – same concepts, idiomatic Python API.

The scenario

Imagine you’re responsible for an HR self-service agent at your company. The first version has a single file-based skill that guides new hires through onboarding. Over the next few weeks, the HR systems team publishes a benefits enrollment skill as an installable Python package on your organization’s internal package index, and you want to slot it in next to the onboarding skill without touching existing code. Meanwhile, you learn they’re also building a time-off balance skill – but the packaged version won’t ship for another sprint. The HR data you need is already reachable through an internal client your application uses elsewhere, so you write a quick inline skill that wraps it. Once the official package lands, you swap out your bridge and move on.

Every step here is independent. Adding one skill never means rewriting another.

Step 1: Start with a file-based skill

The onboarding guide is a skill directory with a SKILL.md file, a Python script that checks whether IT accounts have been provisioned, and a reference document containing the checklist:

skills/
└── onboarding-guide/
    ├── SKILL.md
    ├── scripts/
    │   └── check-provisioning.py
    └── references/
        └── onboarding-checklist.md
---
name: onboarding-guide
description: >-
  Walk new hires through their first-week setup checklist. Use when a new
  employee asks about system access, required training, or onboarding steps.
---

## Instructions

1. Ask for the employee's name and start date if not already provided.
2. Run the `scripts/check-provisioning.py` script to verify their IT accounts are active.
3. Walk through the steps in the `references/onboarding-checklist.md` reference.
4. Follow up on any incomplete items.

To let the agent execute that script, provide a script_runner when creating the SkillsProvider and pass the provider to an agent:

import os
from pathlib import Path
from agent_framework import Agent, SkillsProvider
from agent_framework.foundry import FoundryChatClient
from azure.identity import AzureCliCredential

def my_runner(skill, script, args=None):
    """Run a file-based script as a subprocess."""
    import subprocess, sys
    script_path = Path(script.full_path)
    cmd = [sys.executable, str(script_path)]
    if isinstance(args, list):
        cmd.extend(args)
    result = subprocess.run(
        cmd, capture_output=True, text=True, timeout=30, cwd=str(script_path.parent)
    )
    return result.stdout.strip()

# Discover skills from the 'skills' directory
skills_provider = SkillsProvider.from_paths(
    skill_paths=Path(__file__).parent / "skills",
    script_runner=my_runner,
)

endpoint = os.environ["FOUNDRY_PROJECT_ENDPOINT"]
deployment = os.environ.get("FOUNDRY_MODEL", "gpt-4o-mini")

client = FoundryChatClient(
    project_endpoint=endpoint,
    model=deployment,
    credential=AzureCliCredential(),
)

agent = Agent(
    client=client,
    instructions="You are a helpful HR self-service assistant.",
    context_providers=[skills_provider],
)

When a new hire asks about onboarding, the agent matches the request to the skill description, loads the instructions, and calls the provisioning script to verify account status.

The runner shown here is deliberately simple. In production, wrap it with sandboxing, resource limits, input validation, and logging.

Step 2: Bring in a class-based skill from a Python package

A few weeks later, the HR systems team publishes contoso-skills-hr-enrollment to your internal Python package index. Class-based skills package everything – metadata, instructions, resources, and scripts – inside a single Python class. They subclass ClassSkill and rely on @ClassSkill.resource and @ClassSkill.script decorators for automatic discovery:

# Inside the contoso-skills-hr-enrollment package
import json
from textwrap import dedent
from agent_framework import ClassSkill, SkillFrontmatter

class BenefitsEnrollmentSkill(ClassSkill):
    """Enroll employees in health, dental, or vision plans."""

    def __init__(self) -> None:
        super().__init__(
            frontmatter=SkillFrontmatter(
                name="benefits-enrollment",
                description=(
                    "Enroll an employee in health, dental, or vision plans. "
                    "Use when asked about benefits sign-up, plan options, or coverage changes."
                ),
            ),
        )

    @property
    def instructions(self) -> str:
        return dedent("""\
            Use this skill when an employee asks about enrolling in or changing their benefits.

            1. Read the available-plans resource to review current offerings and pricing.
            2. Confirm the plan the employee wants to enroll in.
            3. Use the enroll script to complete the enrollment.
        """)

    @property
    @ClassSkill.resource(description="Health, dental, and vision plan options with monthly pricing.")
    def available_plans(self) -> str:
        return dedent("""\
            ## Available Plans (2026)
            - Health: Basic HMO ($0/month), Premium PPO ($45/month)
            - Dental: Standard ($12/month), Enhanced ($25/month)
            - Vision: Basic ($8/month)
        """)

    @ClassSkill.script(description="Enrolls an employee in the specified benefit plan. Returns a JSON confirmation.")
    def enroll(self, employee_id: str, plan_code: str) -> str:
        success = HrClient.enroll_in_plan(employee_id, plan_code)
        return json.dumps({"success": success, "employee_id": employee_id, "plan_code": plan_code})

A bare @ClassSkill.resource decorator (no arguments) uses the method name as the resource name, converting underscores to hyphens. Pass name="..." and description="..." explicitly when you want different values. The same applies to @ClassSkill.script. Resources work as regular methods or @property descriptors – when combining the two, put @property first.

Now wire the class-based skill into the same provider that already serves the file-based onboarding guide. This is where source composition comes in – import BenefitsEnrollmentSkill from the installed package and combine the sources:

from contoso_skills_hr_enrollment import BenefitsEnrollmentSkill
from agent_framework import (
    AggregatingSkillsSource,
    DeduplicatingSkillsSource,
    FileSkillsSource,
    InMemorySkillsSource,
    SkillsProvider,
)

skills_provider = SkillsProvider(
    DeduplicatingSkillsSource(
        AggregatingSkillsSource([
            FileSkillsSource(
                Path(__file__).parent / "skills",    # file-based: onboarding guide
                script_runner=my_runner,
            ),
            InMemorySkillsSource([BenefitsEnrollmentSkill()]),  # class-based: benefits enrollment from internal package
        ])
    )
)

Here AggregatingSkillsSource merges the file-based and in-memory sources into a single stream, and DeduplicatingSkillsSource ensures that if two sources happen to supply a skill with the same name, the first one takes priority. The agent sees both skills in its system prompt and picks the right one based on the employee’s question – no routing logic on your side.

Step 3: Bridge the gap with an inline skill

The HR systems team is also building a time-off balance skill, but the package won’t be published to the internal index for another sprint. The underlying data is already reachable through the shared HrDatabase client your application uses elsewhere – it’s the same source the official skill will read from. Instead of waiting, you wrap it in an inline skill defined in your application code with InlineSkill:

import json
from textwrap import dedent
from agent_framework import InlineSkill, SkillFrontmatter

time_off_skill = InlineSkill(
    frontmatter=SkillFrontmatter(
        name="time-off-balance",
        description="Calculate an employee's remaining vacation and sick days. Use when asked about available time off or leave balances.",
    ),
    instructions=dedent("""\
        Use this skill when an employee asks how many vacation or sick days they have left.
        1. Ask for the employee ID if not already provided.
        2. Use the calculate-balance script to get the remaining balance.
        3. Present the result clearly, showing both used and remaining days.
    """),
)

@time_off_skill.script(description="Calculate remaining leave balance for an employee.")
def calculate_balance(employee_id: str, leave_type: str) -> str:
    # Temporary implementation - replace with the packaged skill when available
    total_days = HrDatabase.get_annual_allowance(employee_id, leave_type)
    days_used = HrDatabase.get_days_used(employee_id, leave_type)
    remaining = total_days - days_used
    return json.dumps({
        "employee_id": employee_id,
        "leave_type": leave_type,
        "total_days": total_days,
        "days_used": days_used,
        "remaining": remaining,
    })

Fold it into the existing provider alongside the other two skills:

skills_provider = SkillsProvider(
    DeduplicatingSkillsSource(
        AggregatingSkillsSource([
            FileSkillsSource(
                Path(__file__).parent / "skills",    # file-based: onboarding guide
                script_runner=my_runner,
            ),
            InMemorySkillsSource([
                BenefitsEnrollmentSkill(),            # class-based: benefits enrollment from internal package
                time_off_skill,                       # code-defined: temporary bridge
            ]),
        ])
    )
)

From the agent’s perspective, this skill looks identical to the file-based and class-based ones. When the official package eventually ships, swap out time_off_skill for the class-based version – nothing else changes.

InlineSkill also fits naturally when you need resources that execute logic at read time rather than serving static files, when skill definitions must be constructed at runtime from data (for example, a personalized skill per user session based on role or permissions), or when a skill needs to close over call-site state (local variables, closures) rather than resolve services through **kwargs.

Step 4: Add human approval for script execution

Some of these scripts carry real weight: check-provisioning hits production infrastructure, and enroll writes to the HR system. Before going live, you’ll want a human to sign off on each script call. Set require_script_approval=True on the provider:

skills_provider = SkillsProvider(
    DeduplicatingSkillsSource(
        AggregatingSkillsSource([
            FileSkillsSource(
                Path(__file__).parent / "skills",    # file-based: onboarding guide
                script_runner=my_runner,
            ),
            InMemorySkillsSource([
                BenefitsEnrollmentSkill(),            # class-based: benefits enrollment from internal package
                time_off_skill,                       # code-defined: temporary time-off balance bridge
            ]),
        ])
    ),
    require_script_approval=True,
)

With this flag set, the agent pauses whenever it wants to run a script and hands your application an approval request. You present it to a reviewer, collect a decision, and resume. If approved, execution proceeds normally. If rejected, the agent is told the call was declined and can adjust its response accordingly. For the complete approval-handling pattern, see Tool approval in the documentation.

Why this matters

Independent skill ownership. Different teams author and publish skills on their own schedule – as directories in a shared repo or as Python packages on your internal index – and source composition stitches them together without cross-team coordination.

Grow the agent one skill at a time. Each new skill is additive. You don’t refactor existing skills to accommodate new ones; the agent selects the right skill at runtime.

Prototype quickly, replace cleanly. InlineSkill lets you ship behavior the same day you need it. When the official package arrives, the swap is a one-line change – the agent can’t tell the difference.

Human oversight where it counts. Script approval inserts a review step before any script with side effects executes – a practical safeguard for sensitive environments.

Selective exposure from shared libraries. When your organization maintains a central skill repository but individual agents should only see a subset, FilteringSkillsSource handles it with a predicate:

from agent_framework import (
    DeduplicatingSkillsSource,
    FileSkillsSource,
    FilteringSkillsSource,
    SkillsProvider,
)

approved_skills = {"onboarding-guide", "benefits-enrollment"}

skills_provider = SkillsProvider(
    DeduplicatingSkillsSource(
        FilteringSkillsSource(
            FileSkillsSource(Path(__file__).parent / "all-skills"),
            predicate=lambda skill: skill.frontmatter.name in approved_skills,
        )
    )
)

Wrapping up

The Python SDK for Agent Skills now gives you three authoring options – file-based, code-defined, and class-based – along with composable source classes to combine, filter, and deduplicate them however you need. Start with a skill directory, pull in a packaged class from your internal index, fill gaps with inline code, and let the provider handle the rest. Add script approval when the stakes call for it.

The post Agent Skills for Python: File, Code, and Class – Composed in One Provider appeared first on Microsoft Agent Framework.

Read the whole story
alvinashcraft
28 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories