Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
150470 stories
·
33 followers

Mistral releases a new open-source model for speech generation

1 Share
Mistral's new speech model can run on a smartwatch or a smartphone.
Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Building shared coding guidelines for AI (and people too)

1 Share
Coding guidelines and standards for agents need to be a little different—more explicit, demonstrative of patterns, and obvious.
Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

The Missing Layer in Agentic AI

1 Share

The day two problem

Imagine you deploy an autonomous AI agent to production. Day one is a success: The demos are fantastic; the reasoning is sharp. But before handing over real authority, uncomfortable questions emerge.

What happens when the agent misinterprets a locale-specific decimal separator, turning a position of 15.500 ETH (15 and a half) into an order for 15,500 ETH (15 thousand) on leverage? What if a dropped connection leaves it looping on stale state, draining your LLM request quota in minutes?

What if it makes a perfect decision, but the market moves just before execution? What if it hallucinates a parameter like force_execution=True—do you sanitize it or crash downstream? And can it reliably ignore a prompt injection buried in a web page?

Finally, if an API call times out without acknowledgment, do you retry and risk duplicating a $50K transaction, or drop it?

When these scenarios occur, megabytes of prompt logs won’t explain the failure. And adding “please be careful” to the system prompt acts as a superstition, not an engineering control.

Why a smarter model is not the answer

I encountered these failure modes firsthand while building an autonomous system for live financial markets. It became clear that these were not model failures but execution boundary failures. While RL-based fine-tuning can improve reasoning quality, it cannot solve infrastructure realities like network timeouts, race conditions, or dropped connections.

The real issues are architectural gaps: contract violations, data integrity issues, context staleness, decision-execution gaps, and network unreliability.

These are infrastructure problems, not intelligence problems.

While LLMs excel at orchestration, they lack the “kernel boundary” needed to enforce state integrity, idempotency, and transactional safety where decisions meet the real world.

An architectural pattern: The Decision Intelligence Runtime

Consider modern operating system design. OS architectures separate “user space” (unprivileged computation) from “kernel space” (privileged state modification). Processes in user space can perform complex operations and request actions but cannot directly modify system state. The kernel validates every request deterministically before allowing side effects.

AI agents need the same structure. The agent interprets context and proposes intent, but the actual execution requires a privileged deterministic boundary. This layer, the Decision Intelligence Runtime (DIR), separates probabilistic reasoning from real-world execution.

The runtime sits between agent reasoning and external APIs, maintaining a context store, a centralized, immutable record ensuring the runtime holds the “single source of truth,” while agents operate only on temporary snapshots. It receives proposed intents, validates them against hard engineering rules, and handles execution. Ideally, an agent should never directly manage API credentials or “own” the connection to the external world, even for read-only access. Instead, the runtime should act as a proxy, providing the agent with an immutable context snapshot while keeping the actual keys in the privileged kernel space.

Figure 1: High-level design (HLD) of the Decision Intelligence Runtime
Figure 1: High-level design (HLD) of the Decision Intelligence Runtime, illustrating the separation of user space reasoning from kernel space execution

Bringing engineering rigor to probabilistic AI requires implementing five familiar architectural pillars.

Although several examples in this article use a trading simulation for concreteness, the same structure applies to healthcare workflows, logistics orchestration, and industrial control systems.

DIR versus existing approaches

The landscape of agent guardrails has expanded rapidly. Frameworks like LangChain and LangGraph operate in user space, focusing on reasoning orchestration, while tools like Anthropic’s Constitutional AI and Pydantic schemas validate outputs at inference time. DIR, by contrast, operates at the execution boundary, the kernel space, enforcing contracts, business logic, and audit trails after reasoning is complete.

Both are complementary. DIR is intended as a safety layer for mission-critical systems.

1. Policy as a claim, not a fact

In a secure system, external input is never trusted by default. The output of an AI agent is exactly that: external input. The proposed architecture treats the agent not as a trusted administrator, but as an untrusted user submitting a form. Its output is structured as a policy proposal—a claim that it wants to perform an action, not an order that it will perform it. This is the start of a Zero Trust approach to agentic actions.

Here is an example of a policy proposal from a trading agent:

proposal = PolicyProposal(
    dfid="550e8400-e29b-41d4-a716-446655440000", # Trace ID (see Sec 5)
    agent_id="crypto_position_manager_01",
    policy_kind="TAKE_PROFIT",
    params={
        "instrument": "ETH-USD",
        "quantity": 0.5,
        "execution_type": "MARKET"
    },
    reasoning="Profit target of +3.2% hit (Threshold: 3.0%). Market momentum slowing.",
    confidence_score=0.92
)

2. Responsibility contract as code

Prompts are not permissions. Just as traditional apps rely on role-based access control, agents require a strict responsibility contract residing in the deterministic runtime. This layer acts as a firewall, validating every proposal against hard engineering rules: schema, parameters, and risk limits. Crucially, this check is deterministic code, not another LLM asking, “Is this dangerous?” Whether the agent hallucinates a capability or obeys a malicious prompt injection, the runtime simply enforces the contract and rejects the invalid request.

Real-world example: A trading agent misreads a comma-separated value and attempts to execute place_order(symbol='ETH-USD', quantity=15500). This would be a catastrophic position sizing error. The contract rejects it immediately:

ERROR: Policy rejected. Proposed order value exceeds hard limit.
Request: ~40000000 USD (15500 ETH)
Limit: 50000 USD (max_order_size_usd)

The agent’s output is discarded; the human is notified. No API call, no cascading market impact.

Here is the contract that prevented this:

# agent_contract.yaml
agent_id: "crypto_position_manager_01"
role: "EXECUTOR"
mission: "Manage news-triggered ETH positions. Protect capital while seeking alpha."
version: "1.2.0"                  # Immutable versioning for audit trails
owner: "jane.doe@example.com"     # Human accountability
effective_from: "2026-02-01"

# Deterministic Boundaries (The 'Kernel Space' rules)
permissions:
  allowed_instruments: ["ETH-USD", "BTC-USD"]
  allowed_policy_types: ["TAKE_PROFIT", "CLOSE_POSITION", "REDUCE_SIZE", "HOLD"]
  max_order_size_usd: 50000.00

# Safety & Economic Triggers (Intervention Logic)
safety_rules:
  min_confidence_threshold: 0.85      # Don't act on low-certainty reasoning
  max_drawdown_limit_pct: 4.0         # Hard stop-loss enforced by Runtime
  wake_up_threshold_pnl_pct: 2.5      # Cost optimization: ignore noise
  escalate_on_uncertainty: 0.70       # If confidence < 70%, ask human

3. JIT (just-in-time) state verification

This mechanism addresses the classic race condition where the world changes between the moment you check it and the moment you act on it. When an agent begins reasoning, the runtime binds its process to a specific context snapshot. Because LLM inference takes time, the world will likely change before the decision is ready. Right before executing the API call, the runtime performs a JIT verification, comparing the live environment against the original snapshot. If the environment has shifted beyond a predefined drift envelope, the runtime aborts the execution.

Figure 2: JIT verification catches stale decisions before they reach external systems.
Figure 2: JIT verification catches stale decisions before they reach external systems.

The drift envelope is configurable per context field, allowing fine-grained control over what constitutes an acceptable change:

# jit_verification.yaml
jit_verification:
  enabled: true
  
  # Maximum allowed drift per field before aborting execution
  drift_envelope:
    price_pct: 2.0           # Abort if price moved > 2%
    volume_pct: 15.0         # Abort if volume changed > 15%
    position_state: strict   # Any change = abort
  
  # Snapshot expiration
  max_context_age_seconds: 30
  
  # On drift detection
  on_drift_exceeded:
    action: "ABORT"
    notify: ["ops-channel"]
    retry_with_fresh_context: true

4. Idempotency and transactional rollback

This mechanism is designed to mitigate execution chaos and infinite retry loops. Before making any external API call, the runtime hashes the deterministic decision parameters into a unique idempotency key. If a network connection drops or an agent gets confused and attempts to execute the exact same action multiple times, the runtime catches the duplicate key at the boundary.

The key is computed as:

IdempotencyKey = SHA256(DFID + StepID + CanonicalParams)

Where DFID is the Decision Flow ID, StepID identifies the specific action within a multistep workflow, and CanonicalParams is a sorted representation of the action parameters.

Critically, the context hash (snapshot of the world state) is deliberately excluded from this key. If an agent decides to buy 10 ETH and the network fails, it might retry 10 seconds later. By then, the market price (context) has changed. If we included the context in the hash, the retry would generate a new key (SHA256(Action + NewContext)), bypassing the idempotency check and causing a duplicate order. By locking the key to the Flow ID and Intent params only, we ensure that a retry of the same logical decision is recognized as a duplicate, even if the world around it has shifted slightly.

Furthermore, when an agent makes a multistep decision, the runtime tracks each step. If one step fails, it knows how to perform a compensation transaction to roll back what was already done, instead of hoping the agent will figure it out on the fly.

A DIR does not magically provide strong consistency; it makes the consistency model explicit: where you require atomicity, where you rely on compensating transactions, and where eventual consistency is acceptable.

5. DFID: From observability to reconstruction

Distributed tracing is not a new idea. The practical gap in many agentic systems is that traces rarely capture the artifacts that matter at the execution boundary: the exact context snapshot, the contract/schema version, the validation outcome, the idempotency key, and the external receipt.

The Decision Flow ID (DFID) is intended as a reconstruction primitive—one correlation key that binds the minimum evidence needed to answer critical operational questions:

  • Why did the system execute this action? (policy proposal + validation receipt + contract/schema version)
  • Was the decision stale at execution time? (context snapshot + JIT drift report)
  • Did the system retry safely or duplicate the side effect? (idempotency key + attempt log + external acknowledgment)
  • Which authority allowed it? (agent identity + registry/contract snapshot)

In practice, this turns a postmortem from “the agent traded” into “this exact intent was accepted under these deterministic gates against this exact snapshot, and produced this external receipt.” The goal is not to claim perfect correctness; it is to make side effects explainable at the level of inputs and gates, even when the reasoning remains probabilistic.

At the hierarchical level, DFIDs form parent-child relationships. A strategic intent spawns multiple child flows. When multistep workflows fail, you reconstruct not just the failing step but the parent mandate that authorized it.

Figure 3: Hierarchical Decision Flow IDs enable full process reconstruction across multi-agent interactions.
Figure 3: Hierarchical Decision Flow IDs enable full process reconstruction across multi-agent interactions.

In practice, this level of traceability is not about storing prompts—it is about storing structured decision telemetry.

In one trading simulation, each position generated a decision flow that could be queried like any other system artifact. This allowed inspection of the triggering news signal, the agent’s justification, intermediate decisions (such as stop adjustments), the final close action, and the resulting PnL, all tied to a single simulation ID. Instead of replaying conversational history, this approach reconstructed what happened at the level of state transitions and executable intents.

SELECT position_id
     , instrument
     , entry_price
     , initial_exposure
     , news_full_headline
     , news_score
     , news_justification
     , decisions_timeline
     , close_price
     , close_reason
     , pnl_percent
     , pnl_usd
  FROM position_audit_agg_v
 WHERE simulation_id = 'sim_2026-02-24T11-20-18-516762+00-00_0dc07774';
Figure 4: Example of structured decision telemetry
Figure 4: Example of structured decision telemetry. Each row links context, reasoning, intermediate actions, and financial outcome for a single simulation run.

This approach is fundamentally different from prompt logging. The agent’s reasoning becomes one field among many—not the system of record. The system of record is the validated decision and its deterministic execution boundary.

From model-centric to execution-centric AI

The industry is shifting from model-centric AI, measuring success by reasoning quality alone, to execution-centric AI, where reliability and operational safety are first-class concerns.

This shift comes with trade-offs. Implementing deterministic control requires higher latency, reduced throughput, and stricter schema discipline. For simple summarization tasks, this overhead is unjustified. But for systems that move capital or control infrastructure, where a single failure outweighs any efficiency gain, these are acceptable costs. A duplicate $50K order is far more expensive than a 200 ms validation check.

This architecture is not a single software package. Much like how Model-View-Controller (MVC) is a pervasive pattern without being a single importable library, DIR is a set of engineering principles: separation of concerns, zero trust, and state determinism, applied to probabilistic agents. Treating agents as untrusted processes is not about limiting their intelligence; it is about providing the safety scaffolding required to use that intelligence in production.

As agents gain direct access to capital and infrastructure, a runtime layer will become as standard in the AI stack as a transaction manager is in banking. The question is not whether such a layer is necessary but how we choose to design it.


This article provides a high-level introduction to the Decision Intelligence Runtime and its approach to production resiliency and operational challenges. The full architectural specification, repository of context patterns, and reference implementations are available as an open source project at GitHub.



Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Why ICE Is Allowed to Impersonate Law Enforcement

1 Share
“There's no accountability,” one expert tells WIRED of ICE’s ability to lie to the public. "The consequence of this is that it’s going to be a systemic harm across all law enforcement.”
Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Getting started with GitLab feature flags in Python

1 Share

You've spent weeks building a new feature. It passes every test, the code review is done, and it's ready to ship. So you deploy it and within an hour your inbox is full of bug reports. The feature works fine for most users, but something about production traffic you didn't anticipate is causing failures for a subset of them. Now you're scrambling to roll back, writing incident reports, and managing the PR fallout.

Feature flags prevent exactly this. They let you decouple deployment from release: push code to production whenever it's ready, then control who actually sees the new feature by flipping a toggle in GitLab. Start with your QA team using a "User IDs" strategy, then switch to a "10% percent rollout," then flip to "All users" when you're confident. If something goes wrong at any point, you turn it off in seconds. No redeployment, no hotfix, no bad press.

This tutorial walks through a working Flask application that reads GitLab feature flags through the Unleash Python SDK. A complete, runnable version of the code is available at gitlab.com/omid-blogs/gitlab-feature-flags-demo. Clone it into your own group or workspace, and you'll have live feature flag control in minutes.

By the end, you'll understand how the integration works and have a template you can drop straight into your own projects.

What you'll need

  • A GitLab project (Free, Premium, or Ultimate) with feature flags enabled. This is where you'll create and manage your flags. To enable it, go to your project and navigate to Settings > General > Visibility, project features, permissions and make sure the Feature Flags toggle is on.
  • The demo repo forked into your own GitLab namespace, then cloned locally

How GitLab feature flags work under the hood

GitLab exposes an Unleash-compatible API for every project. That means any Unleash client SDK (Go, Ruby, Python, JavaScript, and more) can connect directly to GitLab without a separate Unleash server.

On startup, the SDK fetches all flag definitions, then re-fetches on a configurable interval (the demo uses 15 seconds). Every call to is_enabled() evaluates locally against the cached configuration with no network call per flag check. That makes flag evaluation near-instant and resilient to transient network issues.

Here are the steps to take to integrate GitLab feature flags into a Python Flask app using the Unleash SDK.

1. Set up your GitLab project and clone the demo

You'll need:

  • Your own GitLab project to host the feature flags
  • The demo repo cloned locally to run the app

Fork or clone the demo repo

Go to gitlab.com/omid-blogs/gitlab-feature-flags-demo and fork it into your own GitLab namespace. This gives you a personal copy of the project where you can manage your own feature flags. Then clone it locally and open it in your favorite IDE:

git clone https://gitlab.com/<your-namespace>/gitlab-feature-flags-demo.git
cd gitlab-feature-flags-demo

What's inside the repo

.
├── app.py                # Flask app + Unleash SDK integration
├── requirements.txt      # Python dependencies
├── .env.example          # Template for required environment variables
├── .gitignore
├── templates/
│   └── index.html        # Web UI template
└── static/
    └── styles.css        # Styling

2. Create your feature flags in GitLab

Open your own GitLab project and navigate to Deploy > Feature Flags, then click New feature flag. Create the following four flags, setting each status to Active with a strategy of All users.

  • dark_mode — Switches the page to a dark color scheme.
  • holiday_banner — Shows a festive banner at the top of the page.
  • new_layout — Switches the card grid to a single-column layout.
  • fun_fonts — Swaps the body font to a playful handwritten style.

All four feature flags in the GitLab UI

Tip: A flag must be Active and have at least one strategy to evaluate as enabled. Without a strategy, the SDK treats the flag as disabled even if it's marked "Active."

Understanding strategies

"All users" is a simple on/off toggle, but GitLab supports several more out of the box:

  • Percent rollout (recommended): Gradually roll out to a percentage of users based on user ID, session ID, or randomly. This is the most flexible option and the one to reach for first.
  • Percent of users: Enable for a percentage of authenticated users. Less flexible than Percent rollout since it only works with logged-in users.
  • User IDs: Enable for specific user IDs only, great for internal testing with a named group.
  • User list: Enable for a predefined list of users.
  • All users: Enable for everyone.

Strategies are where feature flags get really powerful. Start with your QA team using a User IDs strategy, switch to a 10% percent rollout, then flip to All users when you're confident. All from the GitLab UI, no code changes required.

3. Get your Unleash credentials

On the Feature Flags page, click Configure in the top-right corner. You'll see two values:

  • API URL: https://gitlab.com/api/v4/feature_flags/unleash/<your-project-id>
  • Instance ID: A unique token scoped to your project

Copy both values. You'll pass them to the app as environment variables. Note that the Instance ID is read-only. It can only fetch flag state, not modify anything, but still treats the Instance ID as a secret to prevent information disclosure.

Configure panel shows your API URL and Instance ID

4. Set up the project locally

The README has the full setup walkthrough, but the short version is:

pip install -r requirements.txt

Then set your credentials. You can do this one of two ways:

Option A: Using the .env file (recommended)

The repo includes a .env.example file. Copy it and fill in your values:

cp .env.example .env

Open .env in your editor and replace the placeholder values with your credentials from Step 3:

UNLEASH_URL=https://gitlab.com/api/v4/feature_flags/unleash/<your-project-id>
UNLEASH_INSTANCE_ID=<your-instance-id>
UNLEASH_APP_NAME=production

Then export them:

export $(grep -v '^#' .env | xargs)

Option B: Export directly in your terminal

export UNLEASH_URL="https://gitlab.com/api/v4/feature_flags/unleash/<your-project-id>"
export UNLEASH_INSTANCE_ID="<your-instance-id>"
export UNLEASH_APP_NAME="production"

Never commit your .env file to version control. The .gitignore in the repo already excludes it, but it's worth knowing why: your Instance ID is a secret and should stay out of git history.

Three environment variables drive the entire integration:

VariableRequiredDescriptionDefault
UNLEASH_URLYesGitLab Unleash API URL for your project
UNLEASH_INSTANCE_IDYesInstance ID from the Configure panel
UNLEASH_APP_NAMENoEnvironment name, matches flag strategiesproduction

UnleashClient is the key dependency. It's the official Unleash Python SDK that handles polling, caching, and local flag evaluation so you don't have to build any of that yourself.

5. Understand the application

Before running it, read through app.py. Here are the key patterns worth understanding so you can replicate them in your own projects.

Initializing the SDK

unleash_client = UnleashClient(
    url=UNLEASH_URL,
    app_name=UNLEASH_APP_NAME,
    instance_id=UNLEASH_INSTANCE_ID,
    refresh_interval=15,
    metrics_interval=60,
)

unleash_client.initialize_client()

No personal access tokens, no credentials hardcoded in source. The app exits immediately with a clear error message if either required variable is missing.

Checking a flag

def is_flag_enabled(flag_name):
    return unleash_client.is_enabled(flag_name)

Because the SDK caches flag definitions in memory, is_enabled() returns instantly with no network roundtrip.

Gating real behavior behind flags

The index route builds a feature dictionary, evaluating each flag and passing the results to the template:

features = {}
for flag_name, config in feature_configs.items():
    features[flag_name] = {
        **config,
        'enabled': is_flag_enabled(flag_name)
    }

return render_template('index.html', features=features)

The template uses those values to conditionally apply CSS classes and render UI elements. dark_mode toggles a body class, holiday_banner shows or hides a banner element entirely. Open templates/index.html to see how it's wired together.

Note that index.html also auto-refreshes every 30 seconds via a small JavaScript snippet, so you can watch flag changes take effect without manually reloading.

Passing user context for targeted strategies

When you're ready to move beyond All users and use Percentage rollouts or User targeting, pass a context object to is_enabled():

unleash_client.is_enabled(
    'new_layout',
    context={'userId': current_user.id}
)

The SDK handles consistent hashing for percentage rollouts automatically. No math required on your end.

6. Run the app

python3 app.py

Visit http://localhost:8080. You should see all four feature cards showing their current enabled/disabled state.

Demo app with all four feature flags disabled

7. Toggle flags in real time

Go back to Deploy > Feature Flags in GitLab and toggle one of the flags. Try dark_mode or holiday_banner for the most visible effect. Wait about 15 seconds, then reload the page. The card updates to reflect the new state, and if you toggled dark_mode on, the entire page switches to a dark theme. Toggle it back off, wait, reload, and it snaps back instantly.

This is the core value of feature flags: You control application behavior from GitLab without touching code or redeploying.

Demo app with two feature flags toggled off

Demo app with two feature flags toggled off

Why the Unleash SDK instead of the GitLab REST API?

For an app that evaluates flags on every request, the SDK is the clear winner. It's faster, simpler, and the Instance ID it uses carries no permissions beyond reading flag state. That's a much smaller security footprint than a PAT.

REST APIUnleash SDK
AuthenticationRequires a Personal Access Token with broader project permissionsUses only the Instance ID, read-only, scoped to flag state, no PAT needed
Flag evaluationNetwork call per checkEvaluates locally from cached config
Latency per checkNetwork round-tripNear zero (in-memory)
Strategy supportManual parsing requiredBuilt-in support for percentage rollouts and user ID targeting
Rate limitsSubject to GitLab.com API rate limitsSingle polling connection per app instance

Troubleshooting

ProblemFix
App exits with ERROR: UNLEASH_URL and UNLEASH_INSTANCE_ID...Set both env vars. See .env.example.
All flags show as disabledCheck that the flags exist in GitLab and have an active strategy. Then wait 15 seconds for the SDK to refresh.
Changes in GitLab don't appearThe SDK polls every 15 seconds. Reload the page after a short wait.
A local IP address doesn't workYour OS firewall may be blocking Port 8080. Use localhost:8080 instead.

A note on rate limits in production

The 15-second polling interval works well for development and small deployments. With all clients polling from the same IP, GitLab.com can support around 125 clients at a 15-second interval before hitting rate limits. If you're building a larger production app, consider running an Unleash Proxy in front of your clients. It batches requests to GitLab on behalf of all your instances and dramatically reduces upstream traffic.

Security considerations

  1. debug=False is already set in the demo: Keep it that way. Flask's debug mode exposes an interactive debugger that allows remote code execution.
  2. Keep dependencies updated: The requirements.txt pins specific versions. Enable GitLab Dependency Scanning in your CI/CD pipeline to stay on top of vulnerabilities.
  3. Use environment variables for credentials: Never hardcode the Instance ID or any tokens in source code. The demo's .env.example shows the right pattern.
  4. The Instance ID is read-only: It can only fetch flag state, not modify it. Still treat it as a secret.

Summary

This tutorial covered the full lifecycle of integrating GitLab feature flags into a Python application: creating flags with the right strategies, retrieving Unleash credentials, initializing the SDK, evaluating flags locally in Flask, and toggling behavior in real time without a redeployment.

The entire integration requires one dependency (UnleashClient), three environment variables, and a single method call (is_enabled()). No separate Unleash server, no personal access tokens, no complex infrastructure.

Feature flags are one of the most practical tools available for reducing deployment risk. The ability to instantly disable a broken feature, or progressively roll one out from a targeted user group to a percentage to everyone, without touching a deployment pipeline, delivers outsized value for minimal setup. The demo repo provides a working foundation to fork and adapt for any project.

Resources

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

How to Use GPO to Install EXE Files on Multiple Computers

1 Share
Group Policy is a powerful tool for managing Windows Environments, and we’ve previously discussed how you can deploy an MSI using GPOs. [...]
Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories