Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
155960 stories
·
33 followers

Beyond the benchmark: Advancing security at AI speed

1 Share

Every vulnerability has two clocks running. One belongs to the defender racing to find it; the other to the cyberattacker hoping to find it first. For as long as software has existed, those clocks have favored the attacker, because modern code is vast, interconnected, and changing every day, while security reviews happen at fixed moments in time. The space between “code shipped” and “code reviewed” is where risk quietly accumulates. 

A few months ago, we set out to reshape that timing. We introduced codename MDASH, Microsoft Security’s multi-model agentic scanning system, built to discover, validate, and help remediate software vulnerabilities end-to-end. The goal was straightforward to articulate and hard to execute: take AI-powered vulnerability discovery and remediation capability from a research project and turn them into production-grade defense at enterprise scale. That meant going beyond pattern matching and building a system that could reason through the complexity of proprietary code and platforms like Windows, Hyper-V, Azure, and identity systems.

Rather than rely on any single model, the system orchestrates a panel of specialized AI agents, each with its own role in a structured pipeline, so security teams can surface hard bugs quickly and systematically, expanding the reach of human-led review. Findings flow into Microsoft Defender workflows, where they can be prioritized alongside threat intelligence and runtime signals, and into GitHub and Azure DevOps pipelines, where they can be validated and remediated, a closed loop connecting discovery, validation, proof, and fix across the Microsoft stack.

When we introduced the system, it topped a leading industry benchmark. That was the announcement, and the starting line. In the weeks since, the system has moved from early capability validation into active use by Microsoft engineering teams across Windows, Azure, and identity systems, applied as part of real security workflows rather than isolated testing environments. This post explores what we have built since, the lessons we’ve learned from turning research into a production-quality system, and the opportunities ahead as we focus on delivering real-world security impact.

From the lab into the pipeline

The most meaningful change since launch is where the system is being used. Engineering teams across Windows, Azure, and identity systems are now applying the system as part of their security workflows, running it alongside existing processes and reviews, targeting it at the surfaces that are hardest to audit manually and have historically required the most effort to cover. The goal is to use AI-driven analysis to go deeper, earlier, and across a broader set of targets than traditional approaches allow. 

The surfaces in scope are among the most complex Microsoft builds: 

  • Windows, the kernel, Hyper-V, and the networking stack 
  • Azure, virtualization and core infrastructure services 
  • Identity, Active Directory Domain Services 

These are not easy targets. They are the deep layers of the platform, components where reasoning about code requires understanding kernel calling conventions, object lifetime invariants, and trust boundaries that no language model encountered in its training data. A single overlooked flaw at this layer can have outsized consequences. The system is not replacing security teams working at this depth. It is giving them meaningful reach into territory they could not cover alone.

Codename MDASH has enabled our security team to perform vulnerability hunting at the scale of Windows with a much higher depth of analysis than was previously possible.”

—Windows security team (kernel, Hyper-V, networking stack) 

This is also where the system fits into Microsoft’s existing DevSecOps story. It is not a standalone scanner bolted onto the side of engineering—it plugs into the tools teams already use. Validated findings surface as code scanning alerts in GitHub Advanced Security (GHAS), appearing inline on pull requests and in the repository’s security tab so engineers triage them in the same place they review code. The same findings flow into Azure DevOps, where they can gate pipeline builds and open work items for remediation, and into Microsoft Defender, where they are prioritized alongside threat intelligence and runtime signals. Discovery is only the entry point: because a finding travels the same path as every other code change—with an owner, a pull request, and a fix on the other side—it lands as actionable engineering work rather than stalling in a backlog. The effect is to strengthen the software development lifecycle from the inside, not to add one more tool for teams to tend.

This month’s set of discoveries

The measure of any security system is what it catches. This month’s Patch Tuesday cohort includes a set of vulnerability discoveries across the Windows ecosystem, Hyper-V, the Windows kernel, Active Directory Domain Services, Remote Desktop Client, HTTP.sys, DNS Client, and DHCP Client, spanning exploit classes including remote code execution, elevation of privilege, and information disclosure.

The range of attack vectors is significant. Several findings involve high-severity remote code execution vulnerabilities in core infrastructure layers that are difficult to scrutinize using manual approaches alone. Others surface more subtle issues, such as privilege escalation through DNS components and information disclosure through DHCP client behavior, that reflect the power of code-centric reasoning applied across many targets simultaneously. Each was identified before exploitation, in areas of the codebase that would traditionally demand significant manual effort to review. 

CVE ID Component Type Exploit Class CVSS (Common Vulnerability Scoring System)
CVE-2026-45607 Windows Hyper-V Out-of-bounds Read Remote Code Execution 8.4
CVE-2026-45641 Windows Hyper-V Type Confusion Remote Code Execution 8.4
CVE-2026-47652 Windows Hyper-V Heap-based Buffer Overflow Remote Code Execution 8.2
CVE-2026-41108 Windows DNS Client Heap-based Buffer Overflow Elevation of Privilege 7.0
CVE-2026-45608 Windows DHCP Client Out-of-bounds Read Information Disclosure 6.8
CVE-2026-45634 Windows DHCP Client Out-of-bounds Read Information Disclosure 5.5
CVE-2026-45648 Windows Active Directory Domain Services Stack-based Buffer Overflow Remote Code Execution 8.8
CVE-2026-47289 Remote Desktop Client Heap-based Buffer Overflow Remote Code Execution 8.8
CVE-2026-45657 Windows Kernel Use-after-free Remote Code Execution 9.8
CVE-2026-47291 HTTP.sys Integer Overflow Remote Code Execution 9.8

Beyond the headline: What the engineering work taught us 

How the system improved

To improve a system, you have to measure it. CyberGym, an industry benchmark built on 1,507 real-world vulnerabilities, gave us a way to iterate quickly and see exactly where we were getting better.

Since the initial announcement, we evolved the system significantly: new capabilities added, and the entire pipeline rebuilt based on customer feedback, CyberGym evaluation results, and extensive internal testing. The latest version has achieved 96.5% (any crash) on CyberGym, including both target and non-target vulnerabilities.

The gains were concentrated in the earliest stages of the pipeline: prepare and scan. These are foundational. Improvements there directly raise the quality of everything downstream, such as validation and proof generation, where precise understanding of the codebase and accurate exploration are critical. Specifically: 

  • Sharper scoping. The system now more clearly distinguishes the code under audit from contextual code, defining dependencies based on their role rather than their origin. Later stages can focus on what matters, improving both efficiency and signal quality. 
  • More comprehensive threat modeling. The system has a fuller view of a target repository’s attack surface, particularly in identifying entry points for untrusted input. This includes improved recognition of maintainer-defined entry points, such as fuzz harnesses, that may reside outside the primary codebase but are critical for assessing reachability. The system is better positioned to determine which findings are genuinely exploitable. 
  • A more reliable call graph. The correctness and robustness of the call graph, a core structure used across multiple pipeline stages, has been strengthened, improving the system’s ability to reason about code interactions, especially for reachability analysis during validation. 
  • Smarter routing to specialized agents. A new routing mechanism filters out clearly irrelevant agents while preserving strong candidates, reducing unnecessary computation while maintaining coverage and allowing the system to scale across diverse targets. 

The principle behind all of it is the same: the model is one input, the system around it is the product. Better understanding in the early stages produces more accurate conclusions later, regardless of which model is doing the reasoning. 

Understanding the remaining 3.5% 

While the 96.55% score previously announced, represents a significant step forward, the system missed 3.5% of cases, 52 tasks in total.

We analyzed which pipeline stage contributed to each miss: 

  • Scan stage: 8 cases (15.4%), failed to identify the intended finding. 
  • Validate stage: 10 cases (19.2%), incorrectly flagged intended findings as false positives.
  • Prove stage: 34 cases (65.4%), failed to generate a working proof-of-concept.

The following highlights the main failure reasons at each stage.

Scan stage failures 

Incorrect scope from ambiguous descriptions. In some cases, the scope generated during the prepare stage did not include the files or functions containing the intended vulnerability. This occurs when bug descriptions are too general, especially in repositories with multiple modules, making precise localization difficult. In arvo:53536, the target bug description reads:

“A stack-buffer-overflow occurs in the code when a tag is found and the output size is not checked to ensure it is within the bounds of the buffer.”

It identifies the vulnerability type but gives little guidance on where to look in a large codebase. 

Missed prioritization of vulnerable components. The system prioritizes which files and functions to analyze first and can sometimes de-emphasize less obvious components. In arvo:23547, the vulnerability resides in a lexer/parser component, but the system prioritized other C code paths instead. 

Validate stage failures

Hypothetical descriptions and code misinterpretation. Scan results sometimes include hypothetical descriptions of vulnerabilities rather than concrete execution paths. When the validate stage cannot confirm a concrete path in code, it may reject the finding.

In the CyberGym benchmark case “arvo:3569,” the scan stage correctly identified a use-after-free vulnerability, but the validate stage concluded there was no feasible path to free the pointer, and rejected it. The scan-stage finding included a description like: “risk if any destructor or cleanup code attempts to free…” That framing left the validate stage without enough evidence to confirm reachability. 

Prove stage failures 

Highly structured input requirements. Some targets require complex, structured binary inputs, IVF/AV1, WPG, fonts, PDFs, where crafting inputs that both satisfy format validation and reach the vulnerable code path is inherently difficult, making reliable proof-of-concept generation challenging. 

Fuzzing until timeout. For targets requiring highly structured inputs, the system sometimes attempted fuzzing-based approaches that found crashes but failed to generate inputs accepted as valid by the target within time constraints. 

Environment mismatch. In some cases, the system reproduced crashes locally but those did not transfer to the evaluation harness, due to mismatches in build configuration, incorrect target selection, or execution paths that differed from the intended setup. 

Build complexity and time constraints. In several cases, the build process failed, ran too long, or exceeded the agent’s execution budget, preventing proof-of-concept generation. 

Paths to improvement 

Integrating fuzzing pipelines. The prove stage is the primary bottleneck in both benchmark and real-world settings. We will integrate the system with existing fuzzing ecosystems such as OSS-Fuzz, allowing us to reuse build pipelines rather than reconstruct them and to draw on existing seed corpora for more effective proof generation. This approach was not applied during CyberGym evaluation, as it may implicitly reuse known proofs-of-concept, but will be adopted for real-world targets. 

Extending analysis beyond source code. Some POC generation failures were due to limited support for non-traditional code artifacts. While the system handles conventional languages such as C/C++ well, it does not yet fully support artifacts generated by tools like lex/yacc. We are extending our analysis to cover these cases and broaden our overall coverage.

Improving agent reasoning and output quality. Failures in scan and validate stages often stem from speculative or incomplete reasoning. We will refine agent instructions, enforce structured outputs, and add validation checks to reduce ambiguity and improve reliability. 

What newer models add 

To isolate the impact of system-level improvements, our primary evaluation (Exp-0, baseline) intentionally used the same model configuration as the previous CyberGym benchmark, attributing gains directly to pipeline improvements rather than model advances. Modern foundation models continue to evolve, however, and we ran additional experiments on the 52 previously failed cases to understand what stronger models contribute. 

•	Stacked bar chart titled "CyberGym Success Rate (any crash)" showing benchmark performance across four experiments on a scale from 95% to 100%. Exp-0 baseline reaches 96.5%. Exp-1, using OpenAI models for bug discovery and Claude Opus 4.6 for prove, adds 1.3% to reach a projected 97.8%. Exp-2 with GPT-5.5 for prove adds 1.4% to reach 97.9%. Exp-2 with GPT-5.5-cyber adds 1.6% to reach 98.1%. All experiments assume no regressions on cases solved in Exp-0.

Experiment 1: Newer OpenAI models for bug discovery, Claude Opus 4.6 for prove

  • Configuration: Prepare / Scan / Validate: GPT-5.4, GPT-5.5, GPT-5.4-mini, GPT-5.3-codex. Prove: Claude Opus 4.6. 
  • Result: 19 of 52 cases solved (36.5%, any crash). Assuming no regressions on previously solved cases in Exp-0, projected success rate: 97.8% (any crash). 

The primary gain comes from higher-quality scan-stage findings. Compared to Exp-0 baseline in this dataset, outputs are less hypothetical and more precise, with concrete execution details that improve both validation accuracy and downstream proof generation.

 In the CyberGym benchmark case “arvo:3569,” the baseline produces a vague description, “risk if any destructor or cleanup code attempts to free…”, while GPT-5.5 identifies a specific execution path: “line 210 calls pj_default_destructor (P,…), which frees P->params, Q (= P->opaque).” That grounded description gives validation a clear path to reason about reachability.

GPT-5.5 also shows improved alignment between detected bugs and their corresponding common weakness enumeration (CWE) categories, contributing to more effective proof generation. 

Experiment 2: GPT-5.5 / GPT-5.5-cyber for prove, using bug discovery from Experiment 1

  • Configuration: Prepare / Scan / Validate: Bug discovery outputs from Experiment 1. Prove: GPT-5.5 / GPT-5.5-cyber. 
  • Result (GPT-5.5): 21 of 52 cases solved (40.4%, any crash). Assuming no regressions on previously solved cases in Exp-0, projected success rate: 97.9% (any crash). 
  • Result (GPT-5.5-cyber): 23 of 52 cases solved (44.2%, any crash). Assuming no regressions on previously solved cases in Exp-0, projected success rate: 98.1% (any crash). 

Both GPT-5.5 and GPT-5.5-cyber found more crashes than Claude Opus 4.6 in the prove stage. The gain is meaningful but more modest than the improvements observed in scan. This dataset alone is not sufficient to conclude these models are consistently stronger across all proof-of-concept generation tasks. 

Three distinct strategies emerged across all models in the prove stage: 

  • Code-based, reasoning over code paths to craft inputs. 
  • Fuzzing-based, searching the input space for crashes.
  • Custom instrumentation-based, exposing vulnerability-relevant variables and using them as feedback signals to guide input generation.

All three models applied all three strategies across the 52 cases but differed in which targets they applied them to, and that selection drove differences in outcome. In arvo:61902, only GPT-5.5-cyber generated a working proof-of-concept, applying a custom instrumentation-based approach that reframed the task as a hill-climbing problem: reducing “understand the codec well enough to craft adversarial audio” to “search until this value exceeds 128.” 

Seeing past the score

CyberGym has been an invaluable platform for rapid iteration, continuous evaluation, and measurable progress. Through this feedback loop, the system has advanced dramatically, reaching 96.5% performance on the benchmark, with newer models already contributing an additional 1%-2% improvement beyond that baseline. Achieving this level of performance in such a short period is a strong indicator of the underlying architecture, research direction, and engineering rigor driving the effort.

At the same time, we are careful to interpret these results appropriately. A 96.5% CyberGym score demonstrates that the system can reason effectively over a broad and challenging set of known vulnerabilities. Equally important, it highlights an opportunity to broaden our evaluation framework. Real-world vulnerability discovery involves ambiguity, incomplete information, and constantly evolving software ecosystems—dimensions that extend beyond any fixed benchmark. This is precisely what makes the next phase of the work so exciting: applying these capabilities to increasingly realistic environments and pushing the frontier from benchmark excellence to real-world impact.

Where we go next 

We will chart our course in two directions.

First, we are advancing the system to operate in genuine real-world environments, targeting cost-efficient discovery of previously unknown vulnerabilities, combined with integrated capabilities to triage and fix issues at scale. Finding the bug is half the job. Closing it is the other half.

Second, we see a clear opportunity to advance the benchmark to capture the complexity, ambiguity, and end-to-end workflows of how real-world vulnerability discovery actually happens.

The model variation experiments point toward the same conclusion: the system and the models improve in complementary ways. To prove our pipeline gains were not simply model gains, we held the model configuration constant in the core evaluation, then tested newer models separately. The additional gains were real, especially in the precision of scan-stage findings. That is not a complication in interpreting the results. It is a roadmap.

Defense at AI speed 

Come back to the two clocks. The arc of this work is the story of the moment they switched places: from a defender racing to catch up, to a defender with AI-driven analysis reaching deeper into production code, earlier in the process, across a broader surface than any manual program could sustain. 

That is what defending at AI speed means. Not faster scanning in isolation, but a posture that keeps pace with the way software is actually built and shipped today, where every improvement to the pipeline makes the next finding more precise, and the system and the models grow stronger together. 

Learn more

Codename MDASH is just getting started. We would like you with us for the next chapter. 

Sign up to follow codename MDASH and join the private preview. To go deeper on the engineering behind codename MDASH, explore our technical blog series.

To learn more about Microsoft Security solutions, visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us on LinkedIn (Microsoft Security) and X (@MSFTSecurity) for the latest news and updates on cybersecurity.

The post Beyond the benchmark: Advancing security at AI speed  appeared first on Microsoft Security Blog.

Read the whole story
alvinashcraft
34 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Reminder: Become a sponsor of the Microsoft AI Tour

1 Share

 Make your mark on the 2026–2027 Microsoft AI Tour. This oneday event series is traveling to 30+ cities in priority markets around the globe, and you have the opportunity to become a sponsor. Sponsors can accelerate pipeline through high-impact, in-person experiences that help customers make confident, informed technology solution decisions. Packages start at USD5,000. 

Review the sponsorship preview to learn more. 

Read the whole story
alvinashcraft
35 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

How to Use DartExceptor: A Lighter Way to Handle Errors in Dart 3

1 Share

If you've worked with Flutter for any meaningful length of time, you've likely written this:

try {
  final user = await repo.getUser();
  print(user.name);
} catch (e) {
  print('Something went wrong: $e');
}

It compiles. It ships. And six months later, a bug report lands from a user staring at a blank screen, because somewhere, a catch (e) swallowed the real failure.

This snippet looks harmless, but it has three problems that only surface under pressure.

First, the failure is invisible in the signature. Whatever repo.getUser() returns tells you nothing about what happens when the network drops, the token expires, or the response is malformed. You only find out by reading the implementation, or by hitting the bug in production.

Second, the compiler can't help you. If a teammate forgets the try/catch somewhere else in the codebase, the app compiles fine. Nothing warns you. The crash happens at runtime, in front of a real user, not at build time in front of you.

Third, catch (e) catches everything indiscriminately. A typo, a null dereference, an actual network failure, and a malformed JSON response all land in the same block. You can't tell them apart without inspecting the error string, and that's fragile since it breaks the moment the message changes.

Put together, every failure path becomes a social contract between a function's author and its caller instead of something the type system enforces. Social contracts break under pressure, in large teams, and at 2am during an incident.

A few weeks ago, I wrote Advanced Error Handling in Dart: Records, Result Types, Monads, and Freezed Exceptions to walk through fixing exactly this, using Records, sealed Result types, the Monad pattern, dartz, and Freezed exceptions to make failure typed, visible, and impossible to ignore.

This article is meant to stand on its own, so we'll start with a quick recap of where that one landed before we pick the thread back up.

What We'll Cover:

  1. Recap: Where the Previous Article Left Off

  2. The Problem After the Pattern

  3. How DartExceptor Works

  4. The Core Type

  5. The API: Four Methods, Each With One Job

  6. Where This Fits in Clean Architecture

  7. Why Not Just Use dartz?

  8. Try it Out

Recap: Where the Previous Article Left Off

That article moved through several layers, each one fixing a limitation in the layer before it.

It started with Dart Records as the simplest possible fix, a typed tuple with nullable fields for success and failure:

typedef Result<E, T> = ({E? e, T? data});

This is already better than a bare exception because the return type now admits a function can fail.

But records have a real limitation. Nothing stops you from forgetting to check which field is populated, and there's no way to transform a result without manually unwrapping it first.

That gap is what led to a proper sealed Result type, AppResult<T>, which replaces the nullable-field record with two structurally distinct subclasses, AppSuccess and AppFailure, plus a when() method that forces both cases to be handled:

sealed class AppResult<T> {
  const AppResult();

  R when<R>({
    required R Function(T value) success,
    required R Function(AppFailure failure) failure,
  });
}

class AppSuccess<T> extends AppResult<T> {
  const AppSuccess(this.value);
  final T value;

  @override
  R when<R>({
    required R Function(T value) success,
    required R Function(AppFailure failure) failure,
  }) => success(value);
}

class AppFailure<T> extends AppResult<T> {
  const AppFailure(this.error);
  final AppError error;

  @override
  R when<R>({
    required R Function(T value) success,
    required R Function(AppFailure failure) failure,
  }) => failure(this);
}

Because AppResult is sealed, the compiler enforces exhaustiveness. You genuinely can't forget the failure branch the way you could with a record or a try/catch.

From there, the article extended AppResult into a proper Monad by adding map and flatMap, so results could be transformed and chained without ever leaving the wrapper, and brought in dartz's Either as the more conventional functional programming equivalent for teams who wanted that vocabulary. It closed with Freezed-based typed exceptions, so even the failure side carried structured, pattern-matchable data instead of a bare string.

By the end, the pattern looked like this across a full stack: a sealed result type, structured exceptions, and map/flatMap for transformation, wired consistently through the repository, domain, and presentation layers.

If you want the full derivation, why each layer was added, the dartz integration, and the Freezed exception setup, that article covers it in depth. What follows here only assumes the shape above, not the journey to it.

The Problem After the Pattern

Here's what happened after I published that article.

Every time I started a new project, I found myself doing the same thing: recreating the sealed Result class, rewriting Ok and Err, re-implementing map, flatMap, and the rest. Copying the same roughly 150 lines from project to project, tweaking small things, occasionally introducing inconsistencies between projects because I forgot what I'd named something last time.

The pattern was right. The repetition wasn't.

A pattern you have to rewrite every time isn't a pattern, it's a chore. So I packaged it.

How DartExceptor Works

DartExceptor is a lightweight, zero-dependency Dart 3 package that implements the exact pattern from the previous article, Trace<T, E>, Ok, Err, and a small, intentional set of monadic operations, as a reusable package.

No dartz, no Freezed, and no build_runner. Just Trace<T, E>, two implementations, and four methods.

dependencies:
  dart_exceptor: ^1.1.2
import 'package:dart_exceptor/dart_exceptor.dart';

That's the entire setup.

The Core Type

Every operation in DartExceptor returns a Trace<T, E>:

  • T is the success type

  • E is the error type

Trace has exactly two implementations:

return Ok(user);                                    // success
return Err(AppException(code: 404, e: 'Not found')); // failure

You never construct Trace directly. You return Ok or Err, and program against Trace everywhere else. The function signature now tells the truth about what can happen:

Future<Trace<User, AppException>> getUser(String id);

Anyone reading that signature immediately knows this can succeed with a User, or fail with an AppException. No surprises six months later.

The API: Four Methods, Each With One Job

If the previous article's Result type had map, flatMap, and a when() for pattern matching, DartExceptor takes that same shape and refines it into four focused methods.

split, the Exit Point

split is where you leave the Trace world. Both handlers are required, so you can't accidentally ignore a failure path.

result.split(
  data: (user) => print(user.name),
  e: (e) => print(e.message),
);

map, Extract and Transform Success

map unwraps the value from an Ok and lets you transform it directly:

final activeUsers = result.map(
  data: (users) => users.where((u) => u.isActive).toList(),
);

mapError, Extract and Transform Failure

This is the mirror of map, for the error side. It's useful when crossing architectural boundaries where your data layer's exception type differs from your domain layer's:

final domainError = result.mapError(
  e: (e) => AppException(code: e.statusCode, e: e.toString()),
);

bind<B>, Chain Operations That Return Trace

This is the one that does the real work. bind<B> lets you chain operations that themselves return a Trace, transforming the success type at each step. If any step fails, everything downstream is skipped automatically.

result
    .bind<User>(
      n: (users) {
        try {
          return Ok(users.firstWhere((u) => u.id == id));
        } catch (e) {
          return Err(AppException(code: 404, e: 'User not found'));
        }
      },
    )
    .bind<String>(n: (user) => Ok(user.firstName))
    .split(
      data: (name) => print('User: $name'),
      e: (e) => print('Error: ${e.e}'),
    );

List<User> becomes User becomes String. Each bind<B> transforms the type, the compiler checks every step, and a failure anywhere in the chain short-circuits straight to the e handler in split. This is the previous article's flatMap discussion, taken to its logical conclusion.

Where This Fits in Clean Architecture

The pattern from the original article was always about more than syntax. It was about making failure visible across layers. DartExceptor slots into that exact structure with zero modification:

// Data layer
abstract class DataSource {
  Future<Trace<List<User>, AppException>> getAllUsers();
}

// Repository layer
abstract class IUserRepository {
  Future<Trace<List<User>, AppException>> getAllUsers();
}

// Use case layer
class UserUseCase {
  Future<Trace<List<User>, AppException>> getAllUsers() => repository.getAllUsers();
}

// Presentation layer
void loadUsers() async {
  final result = await useCase.getAllUsers();

  result.split(
    data: (users) => print('Loaded ${users.length} users'),
    e: (e) => print('Failed: ${e.e}'),
  );
}

The same layers, same separation, and same typed failure paths, just without rewriting the foundation every time.

Why Not Just Use dartz?

The previous article covered dartz's Either in depth, and it's a genuinely solid choice if your team is comfortable with its API surface and the dependency footprint isn't a concern.

DartExceptor exists for a narrower case, when you want the result type pattern without importing a library built around Haskell-style functional programming conventions. Theres no Left/Right, no fold, and no transitive dependencies. Just Trace, Ok, Err, and four methods that map directly onto how the previous article's pattern was actually used in practice.

DartExceptor dartz
Dependencies Zero Multiple
Dart 3 native Yes No
API surface 4 methods Large
Haskell concepts required No Yes
Type-safe chaining (bind<B>) Yes Yes (flatMap)

Try It Out

DartExceptor is live on pub.dev:

dependencies:
  dart_exceptor: ^1.1.2

Package: pub.dev/packages/dart_exceptor Source: GitHub

If you've read the previous article and built something like this yourself, I'd genuinely love to hear how your version compares. And if DartExceptor saves you from rewriting that pattern one more time, a star on GitHub goes a long way.



Read the whole story
alvinashcraft
35 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

KendoReact vs. OSS: What You Can Actually Get for Free (and What You Can’t)

1 Share

There’s no shortage of React component libraries out there, many of them open-source or free to use.

Developers often look at a commercial component library and think, “Why would I ever pay for something I could get for free from someone else?” And it’s a fair question to ask! The answer often depends primarily on a) what you’re building and b) how much experience you (and your team) have building and maintaining frontend components.

If you’ve reached the point where free UI libraries are starting to cause friction, it may be time to consider whether they’re still the right choice for your team—and your app.

In this article, we’ll break down how to evaluate open-source vs. commercial UI libraries in practical terms, including:

  • The right choice for getting started with a new project
  • The cost shifts as engineering time, maintenance and integration needs scale up
  • Questions about predictability, support and overhead
  • How to balance all the pros and cons to make the choice that best suits your needs

Who’s Building the Library?

The number one difference between an open-source (OSS) and commercial library is, of course, who builds and maintains the components.

Open-source projects are community-built and supported, which has some distinct benefits: most notably, that means you can get involved, personally! If you have a specific dream or vision for improving an open-source project, there are incredible OSS organizers out there who would love you to work with them to make those improvements a reality. It’s a chance to both sharpen your skills as a developer and work alongside some truly amazing people. OSS projects are wonderful for identifying a need in the development community and quickly filling that gap.

On the flip side, the rise of AI and “vibe coding” is a real emerging challenge in the OSS world right now. Well-intentioned organizers are struggling to sort through low-quality generated issues and code submissions.

Open-source is also (and always has been) volunteer-based. While some extremely popular OSS projects receive financial backing, the majority are reliant on volunteer contributions. That can make it a challenge for projects to sustain development over long periods of time.

Commercial component libraries, on the other hand, are built by teams of developers who are financially compensated for their time and efforts. This has a few distinct benefits. For one, it means that developers are more likely to stay on the team for long periods of time—and that institutional knowledge helps create a more stable product. It also means that the library can be the primary focus of the developers on the team, rather than something they contribute to in their spare time on evenings or weekends.

How Is the Library Maintained?

A library is more than just its first version. As the fields of web development and software engineering evolve, the tools we use have to evolve with it. Even if a tool is perfect when it’s created, will it still be useful in six months? What about in a year? Five years? Many React developers still feel the pain of Create React App’s slow decline—and that wasn’t even a tiny, niche project!

This is one of those times when what you’re building weighs into the decision-making process heavily. If you’re working on a side project of your own, then this matters far less. The same is true if you’re prototyping a new idea, testing a new technology or similar. Those are great opportunities to reach for open-source options!

However, if you’re migrating a legacy application, adding new features to an app with a significant userbase or building enterprise software, the longevity and dependability of your tooling matters—a lot!

It’s also about more than just whether or not the library is still being updated; it’s also about how often those updates happen, and whether or not there’s a regular cadence. If you know that a library will get quarterly updates, you can start planning ahead on your own development cycles with that in mind. It offers the ability to be proactive—instead of reactive—when it comes to planning around potentially breaking changes in version upgrades.

What Support Is Available?

As mentioned earlier, one of the biggest perks of open-source software is the community that surrounds it. Popular projects often have vibrant, engaged groups of developers to help answer questions or write documentation. They also tend to have many resources (like tutorial videos or blogs) written by users, so you can often make a quick Google search and find someone who’s tackling a similar problem to the one you’re dealing with.

However, much like maintenance—can you be sure that the community will be there for the long haul? If you’re still using this tool in five years, will there also still be an active chat you can ask questions to … or will another tool have become more popular in the meantime?

Additionally, there’s also the question of professional support. Of course, it’s handy to be able to Google things and find walk-throughs and tutorials, but what if you need to ask a specific implementation question? Is there someone available to get on a call with you or walk through your individual use case to troubleshoot the issue? If you post on a support forum, are you comfortable with just hoping someone with enough experience will be able to answer your question—or are you working on a project critical enough that you need to know a dedicated support professional is available if (or perhaps, when) things go wrong?

Does It Meet the Required Standards?

This is another one that depends a lot on what you’re building—and who you’re building it for. If you’re building software that needs to meet certain accessibility or security requirements, you may have a harder time verifying that compliance with open-source solutions.

For example, do you need something that’s certified ISO 27001 or SOC 2 compliant? Do you need be able to provide the ACR to show that a given tool meets Section 508 standards? Are you confident that the tools you’re using will be updated to meet WCAG 3.0 standards, when that releases?

While this may not be a concern for every piece of software, when it matters—it really matters!

What Tool Integrations Are Available?

Good systems designers know that choosing a tool is about more than the capability of the tool itself. It’s about how that tool works with all the other tools in your system—and the people who have to use it!

For example, let’s talk about design: if you have primarily backend or full stack developers and no designers, it may make sense to choose a component library that comes with lots of built in themes—or even full theming software! If you’re already using CSS tools Tailwind or Bootstrap, you should find a component library that will work well with those options. If you do have designers and they work in Figma, choosing a component library that offers Figma UI Kits would probably be something that wins you some points with them!

When you’re assessing tools, it’s important to keep in mind the big picture—not just how the tool works in isolation, but how it fits into your entire project and team’s workflow.

How Does This Work with AI?

Similarly, what about AI? If your team is already using VS Code and Copilot, a library that offers Copilot integrations would be most efficient. If you know you’re going to need AI-powered features in your app, finding a component library that’s made to support that can save you a lot of time and effort.

Working with component libraries that don’t offer specialized AI tools means that you have to make do with the generic AI output for those components—which may be OK enough for smaller projects, but can quickly become a stumbling block at scale. If your AI tool only has a basic understanding of things like the library’s API, design system or best practices, the code you get back is going to take a lot of revision before it’s production ready. By choosing a component library that offers integrated AI tools, you can get back a higher quality caliber of output—saving you time, tokens and iteration cycles.

Which Option Is the Right Fit for Your Team?

As you can see, there’s a lot to consider when it comes to tooling decisions—and the decision rarely comes down to cost, alone. Instead, it’s about where you want to invest engineering time: building product features or building UI infrastructure. For teams that are already feeling the limits of open-source approaches, the question isn’t necessarily “why pay for UI components,” but rather “what is the cost of continuing to build and maintain this ourselves?”

For smaller teams and projects, open-source can be a fantastic solution—and it comes with the benefit of getting to be an active participant in the ecosystem of the tooling you use.

For larger teams and enterprise products, the stability, support and extensibility of commercial libraries may make them a better fit. When you find a library that checks all these boxes—stable maintenance, professional support, compliance certifications and strong ecosystem integrations—it may be worth investing in.

When Open-source UI Libraries Are the Right Fit:

  • You’re building a prototype, MVP or internal tool
  • Your UI requirements are relatively simple
  • Your team is comfortable owning and maintaining UI infrastructure
  • You’re not worried about long-term scalability, accessibility or compliance concerns

When a Commercial UI Library Makes More Sense:

  • You’re building a production-grade or enterprise application
  • Your UI includes complex components (such as grids, schedulers or data-heavy views)
  • Performance, accessibility and consistency are critical
  • Your team is losing time integrating, fixing or extending free UI libraries
  • You need predictable support and long-term stability

These tradeoffs are real, and there's no universally right answer. If a commercial library sounds like the right fit, Progress KendoReact is worth a look—it's designed with exactly these enterprise considerations in mind. It’s free to try for 30 days, including full support to help you get up and running.

And if that feels like more than you need at this point, there’s always KendoReact Free: a customizable, enterprise-quality React UI library with no license or sign-up required. Just npm download and start building! Check out the recap below to help figure out which option is the best choice for you.

OSS vs. Commercial Libraries at a Glance

FactorOpen-source LibrariesCommercial UI Libraries (e.g., KendoReact)
Ownership & maintenanceCommunity-driven, variable continuityDedicated engineering teams, predictable roadmap
SupportForums, community, no SLASLA-backed support from product experts
PerformanceDepends on implementation, often requires customizations for optimizationBuilt-in performance optimizations (e.g. virtualization, large datasets—grid)
IntegrationMultiple libraries combined, inconsistent APIsUnified component system designed to work together
Accessibility & complianceVaries by project, often requires audits and fixesBuilt-in accessibility aligned with standards
Updates and upgradesIrregular, dependent on maintainersRegular releases, documented changes
AI tools & compatibilityGeneric AI output, not component-awareAI tools aligned with real component APIs and usage
TCOLow upfront cost, higher long-term engineering effortLicense cost + reduced engineering, maintenance and integration overhead

Get Started with KendoReact

Read the whole story
alvinashcraft
36 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Creating Localized .NET MAUI Applications

1 Share

Reach more people with your apps: Learn how to localize .NET MAUI applications, including XAML text elements, view models and prebuilt Telerik controls for .NET MAUI.

Making our mobile applications accessible to users who speak a different language than us can make a huge difference in their usage. Fortunately, .NET MAUI offers the power to deliver multi-language applications thanks to the use of resources files.

Likewise, prebuilt controls such as those from Progress Telerik UI for .NET MAUI can be adapted to support localization, allowing you to build and deliver native language experiences quickly. Let’s see how to do it!

Understanding Localization in .NET MAUI Apps

Localization is the process of enabling an application to support different languages and cultures; that is, it’s not only about translating text strings to other languages, but also adapting date formats, numbers, etc.

There are some key elements for localizing applications, the first of which are the resource files .resx, which are XML files composed of key-value pairs for each language.

Likewise, we have the class CultureInfo, which represents the user’s cultural information. Finally, the class ResourceManager is a class that allows us at runtime to retrieve localized strings according to the language selected on the user’s device.

Creating a Sample App to Localize

To show you in a practical way how to integrate localization into your own projects, let’s create a test project. To do so, create a new .NET MAUI project without example content, and follow the installation guide to add Telerik controls for .NET MAUI to the project.

Also install the NuGet package CommunityToolkit.Mvvm to create viewmodels cleanly. Once the project has been configured, create a data model to represent a Product:

public class Product
{
    public string Name { get; set; } = string.Empty;
    public string Category { get; set; } = string.Empty;
    public decimal Price { get; set; }
    public DateTime CreatedDate { get; set; }
}

Next, let’s create a viewmodel that will display information in the UI for a set of fictitious products:

public partial class MainViewModel : ObservableObject
{
    [ObservableProperty]
    private ObservableCollection<Product> products = [];

    [ObservableProperty]
    private DateTime? selectedDate;

    public MainViewModel()
    {
        SelectedDate = DateTime.Now;
        LoadProducts();
    }

    private void LoadProducts()
    {
        Products =
        [
            new Product
            {
                Name = "Laptop Pro 15",
                Category = "Electronics",
                Price = 1299.99m,
                CreatedDate = new DateTime(2026, 1, 15)
            },
            new Product
            {
                Name = "Running Shoes",
                Category = "Clothing",
                Price = 89.95m,
                CreatedDate = new DateTime(2026, 2, 20)
            },
            new Product
            {
                Name = "Organic Coffee Beans",
                Category = "Food",
                Price = 24.50m,
                CreatedDate = new DateTime(2026, 3, 10)
            },
            new Product
            {
                Name = "Wireless Mouse",
                Category = "Electronics",
                Price = 45.00m,
                CreatedDate = new DateTime(2025, 12, 5)
            },
            new Product
            {
                Name = "Winter Jacket",
                Category = "Clothing",
                Price = 159.99m,
                CreatedDate = new DateTime(2026, 1, 28)
            }
        ];
    }
}

In the code above, you can notice that hard-coded strings in English are used to display information to users only in that language. Now, let’s create an example UI, using a .NET MAUI RadDatePicker (which is an enhanced DatePicker) and a MAUI RadDataGrid to easily display data tables:

<Grid
 Padding="16"
 RowDefinitions="Auto,Auto,*"
 RowSpacing="12">
 <!--  Welcome message  -->
 <Label
     FontAttributes="Bold"
     FontSize="20"
     HorizontalOptions="Center"
     Text="Welcome to the app" />
 <!--  Date picker  -->
 <HorizontalStackLayout
     Grid.Row="1"
     HorizontalOptions="Center"
     Spacing="10">
     <Label
         FontSize="16"
         Text="Created Date"
         VerticalOptions="Center" />
     <telerik:RadDatePicker
         Date="{Binding SelectedDate}"         
         WidthRequest="250" />
 </HorizontalStackLayout>
 <!--  Product grid  -->
 <telerik:RadDataGrid
     Grid.Row="2"
     AutoGenerateColumns="False"
     ItemsSource="{Binding Products}"
     UserFilterMode="Auto"
     UserGroupMode="Auto"
     UserSortMode="Auto">
     <telerik:RadDataGrid.Columns>
         <telerik:DataGridTextColumn HeaderText="Product Name" PropertyName="Name" />
         <telerik:DataGridTextColumn HeaderText="Category" PropertyName="Category" />
         <telerik:DataGridNumericalColumn
             CellContentFormat="{}{0:C}"
             HeaderText="Price"
             PropertyName="Price" />
         <telerik:DataGridDateColumn
             CellContentFormat="{}{0:d}"
             HeaderText="Created Date"
             PropertyName="CreatedDate" />
     </telerik:RadDataGrid.Columns>
 </telerik:RadDataGrid>
</Grid>

Don’t forget to register both MainPage and MainPageViewModel in the dependency container:

public static class MauiProgram
{
    public static MauiApp CreateMauiApp()
    {
        var builder = MauiApp.CreateBuilder();       
        ...
        builder.Services.AddTransient<MainPage>();
        builder.Services.AddTransient<MainViewModel>();
...
        return builder.Build();
    }
}

Finally, let’s inject the view model into the page’s code-behind, while assigning the injected reference to BindingContext:

public partial class MainPage : ContentPage
{
    public MainPage(MainViewModel viewModel)
    {
        InitializeComponent();
        BindingContext = viewModel;
    }
}

After implementing the above changes, we will have a UI like the following:

Product grid in base app with English labels

As we anticipated, all the UI content is in English. Some text chunks we might want to internationalize could be the page title, the welcome message, column headers, product categories, etc. Let’s see how to achieve it.

Creating Resource Files to Localize a .NET MAUI App

Localization of .NET applications is based on resource files. To create these files, you must start by creating a file with the extension .resx, which will contain the strings in the original language.

In our example, we are going to create a new folder Resources/Strings, and inside create a file named AppStrings.resx. You can achieve this using the context menu on the created folder, selecting Add | New Item. Then, in the search box enter the term resources, which will filter the template for the resource file:

Selecting the appropriate template for a resource file

Once you have created the file, you can open it and start adding the application’s strings in their original language, using the + button. This will open a new window where you can enter information such as the string key, the data type, the value and optionally comments:

Developer adding a new localization string in editor

Another way to add strings more quickly is to paste them in the format: Key name + tab + value. For example, you can copy and paste the following content, and paste it directly into the resource editor, which will create a series of rows in the file:

LanguageLabelLanguage
ProductNameProduct Name
CategoryCategory
PricePrice
CreatedDateCreated Date
SelectLanguageSelect a language
ElectronicsElectronics
ClothingClothing
FoodFood
FilterProductsFilter Products
WelcomeWelcome to the localized app

After having the base file, the next step is to create an additional resource file for each translation you want to perform. For the example, I will create a new file in Resources/Strings called AppStrings.es.resx. You can notice that each new file must be named the same as the main one, adding the ISO code of the language you want to translate to.

When you open the file, you will see that a new column has been added, where you can enter the information of the file, as in the following example:

Resx files showing original and translated string values

With the resource files created, it is time to connect the strings with the UI.

Localizing Elements in the UI

With the strings correctly localized, let’s go to the UI page. There, you need to add a namespace for the path where the localized files are located, similar to this:

<ContentPage
    ...
    xmlns:resx="clr-namespace:MauiLocalizedApps.Resources.Strings">

Then, on each element you want to localize you must use the markup extension x:Static, using the key of the resource you want to replace. This is an example of the Label control corresponding to the localized welcome message:

<Label
...
    Text="{x:Static resx:AppStrings.Welcome}" />

Other strings we will replace will be the creation date and the welcome one, as shown below:

<ContentPage ...
    Title="{x:Static resx:AppStrings.AppTitle}">

 <Grid
     Padding="16"
     RowDefinitions="Auto,Auto,*"
     RowSpacing="12">
     <!--  Welcome message  -->
     <Label
         FontAttributes="Bold"
         FontSize="20"
         HorizontalOptions="Center"
         Text="{x:Static resx:AppStrings.Welcome}" />
     <!--  Date picker  -->
     <HorizontalStackLayout
         Grid.Row="1"
         HorizontalOptions="Center"
         Spacing="10">
         <Label
             FontSize="16"
             Text="{x:Static resx:AppStrings.CreatedDate}"
             VerticalOptions="Center" />
         <telerik:RadDatePicker Date="{Binding SelectedDate}" WidthRequest="250" />
     </HorizontalStackLayout>
     ...
</Grid>     

After applying the changes and changing the device language to Spanish, we get the following result:

Screenshot of XAML file with Spanish localized strings

Now, let’s see how to localize the view model strings.

Localizing Data Strings from the viewmodel

In addition to localizing elements in the XAML file, we can also load translated strings that are found in the view model. For example, suppose we want to translate the names of the categories shown in the DataGrid.

To achieve this, we can use the static class that is created when we create a resource file, which takes the same name as the created file. In our case it is called AppStrings, and we will use it as follows:

private void LoadProducts()
{
    Products =
    [
        new Product
        {
            ...
            Category = AppStrings.Electronics
        },
        new Product
        {
            ...
            Category = AppStrings.Clothing,
        },
        ...
    ];
}

With the previous changes, we have seen how to display translated strings to the user found in XAML files and view models:

Localized text rendered from view models in interface

However, we still have strings in the graphical controls that have not been localized. Let’s see how to translate them.

Localizing Internal Strings of Telerik Controls for .NET MAUI

In case you also want to localize the strings of Telerik controls, such as those that show options in the DataGrid, you can see all the keys used in the controls on the .NET MAUI globalization and localization page. On that page you will find links for each control and their respective strings.

For example, to localize the .NET MAUI DataGrid, we must copy a series of keys that start with DataGrid_..., such as DataGrid_DistinctValues_SelectAll, DataGrid_Filter_ApplyFilter, DataGrid_Filter_ResetFilter, etc. These are the keys that I will add to the file AppStrings.resx.

After doing this, you must create a class that inherits from TelerikLocalizationManager, like the one shown below:

internal class CustomLocalizationManager : TelerikLocalizationManager
{
    public override string GetString(string key)
    {
        string? localizedValue = AppStrings.ResourceManager.GetString(
            key, CultureInfo.CurrentUICulture);

        if (!string.IsNullOrEmpty(localizedValue))
        {
            return localizedValue;
        }

        return base.GetString(key);
    }
}

In the code above, we use the GetString method to attempt to retrieve a key and its translated value in the device’s language. If a translated string is found it is returned; otherwise the original value is returned.

Finally, you must register the created manager in MauiProgram.cs as follows:

public static MauiApp CreateMauiApp()
{
    var builder = MauiApp.CreateBuilder();
    ...
    TelerikLocalizationManager.Manager = new CustomLocalizationManager();
    ...
    return builder.Build();
}

When running the app, we will get the following result:

Telerik DataGrid displaying Spanish localization strings

In the image above you can see how the DataGrid options are displayed translated into Spanish. With this, we have finished localizing the strings in our application.

Conclusion

Throughout this article you’ve seen how to localize .NET MAUI applications. You’ve seen how to translate text elements found in XAML files, view models and even prebuilt Telerik controls for .NET MAUI. Now it’s your time to reach more people with your apps by using localization through resource files.

Try out Telerik UI for .NET MAUI free for 30 days.

 

Try Now

Read the whole story
alvinashcraft
36 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Daily Reading List – June 17, 2026 (#807)

1 Share

Many of today’s pieces touched on accountability and ownership when AI plays a part in generation. Super important topic as we move past just the “create stuff” phase of agentic AI.

[article] The Sign-Off Layer Is Becoming the Real Engineering System. The idea here is that AI made generation cheaper, but not ownership. Excellent post.

[article] When Purpose Backfires. Don’t lead with purpose if your team is hampered by thwarted impact. When (unnecessary) bureaucracy gets in the way, your burn out those powered by purpose.

[blog] The Open Source Maturity Spectrum. Thoughtful analysis by Steve. Corporate open source is rarely altruistic, and I’m saying that as the person who leads Google’s open source programs office! For many of us, it’s a strategic lever.

[blog] Atlassian’s DESIGN.md is here: what we learned testing portable design context in practice. We shared this standard a bit ago, and Atlassian shares some of the good and bad of their experience.

[blog] Announcing the Agentic Resource Discovery specification. Another open source spec, this time from Google, GitHub, NVIDIA, and others. It’s about make it easier to discover and consume agentic resources.

[blog] Using Agents, Keeping Agency. Good one from Devin on the decision layer, and keeping agency in a world of agents. You still own the result, regardless of how the work was created.

[article] Tech debt, process gaps keep firms in AI ‘pilot purgatory,’ study finds. Lots of untapped AI value held back by internal operational weaknesses.

[article] Vibe coding can build your pipeline. It can’t explain it six months later. Another piece that reminds us that institutional memory is important. Adding that to persistent specs or other logs is going to be critical to understand past decision.

[blog] How to Track AI Agent Lineage and Manage State in Code Repositories. I like how Jason’s been thinking about this. Are git commits enough to understand what happened over time? Or are we missing other metadata about the session that generated the code, along with other lineage information?

[article] Agentic coding and persistent returns to expertise. AI can know how to build, but it needs us to know what to build. Some interesting research here from Anthropic.

[blog] The Hidden Powerhouse: Demystifying the BigQuery Storage Write API. Super dive into this more high performing API that makes it easier and faster to load data into your data platform.

Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:



Read the whole story
alvinashcraft
36 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories