Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
154337 stories
·
33 followers

Is America Closer to Ending Daylight Saving Time?

1 Share
A proposal to make daylight saving time permanent has advanced in the U.S. House of Representative, reports California news station KCRA: A proposal to make daylight saving time permanent has advanced in the House, reigniting an age-old American debate around the twice-annual clock changes. And this time, the proposal has the president's backing. President Donald Trump said Thursday that he will work "very hard" to sign the so-called Sunshine Protection Act into law after the House Energy and Commerce Committee overwhelmingly approved the bill by a 48-1 vote. The bill still needs to pass the full U.S. House, and then the U.S. Senate would consider taking up the measure. The bill would allow U.S states to decide whether to "exempt themselves" from Daylight Saving Time, according to the article. The bill's sponsor described the annual clock-switching as "inconvenient, unnecessary, and out of step with the needs of today's families and economy," while finally creating a permanent Daylight Saving would bring "more usable daylight hours throughout the year."

Read more of this story at Slashdot.

Read the whole story
alvinashcraft
28 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

US Layoffs Haven't Increased, and New Tech Industry Hiring Balances Firings

1 Share
"The numbers show that layoffs in the U.S. are roughly at or below levels from before the pandemic," reports the Washington Post, "although they are higher than in 2022 when businesses snapped up workers as the economy roared back to life... "A different measure that accounts for the growing U.S. workforce shows that layoffs affected about 1.2% of employed people in March, a number that has been steady for years outside of the pandemic..." In the technology industry, where Meta and other companies are regularly announcing job cuts, the layoff picture is complex. There has been a marked increase in layoffs in recent months in what the Labor Department calls the information industry, which includes employment of software developers and other tech workers. But Matthew Martin, senior U.S. economist at the research and consulting firm Oxford Economics, noted that hiring has also increased in that category, which includes media and entertainment. The combination of hiring minus layoffs in the information industry is effectively a wash, Martin said. Layoffs at Big Tech companies like Meta and other high-profile employers don't necessarily reflect what is happening in the country, Martin said, and draw far more attention than what may be slow and steady workforce growth. "There's a lot more headlines about job cuts than there are [about] expansion plans by businesses," he said. In his view, technology companies may be pushing out some workers and replacing them with people who have different skills as they respond to the demands of AI. It's true that businesses in some industries are devoting enormous sums of money and attention to AI. It's changing how some people work and a minority of American businesses are rolling out AI tools. But it's also become a trend for bosses to blame layoffs on the productive capabilities of AI and its ability to replace workers, even when job cuts may have little to do with the technology. Sam Altman, CEO of ChatGPT-maker OpenAI, has taken note of the pattern that he and others call "AI washing," essentially a high-tech form of whitewashing... "You know something is happening all the time when they have a word for it," said Gautam Mukunda, who teaches leadership at the Yale School of Management... AI-related employment changes are tiny so far, said Nathan Goldschlag, director of research at the Economic Innovation Group, a Washington think tank. He pointed to a recently published analysis of Census Bureau surveys, which found more than 95 percent of businesses that use AI said it hasn't changed their staff sizes — and AI-related employment increases were more common than decreases.

Read more of this story at Slashdot.

Read the whole story
alvinashcraft
39 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Pwned with Purpose: Where Motives Meet Mayhem - Troy Hunt - NDC Sydney 2026

1 Share
From: NDC
Duration: 1:00:41
Views: 163

This talk was recorded at NDC Sydney in Sydney, Australia. #ndcsydney #ndcconferences #developer #softwaredeveloper

Attend the next NDC conference near you:
https://ndcconferences.com
https://ndcsydney.com/

Subscribe to our YouTube channel and learn every day:
/ @NDC

Follow our Social Media!

https://www.facebook.com/ndcconferences
https://twitter.com/NDC_Conferences
https://www.instagram.com/ndc_conferences/

#security

Every data breach tells a story — not just of compromised information, but of human motive.

Drawing on a dozen years of experience running Have I Been Pwned, this talk explores why hackers exfiltrate and share data, what those motives reveal about the evolving threat landscape, and how understanding them helps us respond more effectively in partnership with website operators.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete

From Closed Internal Stack to Open-Source Ecosystem: I Finally Shipped Three Years of .NET Infrastructure

1 Share

This is a submission for the
GitHub Finish-Up-A-Thon Challenge

Why this exists (the actual pain)

Before the architecture talk, the why. Three years ago
I watched my team — and every other .NET team I knew —
burn most of their hours not on business logic but on
plumbing the same four layers over and over:

  • EF Core + migrations. DbContext, fluent mappings, navigation properties, N+1 puzzles, lazy/eager loading, a migration per field, separate read/write models if you go CQRS. On a non-trivial object graph that's hundreds of hours just for the data layer, before any business logic is written.
  • A pile of integration connectors. Kafka here, RabbitMQ there, S3 for files, SFTP for the legacy partner, an HTTP webhook for the new one, each one hand-rolled with its own retry policy, dead-letter, idempotency, serialization, health check, telemetry. Ten integrations × 40-80 hours each.
  • Auth. OAuth flows, refresh tokens, scope checking, claims transformation, multi-tenant, M2M, DPoP if you want to be modern — built from IdentityServer / Auth0 / Keycloak plus 200-600 hours of glue per serious deployment.
  • Runtime. Hot-reload without dropping in-flight messages, graceful drain on redeploy, cluster coordination, leader election, watchdog, dashboard — usually missing entirely on .NET, or assembled from 5-7 unrelated NuGets that don't share conventions.

The worst part isn't any single layer — it's the seams
between them
. Where the auth user becomes the EF entity
becomes the Kafka message becomes the S3 blob, every seam
is one more serialization, one more mapping, one more
versioning headache, one more place a 3 AM page comes from.

RedBase is built so that business code is the only thing
left to write
. The class IS the schema (no EF, no
migrations). The 22 transports share one DSL (no per-connector
plumbing). Identity is a direct-vm:// Route (calling auth
is calling a function, not a network round-trip). Tsak gives
you hot-reload, cluster, dashboard, drain out of the box. The
seams collapse because everything lives on the same fabric.

On the one full business workflow where I had honest before/after
numbers — built the traditional way vs built on the RedBase
stack — the human effort was roughly ~3,000 hours vs ~128
hours
. That ratio isn't magic on any single layer (where
each subsystem gives 3-5× at most); it comes from the seams
no longer existing. When I asked Claude Opus 4.7 to
sanity-check that estimate against typical .NET project
breakdowns (data layer + integrations + auth + runtime +
inter-seam testing), the order of magnitude held up.

One stack, one team, one architectural style — so the team
gets to write features instead of wiring infrastructure.
That is the project. Everything below is just how I got
there.

What I Built

RedBase — a four-pillar open-source ecosystem for .NET
that grew up inside one production system over three years
and finally got shipped to the world this spring. Three
pillars are now public on GitHub under Apache 2.0, and the
fourth is in pre-release polish:

1. redb — typed
object storage engine. Your C# class IS the schema. Two
physical tables (_objects + _values), full LINQ, zero
migrations, recursive-CTE tree queries, bulk COPY BINARY
saves on PostgreSQL and SqlBulkCopy on SQL Server. Free
core + Pro tier with tree-diff ChangeTracking saves.

2. redb-route
— Apache Camel for .NET. Fluent C# DSL for
From → Process → To pipelines, 22 transport packages
(Kafka, RabbitMQ, MQTT, S3, gRPC, SFTP, AMQP, Azure
Service Bus, IBM MQ, Elasticsearch, Redis, LDAP, FTP, HTTP,
WebSocket, SignalR, Firebase, TCP, Mail, SQL, Quartz,
generic File), 80+ EIP patterns (Splitter, Aggregator,
CBR, Circuit Breaker, Saga, ...), compiled expression
engine, transactional routes, OpenTelemetry built-in.

3. redb-tsak
— the .NET analogue of Apache Karaf / Camel K. Production
runtime container: drop a .tpkg bundle (ZIP with
manifest.json + entry-point DLLs + dependency DLLs +
per-module JSON config) — or just a bare .dll for the
simplest cases — into Libs/, get hot-reload without
dropping in-flight messages, REST + Blazor dashboard,
watchdog, Quartz scheduler, three deployment modes
(Standalone / Single-node+redb / Cluster with leader
election & auto-rebalance), per-module
AssemblyLoadContext isolation, API Key + HMAC-SHA256
security, 30-command CLI, typed C# client. 415+ tests.

4. redb.Identity (pre-release, repo opening soon)
OAuth 2.1 / OIDC server, architecturally transport-agnostic:
every endpoint is a direct-vm:// Route. That gives two
ready usage modes today: (a) in-process — call the
auth server directly from another module in the same
run-time with zero network hop, zero serialization, zero
HTTP stack (this is what direct-vm:// is), and
(b) HTTP — a full OAuth 2.1 / OIDC HTTP facade is
shipped and working. The other RPC-capable facades —
gRPC, RabbitMQ RPC, AMQP request/reply, IBM MQ
request/reply, WebSocket, SignalR, TCP — are on the
roadmap and become near-trivial to add because the core
logic is already wire-format independent. (The
fire-and-forget transports in the Route set — Kafka
producer, File, S3, Mail, etc. — don't fit an auth
server, so they're correctly out of scope.) OpenIddict
under the hood, REDB-backed storage, DPoP / PAR / Dynamic
Client Registration / SCIM 2.0 / FIDO2 / WebAuthn /
RFC 8417 shared-signals support.
1751+ tests passing before public release.

Total scope: ~385k LOC (326k C# + 58k SQL) across
~2200 source files, Apache 2.0, NuGet-published, all
docs at redbase.app/why-redbase.

Demo

Live docs and positioning:

  • redbase.app/why-redbase — full "Sound familiar?" pain-point walkthrough with side-by-side Traditional-vs-RedBase code examples
  • The docs site itself is powered by RedBase — every page, example, and API reference is a REDB object in MSSQL. We eat our own cooking.

Published article series on dev.to
(author page) —
the ecosystem is too large for one post, so I'm publishing
a deep-dive series, one pillar at a time:

  1. May 13We built an enterprise integration stack for .NET from scratch: EAV + DSL + runtime — overview of the whole stack
  2. May 14I spent a year building Apache Camel for .NET. Here's the honest state of it. — discussion piece, what's done / what isn't
  3. May 17redb.Route — Apache Camel for .NET: 22 transports, 30+ EIP patterns, compiled DSL — integration framework deep dive
  4. May 21An EF Core alternative for .NET apps with complex object graphs — full LINQ, no migrations, no DbContext — REDB storage engine deep dive (this is the one that already triggered the multi-version-deploy thread referenced below)

…and the series continues. Tsak (runtime container,
hot-reload, cluster, watchdog) gets its own post next,
then Identity (transport-agnostic OAuth 2.1 / OIDC) when
the repo opens, then a deep dive on the tree-diff
ChangeTracking path, then a multi-DB benchmark post.
A stack this size is a genuinely hard engineering
problem to explain — trying to cram it into one article
would lie about the depth. So I'm taking it one layer
at a time.

The 30-second flavor:

// 1. Define schema — no migration needed
[RedbScheme("order")]
public class OrderProps
{
    public Customer Customer { get; set; }
    public Address ShippingAddress { get; set; }
    public List<OrderLineProps> Lines { get; set; } = [];
    public Dictionary<string, RedbObject<CouponProps>> Coupons { get; set; }
}

// 2. Save entire object graph in one call
await redb.SaveAsync(order);

// 3. Query with full LINQ over nested props
var hot = await redb.Query<OrderProps>()
    .Where(o => o.Customer.Address.City == "London"
             && o.Lines.Any(l => l.Qty > 10))
    .OrderByDescending(o => o.DateModify)
    .Take(50)
    .ToListAsync();

// 4. Tree query with recursive CTE under the hood
var allProducts = await redb
    .TreeQuery<Product>(londonHQ.Id, maxDepth: 10)
    .Where(p => p.InStock)
    .ToListAsync();
// Same dev defines a routing context with redb.Route.
// Real production patterns: REDB isn't a transport URI —
// it's accessed inside processors via DI (.ProcessWithRedb).
public class OrdersRouteBuilder : RouteBuilderBase
{
    public override void Build()
    {
        From("kafka://orders-raw")
            .Unmarshal<OrderProps>()
            .Split(o => o.Lines, EipParallel.Yes)
            .ProcessWithRedb(async (redb, exchange, ct) =>
            {
                var line = (OrderLineProps)exchange.In.Body;
                await redb.SaveAsync(line);   // REDB store
            })
            .Aggregate(by: e => e.In.Headers["OrderId"],
                       timeout: 5.Seconds())
            .To("rabbitmq://orders-enriched");

        // Staged queue with backpressure — exactly how the
        // production garage-sync route does it:
        From("seda://orders?concurrentConsumers=4")
            .ProcessWithRedb(EnrichFromInventoryAsync)
            .To("http://partner-api/orders");
    }
}

// Pack as a .tpkg bundle (manifest.json + entry-point DLL
// + dependency DLLs + module config.json) — or for trivial
// modules, just drop the bare .dll — into Libs/ of a Tsak
// node. Tsak picks it up, runs it across the cluster,
// rebalances on node failure, drains old versions on hot
// redeploy.
//
// Need the same flow gated by an OAuth scope?
// Two ready modes today:
//   * in-process — the call goes through direct-vm://
//     straight into redb.Identity, no network, no HTTP;
//   * over HTTP — standard OAuth 2.1 / OIDC endpoints.
// Tomorrow, the same auth server can also be exposed over
// any other RPC-capable Route transport (gRPC, RabbitMQ
// RPC, AMQP request/reply, IBM MQ, WebSocket, SignalR,
// TCP, ...) as those facades land — no change to the
// auth server itself:
//
//     .Process(ctx => ctx.RequireScope("orders.write"))

That last paragraph — "drops a .tpkg (or just a .dll),
gets hot-reload + cluster + dashboard + REST API + drain on
redeploy + an auth server callable from the same routing
fabric" — is the part I'm proudest of, because it's exactly
the piece
that's been missing from the .NET integration story for a
decade. Java had Karaf, Camel K, and a mature OIDC story
glued together. .NET had nothing equivalent that wasn't
either vendor-locked or hand-rolled per project.

The Comeback Story

Before (early 2026): all four pillars existed and
worked. Years of work. Real production load — 3-node
cluster, ~550 internal users at a HoReCa distributor,
~500k business objects, ~15M _values rows, three months
running every operational integration in the company.
And yet:

  • 0 public repos. Everything in a closed monorepo.
  • 0 NuGet packages. Even our own internal services built it from source.
  • 0 published docs. A 14-line README.md in each project. "TODO: document this properly" in two of them.
  • 0 external users. Nobody outside the company even knew it existed.
  • 0 stars, 0 issues, 0 community. The classic "we'll open-source it when it's ready" trap — except it had been "ready enough for production" for over a year and "ready to share" still felt one polish-pass away every quarter.

The honest version: I was scared. ~385k LOC of opinionated
architecture decisions, exposed to public criticism, with
my real name on it. Easier to keep shipping it inside a
contract than to put it on the internet.

After (May 2026):

  • 3 public GitHub repos under github.com/redbase-app: redb, redb-route, redb-tsak. Apache 2.0. CI, badges, real READMEs. The fourth (redb-identity) is in final polish — 1751+ tests green, repo opens before the contest deadline.
  • 43 NuGet packages published (redb.Core, redb.Postgres, redb.Route.Kafka, redb.Tsak.CLI, ...) with ~18k cumulative downloads, the bulk of that (~15k) added in the active-launch window over the last ~10 days, on a clean upward curve day over day. External developers can dotnet add package today. Early signal: redb.Route (the integration framework) has already overtaken redb.Core in download count — 1497 vs 1482 — meaning the "Apache Camel for .NET" positioning is landing, not just the storage story. Inside Route, redb.Route.Http is the fastest-growing transport — confirming the hypothesis that HTTP is the natural first entry-point for new users exploring the DSL.
  • Marketing/docs site live at redbase.app — quick start, architecture page, pricing, full API docs. The site itself runs on REDB to prove the engine handles real content load.
  • A 4-post dev.to series already out (May 13 → May 21), with more deep dives queued. The REDB-storage post is already getting sharp technical questions in comments — including one about multi-version rolling deploys that surfaced a genuinely unsafe default in SyncStructuresFromTypeAsync (strictDeleteExtra = true) that I'd never noticed in three years of production use because we always single-version-deploy in our internal workflow. That defect went straight onto the fix list — exactly the kind of feedback that justifies open-sourcing in the first place.
  • First reactions and shares from the broader .NET community. Validation that the niche is real.

The completion arc isn't "I wrote a thing." The arc is
"I had a thing that worked for years inside one company
and I finally let it leave the building." Different muscle
entirely. Documentation, positioning, choosing what to call
it (RedBase / REDB / redb — we picked one and renamed
everything), writing READMEs for repos that had no audience
for years, recording first impressions, drafting tweets,
answering the first technical comment in public.

That last one is the hardest part of shipping. The code
was done. Letting it be judged was the work.

My Experience with GitHub Copilot

Honest framing first: a large part of this codebase is my
own work, written before Copilot was a serious factor.
The architectural decisions, the storage model, the
two-table layout, the tree-diff save strategy, the
direct-vm:// transport-agnostic routing fabric that lets
redb.Identity be called in-process today with zero
network hop and exposed over HTTP today and grow
additional facades over the RPC-capable Route transports
(gRPC, RabbitMQ RPC, AMQP, IBM MQ, WebSocket, SignalR,
TCP, ...) tomorrow — all without touching the auth server
itself — all of that came from years of sitting, thinking,
prototyping, and throwing prototypes away. No AI invented
those.

What Copilot changed was the finishing arc. The codebase
had the right shape, but it also had:

  • features sketched-in but not closed out;
  • helper layers half-typed because "I'll do this properly later";
  • XML doc-comments missing on 80% of public surfaces;
  • READMEs with TODO: document this;
  • corner cases I knew existed and had been deferring;
  • six different naming conventions across four years of rewrites that nobody had unified.

Copilot is what let me finally close all of that out
without burning a year of evenings.

Where it was a real force multiplier:

  1. Finishing deferred features. Pieces I had on a
    mental backlog for months — round-trip serialization
    edge cases, missing operators on the LINQ provider,
    Tsak CLI commands I'd been meaning to add, missing
    transport options in redb.Route — Copilot let me
    knock them out in evenings instead of weekends because
    it could read the surrounding code, infer the
    convention, and propose an implementation that I then
    reviewed and corrected. The decisions stayed mine.
    The typing speed and the "what was I about to do here"
    recovery time collapsed.

  2. Structuring code I already had. A lot of the work
    wasn't writing new logic — it was taking working but
    messy code, splitting it into the right files,
    extracting the right interfaces, renaming things
    consistently across ~2200 files. Copilot is excellent
    at that kind of mechanical-but-context-sensitive
    refactor where you need to keep semantics intact while
    moving things around.

  3. Comments and XML documentation at scale. Adding
    /// <summary> to thousands of public members with
    accurate descriptions of what each method actually
    does — by hand this is a multi-week slog and you'll
    give up after two days. With Copilot reading the
    implementation and proposing the comment, then me
    correcting where it was wrong, it became a steady
    background task that actually finished.

  4. Documentation and READMEs. ~12 projects, each
    needing a consistent voice, accurate API examples,
    correct cross-references, and a quick-start that
    actually runs. With Copilot it became "draft → review
    → correct → ship" instead of "stare at empty file →
    procrastinate." Same for the redbase.app positioning
    pages — the "Sound familiar? / WITH REDBASE" pattern
    you see on the site got drafted in one afternoon and
    then iterated.

  5. Cross-file architectural navigation while debugging.
    The redb.Core.Pro ChangeTracking save path crosses
    ValueTreeBuilder, ValueTreeDiff,
    ProObjectStorageProviderBase, and the SQL bulk-ops
    layer. Holding all four files in working memory
    simultaneously while chasing a deduplication bug in
    array hash updates — Copilot did the "where is this
    called from, what's the contract" lookup, my brain did
    the fix.

  6. Honest fact-checking before publishing. I once
    asked it to draft a reply claiming "old readers ignore
    unknown structures gracefully on multi-version deploys."
    It pushed back, we read the actual code together, and
    the answer turned out to be "reads are graceful, writes
    are destructive, and the production playbook
    compensates with runtime drain." That nuance got
    published. The first version would have been a small
    lie. Same loop surfaced the strictDeleteExtra = true
    default that's been on the fix list ever since.

  7. RFC-driven implementation (this one mattered a lot
    for Identity).
    redb.Identity is built almost
    entirely on top of public RFCs: OAuth 2.1 / 6749,
    Token Revocation 7009, Introspection 7662, Dynamic
    Client Registration 7591 / 7592, JWK Thumbprint 7638,
    SCIM 7644, OAuth for Native Apps 8252, Device
    Authorization 8628, Token Exchange 8693, DPoP 9449,
    Shared Signals 8417 — and that's not the full list.
    For a solo developer, staying strictly compliant with
    that many specs is brutal: every endpoint has
    "MUST / SHOULD / MAY" clauses scattered across multiple
    §-sections of multiple documents, edge cases like
    "what status code on revoke of unknown token" (7009
    §2.1: still 200), "must DPoP-Nonce rotate on every
    bearing response" (9449 §8: yes), "is dpop_jkt binding
    enforced at the token endpoint" (9449 §10.1: yes) \u2014
    miss one and you've shipped a non-compliant auth
    server. Copilot was constantly the "wait, RFC 7591 §3.2.1
    says the response MUST include a registration access
    token, did you cover that?" voice in the room. The
    test suite reflects that: 1751+ tests with explicit
    "RFC XXXX §Y.Z: ..." assertion messages, written
    in the loop of "I remember the spirit of this clause,
    Copilot, pull up the exact wording and let's make a
    test out of it."
    Without that, achieving real RFC
    compliance solo would have been a multi-year sub-project
    on its own.

The pattern that emerged: I came in with the design
and most of the code already in place. Copilot saw what
was there, understood the conventions, and helped me
finish. Polish XML docs. Close out deferred TODOs.
Write the documentation pages. Refactor the inconsistent
bits. Catch the contradictions before they hit a public
comment thread. That's a different relationship than
"AI wrote my project." It's closer to having a very
fast pair-programmer who actually read the codebase
before sitting down.

Honest summary: ~385k LOC of infrastructure in three
years of part-time work was the human heroic effort.
Finishing it — closing out the deferred features,
filling in all the missing comments and documentation,
unifying the naming, writing the READMEs, shipping the
NuGet packages, drafting the dev.to articles, answering
the first hard comment in public — that's where Copilot
collapsed months into weeks. The contest prompt is
"revive and finish a project you started but never
completed." My project wasn't unfinished in design.
It was unfinished in all the small, deferred, boring,
necessary things
that turn a working codebase into a
shippable product. Copilot is what made that finishing
arc actually fit inside one spring.

What's next (already in flight, not on a wish-list)

Shipping the four pillars wasn't the finish line — it was
the beginning of the public phase. Three big bets are
already underway, in source, in this repo:

  • Free engine is overtaking Pro. The
    docs/FreePvtQuery
    initiative is rewriting the REDB query path on top of
    PostgreSQL PVT functions, and the free tier is already
    ahead of Pro on a long list of features:
    $case / $coalesce / $cast / n-ary $concat in
    projections, full regex predicates ($regex,
    $iregex, $regexReplace), extended math
    ($power, $sqrt, $log, $sin/cos/tan, ...),
    extended string ops ($substring, $replace,
    $indexOf, $padLeft/Right), projection-level
    DISTINCT ON, date-extract in Select, and as of
    2026-05-23 — HAVING for both regular and array
    GroupBy (33/33 integration tests green across PG
    Free + PG Pro + MSSql Pro). Net effect: most of what
    used to be Pro-only is becoming free, and the
    free engine is gaining capabilities Pro doesn't have
    at all.

  • REDB outside C#. The whole point of building the
    query layer as language-neutral JSON AST evaluated by
    database-side PVT functions is that the C# LINQ
    provider is just one front-end
    . The same _objects +
    _values storage, the same scheme registry, the same
    query AST can be driven from Python, from Google Apps
    Script, from any language that can speak Postgres or
    MSSQL — without porting the engine. That unlocks REDB
    as a shared object store across heterogeneous stacks,
    not just .NET shops. Architecturally, the work is
    already done: every query goes through JSON AST →
    PVT functions, no C#-side filter compilation required
    for the heavy path.

  • Route keeps growing. New transport connectors are
    on the roadmap, deep-dive articles for the Tsak
    runtime container and for redb.Identity are queued
    up after the contest deadline, and the EIP catalogue
    is being expanded incrementally on top of the
    existing 80+ patterns. The fastest-growing transport
    (redb.Route.Http) is also driving a focused docs
    pass on the HTTP path specifically.

The honest version: the ecosystem has been moving for
years and barely anyone outside one production deployment
knew. The potential is genuinely large — language-neutral
object storage with a full EIP runtime and a compliant
identity server on top of the same fabric is not a thing
that currently exists on .NET. That's the bet I'm
finishing.

Team: solo submission. All commits, all docs, all
articles by one author with Copilot in the loop.

Live in production: 3-node cluster, ~550 internal
users, ~500k objects, ~15M _values rows, ~2500 hours
uptime, zero data-layer incidents in 3 months.

Stack: .NET 8 / 9, PostgreSQL 17, MS SQL 2022,
Blazor Server, Quartz.NET, Npgsql, OpenIddict,
OpenTelemetry.

License: Apache 2.0 across all four pillars.

Read the whole story
alvinashcraft
2 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

The RAG Workbench I Actually Needed

1 Share

Most RAG tutorials hand you the same shape of code.

Load some content. Chunk it. Embed it. Query it.

It works for a demo, but the moment you try to run that workflow repeatedly across different documentation sites, retrieval quality becomes the real problem -not embeddings, not vector databases, not orchestration frameworks.

Debugging retrieval is where the time goes.

You change a content selector. Re-ingest. Query again.

The chunking is wrong.  Fix it. Re-ingest. Query again.

The crawler pulled navigation instead of the article body.  Fix it. Re-ingest. Query again.

Every iteration takes longer than it should because the infrastructure sits in the middle of the feedback loop.

What I wanted was not another production-ready RAG platform.  I wanted a workbench.

Something that would let me:

  • Crawl a documentation site
  • Run end-to-end ingest locally
  • Inspect chunks and retrieval scores
  • Iterate on parser and chunking config quickly
  • Debug retrieval failures without fighting infrastructure
  • Carry the working configuration into production later

That became DocIngestion.  Built in .NET using Claude Code, it’s a local-first RAG ingestion and retrieval inspection tool designed around one thing:

  • shortening the retrieval debugging loop.

Lets dig in.

~

The Architectural Decision That Changed Everything

The most important decision in the entire tool was also the simplest:  a single IVectorStore abstraction with 3 implementations.

public interface IVectorStore
{
    Task UpsertAsync(
        string indexName,
        IReadOnlyList<Chunk> chunks,
        CancellationToken ct);

    Task<IReadOnlyList<ScoredChunk>> QueryAsync(
        string indexName,
        float[] queryEmbedding,
        int topK,
        CancellationToken ct);

    Task<IReadOnlyList<IndexedDocument>> ListDocumentsAsync(
        string indexName,
        CancellationToken ct);
}

That interface sits underneath the ingest pipeline:

The pipeline never knows where the chunks are stored.

There are only two implementations:

  • JSON on disk for iteration
  • Elasticsearch for production parity

That’s it.

For most early-stage retrieval work, JSON is the better tool.  Not because it’s scalable, but because it’s quick, easy to eyeball, and gives you something good enough to iterate on during those crucial early stages.

You can optimise for scale and production later, once you’ve nailed down retrieval quality and pipeline behaviour.

~

Why JSON Beats a Vector Database Early On

When most people start a RAG project, they immediately provision infrastructure.

Pinecone. Qdrant. Elasticsearch. Azure AI Search.  That makes sense once the retrieval behaviour is already understood.

Before that, it’s mostly friction.  For a few thousand chunks, a JSON file is more than enough.

The implementation is deliberately primitive:

public sealed class JsonFileVectorStore : IVectorStore
{
    public async Task<IReadOnlyList<ScoredChunk>> QueryAsync(
        string indexName,
        float[] queryEmbedding,
        int topK,
        CancellationToken ct)
    {
        var chunks = await LoadAsync(indexName, ct);

        return chunks
            .Select(c => new ScoredChunk(
                c,
                CosineSimilarity(queryEmbedding, c.Embedding)))
            .OrderByDescending(s => s.Score)
            .Take(topK)
            .ToList();
    }
}

No infrastructure.  No provisioning.  No auth.  No cluster configuration.

Just:

  • crawl
  • ingest
  • inspect
  • query
  • rerun

That changes the speed of iteration completely.  A broken content selector stops being a deployment problem and becomes a two-minute fix.

~

Retrieval Debugging Is the Real Problem

The most useful part of the workbench is not ingestion.  It’s inspection.

When retrieval quality is poor, there are usually only three questions:

  1. Is the correct chunk in the index?
  2. Is it being retrieved?
  3. Is it ranking highly enough?

The workbench surfaces all three immediately.  Query a tenant and the UI shows:

  • the synthesized answer
  • retrieved chunks
  • similarity scores
  • source URLs
  • indexed page counts

That turns retrieval debugging into a fast feedback loop instead of an infrastructure exercise.

If the chunk is missing entirely, the crawler or parser is wrong.

If the chunk exists but never retrieves, the embedding or query phrasing is wrong.

If the chunk retrieves but ranks badly, the chunking strategy is usually the issue.

That sounds obvious written down.  In practice, most RAG tooling often makes answering those questions slow.

~

The Bugs This Catches Quickly

The inspection workflow caught issues that would have taken hours to diagnose inside a cluster:

  • A content selector scoped to the navigation panel instead of the article body, producing near-identical chunks across pages
  • A chunker splitting code samples mid-function so signatures and implementations were separated
  • A crawl indexing CDN error pages as canonical content across dozens of URLs

All of those became obvious by looking at:

  • chunk counts
  • retrieval scores
  • duplicate URLs
  • chunk text directly on disk

That’s the part most RAG tutorials skip.  Retrieval problems are usually data problems.

The faster you can inspect the data, the faster you improve retrieval.

~

Elasticsearch Still Matters

The point is not that JSON replaces a vector database.  It doesn’t.  The JSON store is intentionally limited:

  • everything loads into memory
  • concurrent writes are unsafe
  • performance drops as chunk counts grow

The point is that JSON is a better environment for discovering what “good retrieval” actually looks like.  Once retrieval quality is understood, switching to Elasticsearch is a config change:

"storage": {
  "type": "elasticsearch"
}

The pipeline code stays the same.

That’s where the abstraction earns its keep.

Whatever I learn while iterating locally, parser rules, chunk sizes, overlap strategy,  or retrieval tuning, it all carries directly into the production implementation.  No rewrite required.

No migration project.  No second ingest pipeline.

~

The Most Useful Part

The most useful abstractions are the ones that let you defer decisions until you have enough evidence to make them properly.

IVectorStore is not a complicated abstraction.  It’s three methods.

What it gave me was a way to:

  • debug retrieval quickly
  • iterate locally
  • inspect data directly
  • avoid infrastructure too early
  • move into Elasticsearch later without changing the pipeline

Most RAG problems are not solved by adding more infrastructure.

They’re solved by shortening the feedback loop between ingest and retrieval quality.

That’s what this workbench was built for.

~

Enjoy what you’ve read, have questions about this content, or would like to see another topic?

You can schedule a call using my Calendly link to discuss consulting and development services.

~

Courses

Check my AI courses.  From developers to decision makers, these have you covered:

~

JOIN MY EXCLUSIVE EMAIL LIST
Get the latest content and code from the blog posts!
I respect your privacy. No spam. Ever.
Read the whole story
alvinashcraft
2 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Harald Fianbakken on Microsoft Sovereign Cloud

1 Share

Episode 903

Harald Fianbakken on Microsoft Sovereign Cloud

Microsoft PM Harald Fianbakken discusses a new Azure offering: Microsoft Sovereign Cloud. He describes the private, public, national options and what issues each option is designed to address.

Links:
https://www.microsoft.com/sovereignty
https://learn.microsoft.com/azure/azure-sovereign-clouds/

Read the whole story
alvinashcraft
2 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories