Read more of this story at Slashdot.
Read more of this story at Slashdot.
Read more of this story at Slashdot.
This talk was recorded at NDC Sydney in Sydney, Australia. #ndcsydney #ndcconferences #developer #softwaredeveloper
Attend the next NDC conference near you:
https://ndcconferences.com
https://ndcsydney.com/
Subscribe to our YouTube channel and learn every day:
/ @NDC
Follow our Social Media!
https://www.facebook.com/ndcconferences
https://twitter.com/NDC_Conferences
https://www.instagram.com/ndc_conferences/
#security
Every data breach tells a story — not just of compromised information, but of human motive.
Drawing on a dozen years of experience running Have I Been Pwned, this talk explores why hackers exfiltrate and share data, what those motives reveal about the evolving threat landscape, and how understanding them helps us respond more effectively in partnership with website operators.
This is a submission for the
GitHub Finish-Up-A-Thon Challenge
Before the architecture talk, the why. Three years ago
I watched my team — and every other .NET team I knew —
burn most of their hours not on business logic but on
plumbing the same four layers over and over:
DbContext, fluent mappings,
navigation properties, N+1 puzzles, lazy/eager loading,
a migration per field, separate read/write models if you
go CQRS. On a non-trivial object graph that's hundreds
of hours just for the data layer, before any business
logic is written.IdentityServer /
Auth0 / Keycloak plus 200-600 hours of glue per
serious deployment.The worst part isn't any single layer — it's the seams
between them. Where the auth user becomes the EF entity
becomes the Kafka message becomes the S3 blob, every seam
is one more serialization, one more mapping, one more
versioning headache, one more place a 3 AM page comes from.
RedBase is built so that business code is the only thing
left to write. The class IS the schema (no EF, no
migrations). The 22 transports share one DSL (no per-connector
plumbing). Identity is a direct-vm:// Route (calling auth
is calling a function, not a network round-trip). Tsak gives
you hot-reload, cluster, dashboard, drain out of the box. The
seams collapse because everything lives on the same fabric.
On the one full business workflow where I had honest before/after
numbers — built the traditional way vs built on the RedBase
stack — the human effort was roughly ~3,000 hours vs ~128
hours. That ratio isn't magic on any single layer (where
each subsystem gives 3-5× at most); it comes from the seams
no longer existing. When I asked Claude Opus 4.7 to
sanity-check that estimate against typical .NET project
breakdowns (data layer + integrations + auth + runtime +
inter-seam testing), the order of magnitude held up.
One stack, one team, one architectural style — so the team
gets to write features instead of wiring infrastructure.
That is the project. Everything below is just how I got
there.
RedBase — a four-pillar open-source ecosystem for .NET
that grew up inside one production system over three years
and finally got shipped to the world this spring. Three
pillars are now public on GitHub under Apache 2.0, and the
fourth is in pre-release polish:
1. redb — typed
object storage engine. Your C# class IS the schema. Two
physical tables (_objects + _values), full LINQ, zero
migrations, recursive-CTE tree queries, bulk COPY BINARY
saves on PostgreSQL and SqlBulkCopy on SQL Server. Free
core + Pro tier with tree-diff ChangeTracking saves.
2. redb-route
— Apache Camel for .NET. Fluent C# DSL for
From → Process → To pipelines, 22 transport packages
(Kafka, RabbitMQ, MQTT, S3, gRPC, SFTP, AMQP, Azure
Service Bus, IBM MQ, Elasticsearch, Redis, LDAP, FTP, HTTP,
WebSocket, SignalR, Firebase, TCP, Mail, SQL, Quartz,
generic File), 80+ EIP patterns (Splitter, Aggregator,
CBR, Circuit Breaker, Saga, ...), compiled expression
engine, transactional routes, OpenTelemetry built-in.
3. redb-tsak
— the .NET analogue of Apache Karaf / Camel K. Production
runtime container: drop a .tpkg bundle (ZIP with
manifest.json + entry-point DLLs + dependency DLLs +
per-module JSON config) — or just a bare .dll for the
simplest cases — into Libs/, get hot-reload without
dropping in-flight messages, REST + Blazor dashboard,
watchdog, Quartz scheduler, three deployment modes
(Standalone / Single-node+redb / Cluster with leader
election & auto-rebalance), per-module
AssemblyLoadContext isolation, API Key + HMAC-SHA256
security, 30-command CLI, typed C# client. 415+ tests.
4. redb.Identity (pre-release, repo opening soon) —
OAuth 2.1 / OIDC server, architecturally transport-agnostic:
every endpoint is a direct-vm:// Route. That gives two
ready usage modes today: (a) in-process — call the
auth server directly from another module in the same
run-time with zero network hop, zero serialization, zero
HTTP stack (this is what direct-vm:// is), and
(b) HTTP — a full OAuth 2.1 / OIDC HTTP facade is
shipped and working. The other RPC-capable facades —
gRPC, RabbitMQ RPC, AMQP request/reply, IBM MQ
request/reply, WebSocket, SignalR, TCP — are on the
roadmap and become near-trivial to add because the core
logic is already wire-format independent. (The
fire-and-forget transports in the Route set — Kafka
producer, File, S3, Mail, etc. — don't fit an auth
server, so they're correctly out of scope.) OpenIddict
under the hood, REDB-backed storage, DPoP / PAR / Dynamic
Client Registration / SCIM 2.0 / FIDO2 / WebAuthn /
RFC 8417 shared-signals support.
1751+ tests passing before public release.
Total scope: ~385k LOC (326k C# + 58k SQL) across
~2200 source files, Apache 2.0, NuGet-published, all
docs at redbase.app/why-redbase.
Live docs and positioning:
Published article series on dev.to
(author page) —
the ecosystem is too large for one post, so I'm publishing
a deep-dive series, one pillar at a time:
…and the series continues. Tsak (runtime container,
hot-reload, cluster, watchdog) gets its own post next,
then Identity (transport-agnostic OAuth 2.1 / OIDC) when
the repo opens, then a deep dive on the tree-diff
ChangeTracking path, then a multi-DB benchmark post.
A stack this size is a genuinely hard engineering
problem to explain — trying to cram it into one article
would lie about the depth. So I'm taking it one layer
at a time.
The 30-second flavor:
// 1. Define schema — no migration needed
[RedbScheme("order")]
public class OrderProps
{
public Customer Customer { get; set; }
public Address ShippingAddress { get; set; }
public List<OrderLineProps> Lines { get; set; } = [];
public Dictionary<string, RedbObject<CouponProps>> Coupons { get; set; }
}
// 2. Save entire object graph in one call
await redb.SaveAsync(order);
// 3. Query with full LINQ over nested props
var hot = await redb.Query<OrderProps>()
.Where(o => o.Customer.Address.City == "London"
&& o.Lines.Any(l => l.Qty > 10))
.OrderByDescending(o => o.DateModify)
.Take(50)
.ToListAsync();
// 4. Tree query with recursive CTE under the hood
var allProducts = await redb
.TreeQuery<Product>(londonHQ.Id, maxDepth: 10)
.Where(p => p.InStock)
.ToListAsync();
// Same dev defines a routing context with redb.Route.
// Real production patterns: REDB isn't a transport URI —
// it's accessed inside processors via DI (.ProcessWithRedb).
public class OrdersRouteBuilder : RouteBuilderBase
{
public override void Build()
{
From("kafka://orders-raw")
.Unmarshal<OrderProps>()
.Split(o => o.Lines, EipParallel.Yes)
.ProcessWithRedb(async (redb, exchange, ct) =>
{
var line = (OrderLineProps)exchange.In.Body;
await redb.SaveAsync(line); // REDB store
})
.Aggregate(by: e => e.In.Headers["OrderId"],
timeout: 5.Seconds())
.To("rabbitmq://orders-enriched");
// Staged queue with backpressure — exactly how the
// production garage-sync route does it:
From("seda://orders?concurrentConsumers=4")
.ProcessWithRedb(EnrichFromInventoryAsync)
.To("http://partner-api/orders");
}
}
// Pack as a .tpkg bundle (manifest.json + entry-point DLL
// + dependency DLLs + module config.json) — or for trivial
// modules, just drop the bare .dll — into Libs/ of a Tsak
// node. Tsak picks it up, runs it across the cluster,
// rebalances on node failure, drains old versions on hot
// redeploy.
//
// Need the same flow gated by an OAuth scope?
// Two ready modes today:
// * in-process — the call goes through direct-vm://
// straight into redb.Identity, no network, no HTTP;
// * over HTTP — standard OAuth 2.1 / OIDC endpoints.
// Tomorrow, the same auth server can also be exposed over
// any other RPC-capable Route transport (gRPC, RabbitMQ
// RPC, AMQP request/reply, IBM MQ, WebSocket, SignalR,
// TCP, ...) as those facades land — no change to the
// auth server itself:
//
// .Process(ctx => ctx.RequireScope("orders.write"))
That last paragraph — "drops a .tpkg (or just a .dll),
gets hot-reload + cluster + dashboard + REST API + drain on
redeploy + an auth server callable from the same routing
fabric" — is the part I'm proudest of, because it's exactly
the piece
that's been missing from the .NET integration story for a
decade. Java had Karaf, Camel K, and a mature OIDC story
glued together. .NET had nothing equivalent that wasn't
either vendor-locked or hand-rolled per project.
Before (early 2026): all four pillars existed and
worked. Years of work. Real production load — 3-node
cluster, ~550 internal users at a HoReCa distributor,
~500k business objects, ~15M _values rows, three months
running every operational integration in the company.
And yet:
README.md in each
project. "TODO: document this properly" in two of them.The honest version: I was scared. ~385k LOC of opinionated
architecture decisions, exposed to public criticism, with
my real name on it. Easier to keep shipping it inside a
contract than to put it on the internet.
After (May 2026):
redb, redb-route, redb-tsak. Apache 2.0.
CI, badges, real READMEs. The fourth (redb-identity)
is in final polish — 1751+ tests green, repo opens
before the contest deadline.redb.Core,
redb.Postgres, redb.Route.Kafka, redb.Tsak.CLI,
...) with ~18k cumulative downloads, the bulk of
that (~15k) added in the active-launch window over the
last ~10 days, on a clean upward curve day over day.
External developers can dotnet add package today.
Early signal: redb.Route (the integration framework)
has already overtaken redb.Core in download count
— 1497 vs 1482 — meaning the "Apache Camel for .NET"
positioning is landing, not just the storage story.
Inside Route, redb.Route.Http is the fastest-growing
transport — confirming the hypothesis that HTTP is the
natural first entry-point for new users exploring the
DSL.SyncStructuresFromTypeAsync (strictDeleteExtra = true)
that I'd never noticed in three years of production use
because we always single-version-deploy in our internal
workflow. That defect went straight onto the fix list —
exactly the kind of feedback that justifies open-sourcing
in the first place.The completion arc isn't "I wrote a thing." The arc is
"I had a thing that worked for years inside one company
and I finally let it leave the building." Different muscle
entirely. Documentation, positioning, choosing what to call
it (RedBase / REDB / redb — we picked one and renamed
everything), writing READMEs for repos that had no audience
for years, recording first impressions, drafting tweets,
answering the first technical comment in public.
That last one is the hardest part of shipping. The code
was done. Letting it be judged was the work.
Honest framing first: a large part of this codebase is my
own work, written before Copilot was a serious factor.
The architectural decisions, the storage model, the
two-table layout, the tree-diff save strategy, the
direct-vm:// transport-agnostic routing fabric that lets
redb.Identity be called in-process today with zero
network hop and exposed over HTTP today and grow
additional facades over the RPC-capable Route transports
(gRPC, RabbitMQ RPC, AMQP, IBM MQ, WebSocket, SignalR,
TCP, ...) tomorrow — all without touching the auth server
itself — all of that came from years of sitting, thinking,
prototyping, and throwing prototypes away. No AI invented
those.
What Copilot changed was the finishing arc. The codebase
had the right shape, but it also had:
TODO: document this;Copilot is what let me finally close all of that out
without burning a year of evenings.
Where it was a real force multiplier:
Finishing deferred features. Pieces I had on a
mental backlog for months — round-trip serialization
edge cases, missing operators on the LINQ provider,
Tsak CLI commands I'd been meaning to add, missing
transport options in redb.Route — Copilot let me
knock them out in evenings instead of weekends because
it could read the surrounding code, infer the
convention, and propose an implementation that I then
reviewed and corrected. The decisions stayed mine.
The typing speed and the "what was I about to do here"
recovery time collapsed.
Structuring code I already had. A lot of the work
wasn't writing new logic — it was taking working but
messy code, splitting it into the right files,
extracting the right interfaces, renaming things
consistently across ~2200 files. Copilot is excellent
at that kind of mechanical-but-context-sensitive
refactor where you need to keep semantics intact while
moving things around.
Comments and XML documentation at scale. Adding
/// <summary> to thousands of public members with
accurate descriptions of what each method actually
does — by hand this is a multi-week slog and you'll
give up after two days. With Copilot reading the
implementation and proposing the comment, then me
correcting where it was wrong, it became a steady
background task that actually finished.
Documentation and READMEs. ~12 projects, each
needing a consistent voice, accurate API examples,
correct cross-references, and a quick-start that
actually runs. With Copilot it became "draft → review
→ correct → ship" instead of "stare at empty file →
procrastinate." Same for the redbase.app positioning
pages — the "Sound familiar? / WITH REDBASE" pattern
you see on the site got drafted in one afternoon and
then iterated.
Cross-file architectural navigation while debugging.
The redb.Core.Pro ChangeTracking save path crosses
ValueTreeBuilder, ValueTreeDiff,
ProObjectStorageProviderBase, and the SQL bulk-ops
layer. Holding all four files in working memory
simultaneously while chasing a deduplication bug in
array hash updates — Copilot did the "where is this
called from, what's the contract" lookup, my brain did
the fix.
Honest fact-checking before publishing. I once
asked it to draft a reply claiming "old readers ignore
unknown structures gracefully on multi-version deploys."
It pushed back, we read the actual code together, and
the answer turned out to be "reads are graceful, writes
are destructive, and the production playbook
compensates with runtime drain." That nuance got
published. The first version would have been a small
lie. Same loop surfaced the strictDeleteExtra = true
default that's been on the fix list ever since.
RFC-driven implementation (this one mattered a lot
for Identity). redb.Identity is built almost
entirely on top of public RFCs: OAuth 2.1 / 6749,
Token Revocation 7009, Introspection 7662, Dynamic
Client Registration 7591 / 7592, JWK Thumbprint 7638,
SCIM 7644, OAuth for Native Apps 8252, Device
Authorization 8628, Token Exchange 8693, DPoP 9449,
Shared Signals 8417 — and that's not the full list.
For a solo developer, staying strictly compliant with
that many specs is brutal: every endpoint has
"MUST / SHOULD / MAY" clauses scattered across multiple
§-sections of multiple documents, edge cases like
"what status code on revoke of unknown token" (7009
§2.1: still 200), "must DPoP-Nonce rotate on every
bearing response" (9449 §8: yes), "is dpop_jkt binding
enforced at the token endpoint" (9449 §10.1: yes) \u2014
miss one and you've shipped a non-compliant auth
server. Copilot was constantly the "wait, RFC 7591 §3.2.1
says the response MUST include a registration access
token, did you cover that?" voice in the room. The
test suite reflects that: 1751+ tests with explicit
"RFC XXXX §Y.Z: ..." assertion messages, written
in the loop of "I remember the spirit of this clause,
Copilot, pull up the exact wording and let's make a
test out of it." Without that, achieving real RFC
compliance solo would have been a multi-year sub-project
on its own.
The pattern that emerged: I came in with the design
and most of the code already in place. Copilot saw what
was there, understood the conventions, and helped me
finish. Polish XML docs. Close out deferred TODOs.
Write the documentation pages. Refactor the inconsistent
bits. Catch the contradictions before they hit a public
comment thread. That's a different relationship than
"AI wrote my project." It's closer to having a very
fast pair-programmer who actually read the codebase
before sitting down.
Honest summary: ~385k LOC of infrastructure in three
years of part-time work was the human heroic effort.
Finishing it — closing out the deferred features,
filling in all the missing comments and documentation,
unifying the naming, writing the READMEs, shipping the
NuGet packages, drafting the dev.to articles, answering
the first hard comment in public — that's where Copilot
collapsed months into weeks. The contest prompt is
"revive and finish a project you started but never
completed." My project wasn't unfinished in design.
It was unfinished in all the small, deferred, boring,
necessary things that turn a working codebase into a
shippable product. Copilot is what made that finishing
arc actually fit inside one spring.
Shipping the four pillars wasn't the finish line — it was
the beginning of the public phase. Three big bets are
already underway, in source, in this repo:
Free engine is overtaking Pro. The
docs/FreePvtQuery
initiative is rewriting the REDB query path on top of
PostgreSQL PVT functions, and the free tier is already
ahead of Pro on a long list of features:
$case / $coalesce / $cast / n-ary $concat in
projections, full regex predicates ($regex,
$iregex, $regexReplace), extended math
($power, $sqrt, $log, $sin/cos/tan, ...),
extended string ops ($substring, $replace,
$indexOf, $padLeft/Right), projection-level
DISTINCT ON, date-extract in Select, and as of
2026-05-23 — HAVING for both regular and array
GroupBy (33/33 integration tests green across PG
Free + PG Pro + MSSql Pro). Net effect: most of what
used to be Pro-only is becoming free, and the
free engine is gaining capabilities Pro doesn't have
at all.
REDB outside C#. The whole point of building the
query layer as language-neutral JSON AST evaluated by
database-side PVT functions is that the C# LINQ
provider is just one front-end. The same _objects +
_values storage, the same scheme registry, the same
query AST can be driven from Python, from Google Apps
Script, from any language that can speak Postgres or
MSSQL — without porting the engine. That unlocks REDB
as a shared object store across heterogeneous stacks,
not just .NET shops. Architecturally, the work is
already done: every query goes through JSON AST →
PVT functions, no C#-side filter compilation required
for the heavy path.
Route keeps growing. New transport connectors are
on the roadmap, deep-dive articles for the Tsak
runtime container and for redb.Identity are queued
up after the contest deadline, and the EIP catalogue
is being expanded incrementally on top of the
existing 80+ patterns. The fastest-growing transport
(redb.Route.Http) is also driving a focused docs
pass on the HTTP path specifically.
The honest version: the ecosystem has been moving for
years and barely anyone outside one production deployment
knew. The potential is genuinely large — language-neutral
object storage with a full EIP runtime and a compliant
identity server on top of the same fabric is not a thing
that currently exists on .NET. That's the bet I'm
finishing.
Team: solo submission. All commits, all docs, all
articles by one author with Copilot in the loop.
Live in production: 3-node cluster, ~550 internal
users, ~500k objects, ~15M _values rows, ~2500 hours
uptime, zero data-layer incidents in 3 months.
Stack: .NET 8 / 9, PostgreSQL 17, MS SQL 2022,
Blazor Server, Quartz.NET, Npgsql, OpenIddict,
OpenTelemetry.
License: Apache 2.0 across all four pillars.
Most RAG tutorials hand you the same shape of code.
Load some content. Chunk it. Embed it. Query it.
It works for a demo, but the moment you try to run that workflow repeatedly across different documentation sites, retrieval quality becomes the real problem -not embeddings, not vector databases, not orchestration frameworks.
Debugging retrieval is where the time goes.
You change a content selector. Re-ingest. Query again.
The chunking is wrong. Fix it. Re-ingest. Query again.
The crawler pulled navigation instead of the article body. Fix it. Re-ingest. Query again.
Every iteration takes longer than it should because the infrastructure sits in the middle of the feedback loop.
What I wanted was not another production-ready RAG platform. I wanted a workbench.
Something that would let me:
That became DocIngestion. Built in .NET using Claude Code, it’s a local-first RAG ingestion and retrieval inspection tool designed around one thing:
Lets dig in.
~
The most important decision in the entire tool was also the simplest: a single IVectorStore abstraction with 3 implementations.
public interface IVectorStore
{
Task UpsertAsync(
string indexName,
IReadOnlyList<Chunk> chunks,
CancellationToken ct);
Task<IReadOnlyList<ScoredChunk>> QueryAsync(
string indexName,
float[] queryEmbedding,
int topK,
CancellationToken ct);
Task<IReadOnlyList<IndexedDocument>> ListDocumentsAsync(
string indexName,
CancellationToken ct);
}
That interface sits underneath the ingest pipeline:
The pipeline never knows where the chunks are stored.
There are only two implementations:
That’s it.
For most early-stage retrieval work, JSON is the better tool. Not because it’s scalable, but because it’s quick, easy to eyeball, and gives you something good enough to iterate on during those crucial early stages.
You can optimise for scale and production later, once you’ve nailed down retrieval quality and pipeline behaviour.
~
When most people start a RAG project, they immediately provision infrastructure.
Pinecone. Qdrant. Elasticsearch. Azure AI Search. That makes sense once the retrieval behaviour is already understood.
Before that, it’s mostly friction. For a few thousand chunks, a JSON file is more than enough.
The implementation is deliberately primitive:
public sealed class JsonFileVectorStore : IVectorStore
{
public async Task<IReadOnlyList<ScoredChunk>> QueryAsync(
string indexName,
float[] queryEmbedding,
int topK,
CancellationToken ct)
{
var chunks = await LoadAsync(indexName, ct);
return chunks
.Select(c => new ScoredChunk(
c,
CosineSimilarity(queryEmbedding, c.Embedding)))
.OrderByDescending(s => s.Score)
.Take(topK)
.ToList();
}
}
No infrastructure. No provisioning. No auth. No cluster configuration.
Just:
That changes the speed of iteration completely. A broken content selector stops being a deployment problem and becomes a two-minute fix.
~
The most useful part of the workbench is not ingestion. It’s inspection.
When retrieval quality is poor, there are usually only three questions:
The workbench surfaces all three immediately. Query a tenant and the UI shows:
That turns retrieval debugging into a fast feedback loop instead of an infrastructure exercise.
If the chunk is missing entirely, the crawler or parser is wrong.
If the chunk exists but never retrieves, the embedding or query phrasing is wrong.
If the chunk retrieves but ranks badly, the chunking strategy is usually the issue.
That sounds obvious written down. In practice, most RAG tooling often makes answering those questions slow.
~
The inspection workflow caught issues that would have taken hours to diagnose inside a cluster:
All of those became obvious by looking at:
That’s the part most RAG tutorials skip. Retrieval problems are usually data problems.
The faster you can inspect the data, the faster you improve retrieval.
~
The point is not that JSON replaces a vector database. It doesn’t. The JSON store is intentionally limited:
The point is that JSON is a better environment for discovering what “good retrieval” actually looks like. Once retrieval quality is understood, switching to Elasticsearch is a config change:
"storage": {
"type": "elasticsearch"
}
The pipeline code stays the same.
That’s where the abstraction earns its keep.
Whatever I learn while iterating locally, parser rules, chunk sizes, overlap strategy, or retrieval tuning, it all carries directly into the production implementation. No rewrite required.
No migration project. No second ingest pipeline.
~
The most useful abstractions are the ones that let you defer decisions until you have enough evidence to make them properly.
IVectorStore is not a complicated abstraction. It’s three methods.
What it gave me was a way to:
Most RAG problems are not solved by adding more infrastructure.
They’re solved by shortening the feedback loop between ingest and retrieval quality.
That’s what this workbench was built for.
~
Enjoy what you’ve read, have questions about this content, or would like to see another topic?
You can schedule a call using my Calendly link to discuss consulting and development services.
~
Check my AI courses. From developers to decision makers, these have you covered:
~
Episode 903
Harald Fianbakken on Microsoft Sovereign Cloud
Microsoft PM Harald Fianbakken discusses a new Azure offering: Microsoft Sovereign Cloud. He describes the private, public, national options and what issues each option is designed to address.
Links:
https://www.microsoft.com/sovereignty
https://learn.microsoft.com/azure/azure-sovereign-clouds/