Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
154387 stories
·
33 followers

Qwen3.5 in .NET MEAI Hangs Forever And How OllamaSharp Fixes It in One Line

1 Share
If your Microsoft.Extensions.AI.OllamaChatClient call to qwen3.5 never returns, this post is for you. The culprit is Qwen3.5‘s built-in thinking mode and the fix is switching to a client that can actually control it. My Use Case: Entity Disambiguation Against a Fixed... Continue Reading →
Read the whole story
alvinashcraft
17 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Higher Performance Dynamic Consistency Boundary Development with Marten 9.0

1 Share

To try to explain “Dynamic Consistency Boundary” usage in Event Sourcing, I’d contrast it to “traditional” Event Sourcing where events are only organized into a stream of related events. For example, all the events related to a single invoice in an invoicing system are an example of an event stream. DCB came about because Axon IQ has weak consistency and couldn’t support transactions across multiple streams the way that Marten or Polecat can it’s often impossible to model a system where every operation only involves a single event stream. To that end, folks created the idea of “Dynamic Consistency Boundary” Event Sourcing where events are more organized by tags and the event stores that support DCB are able to enforce transactional boundaries based on an event tag query (think: all the events of these types that are related to either this class id, student id, or instructor id) so that systems can be much more flexible over time.

Marten has had support for the Dynamic Consistency Boundary approach (DCB) to Event Sourcing for a little while. The Marten 9.0 release last week added a new, potentially more performant option for DCB using the PostgreSQL HSTORE extension — which is supported by all the major cloud providers plus specialized cloud providers for managed PostgreSQL like Neon and Supabase.

Unfortunately, we don’t yet have a way to retroactively switch from “classic” DCB to the HSTORE style DCB in an existing application, but let’s say that you’re starting:

  • A greenfield application
  • A problem domain where the event stream boundaries either aren’t clear upfront or you think will never cleanly line up neatly in terms of event streams

You might want to adopt DCB style Event Sourcing from the get go, then use the HSTORE flavor of Marten DCB to be more performant. To get started, just opt into that style of DCB storage like this:

var builder = Host.CreateApplicationBuilder();
builder.Services.AddMarten(opts =>
{
opts.Connection(builder.Configuration.GetConnectionString("postgres"));
// Marten does need to know about the possible tag types
// upfront, and we rely on value types for this
opts.Events.RegisterTagType<StudentId>("student");
opts.Events.RegisterTagType<CourseId>("course");
// This is all you need to do, but this does assume
// that the HSTORE extension is available
opts.Events.DcbStorageMode = DcbStorageMode.HStore;
});

The HSTORE style of DCB is a big performance improvement if you are querying events by two or more tag values at a time, which I’d probably argue is the only time DCB is worthwhile to use from a logical structure perspective anyway:)

I do owe the Critter Stack community a better YouTube video and blog post on using DCB, but for now, I’ll actually send you to the Wolverine documentation on command handlers using DCB with Marten for more examples and context.

Summary

I was the technical lead of a very successful software project for supply chain management in the early 2000’s. The most popular feature within that early web application was a last minute throw in report that I did as a favor for our business contact that wasn’t even part of our original specification. Once the system was live and folks found out about that report, we actually had to add new servers to the application cluster to keep up with the unexpected load just because that one single report was so popular with supply chain analysts.

My only point there is that I’m not always sure what features will actually resonate with users. Sometimes you know based on reported friction that a new feature will eliminate a pain point, but with the DCB support in Marten and Wolverine, I flat out don’t know. DCB is very popular in the Event Sourcing community outside of the Critter Stack, and it was clear we had to have that feature set just to be competitive from an adoption perspective, but I’m not seeing a lot of interest in DCB from our existing community as Marten is much more able to handle more flexible transactional consistency across event streams than specialist Event Store databases seem to be able to do.

But, if DCB is something you’re interested in or just works much easier for your mental model of how the domain should be modeled in your system, Marten has you covered!



Read the whole story
alvinashcraft
33 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

How soon is now in PostgreSQL?

1 Share

cover

How soon is now? In PostgreSQL, it’s not always as soon as you’d think. I learned that the hard way recently, so you don’t have to.

It took me hours and wasn’t easy to reproduce, even though the fix is one line. I found it in a Cybertec post, as I quite often do when I’m staring at something odd in PostgreSQL. I’m supposed to know my way around the database, but I missed it, which is another reason I want to write this down.

I was working on distributed locking in Emmett. When you scale a service horizontally, you can easily end up with two instances of the same message processor running at once. That’s bad. Both instances would pull the same events, both would write to the same projection storage, and we’d get duplicated side effects, overwritten state and broken checkpoints. So we need to guarantee that exactly one instance of each processor is active at any time. Emmett does that using two things working hand in glove: PostgreSQL advisory locks and a row in the emt_processors table. The row keeps the durable side of ownership: which instance currently holds the processor (processor_instance_id), when it last checked in (last_updated), and what state it’s in (status). I described the full design in Rebuilding Event-Driven Read Models in a safe and resilient way, so I won’t bore you with the whole picture here.

For this story, the part that matters is what happens when an instance crashes. The crashed processor’s connection is gone, so its advisory lock has already been released. A new instance can grab the advisory lock without resistance. But the row in emt_processors still says status = 'running' and still points to the previous owner, because the crash didn’t give anyone a chance to clean it up.

From the outside, we can’t tell whether the previous owner has crashed or is just between heartbeats. So we wait. If the row’s last_updated is older than a configurable timeout, the new instance is allowed to claim ownership anyway. Anyone quiet for that long is treated as gone. To make this graceful, the lock acquisition runs inside a retry policy. A fresh instance starting just after a crash doesn’t fail straight away; it retries until the timeout window expires.

The bug

The takeover decision lives in the upsert against emt_processors. In the real function, that upsert sits inside a Common Table Expression (CTE) alongside a pg_try_advisory_xact_lock call.

For the record: the snippets below skip that wrapping (and trim a couple of unused parameters) to keep the focus on the upsert, where the bug lives. The full version is in the source.

CREATE OR REPLACE FUNCTION emt_try_acquire_processor_lock(
    p_processor_id           TEXT,
    p_processor_instance_id  TEXT,
    p_lock_timeout_seconds   INT
)
RETURNS BOOLEAN
LANGUAGE plpgsql
AS $$
BEGIN
  INSERT INTO emt_processors (processor_id, processor_instance_id, status, last_updated)
  VALUES (p_processor_id, p_processor_instance_id, 'running', now())
  ON CONFLICT (processor_id) DO UPDATE
  SET processor_instance_id = p_processor_instance_id,
      status                = 'running',
      last_updated          = now()
  WHERE   
     -- same instance reconnecting
     emt_processors.processor_instance_id = p_processor_instance_id                      
     -- previous owner stopped cleanly
     OR emt_processors.status = 'stopped'     
     -- previous owner timed out    
     OR emt_processors.last_updated
        < now() - (p_lock_timeout_seconds || ' seconds')::interval;
  RETURN FOUND;
END;
$$;

The last branch is the takeover. It reads naturally: if the previous owner hasn’t checked in for longer than the timeout, the new instance can replace them. All tests were green. Stop me if you think you’ve heard this one before. The problem surfaced through user feedback (thanks, Martin!), and it took me a long time to reproduce; none of the existing tests covered the scenario that triggered it. Once I had a new end-to-end test that pinpointed the symptom, the rest was the usual, long, boring, debugging loop.

To see why, open psql and run this:

BEGIN;
SELECT now() AS tx_now, clock_timestamp() AS wall_clock;
SELECT pg_sleep(2);
SELECT now() AS tx_now, clock_timestamp() AS wall_clock;
COMMIT;

You’ll get something like:

            tx_now             |          wall_clock
-------------------------------+-------------------------------
 2026-05-25 10:00:00.123456+00 | 2026-05-25 10:00:00.124012+00

            tx_now             |          wall_clock
-------------------------------+-------------------------------
 2026-05-25 10:00:00.123456+00 | 2026-05-25 10:00:02.131845+00

The first column is the same in both rows. The second one is two seconds apart. As it turns out, now() is a synonym for transaction_timestamp(): it returns the time the transaction began, and keeps returning that value for every statement inside the same transaction. A light that never goes out, in other words. clock_timestamp() reads the wall clock each time it’s called, so it advances as time does. Cybertec wrote a good walkthrough of the whole family of timestamp functions if you want the full picture.

What difference does it make? For a column like last_updated, the constancy of now() is usually what you want: every row touched in the same transaction shares a single timestamp, which keeps audit logs and write batches coherent. For asking “has enough time passed?” inside the same transaction, the same constancy works against us.

Now back to the retry. The consumer that calls tryAcquire looks roughly like this:

pool.withTransaction((tx) =>
  asyncRetry(
    () => tryAcquireProcessorLock(tx.execute, options),
    { retries: 10, minTimeout: 200, maxTimeout: 1000 },
  ),
);

pool.withTransaction opens a database transaction and passes its executor to the body. asyncRetry then repeatedly calls the stored procedure on that same executor, with a backoff between attempts. So even though the retries are spread out in real time, every call runs inside the same database transaction:

withTransaction        (transaction starts at T)
  └── asyncRetry
       ├── call lock function    → now() = T
       ├── call lock function    → now() = T   (200 ms later)
       ├── call lock function    → now() = T   (400 ms later)
       └── ...

last_updated < now() - timeout evaluates the same way every iteration. The predicate is effectively constant for the lifetime of that transaction. From the database’s perspective, no time was passing between attempts, even though the retries were spread across real seconds. (of course, the valid question is whether retries should happen inside a transaction, but let’s say that this is out of scope of today’s article, deal?).

So what was the fix? Change the time source inside the function. PL/pgSQL lets you declare local variables, so I added one at the top, initialised from clock_timestamp(), and used it everywhere the function previously called now():

CREATE OR REPLACE FUNCTION emt_try_acquire_processor_lock(
    p_processor_id           TEXT,
    p_processor_instance_id  TEXT,
    p_lock_timeout_seconds   INT
)
RETURNS BOOLEAN
LANGUAGE plpgsql
AS $$
DECLARE
  v_current_time TIMESTAMPTZ := clock_timestamp();
BEGIN
  INSERT INTO emt_processors (processor_id, processor_instance_id, status, last_updated)
  VALUES (p_processor_id, p_processor_instance_id, 'running', v_current_time)
  ON CONFLICT (processor_id) DO UPDATE
  SET processor_instance_id = p_processor_instance_id,
      status                = 'running',
      last_updated          = v_current_time
  WHERE emt_processors.processor_instance_id = p_processor_instance_id
     OR emt_processors.status = 'stopped'
     OR emt_processors.last_updated
        < v_current_time - (p_lock_timeout_seconds || ' seconds')::interval;
  RETURN FOUND;
END;
$$;

clock_timestamp() ignores transaction boundaries. Every call to the stored procedure reads the wall clock fresh, so each retry sees a slightly later value than the last. After enough retries inside the wrapping transaction, the takeover predicate flips, and the new instance wins.

The function uses the timestamp in two places: when setting last_updated on the new owner, and when comparing the previous owner’s last_updated to the timeout. I switched both to v_current_time, so the write and the check read from the same wall clock. Mixing clock_timestamp() on one side and now() on the other would leave a subtler version of the same bug.

A cleaner option for the future is to move the retry one layer up, so each attempt opens its own transaction. That would remove the trap entirely and let the function go back to plain now(). For now, the local variable does the job.

Why my tests didn’t catch it

Now to the testing side. I had a careful suite of integration tests for the stored procedure. Two instances racing for the lock. The same instance reconnects after a crash. Takeover after a custom timeout. They all passed, and they could not have caught this bug. Here’s the shape of a typical one:

await pool.withTransaction((connection) =>
  lock.tryAcquire({ execute: connection.execute }),
);

That’s the shape: set up some state, call tryAcquire once, check the result. A single call to the stored procedure works fine with now() in the WHERE clause. There is only one timestamp involved per call, and the predicate evaluates correctly against it. The bug only shows up when several calls share one transaction, which happens when the consumer’s withTransaction wraps a retry policy. The stored-procedure tests never put those two together.

The end-to-end consumer tests do go through the consumer’s withTransaction wrapper, but they covered the happy paths: clean start, clean stop, two consumers competing, and an instance reclaiming its own stale lock. None of them combined the three conditions that together expose the bug:

  1. a previous owner whose row still says status = 'running' (a crash, not a graceful stop),
  2. a new instance arriving with a different instance ID,
  3. a retry acquisition policy with a timeout short enough for the retries to outlast it inside the test’s deadline.

Any one of those missing and the takeover predicate either succeeded on the first attempt (so the retry never fired), or the test finished before the retry’s failure to make progress was visible.

So why did this slip through both layers? The stored-procedure tests never combined a retry policy with the stale-row state, so they never produced multiple calls inside one transaction. The end-to-end tests did exercise the retry path, but none of them happened to combine all three conditions above at once. Both layers had blind spots, and the bug lived exactly where they overlapped.

The Pull Request that fixes the bug also adds a new family of tests that mount tryAcquire under the same transactional wrapper the real consumer uses, with the crash + new-instance + retry combination wired up on purpose. That’s the kind of test I should have had from the start.

What I’m taking away

Two things about now():

  • now() is the right tool when you want every row touched in the same transaction to share a timestamp. created_at, last_updated, audit columns. That stability is a feature.
  • now() is the wrong tool when you want to ask “has time moved on?” from inside a transaction. Use clock_timestamp() when you genuinely mean the wall clock.

And one harder thing about tests. Inner tests for stored procedures give me a tight feedback loop and pinpoint failures, but only inside the scaffolding I build for them. End-to-end tests run the real wiring, but I can’t enumerate every combination of timeouts, instance IDs and crash states without the suite collapsing under its own weight. The combination where this bug lived wasn’t reachable from either side by accident; it needed a deliberate setup.

I don’t have a clean rule for where to draw that line, and honestly, I don’t think there is one. My takeaway is to look at the seam: the spot where the inner test invokes the code differently from how the production caller invokes it. Here, it was a single-call test against a retry loop sharing one transaction. Wherever that gap sits, write a test there. Not at the unit level, not at the full end-to-end level, but in a setup that mirrors how the real caller actually drives the code and exercises the path you care about.

TLDR

now() returns the start of the current transaction, not the current moment. Inside a transaction it doesn’t change between statements. If you wrap a retry loop around a function that uses now() in a WHERE clause, and the retry loop runs inside one transaction, the predicate is frozen and the retries do nothing. Use clock_timestamp() when you mean “right now”. And pay extra attention to the seam between inner and end-to-end tests, because that’s where mismatches between how tests drive the code and how production drives it tend to hide.

The fix and the new tests live in Emmett Pull Request #339.

Uff. That bug was nasty.

Read also:

Or my other articles about PostgreSQL:

Cheers!

Oskar

p.s. Ukraine is still under brutal Russian invasion. A lot of Ukrainian people are hurt, without shelter and need help. You can help in various ways, for instance, directly helping refugees, spreading awareness, putting pressure on your local government or companies. You can also support Ukraine by donating e.g. to Red Cross, Ukraine humanitarian organisation or donate Ambulances for Ukraine.

Read the whole story
alvinashcraft
39 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Harness, Scaffold, and the AI Agent Terms Worth Getting Right

1 Share
Read the whole story
alvinashcraft
56 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Will Big Tech Layoffs Bring a Culture Shift to Anxiety and Job Insecurity?

1 Share
Tech industry layoffs may be worse at large tech companies than the rest of the IT industry. The New York Times argues those layoffs have now shifted the culture at Big Tech companies, after interviewing more than two dozen of their workers. "Cooperation and collegiality are on the wane; chumminess between employees and managers has cooled as mutual suspicion pervades their relationships; and a throbbing economic anxiety infects almost every conversation. "Perhaps no site on the internet reflects this transformation more vividly than Blind, where users can post in private channels restricted to employees of a single company, or public channels visible to anyone..." Since 2022, large tech companies have collectively laid off more than 150,000 workers, unraveling what many tech workers once perceived as a guarantee of affluence and employability. The threat of being replaced by artificial intelligence has loomed over those who remain. This year alone, Amazon has indicated that it is laying off more than 15,000 workers, Block 4,000, Meta 8,000 and Oracle an estimated 30,000... By most measures, the sentiments that Blind tracks have taken a turn for the worse. During the nearly four years before tech companies began major layoffs in the fall of 2022, Meta and Microsoft employees posted about career success — topics like how to maximize their salary or win promotions — more than four times as often as they posted about job insecurity, according to Blind. Since then, the ratios have lurched in the opposite direction: Meta and Microsoft employees have posted about job insecurity roughly 1.5 times as often as they post about success... The shift has had practical effects. A Meta employee said in an interview that some workers on her team now used less vacation time and that, in a break with custom, people frequently checked on their projects while on vacation. They increasingly worry about getting a poor performance review or losing their job if they aren't constantly available. The employee, who declined to be identified for fear of retribution, said she and many of her colleagues frequently checked Blind because it could be comforting to see how many other Meta workers shared their anxieties. Employees at several companies said in interviews that their morale was further undermined by the feeling that the layoffs were abrupt and arbitrary, and executed with little empathy. Several tech workers said it was the scarcity of information about possible layoffs that raised their cortisol levels and made it difficult to focus on their jobs. They often fill the vacuum by turning to Blind, which, in addition to posts by workers, features a "tech layoff tracker" that lists both layoff rumors and those it has confirmed. "I was on Blind five days a week," said Faith Wilkins El, a software engineer who was laid off from Oracle in late March, after more than four years at the company. Wilkins El, who is part of the Oracle Workers Collective, a group seeking better severance agreements with the company, said navigating Blind was sometimes stressful because it was hard to know what was true or false. (Blind says it has a security team to weed out bad actors, like those who may try to register under fake email addresses.) Still, she found it more helpful than not because the layoffs came as less of a shock after she spent time on the site. "I was trying to get prepared mentally," she said. Blind is capitalizing on the increased interest with new products. It plans to unveil a service called Blind AI, which will allow employers to simulate their workers' reactions to certain changes, like a stricter in-office mandate. And it is close to releasing a feature to alert users that layoffs are imminent.

Read more of this story at Slashdot.

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Let’s Take A Look Inside Adobe’s Complete Career Ladder

1 Share

We pulled data from job postings, Levels.fyi, Glassdoor, and Blind.

Put together, they form a consistent picture – not an official ladder, but a very real one that shows how engineers grow, gain influence, and move from writing code to shaping entire systems.

Compensation reveals the real hierarchy

At its core, Adobe uses a P-level system (Professional levels) that maps engineering growth from entry-level roles to company-wide technical leadership.

Adobe doesn’t hand out impressive titles quickly. But behind the modest titles, what’s actually expected of you keeps growing at every level. The ladder looks flat from the outside. From the inside, the gap between levels is real.

If titles are understated, compensation isn’t: across multiple sources, total compensation for software engineers at Adobe ranges from roughly $150K at the entry level to well above $500K at the top of the individual contributor track.

The numbers matter, but so does the curve. Pay grows steadily through early and mid-level roles and then jumps sharply after senior. That’s where Adobe starts paying for influence, not just output.

Senior level: where leverage begins

At P10 and P20, the job is straightforward: ship code, learn the systems, and figure out how Adobe builds and scales things. The goal is to become someone the team can rely on.

By P30, something shifts. Engineers stop executing tasks and start owning problems (taking a feature end-to-end), making real technical calls, and thinking about why something should be built, not just how.

At P40, the job changes for real. Senior engineers design systems, not just features. They cross team boundaries, shape architectural decisions, and lead bigger initiatives. For many, this is a long-term home – the next step demands a fundamentally different kind of growth.

Staff: the real career breakpoint

The jump from Senior (P40) to Staff (P50) is the most important one on the ladder. Same title family, completely different job.

Staff engineers operate as technical leaders without formal authority. They define architecture, guide technical direction, and shape roadmaps across teams. At Staff, you’re measured by what others can build because of you and compensation starts to reflect that.

Beyond Staff, engineering becomes increasingly strategic. Senior Staff engineers (P55) operate across domains, aligning engineering efforts with business goals and driving initiatives that span multiple teams.

Principal engineers (P60) move to a company-wide level of influence. They define technical vision, tackle ambiguous problems, and shape decisions that impact entire product lines. At this level, engineering is less about building and more about direction-setting.

Cross-company level mapping

One useful way to understand Adobe’s ladder is to map it against more transparent systems at companies like Microsoft. While titles and expectations vary slightly, the underlying progression is broadly aligned across Big Tech. Adobe’s levels tend to appear slightly compressed in naming, but comparable in scope, especially from Staff level onward.

The important nuance is that while the mapping is directionally accurate, scope matters more than exact title equivalence. A P50 at Adobe may operate closer to a strong L6 at Google or even edge into L7 territory, depending on the organization, reinforcing the idea that Adobe’s ladder is less about labels and more about impact.

What Adobe actually values?

One pattern runs through the whole ladder: scope drives everything.

  • Early levels – Can you execute?
  • Mid levels – Can you own?
  • Senior – Can you design systems?
  • Staff+ – Can you influence outcomes across teams?

That’s the real progression. The ladder feels invisible from the outside because titles aren’t the point — expanding impact is.

Adobe’s ladder stands out for how quietly it operates. No playbook, no loud framing, just one consistent logic: as you grow, you move from writing code to shaping systems to shaping decisions. At the top, one question defines everything: how much of the company changes because of your work?

The post Let’s Take A Look Inside Adobe’s Complete Career Ladder appeared first on ShiftMag.

Read the whole story
alvinashcraft
2 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories