Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
152342 stories
·
33 followers

Web dependencies are broken. Can we fix them?

1 Share
A cartoon where JS is shown hanging out drinking with bundlers. Webpack is saying “just one more config!” and Vite and Rollup are saying “We can optimize this!”. The event looks like it's winding down after hours of depravity. On the right side, a group of fed up developers are holding up an “Intervention” banner.
No, this is not another rant about npm’s security issues.

Abstraction is the cornerstone of modern software engineering. Reusing logic and building higher-level solutions from lower-level building blocks is what makes all the technological wonders around us possible. Imagine if every time anyone wrote a calculator they also had to reinvent floating-point arithmetic and string encoding!

And yet, the web platform has outsourced this fundamental functionality to third-party tooling. As a result, code reuse has become a balancing of tradeoffs that should not have existed in the first place.

In NodeJS, you just npm install and reference specifiers straight away in your code. Same in Python, with pip install. Same in Rust with cargo add. In healthy ecosystems you don’t ponder how or whether to use dependencies. The ecosystem assumes dependencies are normal, cheap, and first-class. You just install them, use them, and move on. “Dependency-free” is not a badge of honor.

Instead, dependency management in the web platform consists of bits and bobs of scattered primitives, with no coherent end-to-end solution. Naturally, bundlers such as Webpack, rollup, and esbuild have picked up the slack, with browserify being the one that started it all, in 2012.

There is nothing wrong with bundlers when used as a performance optimization to minimize waterfall effects and overhead from too many HTTP requests. You know, what a bundler is supposed to do. It is okay to require advanced tools for advanced needs, and performance optimization is generally an advanced use case. Same for most other things bundlers and build tools are used for, such as strong typing, linting, or transpiling. All of these are needs that come much later than dependency management, both in a programmer’s learning journey, as well as in a project’s development lifecycle.

Dependency management is such a basic and ubiquitous need, it should be a part of the platform, decoupled from bundling. Requiring advanced tools for basic needs is a textbook usability cliff. In other ecosystems, optimizations happen (and are learned) after dependency resolution. On the web, optimization is the price of admission! This is not normal.

Bundlers have become so ubiquitous that most JS developers cannot even imagine deploying code without them. READMEs are written assuming a bundler, without even mentioning the assumption. It’s just how JS is consumed. My heart breaks for the newbie trying to use a drag and drop library, only to get mysterious errors about specifiers that failed to resolve.

However, bundling is not technically a necessary step of dependency management. Importing files through URLs is natively supported in every browser, via ESM imports. HTTP/2 makes importing multiple small files far more reasonable than it used to be — at least from a connection overhead perspective. You can totally get by without bundlers in a project that doesn’t use any libraries.

But the moment you add that first dependency, everything changes. You are suddenly faced with a huge usability cliff: which bundler to use, how to configure it, how to deploy with it, a mountain of decisions standing between you and your goal of using that one dependency. That one drag and drop library. For newcomers, this often comes very early in their introduction to the web platform, and it can be downright overwhelming.

Dependencies without bundlers, today?

It is technically possible to use dependencies without bundlers, today. There are a few different approaches, and — I will not sugarcoat it — they all suck.

There are three questions here:

  1. Use specifiers or URLs?
  2. How to resolve specifiers to URLs?
  3. Which URL do my dependencies live at?

There is currently no good answer to any of them, only fragile workarounds held together by duct tape.

Using a dependency should not need any additional song and dance besides “install this package” + “now import it here”. That’s it. That’s the minimum necessary to declare intent. And that’s precisely how it works in NodeJS and other JS runtimes. Anything beyond that is reducing signal-to-noise ratio, especially if it needs to be done separately for every project or worse, for every dependency.

You may need to have something to bite hard on while reading the next few sections. It’s going to be bad.

Rawdogging node_modules/ imports

Typically, package managers like npm take care of deduplicating compatible package versions and may use a directory like node_modules to install packages. In theory, one could deploy node_modules/ as part of their website and directly reference files in client-side JS. For example, to use Vue:

import { createApp } from "../node_modules/vue/dist/vue.esm-browser.js";

It works out of the box, and is a very natural thing to try the first time you install a package and you notice node_modules. Great, right?

No. Not great.

First, deploying your entire node_modules directory is both wasteful, and a security risk. In fact, most serverless hosts (e.g. Netlify or Vercel) automatically remove it from the publicly deployed files after the build is finished.

Additionally, it violates encapsulation: paths within a package are generally seen as an implementation detail of the package itself, and packages expose specifier exports like vue or colorjs.io/fn that they map to internal paths. If you decide to circumvent this and link to files directly, you now need to update your import paths whenever you update the package.

It is also fragile, as not every module is installed directly in node_modules/ — though those explicitly marked as app dependencies are.

Importing from public CDNs

Another common path is importing from CDNs like Unpkg and JSDelivr. For Vue, it would look like this:

import { createApp } from "https://unpkg.com/vue@3/dist/vue.esm-browser.js";

It’s quick and easy. Nothing to install or configure! Great, right?

No. Not great.

It is always a bad idea to introduce a dependency on a whole other domain you do not control, and an even worse one when linking to executable code.

First, there is the obvious security risk. Unless you link to a specific version, down to the patch number and/or use SRI, the resource could turn malicious overnight under your nose if the package is compromised. And even if you link to a specific version, there is always the risk that the CDN itself could get compromised. Who remembers polyfill.io?

But even supply-chain attacks aside, any third-party domain is an unnecessary additional point of failure. I still remember scrambling to change JSDelivr URLs to Unpkg during an outage right before one of my talks, or having to hunt down all my repos that used RawGit URLs when it sunset, including many libraries.

The DX is also suboptimal. You lose the immediacy and resilience of local, relative paths. Without additional tooling (Requestly, hosts file edits, etc.), you now need to wait for CDN roundtrips even during local development. Wanted to code on a flight? Good luck. Needed to show a live demo during a talk, over clogged conference wifi? Maybe sacrifice a goat to the gods first.

And while they maintain encapsulation slightly better than raw file imports, as they let you reference a package by its name for its default export, additional specifiers (e.g. packagename/fn) typically still require importing by file path.

“But with public CDNs, I benefit from the resource having already been cached by another website the user visited!”
Oh my sweet summer child. I hate to be the one to break it to you, but no, you don’t, and that has been the case since about 2020. Double keyed caching obliterated this advantage.

node_modules imports locally + rewrite to CDN remotely

A quick and dirty way to get local URLs for local development and CDN URLs for the remote site is to link to relative ./node_modules URLs, and add a URL rewrite to a CDN if that is not found. E.g. with Netlify rewrites this looks like this:

node_modules/:modulename/* https://cdn.jsdelivr.net/npm/:modulename@latest/:splat 301

Since node_modules is not deployed, this will always redirect on the remote URL, while still allowing for local URLs during development. Great, right?

No. Not great.

Like the mythical hydra, it solves one problem and creates two new ones.

First, it still carries many of the same issues of the approaches it combines:

  • Linking to CDNs is inherently insecure
  • It breaks encapsulation of the dependencies

Additionally, it introduces a new problem: the two files need to match, but the naïve approach above would always just link to the latest version.

Sure, one could alleviate this by building the _redirects file with tooling, to link to specific versions, read from package-lock.json. But the point is not that it’s insurmountable, but that it should not be this hard.

Copy packages or exports to local directory

Another solution is a lightweight build script that copies either entire packages or specific exports into a directory that will actually get deployed. When dependencies are few, this can be as simple as an npm script:

{
	"scripts": {
		"lib": "cp node_modules/vue/dist/vue.esm-browser.js common/lib/vue.js",
		"build": "npm run lib"
	}
}

So now we have our own nice subset of node_modules/ and we don’t depend on any third-party domains. Great, right?

No. Not great.

Just like most other solution, this still breaks encapsulation, forcing us to maintain a separate, ad-hoc index of specifiers to file paths.

Additionally, it has no awareness of the dependency graph. Dependencies of dependencies need to be copied separately. But wait a second. Did I say dependencies of dependencies? How would that even work?

Dependencies that use dependencies

In addition to their individual flaws, all of the solutions above share a major flaw: they can only handle importing dependency-free packages. But what happens if the package you’re importing also uses dependencies? It gets unimaginably worse my friend, that’s what happens.

There is no reasonable way for a library author to link to dependencies without excluding certain consumer workflows. There is no local URL a library author can use to reliably link to dependencies, and CDN URLs are highly problematic. Specifiers are the only way here.

So the moment you include a dependency that uses dependencies, you’re forced into specifier-based dependency management workflows, whether these are bundlers, or import map flavored JSON vomit in every single HTML page (discussed later).

“Browser” bundles

As a fig leaf, libraries will often provide a “browser” bundle that consumers can import instead of their normal dist, which does not use specifiers. This combines all their dependencies into a single dependency-free file that you can import from a browser. This means they can use whatever dependencies they want, and you can still import that bundle using regular ESM imports in a browser, sans bundler. Great, right?

No. Not great.

It’s called a bundle for a reason. It bundles all their dependencies too, and now they cannot be shared with any other dependency in your tree, even if it’s exactly the same version of exactly the same package. You’re not avoiding bundling, you’re outsourcing it, and multiplying the size of your JS code in the process.

And if the library author has not done that, you’re stuck with little to do, besides a CDN that rewrites specifiers on the fly like esm.sh, with all CDN downsides described above.


As someone who regularly releases open source packages (some with billions of npm installs), I find this incredibly frustrating. I want to write packages that can be consumed by people using or not using bundlers, without penalizing either group, but the only way to do that today is to basically not use any dependencies. I cannot even modularize my own packages without running into this! This doesn’t scale.

But won’t import maps solve all our problems?

Browsers can import specifiers, as long as the mapping to a URL is explicitly provided through an import map. Import maps look like this:

<script type="importmap">
{
	"imports": {
		"vue": "./node_modules/vue/dist/vue.runtime.esm-bundler.js",
		"lodash": "./node_modules/lodash-es/lodash.js",
	}
}
</script>

Did you notice something? Yes, this is an HTML block. No, I cannot link to an import map that lives in a separate file. Instead, I have to include the darn thing in. Every. Single. Page. The moment you decide to use JS dependencies, you now need an HTML templating tool as well. 🙃

“💡 Oh I know, I’ll generate this from my library via DOM methods!” I hear you say. No, my sweet summer child. It needs to be present at parse time. So unless you’re willing to document.write() it (please don’t), the answer is a big flat NOPE.

“💡 Ok, at least I’ll keep it short by routing everything through a CDN or the same local folder” No, my sweet summer child. Go to sleep and dream of globs and URLPatterns. Then wake up and get to work, because you actually need to specify. Every. Single. Mapping. Yes, transitive dependencies too.

Yo dawg meme with the text “Yo Dawg, I heard you like dependencies, so I put the deps of your deps in your import maps” Ursula from The Little Mermaid singing “It’s sad, but true. If you want to use dependencies, my sweet you gotta pay the toll” to the tune of “Poor Unfortunate Souls”

You wanted to use dependencies? You will pay with your blood, sweat, and tears. Or, well, another build tool.

So now I need a build tool to manage the import map, like JSPM. It also needs to talk to my HTML templating tool, which I now had to add so it can spit out these import maps on. Every. Single. HTML. Page.

There are three invariants that import maps violate:

  1. Locality: Dependency declarations live in HTML, not JS. Libraries cannot declare their own dependencies.
  2. Composability: Import maps do not compose across dependencies and require global coordination
  3. Scalability: Mapping every transitive dependency is not viable without tooling

Plus, you still have all of the issues discussed above, because you still need URLs to link to. By trying to solve your problem with import maps, you now got multiple problems.

To sum up, in their current form, import maps don’t eliminate bundlers — they recreate them in JSON form, while adding an HTML dependency and worse latency.

Are bundlers the lesser evil?

Given the current state of the ecosystem, not using bundlers in any nontrivial application does seem like an exercise in masochism. Indeed, per State of JS 2024, bundlers were extremely popular, with Webpack having been used by 9 in 10 developers and having close to 100% awareness! But sorting by sentiment paints a different picture, with satisfaction, interest, and positivity dropping year after year. Even those who never question the status quo can feel it in their gut that this is not okay. This is not a reasonable way to manage dependencies. This is not a healthy ecosystem.

Out of curiosity, I also ran two polls on my own social media. Obviously, this suffers from selection bias, due to the snowball sampling nature of social media, but I was still surprised to see such a high percentage of bundle-less JS workflows:

I’m very curious how these folks manage the problems discussed here.

Oftentimes when discussing these issues, I get the question “but other languages are completely compiled, why is it a problem here?”. Yes, but their compiler is official and always there. You literally can’t use the language without it.

The problem is not compilation, it’s fragmentation. It’s the experience of linking to a package via a browser import only to see errors about specifiers. It’s adding mountains of config and complexity to use a utility function. It’s having no clear path to write a package that uses another package, even if both are yours.

Abstraction itself is not something to outsource to third-party tools. This is the programming equivalent of privatizing fundamental infrastructure — roads, law enforcement, healthcare — systems that work precisely because everyone can rely on them being there.

Like boiling frogs, JS developers have resigned themselves to immense levels of complexity and gruntwork as simply how things are. The rise of AI introduced swaths of less technical folks to web development and their overwhelm and confusion is forcing us to take a long hard look at the current shape of the ecosystem — and it’s not pretty.

Few things must always be part of a language’s standard library, but dependency management is absolutely one of them. Any cognitive overhead should be going into deciding which library to use, not whether to include it and how.

This is also actively harming web platform architecture. Because bundlers are so ubiquitous, we have ended up designing the platform around them, when it should be the opposite. For example, because import.meta.url is unreliable when bundlers are used, components have no robust way to link to other resources (styles, images, icons, etc.) relative to themselves, unless these resources can be part of the module tree. So now we are adding features to the web platform that break any reasonable assumption about what HTML, CSS, and JS are, like JS imports for CSS and HTML, which could have been a simple fetch() if web platform features could be relied on.

And because using dependencies is nontrivial, we are adding features to the standard library that could have been userland or even browser-provided dependencies.

To reiterate, the problem isn’t that bundlers exist — it’s that they are the only viable way to get first-class dependency management on the web.

JS developers deserve better. The web platform deserves better.

Where do we go from here?

As a web standards person, my first thought when spotting such a lacking is “how can the web platform improve?”. And after four years in the TAG, I cannot shake the holistic architectural perspective of “which part of the Web stack is best suited for this?”

Specifiers vs URLs

Before we can fix this, we need to understand why it is the way it is. What is the fundamental reason the JS ecosystem overwhelmingly prefers specifiers over URLs?

On the surface, people often quote syntax, but that seems to be a red herring. There is little DX advantage of foo (a specifier) over ./foo.js (a URL), or even ./foo (which can be configured to have a JS MIME type). Another oft-cited reason is immutability: Remote URLs can change, whereas specifiers cannot. This also appears to be a red herring: local URLs can be just as immutable as specifiers.

Digging deeper, it seems that the more fundamental reason has to do with purview. A URL is largely the same everywhere, whereas foo can resolve to different things depending on context. A specifier is app-controlled whereas a URL is not. There needs to be a standard location for a dependency to be located and referenced from, and that needs to be app-controlled.

Additionally, specifiers are universal. Once a package is installed, it can be imported from anywhere, without having to work out paths. The closest HTTP URLs can get to this is root-relative URLs, and that’s still not quite the same.

Specifiers are clearly the path of least resistance here, so the low hanging fruit would be to make it easier to map specifiers to URLs, starting by improving import maps.

Improving import maps

An area with huge room for improvement here is import maps. Both making it easier to generate and include import maps, and making the import maps themselves smaller, leaner, and easier to maintain.

External import maps

The biggest need here is external import maps, even if it’s only via <script type=importmap src>. This would eliminate the dependency on HTML templating and opens the way for generating them with a simple build tool.

This was actually part of the original import map work, and was removed from the spec due to lack of implementer interest, despite overwhelming demand. In 2022, external import maps were prototyped in WebKit (Safari), which prompted a new WHATWG issue. Unfortunately, it appears that progress has since stalled once more.

Import maps without HTML?

External import maps do alleviate some of the core pain points, but are still globally managed in HTML, which hinders composability and requires heavier tooling.

What if import maps could be imported into JS code? If JS could import import maps, (e.g. via import "map.json" with { type: "importmap" }), this would eliminate the dependency on HTML altogether, allowing for scripts to localize their own import info, and for the graph to be progressively composed instead of globally managed.

Import maps via HTTP header?

Going further, import maps via an HTTP header (e.g. Link) would even allow webhosts to generate them for you and send them down the wire completely transparently. This could be the final missing piece for making dependencies truly first-class.

Imagine a future where you just install packages and use specifiers without setting anything up, without compiling any files into other files, with the server transparently handling the mapping!

Deploying dependencies to URLs

However, import maps need URLs to map specifiers to, so we also need some way to deploy the relevant subset of node_modules to public-facing URLs, as deploying the entire node_modules directory is not a viable option.

clientDependencies in package.json?

One solution might be a way to explicitly mark dependencies as client side, possibly even specific exports. This would decouple detection from processing app files: in complex apps it can be managed via tooling, and in simple apps it could even be authored manually, since it would only include top-level dependencies.

Figuring out the dependency graph

Even if we had better ways to mark which dependencies are client-side and map specifiers to URLs, these are still pieces of the puzzle, not the entire puzzle. Without a way to figure out what depends on what, transitive dependencies will still need to be managed globally at the top level, defeating any hope of a tooling-light workflow.

The current system relies on reading and parsing thousands of package.json files to build the dependency graph. This is reasonable for a JS runtime where the cost of file reads is negligible, but not for a browser where HTTP roundtrips are costly. And even if it were, this does not account for any tree-shaking.

Defining specifiers as a type of URL?

Think of how this works when using URLs: modules simply link to other URLs and the graph is progressively composed through these requests. What if specifiers could work the same way? What if we could look up and route specifiers when they are actually imported?

Here’s a radical idea: What if specifiers were just another type of URL, and specifier resolution could be handled by the server in the same way a URL is resolved when it is requested? They could use a specifier: protocol, that can be omitted in certain contexts, such as ESM imports.

How would these URLs be different than regular local URLs?

  • Their protocol would be implied in certain contexts — that would be how we can import bare specifiers in ESM
  • Their resolution would be customizable (e.g. through import maps, or even regular URL rewrites)
  • Despite looking like absolute URLs, their resolution would depend on the request’s Origin header (thus allowing different modules to use different versions of the same dependency). A request to a specifier: URL without an Origin header would fail.
  • HTTP caching would work differently; basically in a way that emulates the current behavior of the JS module cache.

Architecturally, this has several advantages:

  • It bridges the gap between specifiers and URLs. Rather than having two entirely separate primitives for linking to a resource, it makes specifiers a high-level primitive and URLs the low-level primitive that explains it.
  • It allows retrofitting specifiers into parts of the platform that were not designed for them, such as CSS @import. This is not theoretical: I was at a session at TPAC where bringing specifiers to CSS was discussed. With this, every part of the platform that takes URLs can now utilize specifiers, it would just need to specify the protocol explicitly.

Obviously, this is just a loose strawman at this point, and would need a lot of work to turn into an actual proposal (which I’d be happy to help out with, with funding), but I suspect we need some way to bridge the gap between these two fundamentally different ways to import modules.

Too radical? Quite likely. But abstraction is foundational, and you often need radical solutions to fix foundational problems. Even if this is not the right path, I doubt incremental improvements can get us out of this mess for good.

But in the end, this is about the problem. I’m much more confident that the problem needs solving, than I am of any particular solution. Hopefully, after reading this, so are you.

So this is a call to action for the community. To browser vendors, to standards groups, to individual developers. Let’s fix this! 💪🏼

Thanks to Jordan Harband, Wes Todd, and Anne van Kesteren for reviewing earlier versions of this draft.


  1. In fact, when I was in the TAG, Sangwhan Moon and I drafted a Finding on the topic, but the TAG never reached consensus on it. ↩︎

Read the whole story
alvinashcraft
32 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Lessons Learned From Real-World NoSQL Database Migrations

1 Share
A flock of birds against an orange sky.

In “Battle-Tested Tips for a Better NoSQL Migration,” I shared my top strategies for planning, executing and de-risking a NoSQL database migration. I discussed key steps like schema and data migration, data validation and important considerations such as technology switches, tooling, edge cases and the idea that you might not need to migrate all your data.

Now, let’s analyze how teams actually migrated their data — what challenges they faced, trade-offs, how they proceeded and lessons learned. These are all real-world examples with names and identifying details obfuscated.

Streaming Bulk Load (DynamoDB to ScyllaDB)

First example: A large media streaming company that decided to switch from DynamoDB to ScyllaDB to reduce costs.

One interesting aspect of this use case is that the team had an ingestion process that overwrote their entire data set daily. As a result, there was no requirement to forklift their data from one database to another. They could just configure their ingestion job to write to ScyllaDB in addition to DynamoDB.

As soon as the job kicked in, data was stored in both databases. Since DynamoDB and ScyllaDB data models are so similar, that greatly simplified the process. It’s more complex when switching from a document store or a relational database to wide-column NoSQL.

As I mentioned in the previous article, a migration from one technology to another almost always requires making some changes. Even with similar databases, features and inner workings vary. Some of this team’s migration concerns were related to the way ScyllaDB handled out-of-order writes, how they would implement record versioning and the efficiency of data compression. These were all valid and interesting concerns.

The main lesson from this migration is the need to understand the differences between your source database and target databases. Even databases that are quite similar in many respects, such as ScyllaDB and DynamoDB, do have differences that you need to recognize and navigate. As you explore these differences, you may eventually stumble upon room for improvement, which is exactly what happened here.

The use case in question was very susceptible to out-of-order writes. Before we explain how they addressed it, let’s cover what an out-of-order write involves.

Understanding Out-of-Order Writes

Out-of-order writes occur when newer updates arrive before older ones.

For example, assume you’re running a dual-write setup, writing to both your source and target databases at the same time. Then you plug in a migration tool (such as the ScyllaDB Migrator) to start reading data from the source database and writing it to the destination one. The Spark job reads some data from the source database, then the client writes an update to that same data. The client writes the data to the target database first and the Spark job writes it after. The Spark job might overwrite the fresher data. That’s an out-of-order write.

Martin Fowler describes it this way: “An out-of-order event is one that’s received late, sufficiently late that you’ve already processed events that should have been processed after the out-of-order event was received.”

With both Cassandra and ScyllaDB, you can handle these out-of-order writes by using the CQL (Cassandra Query Language) protocol to explicitly set timestamps on writes. In our example, the client update would include a later timestamp than the Spark write, so it would “win” — no matter which arrives last.

This capability doesn’t exist in DynamoDB.

How the Team Handled Out-of-Order Writes in DynamoDB

The team was handling out-of-order writes using DynamoDB’s Condition Expressions, which are very similar to lightweight transactions in Cassandra. However, Condition Expressions in DynamoDB are much more expensive (with respect to performance as well as cost) than regular non-conditional expressions.

How did this team try to circumvent the out-of-order write using ScyllaDB? Initially, they implemented a read-before-write prior to every write. This effectively caused their number of reads to spike.

After we met with them and analyzed their situation, we improved their application and database performance considerably by simply manipulating the timestamp of their writes. That’s the same approach that another customer of ours, Zillow, uses to handle out-of-order events.

Engagement Platform: TTL’d Data (ScyllaDB Self-Managed to ScyllaDB Cloud)

Next, let’s look at a migration across different flavors of the same database: a ScyllaDB to ScyllaDB migration. An engagement platform company decided to migrate from a self-managed on-premises ScyllaDB deployment to the ScyllaDB Cloud managed solution, so we helped them move data over.

No data modeling changes were needed, greatly simplifying the process. Though we initially suggested carrying out an online migration, they chose to take the offline route instead.

Why an Offline Migration?

An offline migration has some clear drawbacks: There’s a data loss window equal to the time the migration takes and the process is rather manual. You have to snapshot each node, copy the snapshots somewhere and then load them into the target system. And if you choose not to dual-write, switching clients is a one-way move; going back would mean losing data.

We discussed those risks upfront, but the team decided that these risks wouldn’t outweigh the benefits and simplicity of doing it offline. (They expected most of their data to be expired with TTL (Time to Live) eventually).

Before the production migration, we tested each step to better understand the potential data loss window.

In most cases, it is also possible to completely shift from data loss to a temporary inconsistency when carrying out an offline migration. After you switch your writers, you simply repeat the migration steps again from the source database (now a read-only system), therefore restoring any data that wasn’t captured as part of the initial snapshot.

A Typical TTL-Based Migration Flow

This team used TTL data to control their data expiration, so let’s discuss how a migration with TTL data typically works.

First, you configure the application clients to do dual-writing but keep the client reading only from the existing source of truth. Eventually, the TTL on that source of truth expires. At this point, you can switch the reads to the new target database and all data should be in sync.

How the Migration Actually Played Out

In this case, the client was only reading and writing against a single existing source of truth. With the application still running, the team took an online snapshot of their data across all nodes. The resulting snapshots were transferred to the target cluster and we loaded the data using Load and Stream (a ScyllaDB extension that builds on the Cassandra nodetool refresh command).

Rather than simply loading the data for the node and discarding the tokens, which the node is not a replica for, Load and Stream actually streams the data to other cluster members. This greatly simplifies the overall migration process. Instead of just loading the data and dropping the tokens that aren’t needed, Load and Stream actually streams the data to other nodes in the cluster.

After the team’s Load and Stream completed, the client simply switched reads and writes over to the new source of truth.

Messaging App: Shadow Cluster (Cassandra to ScyllaDB)

Next, let’s explore how a messaging app company approached the challenge of migrating more than a trillion rows from Cassandra to ScyllaDB.

Since Cassandra and ScyllaDB are API compatible, such migrations shouldn’t require any schema or application changes. However, given the criticality of their data and consistency requirements, an online migration approach was the only feasible option. They needed zero user impact and had zero tolerance for data loss.

Using a Shadow Cluster for Online Migration

The team opted to create a “shadow cluster.” A shadow cluster is a mirror of a production cluster that has the same data (mostly) and receives the same reads and writes. They created it from the disk snapshots from nodes in the corresponding production cluster. Production traffic (both reads and writes) was mirrored to the shadow cluster via a data service that they created for this specific purpose.

With a shadow cluster, they could assess the performance impact of the new platform before they actually switched. It also allowed them to thoroughly test other aspects of the migration, such as longer-term stability and reliability.

The drawbacks? It’s fairly expensive, since it typically doubles your infrastructure costs while you’re running the shadow cluster. Having a shadow cluster also adds complexity to things like observability, instrumentation, potential code changes and so on.

Negotiating Throughput and Latency Trade-offs During Migration

One notable lesson learned from this migration: how important it is to ensure the source system stability during the actual data migration. Most teams just want to migrate their data as fast as possible. However, migrating as fast as possible could affect latencies, and that could be a problem when low latencies are critical to the end users’ satisfaction.

In this team’s case, the solution was to migrate the data as fast as possible, but only up to the point where it started to affect latencies on the source system.

And how many operations per second should you run to migrate? At which level of concurrency? There’s no easy answer here. Really, you have to test.

Wrapping Up

The “best” NoSQL migration approach? As the breadth and diversity of these examples show, the answer is quite simple: it depends. A daily batch ingestion let one team skip the usual migration steps entirely. Another had to navigate TTLs and snapshot timing. And yet another team was really focused on making sure migration didn’t compromise their strict latency requirements. What worked for one team wouldn’t have worked for the next — and your specific requirements will shape your own migration path as well.

I hope these examples provided an interesting peek into the types of trade-offs and technical considerations you’ll face in your own migration. If you’re curious to learn more, I encourage you to browse the library of ScyllaDB user migration stories. For example:

The post Lessons Learned From Real-World NoSQL Database Migrations appeared first on The New Stack.

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Adding Hardcover.app Data to Eleventy

1 Share

It's been far too long since I shared an Eleventy tip, and to be fair what I'm showing today can be used anywhere, but hopefully this will be useful to someone else out there. I enjoy tracking my media consumption, specifically movies and books. For movies I've been real happy with Letterboxd (you can see my profile if you wish). For books, I used Goodreads for a very long time, but have wanted to migrate off the platform and switch to something else. There's alternatives, but none really worked well for me. Earlier this week, an old friend of mine (hi Jason!) suggested Hardcover. This is a Goodreads competitor built, in their own words, out of spite, and I can totally get behind that. I signed up and imported my Goodreads data in about five minutes and while I haven't dug deep into the site at all, it seems totally fine to me so I'll be sticking there. You can find my profile here: https://hardcover.app/@raymondcamden

Ok, you aren't here (I assume) to peruse my books and see how few books I consume (teenage Ray would be embarrassed by the number). The biggest reason I switched to Hardcover was because of their API, which I wanted to use to display it on my Now page. Again, I don't honestly think anyone cares what I'm reading/listening to/watching, but I think it's cool and that's all that matters on my little piece of the Internet.

Their API docs make it incredibly easy to get started, including the ability to quickly run your own requests for testing. Their API is GraphQL based, which I'm a bit rusty with, but I had no trouble getting started. My goal was to simply get my list of books I'm currently reading. To do this, I needed:

  • My user id
  • The status value for a book that is currently being read.

For the first one, I used their link to a GraphQL client and ran this query:

query Test {
    me {
      username
      id
    }
  }

I didn't actually need my username, but it was already there. Anyway, this gave me my user id, 65213.

Next, I needed to know which books were in my "Currently Reading" status and luckily, they literally had a doc page for that, "Getting Books with a Status", that used that particular value. Here's their query:

{
  user_books(
      where: {user_id: {_eq: ##USER_ID##}, status_id: {_eq: 2}}
  ) {
      book {
          title
          image {
              url
          }
          contributions {
              author {
                  name
              }
          }
      }
  }
}

Simple, right? There is one minor nit to keep in mind - their dashboard makes it easy to get your key, but it expires in one year and you can't programatically renew it. My solution? Adding a reminder to my calendar. Ok, now to how I actually used it.

Providing the Data to Eleventy

Here's how I added this to Eleventy, and again, you should be able to port this out anywhere else as well. I added a new file to my _data folder, hardcover_books.js. Per the docs for global data files in Eleventy, whatever my code returns there can be used in my templates as hardcover_books. Here's my implementation:

const HARDCOVER_BOOKS = process.env.HARDCOVER_BOOKS;

export default async function() {

    if(!HARDCOVER_BOOKS) return [];
    let req;

    let body = `
    {
    user_books(
        where: {user_id: {_eq: 65213}, status_id: {_eq: 2}}
    ) {
        book {
            title
            image {
                url
            }
            contributions {
                author {
                    name
                }
            }
        }
    }
    }
    `.trim();

    try {
        req = await fetch('https://api.hardcover.app/v1/graphql', {
            method:'POST', 
            headers: {
                'authorization':HARDCOVER_BOOKS,
                'Content-Type':'application/json'
            },
            body:JSON.stringify({query:body})
        });
    } catch (e) {
        console.log('Hardcover API error', e);
        return [];
    }

    let data = (await req.json()).data.user_books.map(ob => ob.book);
    /* normalize authors */
    data = data.map(b => {
        b.authors = b.contributions.reduce((list,c) => {
            if(c.author) list.push(c.author.name);
            return list;
        },[]);
        return b;
    });

    return data;

    
};

Most of the code is me just calling their API and passing the GraphQL query, nothing special. However, I did want to shape the data a bit before returning it so I simplify it to an array, and then take the complex data of authors and simplify it to a simpler array of strings. Here's an example of how this looks (reduced to two books for length):

[
  {
    title: 'Frankenstein',
    image: {
      url: 'https://assets.hardcover.app/external_data/46789420/6823e1155b2785ae31ac59ccb752c4f33b599b35.jpeg'
    },
    contributions: [
      { author: { name: 'Mary Shelley' } },
      { author: { name: 'Paul Cantor' } }
    ],
    authors: [ 'Mary Shelley', 'Paul Cantor' ]
  },
  {
    title: 'The Business Value of Developer Relations',
    image: {
      url: 'https://assets.hardcover.app/edition/30438817/content.jpeg'
    },
    contributions: [ { author: { name: 'Mary Thengvall' } } ],
    authors: [ 'Mary Thengvall' ]
  },
]

The last bit was adding it to my Now page. I used a simple grid of image cover + titles:


<div class="films">
{% for book in hardcover_books  %}
  <div class="film">
  {% if book.image != null %}
  <img src="https://res.cloudinary.com/raymondcamden/image/fetch/c_fit,w_216/{{book.image.url}}" alt="Cover of {{ book.title }}">
  {% else  %}
  <img src="https://res.cloudinary.com/raymondcamden/image/fetch/c_fit,w_216/https://static.raymondcamden.com/images/no_cover_available.jpg" alt="No Cover Available">
  {% endif %}
  "{{ book.title  }}" by {{ book.authors | join: ', ' }}
  </div>
{% endfor %}
</div>

Pardon the class names there - as I already had CSS for my films, I just re-used them as I was being lazy. Also note that sometimes a book will not have an image cover. On the web site, they use a few different images to handle this, but the API doesn't return that. I generated my own and put it up in my S3 bucket. If you don't feel like clicking over to my Now page, here's how it looks:

screenshot from my list of books

If you would like to see this code in context with the rest of the site, you can find my blog's repo here: https://github.com/cfjedimaster/raymondcamden2023. Let me know if you end up using their API!

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Model Context Protocol Nov 2025 Specification Update: CIMD, XAA, and Security

1 Share
The November 2025 Model Context Protocol (MCP) update introduces Client ID Metadata Documents (CIMD) and Cross App Access (XAA). Learn how these changes improve AI agent security.

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Skylight debuts Calendar 2 to keep your family organized

1 Share
Skylight, known for its digital picture frame, has a new digital product that puts software and AI at the center.
Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

CES 2026 tech you can already buy

1 Share
Belkin’s Charging Case Pro makes some thoughtful tweaks to its previous battery-equipped model, and it’s launching in mid-January. | Image: Belkin

News coming out of CES 2026 might be slowing down, but as of Wednesday we still have writers on the ground, zipping around from hotel suite to Las Vegas convention center to try everything that matters. We've published well over 100 articles, and there's plenty of more content to come, including reviews of stuff we got to see at the show.

As expected, most of the product announcements we've covered don't launch for at least a few months, but some of the products are already available, or will be soon. So, in case you want to get your hands on the freshest tech money can buy, we've compiled where you can buy products that we've written about …

Read the full story at The Verge.

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories