Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
152341 stories
·
33 followers

Lessons Learned From Real-World NoSQL Database Migrations

1 Share
A flock of birds against an orange sky.

In “Battle-Tested Tips for a Better NoSQL Migration,” I shared my top strategies for planning, executing and de-risking a NoSQL database migration. I discussed key steps like schema and data migration, data validation and important considerations such as technology switches, tooling, edge cases and the idea that you might not need to migrate all your data.

Now, let’s analyze how teams actually migrated their data — what challenges they faced, trade-offs, how they proceeded and lessons learned. These are all real-world examples with names and identifying details obfuscated.

Streaming Bulk Load (DynamoDB to ScyllaDB)

First example: A large media streaming company that decided to switch from DynamoDB to ScyllaDB to reduce costs.

One interesting aspect of this use case is that the team had an ingestion process that overwrote their entire data set daily. As a result, there was no requirement to forklift their data from one database to another. They could just configure their ingestion job to write to ScyllaDB in addition to DynamoDB.

As soon as the job kicked in, data was stored in both databases. Since DynamoDB and ScyllaDB data models are so similar, that greatly simplified the process. It’s more complex when switching from a document store or a relational database to wide-column NoSQL.

As I mentioned in the previous article, a migration from one technology to another almost always requires making some changes. Even with similar databases, features and inner workings vary. Some of this team’s migration concerns were related to the way ScyllaDB handled out-of-order writes, how they would implement record versioning and the efficiency of data compression. These were all valid and interesting concerns.

The main lesson from this migration is the need to understand the differences between your source database and target databases. Even databases that are quite similar in many respects, such as ScyllaDB and DynamoDB, do have differences that you need to recognize and navigate. As you explore these differences, you may eventually stumble upon room for improvement, which is exactly what happened here.

The use case in question was very susceptible to out-of-order writes. Before we explain how they addressed it, let’s cover what an out-of-order write involves.

Understanding Out-of-Order Writes

Out-of-order writes occur when newer updates arrive before older ones.

For example, assume you’re running a dual-write setup, writing to both your source and target databases at the same time. Then you plug in a migration tool (such as the ScyllaDB Migrator) to start reading data from the source database and writing it to the destination one. The Spark job reads some data from the source database, then the client writes an update to that same data. The client writes the data to the target database first and the Spark job writes it after. The Spark job might overwrite the fresher data. That’s an out-of-order write.

Martin Fowler describes it this way: “An out-of-order event is one that’s received late, sufficiently late that you’ve already processed events that should have been processed after the out-of-order event was received.”

With both Cassandra and ScyllaDB, you can handle these out-of-order writes by using the CQL (Cassandra Query Language) protocol to explicitly set timestamps on writes. In our example, the client update would include a later timestamp than the Spark write, so it would “win” — no matter which arrives last.

This capability doesn’t exist in DynamoDB.

How the Team Handled Out-of-Order Writes in DynamoDB

The team was handling out-of-order writes using DynamoDB’s Condition Expressions, which are very similar to lightweight transactions in Cassandra. However, Condition Expressions in DynamoDB are much more expensive (with respect to performance as well as cost) than regular non-conditional expressions.

How did this team try to circumvent the out-of-order write using ScyllaDB? Initially, they implemented a read-before-write prior to every write. This effectively caused their number of reads to spike.

After we met with them and analyzed their situation, we improved their application and database performance considerably by simply manipulating the timestamp of their writes. That’s the same approach that another customer of ours, Zillow, uses to handle out-of-order events.

Engagement Platform: TTL’d Data (ScyllaDB Self-Managed to ScyllaDB Cloud)

Next, let’s look at a migration across different flavors of the same database: a ScyllaDB to ScyllaDB migration. An engagement platform company decided to migrate from a self-managed on-premises ScyllaDB deployment to the ScyllaDB Cloud managed solution, so we helped them move data over.

No data modeling changes were needed, greatly simplifying the process. Though we initially suggested carrying out an online migration, they chose to take the offline route instead.

Why an Offline Migration?

An offline migration has some clear drawbacks: There’s a data loss window equal to the time the migration takes and the process is rather manual. You have to snapshot each node, copy the snapshots somewhere and then load them into the target system. And if you choose not to dual-write, switching clients is a one-way move; going back would mean losing data.

We discussed those risks upfront, but the team decided that these risks wouldn’t outweigh the benefits and simplicity of doing it offline. (They expected most of their data to be expired with TTL (Time to Live) eventually).

Before the production migration, we tested each step to better understand the potential data loss window.

In most cases, it is also possible to completely shift from data loss to a temporary inconsistency when carrying out an offline migration. After you switch your writers, you simply repeat the migration steps again from the source database (now a read-only system), therefore restoring any data that wasn’t captured as part of the initial snapshot.

A Typical TTL-Based Migration Flow

This team used TTL data to control their data expiration, so let’s discuss how a migration with TTL data typically works.

First, you configure the application clients to do dual-writing but keep the client reading only from the existing source of truth. Eventually, the TTL on that source of truth expires. At this point, you can switch the reads to the new target database and all data should be in sync.

How the Migration Actually Played Out

In this case, the client was only reading and writing against a single existing source of truth. With the application still running, the team took an online snapshot of their data across all nodes. The resulting snapshots were transferred to the target cluster and we loaded the data using Load and Stream (a ScyllaDB extension that builds on the Cassandra nodetool refresh command).

Rather than simply loading the data for the node and discarding the tokens, which the node is not a replica for, Load and Stream actually streams the data to other cluster members. This greatly simplifies the overall migration process. Instead of just loading the data and dropping the tokens that aren’t needed, Load and Stream actually streams the data to other nodes in the cluster.

After the team’s Load and Stream completed, the client simply switched reads and writes over to the new source of truth.

Messaging App: Shadow Cluster (Cassandra to ScyllaDB)

Next, let’s explore how a messaging app company approached the challenge of migrating more than a trillion rows from Cassandra to ScyllaDB.

Since Cassandra and ScyllaDB are API compatible, such migrations shouldn’t require any schema or application changes. However, given the criticality of their data and consistency requirements, an online migration approach was the only feasible option. They needed zero user impact and had zero tolerance for data loss.

Using a Shadow Cluster for Online Migration

The team opted to create a “shadow cluster.” A shadow cluster is a mirror of a production cluster that has the same data (mostly) and receives the same reads and writes. They created it from the disk snapshots from nodes in the corresponding production cluster. Production traffic (both reads and writes) was mirrored to the shadow cluster via a data service that they created for this specific purpose.

With a shadow cluster, they could assess the performance impact of the new platform before they actually switched. It also allowed them to thoroughly test other aspects of the migration, such as longer-term stability and reliability.

The drawbacks? It’s fairly expensive, since it typically doubles your infrastructure costs while you’re running the shadow cluster. Having a shadow cluster also adds complexity to things like observability, instrumentation, potential code changes and so on.

Negotiating Throughput and Latency Trade-offs During Migration

One notable lesson learned from this migration: how important it is to ensure the source system stability during the actual data migration. Most teams just want to migrate their data as fast as possible. However, migrating as fast as possible could affect latencies, and that could be a problem when low latencies are critical to the end users’ satisfaction.

In this team’s case, the solution was to migrate the data as fast as possible, but only up to the point where it started to affect latencies on the source system.

And how many operations per second should you run to migrate? At which level of concurrency? There’s no easy answer here. Really, you have to test.

Wrapping Up

The “best” NoSQL migration approach? As the breadth and diversity of these examples show, the answer is quite simple: it depends. A daily batch ingestion let one team skip the usual migration steps entirely. Another had to navigate TTLs and snapshot timing. And yet another team was really focused on making sure migration didn’t compromise their strict latency requirements. What worked for one team wouldn’t have worked for the next — and your specific requirements will shape your own migration path as well.

I hope these examples provided an interesting peek into the types of trade-offs and technical considerations you’ll face in your own migration. If you’re curious to learn more, I encourage you to browse the library of ScyllaDB user migration stories. For example:

The post Lessons Learned From Real-World NoSQL Database Migrations appeared first on The New Stack.

Read the whole story
alvinashcraft
44 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Adding Hardcover.app Data to Eleventy

1 Share

It's been far too long since I shared an Eleventy tip, and to be fair what I'm showing today can be used anywhere, but hopefully this will be useful to someone else out there. I enjoy tracking my media consumption, specifically movies and books. For movies I've been real happy with Letterboxd (you can see my profile if you wish). For books, I used Goodreads for a very long time, but have wanted to migrate off the platform and switch to something else. There's alternatives, but none really worked well for me. Earlier this week, an old friend of mine (hi Jason!) suggested Hardcover. This is a Goodreads competitor built, in their own words, out of spite, and I can totally get behind that. I signed up and imported my Goodreads data in about five minutes and while I haven't dug deep into the site at all, it seems totally fine to me so I'll be sticking there. You can find my profile here: https://hardcover.app/@raymondcamden

Ok, you aren't here (I assume) to peruse my books and see how few books I consume (teenage Ray would be embarrassed by the number). The biggest reason I switched to Hardcover was because of their API, which I wanted to use to display it on my Now page. Again, I don't honestly think anyone cares what I'm reading/listening to/watching, but I think it's cool and that's all that matters on my little piece of the Internet.

Their API docs make it incredibly easy to get started, including the ability to quickly run your own requests for testing. Their API is GraphQL based, which I'm a bit rusty with, but I had no trouble getting started. My goal was to simply get my list of books I'm currently reading. To do this, I needed:

  • My user id
  • The status value for a book that is currently being read.

For the first one, I used their link to a GraphQL client and ran this query:

query Test {
    me {
      username
      id
    }
  }

I didn't actually need my username, but it was already there. Anyway, this gave me my user id, 65213.

Next, I needed to know which books were in my "Currently Reading" status and luckily, they literally had a doc page for that, "Getting Books with a Status", that used that particular value. Here's their query:

{
  user_books(
      where: {user_id: {_eq: ##USER_ID##}, status_id: {_eq: 2}}
  ) {
      book {
          title
          image {
              url
          }
          contributions {
              author {
                  name
              }
          }
      }
  }
}

Simple, right? There is one minor nit to keep in mind - their dashboard makes it easy to get your key, but it expires in one year and you can't programatically renew it. My solution? Adding a reminder to my calendar. Ok, now to how I actually used it.

Providing the Data to Eleventy

Here's how I added this to Eleventy, and again, you should be able to port this out anywhere else as well. I added a new file to my _data folder, hardcover_books.js. Per the docs for global data files in Eleventy, whatever my code returns there can be used in my templates as hardcover_books. Here's my implementation:

const HARDCOVER_BOOKS = process.env.HARDCOVER_BOOKS;

export default async function() {

    if(!HARDCOVER_BOOKS) return [];
    let req;

    let body = `
    {
    user_books(
        where: {user_id: {_eq: 65213}, status_id: {_eq: 2}}
    ) {
        book {
            title
            image {
                url
            }
            contributions {
                author {
                    name
                }
            }
        }
    }
    }
    `.trim();

    try {
        req = await fetch('https://api.hardcover.app/v1/graphql', {
            method:'POST', 
            headers: {
                'authorization':HARDCOVER_BOOKS,
                'Content-Type':'application/json'
            },
            body:JSON.stringify({query:body})
        });
    } catch (e) {
        console.log('Hardcover API error', e);
        return [];
    }

    let data = (await req.json()).data.user_books.map(ob => ob.book);
    /* normalize authors */
    data = data.map(b => {
        b.authors = b.contributions.reduce((list,c) => {
            if(c.author) list.push(c.author.name);
            return list;
        },[]);
        return b;
    });

    return data;

    
};

Most of the code is me just calling their API and passing the GraphQL query, nothing special. However, I did want to shape the data a bit before returning it so I simplify it to an array, and then take the complex data of authors and simplify it to a simpler array of strings. Here's an example of how this looks (reduced to two books for length):

[
  {
    title: 'Frankenstein',
    image: {
      url: 'https://assets.hardcover.app/external_data/46789420/6823e1155b2785ae31ac59ccb752c4f33b599b35.jpeg'
    },
    contributions: [
      { author: { name: 'Mary Shelley' } },
      { author: { name: 'Paul Cantor' } }
    ],
    authors: [ 'Mary Shelley', 'Paul Cantor' ]
  },
  {
    title: 'The Business Value of Developer Relations',
    image: {
      url: 'https://assets.hardcover.app/edition/30438817/content.jpeg'
    },
    contributions: [ { author: { name: 'Mary Thengvall' } } ],
    authors: [ 'Mary Thengvall' ]
  },
]

The last bit was adding it to my Now page. I used a simple grid of image cover + titles:


<div class="films">
{% for book in hardcover_books  %}
  <div class="film">
  {% if book.image != null %}
  <img src="https://res.cloudinary.com/raymondcamden/image/fetch/c_fit,w_216/{{book.image.url}}" alt="Cover of {{ book.title }}">
  {% else  %}
  <img src="https://res.cloudinary.com/raymondcamden/image/fetch/c_fit,w_216/https://static.raymondcamden.com/images/no_cover_available.jpg" alt="No Cover Available">
  {% endif %}
  "{{ book.title  }}" by {{ book.authors | join: ', ' }}
  </div>
{% endfor %}
</div>

Pardon the class names there - as I already had CSS for my films, I just re-used them as I was being lazy. Also note that sometimes a book will not have an image cover. On the web site, they use a few different images to handle this, but the API doesn't return that. I generated my own and put it up in my S3 bucket. If you don't feel like clicking over to my Now page, here's how it looks:

screenshot from my list of books

If you would like to see this code in context with the rest of the site, you can find my blog's repo here: https://github.com/cfjedimaster/raymondcamden2023. Let me know if you end up using their API!

Read the whole story
alvinashcraft
45 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Model Context Protocol Nov 2025 Specification Update: CIMD, XAA, and Security

1 Share
The November 2025 Model Context Protocol (MCP) update introduces Client ID Metadata Documents (CIMD) and Cross App Access (XAA). Learn how these changes improve AI agent security.

Read the whole story
alvinashcraft
45 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Skylight debuts Calendar 2 to keep your family organized

1 Share
Skylight, known for its digital picture frame, has a new digital product that puts software and AI at the center.
Read the whole story
alvinashcraft
46 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

CES 2026 tech you can already buy

1 Share
Belkin’s Charging Case Pro makes some thoughtful tweaks to its previous battery-equipped model, and it’s launching in mid-January. | Image: Belkin

News coming out of CES 2026 might be slowing down, but as of Wednesday we still have writers on the ground, zipping around from hotel suite to Las Vegas convention center to try everything that matters. We've published well over 100 articles, and there's plenty of more content to come, including reviews of stuff we got to see at the show.

As expected, most of the product announcements we've covered don't launch for at least a few months, but some of the products are already available, or will be soon. So, in case you want to get your hands on the freshest tech money can buy, we've compiled where you can buy products that we've written about …

Read the full story at The Verge.

Read the whole story
alvinashcraft
46 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Dell admits consumers don’t care about AI PCs

1 Share

Dell has revealed that consumers aren't buying PCs for AI features right now. In an interview with PC Gamer ahead of CES, Dell has made it clear its 2026 products aren't all about being AI-first, and it's moving beyond being "all about" AI PCs.

"We're very focused on delivering upon the AI capabilities of a device-in fact everything that we're announcing has an NPU in it - but what we've learned over the course of this year, especially from a consumer perspective, is they're not buying based on AI," admits Kevin Terwilliger, Dell's head of product, in the PC Gamer interview. "In fact I think AI probably confuses them more than it helps them …

Read the full story at The Verge.

Read the whole story
alvinashcraft
46 minutes ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories