Read more of this story at Slashdot.
Read more of this story at Slashdot.
🎙️ New to streaming or looking to level up? Check out StreamYard and get $10 discount! 😍 https://streamyard.com/pal/d/5638994479284224
By Oleksii Tkachuk, Kartik Sathyanarayanan, Rajiv Shringi
Netflix has a diverse range of graph use cases, each serving specific business needs with unique functionality and performance requirements. These use cases fall into two broad categories:
Netflix’s Graph Abstraction was designed specifically for this second category of use cases. As of this writing, the abstraction is handling close to 10 million operations per second across 650 TB of graph datasets with low latency and cost efficiency.
This post is the first in a multi-part series that explores the Graph Abstraction architecture in depth. We’ll cover how the abstraction indexes data for real-time and historical views, manages strongly typed graphs, performs efficient traversals, and integrates with the Netflix Big Data ecosystem.
From a business standpoint, the primary driver for developing the Graph Abstraction was internal demand for supporting several key use cases:
Let’s examine the overall architecture of the Graph Abstraction and how it integrates with the Netflix Online Datastore ecosystem.
Instead of building the persistence and caching layers from scratch, we chose to build taller on top of existing Netflix data abstractions.

The Key-Value (KV) Abstraction stores the latest view of nodes and edges, serving as the real-time index for all queries. Optionally, users can plug-in the TimeSeries (TS) Abstraction if they are interested in a historical view of how the graph evolves over time. Additionally, we use EVCache to achieve low-millisecond latencies and are actively experimenting with more specialized caching layers to further improve performance. Finally, the Graph Abstraction integrates with the Data Gateway Control Plane to manage graph schemas and automate the provisioning, deletion, and configuration of datasets in both KV and TS.
The Abstraction uses the Property Graph model to store its data. The graph consists of nodes and edges of various types, each with associated properties. These properties are strongly typed to enable efficient filtering and ensure consistent data exports. For semantic reasons, edges can be either unidirectional or bidirectional.

The Abstraction separates data into isolated units called “namespaces.” Each namespace is associated with a physical storage layer, as configured in the Data Gateway Control Plane, and can be deployed on either dedicated or shared hardware. The optimal, most cost-effective hardware configuration is determined by our provisioning automation, based on user-provided requirements such as throughput, latency, dataset size, and workload criticality. For more details on this topic, see this talk given by our stunning colleague Joey Lynch at AWS re:Invent.
Each namespace is further associated with an explicit graph schema configured in the Control Plane. The graph schema defines node and edge types, allowed properties, permitted relationships, and directions.

The Graph schema is implemented as a collection of edge mappings that describe the nature of the relationship between given node types.
{
"edgeConfig": {
"edgeMappings": [
{
"edgeMappingKey": {
"fromNodeType": "account",
"edgeType": "owns",
"toNodeType": "profile"
},
"directionType": "UNIDIRECTIONAL"
},
{
"edgeMappingKey": {
"fromNodeType": "profile",
"edgeType": "linked_to",
"toNodeType": "device"
},
"directionType": "BIDIRECTIONAL"
}
]
}
}Edge mappings are further extended with specification of property schema that consists of allowed property names and their type specification:
{
"edgeMappingKey":{
"fromNodeType":"profile",
"edgeType":"linked_to",
"toNodeType":"device"
},
"propertySchema":{
"propertyMappings":[
{ "propertyKey":"registration_time", "propertyValueType":"TIMESTAMP" },
{ "propertyKey":"status", "propertyValueType":"STRING" }
]
}
}The Abstraction servers load this schema on startup and build an in-memory metadata graph of possible relationships, enabling several key optimizations:
Further, the Abstraction servers periodically poll the schema from the Data Gateway Control Plane in order to keep it updated with user changes. Looking ahead, we plan to leverage the graph schema for additional improvements, such as:
Next, let’s look at how this data is organized in a real-time index within the KV Abstraction.
Before we discuss how the data is organized into graph indexes, let’s discuss how KV organizes data within namespaces and provides idempotency guarantees:

We use the KV as the underlying storage for all real-time graph indices on nodes and edges. For more on Netflix’s Key-Value Abstraction, see this excellent post published by our KeyValue team.
The two-tiered partitioning strategy works well for node storage. Each node type is isolated within its own KV namespace, which stores all the properties for nodes of that type.

This storage format enables several efficient access patterns for nodes:
Edges utilize two distinct types of indexes: one exclusively for the edge connections (links), and one for edge properties.
The Edge links are arranged as an adjacency list mapping source nodes to their connected neighbors.

The Edge Property index stores information about properties of every edge.

Separating edge links from their properties brings several benefits, but also introduces a key trade-off:
Benefits:
Trade-off:
Additionally, edge indexes are separated into forward and reverse indexes to support traversals in either direction. The illustration below shows an example of the reverse index counterpart for the links namespace shown above.

To ensure consistent record identifiers when updating edge properties in either direction, the Abstraction lexicographically sorts and concatenates the source and destination node IDs to create a direction-agnostic identifier for property storage. This ensures that properties can be accessed or mutated in a single database call regardless of the direction specified in the request.

This storage format enables several efficient access patterns:
Next, let’s explore the caching strategies used by the Abstraction.
Although the Graph Abstraction already provides efficient reads and writes to durable storage, caching remains critical for the stability and performance of any graph datastore for two key reasons:
To address these challenges, the Graph Abstraction employs two distinct caching strategies.
An edge link contains no additional information beyond the link itself and its last-write timestamp. To reduce write amplification on durable storage, we cache edge links for short durations, helping to avoid writing a link that already exists. This mechanism is balanced with configurable TTL windows, cache invalidation on deletes, and lease acquisitions with exponential backoff. These strategies provide the necessary consistency guarantees while still allowing the last-write timestamp to be refreshed according to the predefined staleness.

To reduce read amplification on the durable store, the Graph Abstraction leverages KV’s integration with EVCache. Multiple KV namespaces can share the same caching clusters for cost efficiency. The Abstraction first fetches data from durable storage, while subsequent reads are served from the cache. Caching is applied at both the record and item levels, benefiting all graph objects.
Graph Abstraction employs two invalidation strategies, selected based on write throughput and consistency requirements:
We are also developing a write-through caching strategy designed to store most of the data required by the Abstraction during traversals. This caching mechanism can organize indexes by different sort orders (e.g., sorting data by last-write timestamp), at the cost of increased memory consumption. Stay tuned for more details on this approach.
Next, let’s examine the consistency guarantees in Graph Abstraction and how they are enforced for both reads and writes.
Enforcing data consistency in Graph Abstraction poses several challenges. The connected nature of the data, low-latency API requirements, and the need to handle intermittent failures have led to design choices that enforce strict eventual consistency across multiple regions.
Each write in the Abstraction persists data for both inward and outward indices in parallel to support high throughput. Further, each write happens on multiple KV namespaces. To prevent inconsistencies or lasting entropy from failures in any operation, the Abstraction uses a robust retry mechanism using Kafka:

Deleting nodes in a highly connected graph is more complex than simply removing a KV record as each node may have thousands of connected edges that must be handled to maintain graph integrity. Further, synchronously deleting all such connections would introduce unacceptable latency for the Abstraction callers.
The Abstraction employs an asynchronous deletion strategy to manage this issue. The consequence of this approach, however, is that the observed mutated state is only eventually consistent. Further, to ensure correctness of asynchronous deletes during concurrent updates, the Last-Write-Wins (LWW) conflict resolution mechanism is essential.

The consistency guarantees of Graph Abstraction are shaped by its multi-region availability. As illustrated in the diagram below, both the caching layer and durable storage replicate data asynchronously across regions, resulting in an eventually consistent system.

Now that we’ve covered storing the real-time graph index, let’s see how it enables graph traversals.
The Abstraction provides a custom gRPC traversal API, inspired by Gremlin, which enables exploration of the distributed graph by letting users chain traversals, apply filter criteria, sort results, limit results, and more.
Let’s explore a hypothetical scenario where the Abstraction is used to recommend shows to users on a shared device, by considering the duration of the most recent viewing session for each show across all profiles and accounts associated with that device:
TraversalRequest.newBuilder()
.setNamespace("<graph-namespace>")
.setTraversalQuery(
TraversalQuery.newBuilder()
// Given id of the 'device' node type.
.setStartNode(node("device", "my-device-id"))
.setTraversal(
Traversal.newBuilder()
// fetch the first 5 connections
.setEdgeLimit(5)
.setDirectionTraversal(
DirectionTraversal.newBuilder()
// traverse in the IN direction
.setDirection(IN)
// minimize data exchange: only interested in certain properties
.addNodePropertiesSelections(propSelection("account", "created_at"))
.addNodePropertiesSelections(propSelection("profile", "last_active"))
.setDirectionFilter(
DirectionFilter.newBuilder()
// only interested in certain connected types
.setTypeMatchingStrategy(EXCLUDE_NON_TARGETED)
.addAllNodeFilters(typeFilters("account", "profile"))))
// chain traversals to the intermediate result
.addNextTraversals(
Traversal.newBuilder()
.setOrder(LATEST)
// limit to 200 connections for the 2nd hop
.setEdgeLimit(200)
.setDirectionTraversal(
DirectionTraversal.newBuilder()
// now traverse in the OUT direction
.setDirection(OUT)
.addEdgePropertiesSelections(propSelection("watched", "view_time"))
.addEdgePropertiesSelections(propSelection("has_plan", "active"))
.setDirectionFilter(
DirectionFilter.newBuilder()
.setTypeMatchingStrategy(EXCLUDE_NON_TARGETED)
.addAllNodeFilters(typeFilters("title", "plan")))))))
.build();
And let’s visualize the intended results set produced by the request above:

We’ll explore the design and implementation of traversal planning and execution, along with different traversal types, in the Part II of this blog series.
Now let’s look at the performance metrics of Graph Abstraction based on current production use cases.
Across all applications at Netflix, Graph Abstraction ensures high availability while processing up to 10 million operations per second across all writes, individual edge / node reads and traversals at peak hours:

Edge and node persistence achieve single-digit millisecond latencies (p99 shown in red, p90 shown in orange, and p50 shown in green):

Traversal performance depends on the number of hops, the edge fanout at each stage, and associated filters and sort orders. We parallelize work as much as possible to reduce latencies. Typically 1-hop traversals are executed with single-digit millisecond latency:

We also support a Count API that performs counting traversals at a very high rate with similar latencies, which we will cover in Part II of this series:

Currently, the RDG is powered by 2-hop traversals with a higher degree of fan-out. While these operations can reach upwards of 100 ms in latency, the 90th percentile (p90) latency remains under 50ms.

We track the average and max edge fanout at different depths to give us insights into the traversal performance for different graph datasets.


Asynchronous operations such as node deletions can be slightly latent, but typically perform with sub-second latency:

At the moment, we are storing close to 650 TB of data globally across all our graph datasets.

As Netflix scales further into new verticals such as live content, games, and ads, Graph Abstraction will remain crucial for uncovering and leveraging rich connections — while continuing to support a high throughput and availability at low latencies.
Stay tuned for Part II of this blog series, where we’ll explore the implementation of graph traversals, counting and constraint mechanisms.
In Part III, we’ll take a closer look at the temporal index implementation and its integration with the Time Series Abstraction.
Special thanks to our stunning colleagues who contributed to Graph Abstraction’s success: Kaidan Fullerton, Joey Lynch, Sudhesh Suresh, Vinay Chella, Sumanth Pasupuleti, Vidhya Arvind, Raj Ummadisetty, Jordan West, Chris Lohfink, Joe Lee, Jingxi Huang, Jessica Walton, Prudhviraj Karumanchi, Akashdeep Goel, Sriram Rangarajan, Chris Van Vlack, Christopher Gray, Luis Medina, Ajit Koti, Mohidul Abedin.
High-Throughput Graph Abstraction at Netflix: Part I was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.
In this blog post I will describe a new feature in SQL Project Power Tools: Publish programmability objects on save. This feature lets you instantly deploy stored procedures, views, functions, and triggers to your development database the moment you save the corresponding .sql file in Visual Studio — no full dacpac deployment required.
SQL Database Projects are a great way to manage your database schema in source control. The normal development workflow is:
.sql files in Visual Studio.dacpac file.dacpac to your development database (via dacpac publish or schema compare)This workflow works well for schema objects like tables and indexes, where dacpac deployment is essential — it handles complex operations like renaming columns, adding constraints, and managing data migrations safely. You really do need the full deployment pipeline for those.
However, for programmability objects — stored procedures, views, functions, and triggers — the situation is different. These objects can be replaced atomically using SQL Server's CREATE OR ALTER statement, which has been supported since SQL Server 2016. There is no need to build and deploy a full .dacpac just to update a stored procedure body.
Unfortunately, SQL Database Projects (and the underlying DacFx tooling) do not allow you to selectively deploy a single file. Every change still requires building the full .dacpac and running a deployment, even if you only changed a single stored procedure. This has been raised as a feature request with the DacFx team, but in the meantime SQL Project Power Tools provides a practical workaround.
The new Publish programmability objects on save feature detects when you save a .sql file that contains a supported programmability object and immediately executes a CREATE OR ALTER version of that script against your development database. The status bar shows a confirmation when the publish completes, or an error message if something goes wrong.
Important distinction: This feature only works for programmability objects — stored procedures, views, functions, and triggers. Schema objects such as tables, indexes, and constraints are deliberately excluded. Changing a table definition has data implications that require the full DACpac deployment pipeline to handle safely.
When a .sql file is saved in Visual Studio:
.env file in the project directory containing an AutoPublish connection string.CREATE PROCEDURE, CREATE VIEW, CREATE FUNCTION, or CREATE TRIGGER (and SET options).CREATE statement that is not already CREATE OR ALTER is automatically rewritten to use CREATE OR ALTER.Publish completed: <filename> on success, or Publish failed: <filename> on failure.Files that contain table definitions, ALTER TABLE, INSERT, DROP, or any other statements that are not programmability object definitions are silently skipped — the feature does nothing for those files.
The feature is off by default. To turn it on, open Tools > Options > SQL Server Tools > SQL Project Power Tools in Visual Studio and check the Enable auto publish on save option.
Create a file named .env in the root of your SQL Database Project directory (the same folder as your .sqlproj or .csproj file). Add a line in the following format:
AutoPublish=Server=localhost;Database=MyDevDatabase;Integrated Security=true;TrustServerCertificate=true
You can use any valid SQL Server connection string. The key must be AutoPublish (case-insensitive).
Security tip: The
.envfile contains a connection string, so make sure to add it to.gitignoreto avoid committing credentials to source control.
Open a .sql file in your project that contains a CREATE PROCEDURE, CREATE VIEW, CREATE FUNCTION, or CREATE TRIGGER statement, make a change, and save. The extension will automatically rewrite the statement to CREATE OR ALTER and execute it against the development database. Watch the status bar at the bottom of Visual Studio for the completion message.
Suppose you have a stored procedure in your project at dbo/Stored Procedures/GetCustomerById.sql:
CREATE PROCEDURE [dbo].[GetCustomerById]
@Id INT
AS
BEGIN
SET NOCOUNT ON;
SELECT Id, Name, Email
FROM dbo.Customers
WHERE Id = @Id;
END
When you save this file with the feature enabled and a valid .env file present, the extension will execute the following against your development database:
CREATE OR ALTER PROCEDURE [dbo].[GetCustomerById]
@Id INT
AS
BEGIN
SET NOCOUNT ON;
SELECT Id, Name, Email
FROM dbo.Customers
WHERE Id = @Id;
END
The stored procedure is updated immediately, without a full build and deploy cycle.
SQL Project Power Tools includes many other features to improve the SQL Database Project developer experience:
.dacpac file directly from Solution Explorer.INSERT statements for table data to use as seed data in post-deployment scripts, based on generate-sql-merge.A getting started guide covers all of these features in more detail.
Install SQL Project Power Tools from the Visual Studio Marketplace, or search for it directly in Visual Studio via Extensions > Manage Extensions.
Install SQL Project Power Tools from the new SSMS Gallery.
The source code is open source and available on GitHub. If you find the extension useful, a ★★★★★ rating on the Marketplace is always appreciated. Bug reports and feature requests are welcome on the GitHub issue tracker.
You've probably seen this tweet by @trq212 floating around on Twitter about letting agents write HTML instead of markdown...
// Detect dark theme var iframe = document.getElementById('tweet-2052811606032269638-823'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=2052811606032269638&theme=dark" }
Listed below are some of the reasons mentioned in the article:
I don't disagree with Tariq, but rather than switch to HTML, I think the answer is to make markdown supported everywhere. We've been using it for years and it's powering much of the modern web. However, if we look at how software and platforms have evolved, markdown support is very dependent on the platform to render it.
Why does markdown work for humans and machines? Well, it's pretty simple, humans write simple syntax that gets rendered into something rich, and unironically, that's often by converting it to HTML and a browser engine rendering it. For machines, it's lightweight to parse and easy to generate token by token without the verbosity of HTML.
We write headers, code blocks, pull quotes, bold text, and what typically happens is something is converting that markdown to HTML.
For example, I am literally typing this blog in markdown, and the only way I can share it to the masses is through a platform like dev.to that converts it to HTML and hosts it for me.
So if the feature is available in some places, why is it not everywhere? I believe that software vendors haven't prioritized adding markdown rendering support, and they should.
We should be able to send a standalone index.md file and view it in all web browsers, chat applications, and emails. Some apps already do this like Discord and Slack (Slack's markdown support disappoints me). We can do this with HTML today, all modern browsers will render something nice, but if you load up markdown in your browser today you will become sad.
We have to reach for things like Obsidian or Kiro to render the markdown, which I feel limits the portability of it all.
Curious what you think and where you see yourself heading in terms of AI agent output. Let me know in the comments if you're switching to HTML or sticking with markdown.
As always, happy coding 🫡!
Follow AWS for more articles like this