Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
154712 stories
·
33 followers

High-Throughput Graph Abstraction at Netflix: Part I

1 Share

By Oleksii Tkachuk, Kartik Sathyanarayanan, Rajiv Shringi

Introduction

Netflix has a diverse range of graph use cases, each serving specific business needs with unique functionality and performance requirements. These use cases fall into two broad categories:

  1. OLAP: These use cases typically involve open-ended and algorithmic exploration of large graph datasets. They often utilize industry-standard models and languages such as RDF with SPARQL, Property Graphs with Gremlin or openCypher, and even SQL. The primary focus in these situations is in-depth analysis, rather than achieving high throughput and low latency.
  2. OLTP: These use cases require extremely high throughput — up to millions of operations per second — while delivering traversal results within milliseconds. Achieving such a level of performance often requires making trade-offs, which can include accepting eventual consistency or restricting query complexity. For example, the service can demand a specified starting point for traversals and enforce a maximum traversal depth. Such use cases are often directly tied to streaming or user experiences and demand high global availability.

Netflix’s Graph Abstraction was designed specifically for this second category of use cases. As of this writing, the abstraction is handling close to 10 million operations per second across 650 TB of graph datasets with low latency and cost efficiency.

This post is the first in a multi-part series that explores the Graph Abstraction architecture in depth. We’ll cover how the abstraction indexes data for real-time and historical views, manages strongly typed graphs, performs efficient traversals, and integrates with the Netflix Big Data ecosystem.

Usage at Netflix

From a business standpoint, the primary driver for developing the Graph Abstraction was internal demand for supporting several key use cases:

  • Real-Time Distributed Graph (RDG): A graph capturing dynamic relationships across entities and interactions throughout the Netflix ecosystem. You can learn more about the initial RDG implementation in this insightful blog post. This functionality has since been integrated into the Graph Abstraction.
  • Social Graph: A graph of social connections within Netflix Gaming, designed to boost user engagement.
  • Service Topology: A graph of all internal Netflix services, used for real-time and historical analysis to improve root cause analysis during incidents.

Let’s examine the overall architecture of the Graph Abstraction and how it integrates with the Netflix Online Datastore ecosystem.

Architecture

Instead of building the persistence and caching layers from scratch, we chose to build taller on top of existing Netflix data abstractions.

The Key-Value (KV) Abstraction stores the latest view of nodes and edges, serving as the real-time index for all queries. Optionally, users can plug-in the TimeSeries (TS) Abstraction if they are interested in a historical view of how the graph evolves over time. Additionally, we use EVCache to achieve low-millisecond latencies and are actively experimenting with more specialized caching layers to further improve performance. Finally, the Graph Abstraction integrates with the Data Gateway Control Plane to manage graph schemas and automate the provisioning, deletion, and configuration of datasets in both KV and TS.

Property Graph Model

The Abstraction uses the Property Graph model to store its data. The graph consists of nodes and edges of various types, each with associated properties. These properties are strongly typed to enable efficient filtering and ensure consistent data exports. For semantic reasons, edges can be either unidirectional or bidirectional.

Namespaces

The Abstraction separates data into isolated units called “namespaces.” Each namespace is associated with a physical storage layer, as configured in the Data Gateway Control Plane, and can be deployed on either dedicated or shared hardware. The optimal, most cost-effective hardware configuration is determined by our provisioning automation, based on user-provided requirements such as throughput, latency, dataset size, and workload criticality. For more details on this topic, see this talk given by our stunning colleague Joey Lynch at AWS re:Invent.

Graph Schema

Each namespace is further associated with an explicit graph schema configured in the Control Plane. The graph schema defines node and edge types, allowed properties, permitted relationships, and directions.

The Graph schema is implemented as a collection of edge mappings that describe the nature of the relationship between given node types.

{
"edgeConfig": {
"edgeMappings": [
{
"edgeMappingKey": {
"fromNodeType": "account",
"edgeType": "owns",
"toNodeType": "profile"
},
"directionType": "UNIDIRECTIONAL"
},
{
"edgeMappingKey": {
"fromNodeType": "profile",
"edgeType": "linked_to",
"toNodeType": "device"
},
"directionType": "BIDIRECTIONAL"
}
]
}
}

Edge mappings are further extended with specification of property schema that consists of allowed property names and their type specification:

{
"edgeMappingKey":{
"fromNodeType":"profile",
"edgeType":"linked_to",
"toNodeType":"device"
},
"propertySchema":{
"propertyMappings":[
{ "propertyKey":"registration_time", "propertyValueType":"TIMESTAMP" },
{ "propertyKey":"status", "propertyValueType":"STRING" }
]
}
}

The Abstraction servers load this schema on startup and build an in-memory metadata graph of possible relationships, enabling several key optimizations:

  • Data Quality: The Abstraction rejects non-conforming nodes, edges, and properties during writes, ensuring high data quality and consistent exports.
  • Query Planning: The Abstraction uses the schema to quickly construct the possible traversal paths the service should take to answer a given user query.
  • Deduplication of Traversed Edges: For bidirectional traversals on edges between the same node type, the schema helps avoid redundant processing by deduplicating traversed paths.
  • Eliminating Traversal paths: For a given user query, the Abstraction removes traversal paths associated with impossible relationships, as well as those where filters or property types are incompatible.

Further, the Abstraction servers periodically poll the schema from the Data Gateway Control Plane in order to keep it updated with user changes. Looking ahead, we plan to leverage the graph schema for additional improvements, such as:

  • Minimizing Query Fanout: By using edge cardinality within edge mappings, we aim to select the most efficient traversal paths and minimize query fanout.
  • Improved Developer Experience: The schema will support generating a type-safe data access layer and enhance the Gremlin-like API with schema awareness.

Next, let’s look at how this data is organized in a real-time index within the KV Abstraction.

Real-Time Index: Key-Value Storage

Before we discuss how the data is organized into graph indexes, let’s discuss how KV organizes data within namespaces and provides idempotency guarantees:

  • Data partitioning: A namespace is associated with a table in the underlying storage layer. Within the table, data is partitioned into records by unique IDs, with each record holding multiple sorted items as key-value pairs. This structure effectively makes each namespace a map of sorted maps, providing flexibility for diverse access patterns.
  • Idempotency: Writes to a given ID and key are idempotent, enabling request hedging and safe retries. The idempotency token contains a timestamp, which KV uses to enforce Last-Write-Wins (LWW) semantics at the storage layer.

We use the KV as the underlying storage for all real-time graph indices on nodes and edges. For more on Netflix’s Key-Value Abstraction, see this excellent post published by our KeyValue team.

Node Storage

The two-tiered partitioning strategy works well for node storage. Each node type is isolated within its own KV namespace, which stores all the properties for nodes of that type.

This storage format enables several efficient access patterns for nodes:

  • Efficient reads: A given node and all its properties are fetched in a single partition lookup, achieving single-digit millisecond latency.
  • Property selection pushdown: Target property keys are pushed down to the KV layer, reducing the amount of data fetched and further decreasing latencies and network overhead.
  • Property filtering pushdown: Property keys and values can be efficiently filtered at the KV layer.
  • Efficient exports: This model supports highly parallelized node exports by node type.

Edge Storage

Links and Property Index

Edges utilize two distinct types of indexes: one exclusively for the edge connections (links), and one for edge properties.

The Edge links are arranged as an adjacency list mapping source nodes to their connected neighbors.

The Edge Property index stores information about properties of every edge.

Separating edge links from their properties brings several benefits, but also introduces a key trade-off:

Benefits:

  • Efficient property upserts: Allows individual properties to be upserted over time without needing to read the entire property set for an edge.
  • Wide row prevention: Decoupling edge links from their properties prevents large partitions in databases like Cassandra, enabling efficient storage and low-latency reads — even for edges with millions of connections.

Trade-off:

  • Non-atomic writes: Storing edges across multiple namespaces means that writes across these namespaces are not atomic. We’ll discuss how this is addressed in the Consistency Enforcement section.

Forward and Reverse Indexes

Additionally, edge indexes are separated into forward and reverse indexes to support traversals in either direction. The illustration below shows an example of the reverse index counterpart for the links namespace shown above.

To ensure consistent record identifiers when updating edge properties in either direction, the Abstraction lexicographically sorts and concatenates the source and destination node IDs to create a direction-agnostic identifier for property storage. This ensures that properties can be accessed or mutated in a single database call regardless of the direction specified in the request.

This storage format enables several efficient access patterns:

  • Point Reads: Given an edge id, all properties can be fetched in a single partition lookup on the properties index.
  • Range Reads: Given a source node, a range read on a partition in the links index can efficiently return all edges. Depending on the desired direction, the Abstraction can target the forward or reverse index.
  • Property Filtering: Properties are fetched only for the links that match the record or page limit criteria, minimizing the data exchanged over the network.
  • Sort Orders: By default, edge links are sorted lexicographically by their target node. To support fetching the latest connections, the Abstraction retrieves target edge links in memory, sorts them by their last-write time, and returns the results. In order to ensure optimal performance without exerting too much memory pressure, we aim to limit the number of edges per source node within the system.

Next, let’s explore the caching strategies used by the Abstraction.

Caching Strategies in Graph Abstraction

Although the Graph Abstraction already provides efficient reads and writes to durable storage, caching remains critical for the stability and performance of any graph datastore for two key reasons:

  • Write amplification: A single write on the fronting service can result in multiple writes to the backing durable storage due to the use of multiple indexes. Whenever possible, it’s best to avoid unnecessary writes — for example, by not writing an edge link that already exists.
  • Read amplification: A single traversal request on the fronting service may translate into thousands of fetch operations on the backend, especially for highly interconnected graphs.

To address these challenges, the Graph Abstraction employs two distinct caching strategies.

Write-aside Caching of Edge Links

An edge link contains no additional information beyond the link itself and its last-write timestamp. To reduce write amplification on durable storage, we cache edge links for short durations, helping to avoid writing a link that already exists. This mechanism is balanced with configurable TTL windows, cache invalidation on deletes, and lease acquisitions with exponential backoff. These strategies provide the necessary consistency guarantees while still allowing the last-write timestamp to be refreshed according to the predefined staleness.

Read-aside Caching of Properties

To reduce read amplification on the durable store, the Graph Abstraction leverages KV’s integration with EVCache. Multiple KV namespaces can share the same caching clusters for cost efficiency. The Abstraction first fetches data from durable storage, while subsequent reads are served from the cache. Caching is applied at both the record and item levels, benefiting all graph objects.

Graph Abstraction employs two invalidation strategies, selected based on write throughput and consistency requirements:

  • Invalidation on write: Both record and item caches are invalidated with every write, ensuring consistency across regions. This strategy is ideal for graphs that change infrequently and cannot tolerate data staleness, but comes with the tradeoff of pushing a higher throughput on the cache.
  • TTL-driven invalidation: Cache entries are invalidated only when their TTL expires. This approach works best for frequently modified objects that can tolerate some staleness.

Work In Progress: Write-Through Caching

We are also developing a write-through caching strategy designed to store most of the data required by the Abstraction during traversals. This caching mechanism can organize indexes by different sort orders (e.g., sorting data by last-write timestamp), at the cost of increased memory consumption. Stay tuned for more details on this approach.

Next, let’s examine the consistency guarantees in Graph Abstraction and how they are enforced for both reads and writes.

Consistency Enforcement

Enforcing data consistency in Graph Abstraction poses several challenges. The connected nature of the data, low-latency API requirements, and the need to handle intermittent failures have led to design choices that enforce strict eventual consistency across multiple regions.

Entropy Repair

Each write in the Abstraction persists data for both inward and outward indices in parallel to support high throughput. Further, each write happens on multiple KV namespaces. To prevent inconsistencies or lasting entropy from failures in any operation, the Abstraction uses a robust retry mechanism using Kafka:

Node Deletions

Deleting nodes in a highly connected graph is more complex than simply removing a KV record as each node may have thousands of connected edges that must be handled to maintain graph integrity. Further, synchronously deleting all such connections would introduce unacceptable latency for the Abstraction callers.

The Abstraction employs an asynchronous deletion strategy to manage this issue. The consequence of this approach, however, is that the observed mutated state is only eventually consistent. Further, to ensure correctness of asynchronous deletes during concurrent updates, the Last-Write-Wins (LWW) conflict resolution mechanism is essential.

Global Replication

The consistency guarantees of Graph Abstraction are shaped by its multi-region availability. As illustrated in the diagram below, both the caching layer and durable storage replicate data asynchronously across regions, resulting in an eventually consistent system.

Now that we’ve covered storing the real-time graph index, let’s see how it enables graph traversals.

Graph Traversals

The Abstraction provides a custom gRPC traversal API, inspired by Gremlin, which enables exploration of the distributed graph by letting users chain traversals, apply filter criteria, sort results, limit results, and more.

Let’s explore a hypothetical scenario where the Abstraction is used to recommend shows to users on a shared device, by considering the duration of the most recent viewing session for each show across all profiles and accounts associated with that device:

TraversalRequest.newBuilder()
.setNamespace("<graph-namespace>")
.setTraversalQuery(
TraversalQuery.newBuilder()
// Given id of the 'device' node type.
.setStartNode(node("device", "my-device-id"))
.setTraversal(
Traversal.newBuilder()
// fetch the first 5 connections
.setEdgeLimit(5)
.setDirectionTraversal(
DirectionTraversal.newBuilder()
// traverse in the IN direction
.setDirection(IN)
// minimize data exchange: only interested in certain properties
.addNodePropertiesSelections(propSelection("account", "created_at"))
.addNodePropertiesSelections(propSelection("profile", "last_active"))
.setDirectionFilter(
DirectionFilter.newBuilder()
// only interested in certain connected types
.setTypeMatchingStrategy(EXCLUDE_NON_TARGETED)
.addAllNodeFilters(typeFilters("account", "profile"))))
// chain traversals to the intermediate result
.addNextTraversals(
Traversal.newBuilder()
.setOrder(LATEST)
// limit to 200 connections for the 2nd hop
.setEdgeLimit(200)
.setDirectionTraversal(
DirectionTraversal.newBuilder()
// now traverse in the OUT direction
.setDirection(OUT)
.addEdgePropertiesSelections(propSelection("watched", "view_time"))
.addEdgePropertiesSelections(propSelection("has_plan", "active"))
.setDirectionFilter(
DirectionFilter.newBuilder()
.setTypeMatchingStrategy(EXCLUDE_NON_TARGETED)
.addAllNodeFilters(typeFilters("title", "plan")))))))
.build();

And let’s visualize the intended results set produced by the request above:

We’ll explore the design and implementation of traversal planning and execution, along with different traversal types, in the Part II of this blog series.

Now let’s look at the performance metrics of Graph Abstraction based on current production use cases.

Real World Performance

Across all applications at Netflix, Graph Abstraction ensures high availability while processing up to 10 million operations per second across all writes, individual edge / node reads and traversals at peak hours:

Edge and node persistence achieve single-digit millisecond latencies (p99 shown in red, p90 shown in orange, and p50 shown in green):

Traversal performance depends on the number of hops, the edge fanout at each stage, and associated filters and sort orders. We parallelize work as much as possible to reduce latencies. Typically 1-hop traversals are executed with single-digit millisecond latency:

1-hop traversal latencies

We also support a Count API that performs counting traversals at a very high rate with similar latencies, which we will cover in Part II of this series:

Currently, the RDG is powered by 2-hop traversals with a higher degree of fan-out. While these operations can reach upwards of 100 ms in latency, the 90th percentile (p90) latency remains under 50ms.

2-hop traversal latencies

We track the average and max edge fanout at different depths to give us insights into the traversal performance for different graph datasets.

Median edge fan-out
Max edge fan-out

Asynchronous operations such as node deletions can be slightly latent, but typically perform with sub-second latency:

At the moment, we are storing close to 650 TB of data globally across all our graph datasets.

Conclusion

As Netflix scales further into new verticals such as live content, games, and ads, Graph Abstraction will remain crucial for uncovering and leveraging rich connections — while continuing to support a high throughput and availability at low latencies.

Stay tuned for Part II of this blog series, where we’ll explore the implementation of graph traversals, counting and constraint mechanisms.

In Part III, we’ll take a closer look at the temporal index implementation and its integration with the Time Series Abstraction.

Acknowledgments

Special thanks to our stunning colleagues who contributed to Graph Abstraction’s success: Kaidan Fullerton, Joey Lynch, Sudhesh Suresh, Vinay Chella, Sumanth Pasupuleti, Vidhya Arvind, Raj Ummadisetty, Jordan West, Chris Lohfink, Joe Lee, Jingxi Huang, Jessica Walton, Prudhviraj Karumanchi, Akashdeep Goel, Sriram Rangarajan, Chris Van Vlack, Christopher Gray, Luis Medina, Ajit Koti, Mohidul Abedin.


High-Throughput Graph Abstraction at Netflix: Part I was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read the whole story
alvinashcraft
6 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Instantly deploy T-SQL programmability changes on save in SQL Projects with SQL Project Power Tools

1 Share

In this blog post I will describe a new feature in SQL Project Power Tools: Publish programmability objects on save. This feature lets you instantly deploy stored procedures, views, functions, and triggers to your development database the moment you save the corresponding .sql file in Visual Studio — no full dacpac deployment required.

The problem: slow feedback loop for programmability objects

SQL Database Projects are a great way to manage your database schema in source control. The normal development workflow is:

  1. Edit your .sql files in Visual Studio
  2. Build the project to produce a .dacpac file
  3. Deploy the .dacpac to your development database (via dacpac publish or schema compare)
  4. Test your changes

This workflow works well for schema objects like tables and indexes, where dacpac deployment is essential — it handles complex operations like renaming columns, adding constraints, and managing data migrations safely. You really do need the full deployment pipeline for those.

However, for programmability objects — stored procedures, views, functions, and triggers — the situation is different. These objects can be replaced atomically using SQL Server's CREATE OR ALTER statement, which has been supported since SQL Server 2016. There is no need to build and deploy a full .dacpac just to update a stored procedure body.

Unfortunately, SQL Database Projects (and the underlying DacFx tooling) do not allow you to selectively deploy a single file. Every change still requires building the full .dacpac and running a deployment, even if you only changed a single stored procedure. This has been raised as a feature request with the DacFx team, but in the meantime SQL Project Power Tools provides a practical workaround.

The solution: Publish programmability objects on save

The new Publish programmability objects on save feature detects when you save a .sql file that contains a supported programmability object and immediately executes a CREATE OR ALTER version of that script against your development database. The status bar shows a confirmation when the publish completes, or an error message if something goes wrong.

Important distinction: This feature only works for programmability objects — stored procedures, views, functions, and triggers. Schema objects such as tables, indexes, and constraints are deliberately excluded. Changing a table definition has data implications that require the full DACpac deployment pipeline to handle safely.

How it works

When a .sql file is saved in Visual Studio:

  1. The extension checks whether the file belongs to a SQL Database Project.
  2. It looks for an .env file in the project directory containing an AutoPublish connection string.
  3. If found, it parses the script using the T-SQL parser and verifies it contains only supported statements: CREATE PROCEDURE, CREATE VIEW, CREATE FUNCTION, or CREATE TRIGGER (and SET options).
  4. Any CREATE statement that is not already CREATE OR ALTER is automatically rewritten to use CREATE OR ALTER.
  5. The rewritten script is executed against the database specified in the connection string.
  6. The Visual Studio status bar shows Publish completed: <filename> on success, or Publish failed: <filename> on failure.

Files that contain table definitions, ALTER TABLE, INSERT, DROP, or any other statements that are not programmability object definitions are silently skipped — the feature does nothing for those files.

Setting up the feature

Step 1: Enable the option

The feature is off by default. To turn it on, open Tools > Options > SQL Server Tools > SQL Project Power Tools in Visual Studio and check the Enable auto publish on save option.

Step 2: Add the connection string

Create a file named .env in the root of your SQL Database Project directory (the same folder as your .sqlproj or .csproj file). Add a line in the following format:

AutoPublish=Server=localhost;Database=MyDevDatabase;Integrated Security=true;TrustServerCertificate=true

You can use any valid SQL Server connection string. The key must be AutoPublish (case-insensitive).

Security tip: The .env file contains a connection string, so make sure to add it to .gitignore to avoid committing credentials to source control.

Step 3: Save a programmability object

Open a .sql file in your project that contains a CREATE PROCEDURE, CREATE VIEW, CREATE FUNCTION, or CREATE TRIGGER statement, make a change, and save. The extension will automatically rewrite the statement to CREATE OR ALTER and execute it against the development database. Watch the status bar at the bottom of Visual Studio for the completion message.

Example

Suppose you have a stored procedure in your project at dbo/Stored Procedures/GetCustomerById.sql:

CREATE PROCEDURE [dbo].[GetCustomerById]
    @Id INT
AS
BEGIN
    SET NOCOUNT ON;
    SELECT Id, Name, Email
    FROM dbo.Customers
    WHERE Id = @Id;
END

When you save this file with the feature enabled and a valid .env file present, the extension will execute the following against your development database:

CREATE OR ALTER PROCEDURE [dbo].[GetCustomerById]
    @Id INT
AS
BEGIN
    SET NOCOUNT ON;
    SELECT Id, Name, Email
    FROM dbo.Customers
    WHERE Id = @Id;
END

The stored procedure is updated immediately, without a full build and deploy cycle.

Other features in SQL Project Power Tools

SQL Project Power Tools includes many other features to improve the SQL Database Project developer experience:

  • Templates — Project and item templates for creating new SDK-style SQL Database Projects and common object types from File > New > Project and Add > New Item.
  • Import database — Import the full schema of an existing database into your project, with configurable file layout (flat, by object type, by schema, or by schema and object type).
  • Schema compare — Visually compare your database project with a live database and apply changes in either direction.
  • Analyze — Run static code analysis on your project and view a detailed report of issues.
  • Manage code analysis rules — Enable or disable individual static analysis rules and set their severity directly from a visual dialog, for SDK-style projects.
  • Create Mermaid E/R diagram — Generate an Entity-Relationship diagram of selected tables for documentation.
  • .dacpac Solution Explorer node — Browse the contents of a built .dacpac file directly from Solution Explorer.
  • Script Table Data — Generate INSERT statements for table data to use as seed data in post-deployment scripts, based on generate-sql-merge.
  • Add pre- and post-deployment scripts — Quickly add new pre- and post-deployment scripts to your project from the context menu.
  • Scaffold SQL MCP Server (preview) — Generate a SQL MCP Server configuration file based on your database project schema.

A getting started guide covers all of these features in more detail.

Getting the extension

Visual Studio

Install SQL Project Power Tools from the Visual Studio Marketplace, or search for it directly in Visual Studio via Extensions > Manage Extensions.

SSMS

Install SQL Project Power Tools from the new SSMS Gallery.

Contribute

The source code is open source and available on GitHub. If you find the extension useful, a ★★★★★ rating on the Marketplace is always appreciated. Bug reports and feature requests are welcome on the GitHub issue tracker.



Read the whole story
alvinashcraft
6 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

Markdown Should be Supported Everywhere Natively

1 Share

You've probably seen this tweet by @trq212 floating around on Twitter about letting agents write HTML instead of markdown...

// Detect dark theme var iframe = document.getElementById('tweet-2052811606032269638-823'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=2052811606032269638&theme=dark" }

Listed below are some of the reasons mentioned in the article:

  • Information Density
  • Visual Clarity & Ease of Reading
  • Ease of Sharing (to me this is the most compelling)

I don't disagree with Tariq, but rather than switch to HTML, I think the answer is to make markdown supported everywhere. We've been using it for years and it's powering much of the modern web. However, if we look at how software and platforms have evolved, markdown support is very dependent on the platform to render it.

Why does markdown work for humans and machines? Well, it's pretty simple, humans write simple syntax that gets rendered into something rich, and unironically, that's often by converting it to HTML and a browser engine rendering it. For machines, it's lightweight to parse and easy to generate token by token without the verbosity of HTML.

We write headers, code blocks, pull quotes, bold text, and what typically happens is something is converting that markdown to HTML.

For example, I am literally typing this blog in markdown, and the only way I can share it to the masses is through a platform like dev.to that converts it to HTML and hosts it for me.

So if the feature is available in some places, why is it not everywhere? I believe that software vendors haven't prioritized adding markdown rendering support, and they should.

We should be able to send a standalone index.md file and view it in all web browsers, chat applications, and emails. Some apps already do this like Discord and Slack (Slack's markdown support disappoints me). We can do this with HTML today, all modern browsers will render something nice, but if you load up markdown in your browser today you will become sad.

We have to reach for things like Obsidian or Kiro to render the markdown, which I feel limits the portability of it all.

Curious what you think and where you see yourself heading in terms of AI agent output. Let me know in the comments if you're switching to HTML or sticking with markdown.

As always, happy coding 🫡!

Follow AWS for more articles like this

Read the whole story
alvinashcraft
11 minutes ago
reply
Pennsylvania, USA
Share this story
Delete

2.8.10

1 Share

CLI: Stats command - fix incorrect CPU % reporting (#40627)

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Introducing Work IQ in the IQ Series

1 Share
From: Microsoft Developer
Duration: 0:36
Views: 44

🎬 Episode 1 premieres on June 2 at 12 PM PT
🔗 Learn more: https://aka.ms/iq-series

Welcome to the Work IQ track within The IQ Series. In the Work IQ episodes, we explore how Work IQ powers AI agents by understanding organizational data, context, and work patterns.

The IQ Series is a developer‑focused learning series exploring Microsoft IQ as the intelligence layer for modern AI systems.

Work IQ is a workplace intelligence layer that delivers a semantic understanding of everything happening across your business. Built on four core components—Chat, Context, Tools, and Workspaces—it continuously transforms signals from across Microsoft 365 and business systems into agent‑ready intelligence, enabling work grounding, reasoning, and permission awareness. The result is a platform purpose-built for agentic use, delivering higher intelligence, speed, efficiency, enterprise scale, and security and governance by design.

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete

Blog: gRPC-Rust Client API Evolution (pt. 1/2)

1 Share

PART 1

The following is a detailed explanation of the background behind the new gRPC-Rust client API. It covers the process I went through while designing it, alternatives considered, trade-offs, and advantages of the final design. If you are only interested in the result itself, please feel free to skip straight to our documentation instead.

Background: gRPC, Tonic, and Tower

gRPC is a high performance Remote Procedure Call (RPC) framework. If you aren’t already familiar with gRPC, please see our introduction.

Tonic is an implementation of the gRPC protocol for the Rust programming language. It is widely used and very popular, with over 12k stars on github. Tonic is designed to be part of the Tower ecosystem, which is a framework that attempts to unify all networking client and service APIs, allowing easy introduction of common middleware for your application.

The gRPC-Rust project was started as an effort to bring all the advanced gRPC features missing from Tonic to the Rust community, like integrated health checking and retries, as well as performance-boosting strategies like zero-copy and arena optimizations.

Obvious Starting Point: Keep the Tonic API?

Tonic is already used by a large number of Rust users, so my initial thought was that it would be great if we could re-use its API. Here’s a quick sample of the current Tonic client-side API:

// Simple unary call (one request, one response):
let response = client
 .get_feature(Request::new(Point { latitude: 409146138, longitude: -746188906,}))
 .await?;

// Bidirectional streaming (many requests and responses in parallel:
let outbound = async_stream::stream! { /* request stream generator */ };
let response = client.route_chat(Request::new(outbound)).await?;
let mut inbound = response.into_inner();

while let Some(note) = inbound.message().await? { /* process responses */ }

When analyzing this more closely, I discovered several issues and limitations:

  • Abstraction Leak: Tonic exposes lower-level HTTP/2 primitives (e.g. GrpcService::ResponseBody is http_body::Body). gRPC applications should not assume that gRPC is using HTTP/2 as a transport to enable the use of other transports (e.g. QUIC) in the future.
  • Unstable Dependency: Tonic builds on Stream from futures_stream (via tokio_stream). These crates are unstable, and we want to be able to have a stable release of gRPC-Rust before they are stabilized.
  • Performance Limitation: Tonic accepts and returns owned protobuf messages. This means the Tonic library handles allocations, which prevents the application from using advanced allocation strategies like arenas. Arenas are important for performance in heavily loaded, highly concurrent RPC systems.
  • Security Concern: Tonic makes it easy (using the ? operator) to propagate outgoing client call statuses to server responses, including metadata, which could contain secrets like tokens or other private information.

What About Tower?

So we’ll need to change the Tonic API, but should we keep using Tower?

In our other supported languages, gRPC directly includes many of the features that you would get from Tower middleware, e.g. timeouts and retries. And, gRPC provides versions of this functionality specifically designed to work well with gRPC. In addition, Tower was not created with streaming in mind, even though it technically works. To implement a bidirectional stream, the Request and Response become asynchronous objects, and the call method returns early in the RPC’s lifecycle to allow the application and library to interact with them.

To illustrate these problems, let’s look at timeouts as an example. The Tower crate provides a timeout module to provide this functionality. The approach is to race the Service it wraps with a timer future, dropping one when the other completes. This works fine for many Service implementations, but it does have some surprising behavior many people are unaware of:

  1. The timeout is not applied while waiting for flow control in poll_ready, meaning your call could block indefinitely.
  2. The timeout only applies to the portion of the call spent waiting for the Response object to be returned from the call method -- which, for streaming RPCs, will be almost immediately. In gRPC, timeouts apply to the total time of the RPC from the moment the client starts attempting the call to the receipt of the final status from the server.

In addition to those behaviors which affect any Service, there is another incompatibility with gRPC: we write the timeout on the wire so that the server is aware of the client’s deadline. If an application was using the timeout module, the gRPC library would never be aware of the timeout in order to propagate it.

Another limitation of Tower is that it makes it hard for the application to control memory allocations: the Response would need to be a complex type that allows you to call into it to receive the message into a buffer. For streaming RPCs, something like this would be needed anyway, but for unary RPCs, which is where arenas are most important, it would be awkward.

Use Tower’s “Style”?

If Tower is a poor fit, should we still keep the same style for calls that Tower and Tonic users are familiar with? I.e.

async fn call(Request) -> Response

We ultimately decided against this approach. With this style, applications that wished to do interleaved operations (“send a message, receive a message, send a message, …”) would need to deal with the two Request and Response streams executing concurrently and implement their own synchronization between the two streams.

Two APIs: Channel vs. Generated Stubs

gRPC actually provides two different APIs: one that the application typically interacts with via the protobuf generated code, and one that the channel (the main client entry point) itself implements, and that interceptors (AKA middleware) would use. The generated API focuses more on usability while the channel API is a lower-level, more powerful, streaming-only design.

In Tonic, this split exists as well, but the proto generated code is just a specialization of the Tonic API using type generics. But this approach is not a requirement, and in fact, there are reasons to avoid it.

With gRPC-Rust, we are taking the opportunity to incorporate some lessons learned from our experience implementing gRPC in many other languages. One such idea we’d like to incorporate is to hide the details of gRPC itself from the generated API, and only expose protobuf messages and necessary primitives. As an example:

// *Not* something like this:
async fn call(ctx: grpc::Context, req: MyRequestMessage, options: grpc::CallOptions)
 -> grpc::Response<MyResponseMessage>;

// More like this instead:
async fn call(req: MyRequestMessage) -> Result<MyResponseMessage, Status>;

// Example usage:
let response = client.call(request).await.expect("RPC should succeed!");

This is similar to gRPC-Java’s design, and it allows applications to focus on the business logic of the application: requests and responses. For other functionality that gRPC provides -- like accessing metadata, disabling retry, reading peer details, etc -- these are generally things that interceptors should be used for, instead.

Generated API

With that in mind, let’s dive into the generated (protobuf) API in isolation first, and understand why the nice, simple API in the example above isn’t good enough. (Then I’ll explain how we ultimately achieved exactly that anyway!)

Arenas

As mentioned earlier, arenas are important for making highly efficient RPC systems that can handle extremely high QPS (Queries Per Second), by grouping related memory operations temporally and spatially. To allow the application to control allocations instead of the library, we were looking for a unary API something like:

// Definition:
async fn call(req: &MyRequestMessageView, resp: &mut MyResponseMessageView) -> Status;

// Usage (with a hypothetical arena API):
let req = MyRequestMessage::new_on_arena(arena).set_id(3);
let res = MyResponseMessage::new_on_arena(arena);
client.call(&req, &mut res).expect("RPC should succeed!");
// res now contains the RPC's response

Can we Make it More Usable?

That API is straightforward, but we understand that many users would not want to pre-declare the response message type manually before making every RPC. We can accommodate both use cases by using the async builder pattern. An async builder is able to return owned messages via an IntoFuture implementation, while also providing an alternative method to perform the call using a pre-allocated response.

We can also improve the ergonomics of the request message parameter. Instead of requiring a reference, we can accept an AsView protobuf type that allows either an owned message or a view to be passed into the call. This is the final API we have implemented:

// Definition:
async fn call<Req>(req: Req) -> UnaryFutureBuilder<..>
where
 Req: AsView<Proxied=MyRequestMsg>;

// Implements the simple usage:
impl IntoFuture for UnaryFutureBuilder {
 type Output = Result<MyResponseMessage, StatusError>;
}

// Implements the advanced usage method:
impl UnaryFutureBuilder {
 async fn with_response_message<Res>(self, res: &mut Res) -> Status
 where
 Res: AsMut<MutProxied = MyResponseMessage>;
}

// Usage:
fn main() {
 let request = proto!(MyRequestMessage{ id: 3 });

 // Simple Usage -- exactly what we wanted originally!:
 let response = client.call(request).await.expect("RPC should succeed!");

 // Arena usage (again with a hypothetical arena API):
 let req = MyRequestMessage::new_on_arena(arena).set_id(3);
 let res = MyResponseMessage::new_on_arena(arena);
 client.call(&req).with_response_message(&mut res).await.expect("RPC should succeed!");
}

We adapted these same concepts to our streaming APIs as well. Below is an example of the bidirectional streaming API:

// Definition:
async fn begin_stream() -> BidiCallBuilder<..>

// Implements the same async builder pattern:
impl IntoFuture for BidiCallBuilder<..> {
 type Output = (GrpcStreamingRequest, GrpcStreamingResponse);
}

// Usage:
fn main() {
 // Simple Usage:
 let (request_stream, response_stream) = client.begin_stream().await;

 request_stream.send(proto!(MyRequestMessage{..}));
 let response = response_stream.recv().await.expect("RPC should succeed!");

 // Arena usage:
 let res = MyResponseMessage::new_on_arena(arena);
 let response = response_stream.recv_into(&mut res).await.expect("RPC should succeed!");
}

Please see the full documentation for more detailed usage examples.

Next Time

This covers the generated code API design process. In Part 2 I’ll go into further details on the channel APIs.

Read the whole story
alvinashcraft
1 hour ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories