Content Developer II at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
127719 stories
·
29 followers

Introducing Netflix’s Key-Value Data Abstraction Layer

1 Share

Vidhya Arvind, Rajasekhar Ummadisetty, Joey Lynch, Vinay Chella

Introduction

At Netflix our ability to deliver seamless, high-quality, streaming experiences to millions of users hinges on robust, global backend infrastructure. Central to this infrastructure is our use of multiple online distributed databases such as Apache Cassandra, a NoSQL database known for its high availability and scalability. Cassandra serves as the backbone for a diverse array of use cases within Netflix, ranging from user sign-ups and storing viewing histories to supporting real-time analytics and live streaming.

Over time as new key-value databases were introduced and service owners launched new use cases, we encountered numerous challenges with datastore misuse. Firstly, developers struggled to reason about consistency, durability and performance in this complex global deployment across multiple stores. Second, developers had to constantly re-learn new data modeling practices and common yet critical data access patterns. These include challenges with tail latency and idempotency, managing “wide” partitions with many rows, handling single large “fat” columns, and slow response pagination. Additionally, the tight coupling with multiple native database APIs — APIs that continually evolve and sometimes introduce backward-incompatible changes — resulted in org-wide engineering efforts to maintain and optimize our microservice’s data access.

To overcome these challenges, we developed a holistic approach that builds upon our Data Gateway Platform. This approach led to the creation of several foundational abstraction services, the most mature of which is our Key-Value (KV) Data Abstraction Layer (DAL). This abstraction simplifies data access, enhances the reliability of our infrastructure, and enables us to support the broad spectrum of use cases that Netflix demands with minimal developer effort.

In this post, we dive deep into how Netflix’s KV abstraction works, the architectural principles guiding its design, the challenges we faced in scaling diverse use cases, and the technical innovations that have allowed us to achieve the performance and reliability required by Netflix’s global operations.

The Key-Value Service

The KV data abstraction service was introduced to solve the persistent challenges we faced with data access patterns in our distributed databases. Our goal was to build a versatile and efficient data storage solution that could handle a wide variety of use cases, ranging from the simplest hashmaps to more complex data structures, all while ensuring high availability, tunable consistency, and low latency.

Data Model

At its core, the KV abstraction is built around a two-level map architecture. The first level is a hashed string ID (the primary key), and the second level is a sorted map of a key-value pair of bytes. This model supports both simple and complex data models, balancing flexibility and efficiency.

HashMap<String, SortedMap<Bytes, Bytes>>

For complex data models such as structured Records or time-ordered Events, this two-level approach handles hierarchical structures effectively, allowing related data to be retrieved together. For simpler use cases, it also represents flat key-value Maps (e.g. id → {"" → value}) or named Sets (e.g.id → {key → ""}). This adaptability allows the KV abstraction to be used in hundreds of diverse use cases, making it a versatile solution for managing both simple and complex data models in large-scale infrastructures like Netflix.

The KV data can be visualized at a high level, as shown in the diagram below, where three records are shown.

message Item (   
Bytes key,
Bytes value,
Metadata metadata,
Integer chunk
)

Database Agnostic Abstraction

The KV abstraction is designed to hide the implementation details of the underlying database, offering a consistent interface to application developers regardless of the optimal storage system for that use case. While Cassandra is one example, the abstraction works with multiple data stores like EVCache, DynamoDB, RocksDB, etc…

For example, when implemented with Cassandra, the abstraction leverages Cassandra’s partitioning and clustering capabilities. The record ID acts as the partition key, and the item key as the clustering column:

The corresponding Data Definition Language (DDL) for this structure in Cassandra is:

CREATE TABLE IF NOT EXISTS <ns>.<table> (
id text,
key blob,
value blob,
value_metadata blob,

PRIMARY KEY (id, key))
WITH CLUSTERING ORDER BY (key <ASC|DESC>)

Namespace: Logical and Physical Configuration

A namespace defines where and how data is stored, providing logical and physical separation while abstracting the underlying storage systems. It also serves as central configuration of access patterns such as consistency or latency targets. Each namespace may use different backends: Cassandra, EVCache, or combinations of multiple. This flexibility allows our Data Platform to route different use cases to the most suitable storage system based on performance, durability, and consistency needs. Developers just provide their data problem rather than a database solution!

In this example configuration, the ngsegment namespace is backed by both a Cassandra cluster and an EVCache caching layer, allowing for highly durable persistent storage and lower-latency point reads.

"persistence_configuration":[                                                   
{
"id":"PRIMARY_STORAGE",
"physical_storage": {
"type":"CASSANDRA",
"cluster":"cassandra_kv_ngsegment",
"dataset":"ngsegment",
"table":"ngsegment",
"regions": ["us-east-1"],
"config": {
"consistency_scope": "LOCAL",
"consistency_target": "READ_YOUR_WRITES"
}
}
},
{
"id":"CACHE",
"physical_storage": {
"type":"CACHE",
"cluster":"evcache_kv_ngsegment"
},
"config": {
"default_cache_ttl": 180s
}
}
]

Key APIs of the KV Abstraction

To support diverse use-cases, the KV abstraction provides four basic CRUD APIs:

PutItems — Write one or more Items to a Record

The PutItems API is an upsert operation, it can insert new data or update existing data in the two-level map structure.

message PutItemRequest (
IdempotencyToken idempotency_token,
string namespace,
string id,
List<Item> items
)

As you can see, the request includes the namespace, Record ID, one or more items, and an idempotency token to ensure retries of the same write are safe. Chunked data can be written by staging chunks and then committing them with appropriate metadata (e.g. number of chunks).

GetItems — Read one or more Items from a Record

The GetItemsAPI provides a structured and adaptive way to fetch data using ID, predicates, and selection mechanisms. This approach balances the need to retrieve large volumes of data while meeting stringent Service Level Objectives (SLOs) for performance and reliability.

message GetItemsRequest (
String namespace,
String id,
Predicate predicate,
Selection selection,
Map<String, Struct> signals
)

The GetItemsRequest includes several key parameters:

  • Namespace: Specifies the logical dataset or table
  • Id: Identifies the entry in the top-level HashMap
  • Predicate: Filters the matching items and can retrieve all items (match_all), specific items (match_keys), or a range (match_range)
  • Selection: Narrows returned responses for example page_size_bytes for pagination, item_limit for limiting the total number of items across pages and include/exclude to include or exclude large values from responses
  • Signals: Provides in-band signaling to indicate client capabilities, such as supporting client compression or chunking.

The GetItemResponse message contains the matching data:

message GetItemResponse (
List<Item> items,
Optional<String> next_page_token
)
  • Items: A list of retrieved items based on the Predicate and Selection defined in the request.
  • Next Page Token: An optional token indicating the position for subsequent reads if needed, essential for handling large data sets across multiple requests. Pagination is a critical component for efficiently managing data retrieval, especially when dealing with large datasets that could exceed typical response size limits.

DeleteItems — Delete one or more Items from a Record

The DeleteItems API provides flexible options for removing data, including record-level, item-level, and range deletes — all while supporting idempotency.

message DeleteItemsRequest (
IdempotencyToken idempotency_token,
String namespace,
String id,
Predicate predicate
)

Just like in the GetItems API, the Predicate allows one or more Items to be addressed at once:

  • Record-Level Deletes (match_all): Removes the entire record in constant latency regardless of the number of items in the record.
  • Item-Range Deletes (match_range): This deletes a range of items within a Record. Useful for keeping “n-newest” or prefix path deletion.
  • Item-Level Deletes (match_keys): Deletes one or more individual items.

Some storage engines (any store which defers true deletion) such as Cassandra struggle with high volumes of deletes due to tombstone and compaction overhead. Key-Value optimizes both record and range deletes to generate a single tombstone for the operation — you can learn more about tombstones in About Deletes and Tombstones.

Item-level deletes create many tombstones but KV hides that storage engine complexity via TTL-based deletes with jitter. Instead of immediate deletion, item metadata is updated as expired with randomly jittered TTL applied to stagger deletions. This technique maintains read pagination protections. While this doesn’t completely solve the problem it reduces load spikes and helps maintain consistent performance while compaction catches up. These strategies help maintain system performance, reduce read overhead, and meet SLOs by minimizing the impact of deletes.

Complex Mutate and Scan APIs

Beyond simple CRUD on single Records, KV also supports complex multi-item and multi-record mutations and scans via MutateItems and ScanItems APIs. PutItems also supports atomic writes of large blob data within a single Item via a chunked protocol. These complex APIs require careful consideration to ensure predictable linear low-latency and we will share details on their implementation in a future post.

Design Philosophies for reliable and predictable performance

Idempotency to fight tail latencies

To ensure data integrity the PutItems and DeleteItems APIs use idempotency tokens, which uniquely identify each mutative operation and guarantee that operations are logically executed in order, even when hedged or retried for latency reasons. This is especially crucial in last-write-wins databases like Cassandra, where ensuring the correct order and de-duplication of requests is vital.

In the Key-Value abstraction, idempotency tokens contain a generation timestamp and random nonce token. Either or both may be required by backing storage engines to de-duplicate mutations.

message IdempotencyToken (
Timestamp generation_time,
String token
)

At Netflix, client-generated monotonic tokens are preferred due to their reliability, especially in environments where network delays could impact server-side token generation. This combines a client provided monotonic generation_time timestamp with a 128 bit random UUID token. Although clock-based token generation can suffer from clock skew, our tests on EC2 Nitro instances show drift is minimal (under 1 millisecond). In some cases that require stronger ordering, regionally unique tokens can be generated using tools like Zookeeper, or globally unique tokens such as a transaction IDs can be used.

The following graphs illustrate the observed clock skew on our Cassandra fleet, suggesting the safety of this technique on modern cloud VMs with direct access to high-quality clocks. To further maintain safety, KV servers reject writes bearing tokens with large drift both preventing silent write discard (write has timestamp far in past) and immutable doomstones (write has a timestamp far in future) in storage engines vulnerable to those.

Handling Large Data through Chunking

Key-Value is also designed to efficiently handle large blobs, a common challenge for traditional key-value stores. Databases often face limitations on the amount of data that can be stored per key or partition. To address these constraints, KV uses transparent chunking to manage large data efficiently.

For items smaller than 1 MiB, data is stored directly in the main backing storage (e.g. Cassandra), ensuring fast and efficient access. However, for larger items, only the id, key, and metadata are stored in the primary storage, while the actual data is split into smaller chunks and stored separately in chunk storage. This chunk storage can also be Cassandra but with a different partitioning scheme optimized for handling large values. The idempotency token ties all these writes together into one atomic operation.

By splitting large items into chunks, we ensure that latency scales linearly with the size of the data, making the system both predictable and efficient. A future blog post will describe the chunking architecture in more detail, including its intricacies and optimization strategies.

Client-Side Compression

The KV abstraction leverages client-side payload compression to optimize performance, especially for large data transfers. While many databases offer server-side compression, handling compression on the client side reduces expensive server CPU usage, network bandwidth, and disk I/O. In one of our deployments, which helps power Netflix’s search, enabling client-side compression reduced payload sizes by 75%, significantly improving cost efficiency.

Smarter Pagination

We chose payload size in bytes as the limit per response page rather than the number of items because it allows us to provide predictable operation SLOs. For instance, we can provide a single-digit millisecond SLO on a 2 MiB page read. Conversely, using the number of items per page as the limit would result in unpredictable latencies due to significant variations in item size. A request for 10 items per page could result in vastly different latencies if each item was 1 KiB versus 1 MiB.

Using bytes as a limit poses challenges as few backing stores support byte-based pagination; most data stores use the number of results e.g. DynamoDB and Cassandra limit by number of items or rows. To address this, we use a static limit for the initial queries to the backing store, query with this limit, and process the results. If more data is needed to meet the byte limit, additional queries are executed until the limit is met, the excess result is discarded and a page token is generated.

This static limit can lead to inefficiencies, one large item in the result may cause us to discard many results, while small items may require multiple iterations to fill a page, resulting in read amplification. To mitigate these issues, we implemented adaptive pagination which dynamically tunes the limits based on observed data.

Adaptive Pagination

When an initial request is made, a query is executed in the storage engine, and the results are retrieved. As the consumer processes these results, the system tracks the number of items consumed and the total size used. This data helps calculate an approximate item size, which is stored in the page token. For subsequent page requests, this stored information allows the server to apply the appropriate limits to the underlying storage, reducing unnecessary work and minimizing read amplification.

While this method is effective for follow-up page requests, what happens with the initial request? In addition to storing item size information in the page token, the server also estimates the average item size for a given namespace and caches it locally. This cached estimate helps the server set a more optimal limit on the backing store for the initial request, improving efficiency. The server continuously adjusts this limit based on recent query patterns or other factors to keep it accurate. For subsequent pages, the server uses both the cached data and the information in the page token to fine-tune the limits.

In addition to adaptive pagination, a mechanism is in place to send a response early if the server detects that processing the request is at risk of exceeding the request’s latency SLO.

For example, let us assume a client submits a GetItems request with a per-page limit of 2 MiB and a maximum end-to-end latency limit of 500ms. While processing this request, the server retrieves data from the backing store. This particular record has thousands of small items so it would normally take longer than the 500ms SLO to gather the full page of data. If this happens, the client would receive an SLO violation error, causing the request to fail even though there is nothing exceptional. To prevent this, the server tracks the elapsed time while fetching data. If it determines that continuing to retrieve more data might breach the SLO, the server will stop processing further results and return a response with a pagination token.

This approach ensures that requests are processed within the SLO, even if the full page size isn’t met, giving clients predictable progress. Furthermore, if the client is a gRPC server with proper deadlines, the client is smart enough not to issue further requests, reducing useless work.

If you want to know more, the How Netflix Ensures Highly-Reliable Online Stateful Systems article talks in further detail about these and many other techniques.

Signaling

KV uses in-band messaging we call signaling that allows the dynamic configuration of the client and enables it to communicate its capabilities to the server. This ensures that configuration settings and tuning parameters can be exchanged seamlessly between the client and server. Without signaling, the client would need static configuration — requiring a redeployment for each change — or, with dynamic configuration, would require coordination with the client team.

For server-side signals, when the client is initialized, it sends a handshake to the server. The server responds back with signals, such as target or max latency SLOs, allowing the client to dynamically adjust timeouts and hedging policies. Handshakes are then made periodically in the background to keep the configuration current. For client-communicated signals, the client, along with each request, communicates its capabilities, such as whether it can handle compression, chunking, and other features.

KV Usage @ Netflix

The KV abstraction powers several key Netflix use cases, including:

  • Streaming Metadata: High-throughput, low-latency access to streaming metadata, ensuring personalized content delivery in real-time.
  • User Profiles: Efficient storage and retrieval of user preferences and history, enabling seamless, personalized experiences across devices.
  • Messaging: Storage and retrieval of push registry for messaging needs, enabling the millions of requests to flow through.
  • Real-Time Analytics: This persists large-scale impression and provides insights into user behavior and system performance, moving data from offline to online and vice versa.

Future Enhancements

Looking forward, we plan to enhance the KV abstraction with:

  • Lifecycle Management: Fine-grained control over data retention and deletion.
  • Summarization: Techniques to improve retrieval efficiency by summarizing records with many items into fewer backing rows.
  • New Storage Engines: Integration with more storage systems to support new use cases.
  • Dictionary Compression: Further reducing data size while maintaining performance.

Conclusion

The Key-Value service at Netflix is a flexible, cost-effective solution that supports a wide range of data patterns and use cases, from low to high traffic scenarios, including critical Netflix streaming use-cases. The simple yet robust design allows it to handle diverse data models like HashMaps, Sets, Event storage, Lists, and Graphs. It abstracts the complexity of the underlying databases from our developers, which enables our application engineers to focus on solving business problems instead of becoming experts in every storage engine and their distributed consistency models. As Netflix continues to innovate in online datastores, the KV abstraction remains a central component in managing data efficiently and reliably at scale, ensuring a solid foundation for future growth.

Acknowledgments: Special thanks to our stunning colleagues who contributed to Key Value’s success: William Schor, Mengqing Wang, Chandrasekhar Thumuluru, John Lu, George Cambell, Ammar Khaku, Jordan West, Chris Lohfink, Matt Lehman, and the whole online datastores team (ODS, f.k.a CDE).


Introducing Netflix’s Key-Value Data Abstraction Layer was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read the whole story
alvinashcraft
30 minutes ago
reply
West Grove, PA
Share this story
Delete

How to Use CSS to Improve Web Accessibility

1 Share
Did you know that CSS can play a significant role in web accessibility? While CSS primarily handles the visual presentation of a webpage, when you use it properly it can enhance the user’s experience and improve accessibility. In this article, I'll s...

Read the whole story
alvinashcraft
2 hours ago
reply
West Grove, PA
Share this story
Delete

Daily Reading List – September 18, 2024 (#400)

1 Share

Today was another day of team meetings so I was offline for much of it. But I got some good early-morning reading finished, and you’ll find them in my 400th list below.

[article] Mistral launches a free tier for developers to test its AI models. Every dev loves a good sandbox environment. Mistral is encouraging folks to give their models a try.

[article] Writing CNNs from Scratch in PyTorch. If you’ve been dying to create convoluted neural networks, follow along with this tutorial.

[article] Are You Done Yet? Mastering Long-running Processes in Modern Architectures. Bernd knows what he’s talking about, so learn from him in this talk/transcript about long running workflows.

[article] Go makes a comeback: What’s fueling its revival? This article proposes that Go is getting more popular because of its security posture and AI friendliness.

[blog] Cloud CISO Perspectives: The high value of cross-industry communication. It’s good to see that companies and organizations of all kinds willingly partner together to solve security challenges.

[article] Study Finds No DevOps Productivity Gains from Generative AI. Hmm. I haven’t seen findings like this, which is why the article caught my eye. Run your own analysis when you introduce these tools into your environment to see where it makes a positive difference.

[article] Open Source: Paid Maintainers Keep Code Safer, Survey Says. A lot of folks depend on software with a small number of maintainers. This says that those who are paid spend more time on security work.

[article] Home Depot builds DIY GenAI model that tells human employees what to say. This post looks at a new whitepaper from the home improvement giant. They’re using a RAG pattern to help with support chats.

[article] This Is How To Conquer Anxiety: 4 Secrets From Research. You may not conquer anxiety, but these are solid ways to get it under control or reframe it.

Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:



Read the whole story
alvinashcraft
2 hours ago
reply
West Grove, PA
Share this story
Delete

How To Use Managed Identities in your Azure Logic Apps

1 Share

Azure Logic Apps are a cloud-based service where you can create low code workflows to automate your business processes or to integrate different applications or services. The article How To Embed Your Azure Logic Apps in a Metadata-driven Data Platform gives a detailed introduction to Logic Apps and also explains how you can take it a step further by integrating them in a metadata-driven framework. At the end of the article, the following Logic App workflow was created:

a logic app with three actions. One HTTP trigger, a SharePoint action, and a SQL Server Execute Stored Procedure action

Figure 1: The starting point for the Logic App in this article

A stored procedure is executed on an Azure SQL Database. The connection to this database was configured using SQL Server Authentication. The goal of this article is to show you how you can connect using managed identities instead, which was left as an exercise to the reader in the previous article.

I recommend you to go through this article first if you don’t have a solid understanding of Logic Apps, or if you want to follow along as an exercise. It’s not necessarily a prerequisite to understand the concepts of this article and if you’re just interested in learning how managed identities work for Logic Apps, then keep on reading.

What are Managed Identities and why should you use them?

When developing applications or services that use multiple Azure resources, it’s not uncommon that those resources have to communicate with each other. Often the developer has to manage the secrets (such as usernames and passwords), credentials, certificates and keys to establish secure connections between those resources. For example, when you want to connect from an Azure Data Factory pipeline to an Azure SQL Database, you need to create a connection and specify a username and password. You can store all those credentials securely in Azure Key Vault, but it’s always extra steps to implement and retrieve those credentials. And what if a password changes? Or if an account key is rotated? Then you have to remember to update those in the Key Vault as well.

With Managed Identities, all this is avoided. A Manage I dentity is an object in Azure Active Directory (Azure AD), which can be used by the Azure resources to establish the connections (as long as Azure AD authentication is supported). All the credentials are managed by Azure (hence the name), so you don’t need to keep track of passwords or anything. Think of a managed identity as an identity impersonating an Azure service when it connects to another service.

In the example of ADF and Azure SQL Database, you would create a Managed Identity associated to the Data Factory instance, add this as a user to the database and give that user the appropriate permissions. In the ADF pipeline, you just have to choose to connect with a Managed Identity and everything else is taken care of for you.

a connection dialog of an ADF linked service, where the authentication type is set to System Assigned Managed Identity

Figure 2: Using a Managed Identity in Azure Data Factory

There are actually two types of Managed Identities:

  • System Assigned Managed Identities. This is the one used in Figure 2. For each instance of an Azure resource, you can assign one system managed identity (MI). This MI will bear the name of the resource itself, as you can see in Figure 2. Since the name is unique, you can have only one system-assigned MI.
  • User Assigned Managed Identities. Here you create the MI yourself and it’s not tied to one single resource. A user assigned MI can be assigned to multiple Azure resources, allowing for easy reuse.

Depending on your use case, you choose the type that suits your needs. Personally, in most cases I use the system assigned managed identities as they’re specifically tied to the lifecycle of the resource. If the resource is deleted, so is the MI. It also makes it easier to track which resources have access to another resources (for example as users in a database), since the name of the resource is the name of the identity.

In the case of our Logic App, we could use a system assigned MI if there was only one Logic App. But what if we have the same Logic App in multiple environments (for example a development, test and production environment)? If you want to use system assigned MI, you would need to change the name of the Logic App in each environment. Or what if you have 50 Logic Apps that need access to your database? Are you going to create 50 different users in the database? A more elegant solution would be to have one single user assigned managed identity to connect to the database, and let all Logic Apps use this MI. For these reasons, we’re going to use user assigned managed identities in this article.

If you’re not convinced yet to use managed identities, check out the blog post Why Managed Identities are Awesome.

Using a Managed Identity in the Logic App

In this section I will go through the process of setting up a managed identify for a Logic App.

Create a New Managed Identity

The first step is to create a user assigned MI. In the Azure Portal, I search for “managed identities” as shown in Figure 3:

A search box with the search word “managed”. In the results, managed identities are listed under services

Figure 3: Search for Managed in the Azure Portal

I click on Managed Identities, which leads me to an overview of all existing managed identities in my environment.

a screen with an overview of all existing managed identities. At the top, there’s an option to create a new one

Figure 4: Managed Identities Overview

I click on Create in the top left corner, which brings me to a new screen where I can specify the necessary info.

configuration screen of a user assigned managed identity. You need to specify a subscription, resource group, location and name

Figure 5: Configuration of the User Assigned MI

In the configuration, I specify the same subscription, location and resource group as my Logic App and then I give the MI a unique name. When I click on Create at the bottom, Azure will run a validation check. If it passes, I can click on Create again to start the provisioning of my new MI.

the final screen of the managed identity creation process. When clicking create, the identity will be provisioned

Figure 6: Create New User Assigned Managed Identity

In the Logic App page itself, I go to the Identity section. There I have the option to assign a system assigned MI or a user assigned MI.

identity settings for the logic app. By default the system assigned section is shown

Figure 7: Identity Section of a Logic App

I click on User assigned to go to configure the corresponding tab. I click on Add to assign my previously created user assigned MI.

after clicking on Add, you can search for the user assigned managed identity and add it to the logic app

Figure 8: Assigning the MI to the Logic App

In the pop-up, I can search for the MI I created earlier, as shown in Figure 8. I select the MI and click on Add at the bottom.

Create a Connection Using the Managed Identity

Now that the Logic App has an identity, I can configure it to use that identity to connect to my Azure SQL Database. In Figure 1 you can see there’s a connection defined with the name “sql”. This is an API Connection resource which can be found in the resource group of the Logic App.

an overview of all resources in the resource group, with an API connection highlighted

Figure 9: The SQL API Connection

In the API connection itself, I tried to edit the connection information directly. In the dropdown for Authentication Type, there’s an option to use the Logic Apps Managed Identity.

the edit screen of an API connection, with a dropdown for authentication type

Figure 10: Editing an API Connection

However, when I try to save my changes, an error is thrown with no further explanation.

an error screen with the message “failed to edit API connection”

Figure 11: Failure Editing API Connection

It seems I must create a new connection for my Logic App. In the Logic App editor, I go to the “Execute stored procedure” activity and choose to change the connection.

execute stored procedure action with the change connection option highlighted

Figure 12: Change Connection for the SQL Action

In the Connection menu, I choose the option to create a new connection.

connection menu with the option to add a new connection

Figure 13: Create new connection

In the next step, I simply need to specify a name for my new connection and set the authentication type to Logic Apps Managed Identity.

new connection configuration, with the name “sql-mi” specified and the authentication type set to Logic Apps Managed Identity

Figure 14: Configure the connection to use the managed identity

When the connection is changed, it’s possible the server name is set to “default” and the database name to “use connection settings”. The name “Test” is listed between brackets as it was the name of the database in the previously used connection. However, the connection itself doesn’t store any actual connection information. It only knows it has to use the Logic App managed identity.

: a Logic App action for executing SQL queries. The server name says “default” and the database name says “use connection settings” with “Test” between brackets

Figure 15: Default server and database name

You can manually override these defaults, but a better option would be to pass the server and database name as parameters. This way, if we deploy our Logic App to another environment, we don’t have to change both names manually but rather supply different parameter values.

In the HTTP trigger, I add two extra fields to the JSON request:

screenshot of the HTTP trigger, with two extra fields highlighted in the JSON request body. One field for the database name and another for the server name.

Figure 16: Extra fields in the JSON request body

The total request body now becomes:

{
  "properties": {
    "databasename": {
      "type": "string"
    },
    "listname": {
      "type": "string"
    },
    "servername": {
      "type": "string"
    },
    "sourcecol": {
      "type": "string"
    },
    "tablename": {
      "type": "string"
    }
  },
  "type": "object"
}

With those two new fields added to the HTTP trigger, I can now go back to the “Execute stored procedure” action and configure the server and database name fields to use the values from the request body. I do this by selecting the relevant fields from the dynamic content list.

screenshot of the execute stored procedure action with dynamic content for the server and database name. At the right, the dynamic content pane is shown with the output of the HTTP trigger action

Figure 17: Using dynamic content to parameterize an action

Configuring Permissions in the Database

The Logic App is now configured to use the managed identity to connect to the Azure SQL Database, but without the appropriate permissions the stored procedure will not be executed.

I log into my database using an account that has the necessary permissions to create a new user and assign permissions to that user. First I create a user with the name of the managed identity:

CREATE USER [logic-app-tekkigurus] FROM EXTERNAL PROVIDER;

Next, I assign a bunch of permissions that will allow the newly created user to connect to the database, read and write data, truncate tables (which requires db_ddladmin permissions) and execute stored procedures.

GRANT CONNECT TO [logic-app-tekkigurus];
ALTER ROLE db_datareader    ADD MEMBER [logic-app-tekkigurus];
ALTER ROLE db_datawriter    ADD MEMBER [logic-app-tekkigurus];
ALTER ROLE db_ddladmin      ADD MEMBER [logic-app-tekkigurus];
GRANT EXECUTE TO [logic-app-tekkigurus];

Testing the Logic App

The Logic App can now be tested. In the editor, I select “Run with payload”, which allows me to specify a JSON request body for the HTTP trigger.

run with payload screen of the logic app, with a json body specified

Figure 18: Run the Logic App with a specified payload

The Logic App runs without error, and it now connects to the Azure SQL Database using the user assigned managed identity. That is one password less that needs to be stored and managed.

Conclusion

By using managed identities, we can allow a Logic App to connect to another Azure resource without specifying any password, credential, or account key. This results in a better security practice and less management overhead. After all, the less passwords we need to remember, the better.

When your environment has many Logic Apps, it’s advisable to use user assigned managed identities. In this case, you only need to create one managed identity and reuse it between your Logic Apps.

The post How To Use Managed Identities in your Azure Logic Apps appeared first on Simple Talk.

Read the whole story
alvinashcraft
2 hours ago
reply
West Grove, PA
Share this story
Delete

The best, worst codebase (Interview)

1 Share

Jimmy Miller talks to us about his experience with a legacy codebase at his first job as a programmer. The codebase was massive, with hundreds of thousands of lines of C# and Visual Basic, and a database with over 1,000 columns. Let’s just say Jimmy got into some stuff. There’s even a Gilfoyle involved. This episode is all about his adventures while working there.

Leave us a comment

Changelog++ members save 12 minutes on this episode because they made the ads disappear. Join today!

Sponsors:

  • Assembly AI – Turn voice data into summaries with AssemblyAI’s leading Speech AI models. Built by AI experts, their Speech AI models include accurate speech-to-text for voice data (such as calls, virtual meetings, and podcasts), speaker detection, sentiment analysis, chapter detection, PII redaction, and more.
  • Supabase – Supabase just finished their 12th launch week! Check it out. Or get a month of Supabase Pro (FREE) by going to supabase.com/changelogpod
  • SpeakeasyProduction-ready, enterprise-resilient, best-in-class SDKs crafted in minutes. Speakeasy takes care of the entire SDK workflow to save you significant time, delivering SDKs to your customers in minutes with just a few clicks! Create your first SDK for free!
  • Test Double – Find out more about Test Double’s software investment problem solvers at testdouble.com.

Featuring:

Show Notes:

Something missing or broken? PRs welcome!





Download audio: https://op3.dev/e/https://cdn.changelog.com/uploads/podcast/609/the-changelog-609.mp3
Read the whole story
alvinashcraft
2 hours ago
reply
West Grove, PA
Share this story
Delete

Securing US Elections from Nation-State Adversaries

1 Share

Editor’s note: On Wednesday, September 18, Brad Smith testified before the Senate Intelligence Committee in a hearing titled, Foreign Threats to Elections in 2024 – Roles and Responsibilities of U.S. Tech Providers. To view the proceedings, please visit Hearings | Intelligence Committee (senate.gov).

In his written testimony, Brad notes that the threats to our democracy from abroad are sophisticated and persistent. We must stand together as a tech community, as leaders, and as a nation to protect the integrity of our elections. We pursue this work guided by two key principles: 

  1. We must uphold the foundational principle of free expression for our citizens.
  2. We must protect the American electorate from foreign nation-state cyber interference.

Securing US Elections from Nation-State Adversaries 

Written Testimony of Brad Smith
Vice Chair and President, Microsoft Corporation 

U.S. Senate Select Committee on Intelligence 

September 18, 2024 

Chairman Warner, Vice Chairman Rubio, Members of the Committee, I appreciate the opportunity to join you and other technology leaders today to discuss the timely and critical issue of protecting US elections from nation-state interference.

Today we are 48 days away from the general election; in some states like Pennsylvania, voters have already begun casting ballots, and three days from now all 50 states will send ballots to military and overseas voters. The election is here, and our adversaries have wasted no time in attempting to interfere. Earlier this week, Microsoft’s Threat Analysis Center (MTAC) reported efforts by our adversaries to interfere in our elections leveraging both old and new tactics. Earlier this month the United States Government sanctioned[1] Russian actors for their attempts to influence the election.[2]

The threats to our democracy from abroad are sophisticated and persistent. We must stand together as a tech community, as leaders, and as a nation to protect the integrity of our elections. We pursue this work guided by two key principles:

  1. We must uphold the foundational principle of free expression for our citizens.
  2. We must protect the American electorate from foreign nation-state cyber interference.

Our adversaries target our democracy in part because they fear the open and free expression it promotes and the success it has brought our country.

Current State of Nation-State Interference

Among Microsoft’s vast team of security professionals, dozens are part of Microsoft’s Threat Analysis Center (MTAC), a team whose mission is to detect, assess, and disrupt cyber influence threats to Microsoft, its customers, and democracies worldwide. Part of MTAC’s mission is protecting elections from nation-state adversaries who seek to use online operations to distort the information going to voters, change the outcome of an election, or interfere in electoral processes.

As MTAC has observed and reported, foreign adversaries are using cyber influence operations to target both political parties in the 2024 U.S. presidential election. In the last two years, Microsoft has detected and analyzed cyber-attacks and cyber-enabled influence operations stemming from Russia, Iran, and China, many of which pertain to elections and elections infrastructure.

This follows similar activity Microsoft has observed in several other countries that recently have held national elections. This includes the 2023 elections in the Netherlands and Slovakia and, in 2024, the Taiwanese, EU, UK and French elections (as well as the 2024 Paris Summer Olympics). Since the beginning of this year, we have been working directly with elected government officials and often with the public to combat these threats. We have used our findings to better understand adversarial behavior and intentions leading into the upcoming 2024 U.S. election, including with respect to nation states’ malicious employment of generative AI, of which we have detected and analyzed many such instances.

Today, we see Iran, Russia, and China using cyber operations to target the U.S. election in November. Iranian operations have targeted candidates of both parties but are inclined to denigrate former President Trump’s campaign, which indicates a preference for a Harris victory. Russian operations, meanwhile, are inclined to denigrate Vice President Harris’s campaign, indicating a preference for a Trump victory. China, for its part, has aimed to collect intelligence and to stoke discord, while to date not showing a clear preference for a specific candidate.

Let me share more about the details in what we have detected so far this year:

Iran

So far in 2024, Iranian election interference mirrors what we observed from Iran in 2020 in tempo, timing, and targets. As we reported in an August 8 report,[3] an Iranian actor we track as Sefid Flood, known for impersonating social and political activist groups, started in March to lay the groundwork for U.S. election cyber-enabled operations. Additionally, Iranian-linked covert propaganda sites and social media networks began and have continued to aim to amplify divisions among Americans across ethnic and religious lines.

In June 2024, Microsoft observed an Iranian actor tracked as Mint Sandstorm compromised a personal account linked to a U.S. political operative. Mint Sandstorm used this access to the political operative’s account to conduct a spear phishing attack on a staff member at a U.S. presidential campaign. Microsoft products automatically detected and blocked this phishing email. Microsoft took additional steps to notify the political operative and the campaign of this activity. Last month, Microsoft detected that Mint Sandstorm compromised additional personal accounts belonging to individuals linked to a U.S. presidential candidate. Microsoft quickly took action to notify these users and assist them in securing their accounts. We expect the pace and persistence of Iran’s cyberattacks and social media provocations will quicken as Election Day approaches in November.

Iran has a history of targeting voters in U.S. swing states. In 2020, an IRGC-directed group, Cotton Sandstorm, posed as the right-wing “Proud Boys” to stoke discord in the U.S. over purportedly fake votes. Using a Proud Boys-named email, Cotton Sandstorm sent emails to Florida residents warning them to “vote for Trump or else!”1 Cotton Sandstorm’s cyber activity ahead of the operation included scanning of at least one government organization in Florida.

In 2022, ahead of the midterm elections, Microsoft detected Mint Sandstorm targeting county-level government organizations in a few states, a pair of which were tightly contested states in 2020. Similarly, in 2024, we’ve observed another group operating on the IRGC’s behalf, Peach Sandstorm, successfully access an account at a county government in a tightly contested swing state.

We do not know if the IRGC’s targeting of swing states in 2022 or 2024 was election related; in fact, Peach Sandstorm’s targeting was part of a large-scale password spray operation. That said, Iran appears to have demonstrated an interest in U.S. swing states for potential follow-on operations similar to the one ahead of the 2020 elections that sought to sow discord on our electoral process.

Russia

Russian threat actors, the most notable adversary in previous U.S. election cycles, currently are spoofing reputable media outlets and distributing staged videos to spread the Kremlin’s preferred messages to U.S. voters online. In some cases, these campaigns gain a significant number of views and sizeable reach among U.S. and international audiences.

For example, in early May, Microsoft observed a Russia-affiliated influence actor we track as Storm-1516 disseminate a staged video that claimed to show Ukrainian soldiers burning an effigy of former President Trump. The fake video received some international press, which inaccurately covered the video as genuinely originating from Ukraine. The video was reposted across social media and received several million impressions.

Later, after Vice President Harris joined the presidential race, our team saw Storm-1516 pivot its campaigns. In a second video staged in a Storm-1516 operation in late August, several people who are depicted as Harris supporters are shown assaulting an alleged supporter of former President Trump. This video received at least five million impressions. In a third staged video released earlier this month, Storm-1516 falsely claimed that Harris was involved in a hit-and-run incident. This video similarly gained significant engagement, the original video reportedly receiving more than two million views in the week following its release.[4]

We also anticipate that Russian cyber proxies, which disrupted U.S. election websites during the 2022 midterms,[5] may seek to use similar tactics on Election Day in November 2024. In addition to the Russian cyber proxy “RaHDit,” which the U.S. State Department recently revealed as led by Russian intelligence,[6] Microsoft tracks nearly a dozen Russian cyber proxies that regularly use rudimentary cyberattacks to stoke fear in election and government security on social media.

In our August 9 elections report, we revealed a Russian actor that we track as Volga Flood (also known as Rybar) and their efforts to infiltrate U.S. audiences by posing as local activists.[7] Volga Flood created multiple social media accounts called “TexasvsUSA.” The accounts post inflammatory content about immigration at the Southern border and call for mobilization and violence. This month, we’ve seen Volga Flood shift its focus to the Harris-Walz campaign, posting two deceptively edited videos of Vice President Harris on social media.

Volga Flood is publicly positioned as an anonymous military blogger covering the war in Ukraine. In reality, however, Volga Flood is a media enterprise employing dozens of people and headed by EU-sanctioned Russian national Mikhail Zvinchuk. Volga Flood’s media enterprise is divided across multiple teams that include monitoring, regional analytics, illustration, video, foreign languages, and geospatial mapping—all to fulfill its mission statement of waging information warfare on behalf of the Kremlin. Volga Flood publishes analyses through dozens of social media brands and establishes and runs covert social media accounts.

Two additional Russian actors MTAC tracks have largely focused on European audiences but at times shift to U.S. electoral influence. Since March 2022, we have seen the Russian threat actor we track as Ruza Flood, known internationally as “Doppelganger,” attempt to undermine U.S. politics. Ruza Flood receives significant resourcing and direction from the Russian Presidential Administration.[8]   The U.S. Justice Department, in its September 4 announcements, revealed Ruza Flood’s efforts to influence the U.S. citizenry through projects like the “Good Old USA Project,” “The Guerilla Media Campaign,” and the “U.S. Social Media Influencers Network Project.”[9]

Finally, Storm-1679, a Russian influence actor previously focused on malign influence operations targeting the 2024 Paris Olympic Games, has recently shifted its focus to the U.S. presidential election.[10] Storm-1679 routinely creates videos masquerading as reputable news services or impersonating international intelligence agencies, including France’s DGSI and the U.S.’s CIA. Storm-1679 recently pivoted to creating videos sowing conspiracies about Vice President Harris, which the actor distributes across a range of social media platforms.

Microsoft’s current tracking of current Russian influence operations targeting elections extends beyond the U.S. presidential election. We are also seeing efforts to influence the upcoming Moldovan presidential election and EU referendum on October 20, 2024. In Moldova, a longstanding target of Russian strategic influence campaigns, we currently observe pro-Kremlin proxy activity aimed at achieving Moscow’s goal of destabilizing democratic institutions and undermining pro-EU sentiment. We and others expect Russia will leverage an array of techniques in Moldova: political influence, electoral interference, cyberattacks, sabotage, and cyber-enabled influence campaigns that promote pro-Kremlin political parties and denigrate the current Moldovan leadership.

Microsoft is working in collaboration with the Moldovan government and others to assist in identifying and defending against Russian cyber and influence operations seeking to influence the outcome of these two elections.

China

Chinese actors’ election efforts are more extensive in 2024 than in previous U.S. election cycles. We observe Chinese influence actors spreading politically charged content over covert social media networks, pretending to be U.S. voters and polling Americans on divisive social issues. Chinese actors have also posed as student protestors online, seeking to stoke division over conflict in the Middle East. These fake accounts—masquerading largely as U.S. conservative voters but also a handful of progressive personas as well—frequently ask their followers whether they agree with a political topic or political candidate. This tactic may be for reconnaissance purposes to better understand how Americans view nuanced political issues.

This messaging style may also be part of a broader engagement strategy: Over the past year, these China-linked personas have conducted more tailored audience engagement than observed previously, replying to comments, tagging politicians and political commentators, and creating online groups with likeminded voters. Their content strategy has shifted as well. Rather than producing original infographics and memes that largely failed to resonate with U.S. voters in the past cycle, these personas are creating simple short-form videos taken and edited from mainstream news media. Clips denigrating the Biden administration have successfully reached hundreds of thousands of views.

In July 2024, Microsoft responded to a cyberattack on an organization supporting the upcoming U.S. presidential election. Microsoft worked to remediate and secure the organization’s infrastructure. Subsequent investigation and analysis has attributed this attack to a state affiliated actor based in China.

These examples, as well as others, underscore the ways in which Iranian, Russian, and Chinese influence actors may seek in the next two months to use social divisions and digital technology to further divide Americans and sow discord ahead of this November’s election. We also need to be vigilant in combatting the risk that nation-state adversaries will seek to conduct cyberattacks directly on key American entities that play critical roles in these elections. More information on these actors can be found in our most recent MTAC report.

Deceptive use of synthetic media (deepfakes)

AI is a tool among many tools that adversaries may opt to leverage as part of a broader cyber influence campaign. As we have navigated through the numerous global elections this year, the emergence of AI as a means for interference has presented itself so far this year as less impactful than many had feared. We recognize, however, that determined and advanced actors will continue to explore new tactics and techniques to target democratic countries, which will include additional and improved use of AI over time.

Though we have not, to date, seen impactful use of AI to influence or interfere in the U.S. election cycle, we do not know what is planned for the coming 48 days, and therefore we will continue to be vigilant in our protections and mitigations, against threats both traditional and novel.

As a leading technology company heavily invested in AI, we recognize our important responsibility to implement proactive measures to counter these risks. This includes developing robust frameworks for detecting and responding to deceptive AI election content, enhancing transparency within AI applications, and fostering international collaboration to protect the democratic process. The future of our elections depends on our collective ability to utilize AI responsibly and ethically.

In response to these challenges, we have taken significant steps, including joining together with twenty-seven of the world’s largest technology companies this year to sign the Tech Accord to Combat Deceptive Use of AI in 2024 Elections.[11] This accord addresses abusive AI-generated content through eight specific commitments, categorized into three pillars: Addressing Deepfake Creation, Detecting and Responding to Deepfakes, and Transparency and Resilience. It represents one of the important steps the tech sector has taken this year to protect our elections, and we appreciate the encouragement and support of this Committee to be more proactive, including through Chairman Warner’s presence and voice at the launch of this accord at the Munich Security Conference in February.

Here are some updates on how Microsoft is directly responding to these threats and upholding our joint commitments:

Addressing Deepfake Creation

We recognize that companies whose products are used to create AI generated content have a responsibility to ensure images and videos generated from their systems include indicators of their origin. One way to accomplish this is through content provenance, enabled by an open standard created by the Coalition for Content Provenance and Authenticity (C2PA).[12] Microsoft is a founding member of C2PA and has leveraged this standard (“content credentials”) across several of our products, ensuring that AI generated content is marked and readable.

Specifically, Microsoft has added content credentials to all images created with our most popular consumer facing AI image generation tools, including Bing Image Creator, Microsoft Designer, Copilot, and in our enterprise API image generation tools via Azure OpenAI. We recently started testing a content credentials display in Word. When images with content credentials are inserted into Word documents, future viewers will be able to right click and see the credits and author information of these images. In addition, C2PA tagged content is starting to be automatically labeled on LinkedIn.[13] The first place you’ll see the content credentials icon is on the LinkedIn feed, and we’ll work to expand our coverage to additional surfaces.

As important as it is to mark content as AI generated, a healthy information ecosystem relies on other indicators of authenticity as well. This is why in April we announced[14] the creation of a pilot program that allows political campaigns in the U.S. and the EU, as well as elections authorities and select news media organizations globally, to access a tool that enables them to easily apply content provenance standards to their own authentic images and videos.

We also joined forces with fellow Tech Accord signatory, TruePic[15] to release an app that simplifies the process for participants in the pilot. This app has now launched for both Android and Apple devices and can be used by those enrolled in Content Credentials program.

Detecting and Responding to Deepfakes

Microsoft is harnessing the data science and technical capabilities of our AI for Good Lab and MTAC teams to better assess whether abusive content—including that created and disseminated by foreign actors—is synthetic or not. Microsoft’s AI for Good lab has developed and is using detection models (image, video) to assess whether media was generated or manipulated by AI. The model is trained on approximately 200,000 examples of AI and real content. The Lab continues to invest in creating sample datasets representing the latest generative AI technology. When appropriate, we call on the expertise of Microsoft’s Digital Crimes Unit to operationalize the early detection of AI-powered criminal activity and respond fittingly, including through the filing of affirmative civil actions to disrupt and deter that activity and through threat intelligence programs and data sharing with customers and governments.

To build on the work of our AI for Good lab, in April we announced[16] that we were joining up with AI researcher, Oren Etzioni[17] and his new non-profit, True Media.[18] True Media provides governments, civil society and journalists with access to free tools that enable them to check whether an image or video was AI generated and/or manipulated. Microsoft’s contribution includes providing True Media with access to Microsoft classifiers, tools, personnel, and data. These contributions will enable True Media to train AI detection models, share relevant data, evaluate and refine new detection models as well as provide feedback on quality and classification methodologies.

We are also empowering candidates, campaigns and election authorities to help us detect and respond to deceptive AI that is targeting elections. In February we launched the Microsoft-2024 Elections site[19] where candidates in a national or federal election can directly report deceptive AI election content on Microsoft consumer services. This reporting tool allows for 24/7 reporting by impacted election entities who have been targeted by deceptive AI found on Microsoft platforms.

Transparency and Resilience

In advance of the EU elections this summer, we kicked off a global effort to engage campaigns and elections authorities. This enabled us to deepen understanding of the possible risks of deceptive AI in elections and empower those campaigns and election officials to speak directly to their voters about these risks and the steps they can take to build resilience and increase confidence in the election. So far this year we have conducted more than 150 training sessions for political stakeholders in 23 countries, reaching more than 4,700 participants. This included training and public educations sessions at the Republican and Democratic National Conventions, as well as with state party chairpersons for both major political parties in the United States.

Building on this training, Microsoft also ran public awareness campaigns in the EU ahead of the EU Parliamentary elections,[20] as well as in France[21] and the UK[22] ahead of their national elections. We are now pursuing similar work in the United States ahead of the November general election. This campaign, which is entitled “Check, Recheck, Vote,” educates voters of the possible risks posed by deepfakes and empowers them to take steps to identify trusted sources of election information, look for indicators of trust like content provenance, and pause before they link to or share election content. This includes our ‘Real or Not?’ Quiz, developed by our AI for Good lab to expose users to the challenges of detecting a possible deepfake. So far, individuals from 177 countries have taken the quiz.

Overall, our public awareness campaigns outside the United States have reached more than 350 million people, driving almost three million engagements worldwide. Our U.S. Public Awareness campaign[23] has just begun and already has reached over six million people with over 30,000 engagements.

In May, we announced a series of societal resilience grants in partnership with OpenAI.[24] Grants delivered from the partnership have equipped several organizations, including Older Adults Technology Services (OATS) from AARP, International Institute for Democracy and Electoral Assistance (International IDEA), C2PA, and Partnership on AI (PAI) to deliver AI education and trainings that illuminate the potential of AI while also teaching how to use AI safely and mitigate against the harms of deceptive AI-content.

Protecting Campaign and Election Infrastructure

Since the 2016 election, adversaries have regularly targeted essential systems that support elections and campaigns in the U.S. to advance their cyber enabled influence operations. As mentioned earlier, recent Iranian hacking incidents involved attempts by these actors to provide stolen or allegedly stolen material to the media to propagate narratives of dissent and distrust. This underscores why we continue to invest in efforts that focus on safeguarding the critical infrastructure that underpins our elections.

Our efforts include several initiatives designed to support election officials and political organizations. First, we offer AccountGuard, a no-cost cybersecurity service available to our cloud email customers in 35 countries. This service provides advanced threat detection and notifications against nation-state adversaries for high-risk customers, including those involved in elections. AccountGuard extends beyond commercial customers to individuals at election organizations, their affiliates, and immediate family members who may use personal Microsoft accounts for email. We have observed that sophisticated adversaries often target both professional and personal accounts, amplifying the need for comprehensive protection. More than 5.4 million mailboxes of high-risk users are now protected by AccountGuard globally.

Additionally, our Election Security Advisors program provides federal political campaigns and state election departments with expert security consultation. This includes proactive security assessments or forensic investigations in the event of a cyber incident. Our goal is to ensure that these entities have the necessary support to maintain the integrity of their operations.

For critical election-adjacent systems, such as voter registration databases and voter information portals, we provide our Azure for Election service. This service provides proactive security reviews, resilience assessments, and load analysis. During the election week, we offer our highest tier of reactive support to address any security or availability issues that may arise. Since offering this service from 2018 to today, we have assisted more than half of U.S. states, including many counties and cities, in reviewing their election IT infrastructure.

In preparation for the election this November, we are also establishing a situation room staffed by our team to provide constant management and triage of any election-sensitive issues and maintain real-time communications with other situations rooms across our industry partners. This ensures that any incidents receive the highest level of priority and executive support.

While we continue to provide robust security services, we recognize that collaboration is essential. Public-private partnerships are crucial in strengthening the entire ecosystem. Our Democracy Forward team actively participates in tabletop cybersecurity training exercises with U.S. election officials at both national and state/county levels.

Microsoft remains steadfast in its commitment to supporting the security and integrity of democratic processes. Through our comprehensive programs and collaborative efforts, we aim to protect democracy from the evolving threats posed by nation-state actors.

Policy Recommendations

Finally, we find ourselves at a moment in history when anyone with access to the Internet can use AI tools to create a highly realistic piece of synthetic media that can be used to deceive: a voice clone of a family member, a deepfake image of a political candidate, or even a doctored government document. AI has made manipulating media significantly easier, quicker, more accessible, and requiring little skill. As swiftly as AI technology has become a tool, it has become a weapon.

I want to acknowledge and thank this Committee for its longstanding leadership on these important issues. We particularly commend the efforts reflected in section 511 of the SSCI FY 25 Intelligence Authorization Act (IAA), which focuses on protecting technological measures designed to verify the authenticity and provenance of machine-manipulated media. These protections are essential as technology companies strive to provide users with reliable information about the origins of AI generated content.

We are also encouraged and supportive of the recent agreement by the Federal Election Commission (FEC)[25]  applying existing restrictions regarding fraudulent misrepresentation to campaigns use of AI technology. Existing robocall provisions are another means of addressing the fraudulent use of synthetic content. These provisions have historically restricted the use of artificial or prerecorded voices and allow for enforcement actions when these rules are violated.

Along those lines, it is worth mentioning three ideas that may have an outsized impact in the future fights against deceptive and abusive AI-generated content.

  • First, Congress should enact a new federal “deepfake fraud statute.” We need to give law enforcement officials, including state attorneys general, a standalone legal framework to prosecute AI-generated fraud and scams as they proliferate in speed and complexity.
  • Second, Congress should require AI system providers to use state-of-the-art provenance tooling to label synthetic content. This is essential to build trust in the information ecosystem and will help the public better understand whether content is AI-generated or manipulated.
  • Third, Congress should pass the bipartisan Protect Elections from Deceptive AI Act, sponsored by Senators Klobuchar, Hawley, Coons, and Collins. This important piece of legislation prohibits the use of AI to generate materially deceptive content falsely depicting federal candidates in political ads to influence federal elections, with important exceptions for parody, satire, and the use of AI-generated content by newsrooms. Such legislation is needed to ensure that bad actors cannot exploit ambiguities in current law to create and distribute deceptive content, and to ensure that candidates for federal office have meaningful recourse if they are the victim of such attacks. Several states have proposed or passed legislation similar to this federal proposal. While the language in these bills varies, we recommend states adopt prohibitions or disclosure requirements on “materially deceptive” AI-generated ads or something akin to that language and that the bills contain exceptions for First Amendment purposes.

Conclusion

In conclusion, we recognize that the protection of electoral integrity and public trust is a shared responsibility and a common good that transcends partisan interests and national borders.

This must be our guiding principle.

Looking ahead, we believe that new forms of multistakeholder action are essential. Initiatives like the Paris Call and Christchurch Call have demonstrated positive global impacts by uniting representatives from governments, the tech sector, and civil society. In addressing the challenges posed by deepfakes and other technological issues, it is evident that no single sector of society can solve these complex problems in isolation. Collaboration is crucial to preserving our timeless values and democratic principles amidst rapid technological change.

Thank you for your time and consideration. I look forward to answering any questions you may have.

 


[1] Treasury Takes Action as Part of a U.S. Government Response to Russia’s Foreign Malign Influence Operations | U.S. Department of the Treasury

[2] Office of Public Affairs | Justice Department Disrupts Covert Russian Government-Sponsored Foreign Malign Influence Operation Targeting Audiences in the United States and Elsewhere | United States Department of Justice

[3] Iran Targeting 2024 US Election – Microsoft On the Issues

[4] https://uk.news.yahoo.com/russia-spread-fake-rumour-kamala-153333198.html

[5] https://www.usatoday.com/story/news/politics/elections/2022/11/08/2022-midterm-websites-mississippi-hit-cyber-attack/8308615001/

[6] https://www.state.gov/u-s-department-of-state-takes-actions-to-counter-russian-influence-and-interference-in-u-s-elections

[7] https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/final/en-us/microsoft-brand/documents/5bc57431-a7a9-49ad-944d-b93b7d35d0fc.pdf

[8]https://www.justice.gov/opa/media/1366261/dl

[9] https://www.justice.gov/opa/pr/justice-department-disrupts-covert-russian-government-sponsored-foreign-malign-influence

[10] https://blogs.microsoft.com/on-the-issues/2024/06/02/russia-cyber-bots-disinformation-2024-paris-olympics/

[11] AI Elections accord – A Tech accord to Combat Deceptive Use of AI in 2024 Elections

[12] Overview – C2PA

[13] (1) LinkedIn Adopts C2PA Standard | LinkedIn

[14] Expanding our Content Integrity tools to support global elections – Microsoft On the Issues

[15] Truepic’s Secure Camera Enhances Microsoft’s Content Integrity Tools – Truepic

[16] TrueMedia.org to Enhance Deepfake Detection Capabilities · TrueMedia

[17] An A.I. Researcher Takes On Election Deepfakes – The New York Times (nytimes.com)

[18] TrueMedia.org

[19] Microsoft-2024 Elections

[20] Addressing the deepfake challenge ahead of the European elections – EU Policy Blog (microsoft.com)

[21] Microsoft s’engage dans la préservation de la sincérité des élections législatives en France – News Centre

[22] Combating the deceptive use of AI in elections (microsoft.com)

[23] Combating the deceptive use of AI in US elections (microsoft.com)

[24] Microsoft and OpenAI launch Societal Resilience Fund – Microsoft On the Issues

[25] showpdf.htm (fec.gov)

The post Securing US Elections from Nation-State Adversaries appeared first on Microsoft On the Issues.

Read the whole story
alvinashcraft
5 hours ago
reply
West Grove, PA
Share this story
Delete
Next Page of Stories