Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
154290 stories
·
33 followers

Introducing pulumi do: Direct Resource Operations for Any Cloud

1 Share

Infrastructure as code is the right model for production systems. State tracking, drift detection, and repeatable deployments all matter when you’re managing real workloads.

But sometimes, you also need a quick, one-off interaction with the cloud: create a bucket or a database, look up a VPC, delete a stray resource.

Today we’re introducing pulumi do, a new command for direct resource operations. With pulumi do, you can create, read, update, delete, and query any cloud resource from the terminal with a single command, across thousands of Pulumi-supported providers — no project, code, or state required.

The problem: Sometimes IaC is more than you need

When you’re managing production workloads, IaC is the proven solution. Code lets you declare complex systems, state tracking catches drift before it becomes a problem, dependency graphs sequence changes safely, and policy keeps everything in bounds. That full lifecycle, especially with the backing of a platform like Pulumi Cloud, is exactly what you want to build systems that scale.

But when you (or your coding agent) need an ad-hoc Postgres database, the simplest path with IaC still takes several steps: make a directory, create a project, configure your credentials, write the code, preview, deploy. It works, but it’s not always necessary for what should be a simple operation. pulumi do collapses all of those steps into one, using the same Pulumi providers, resource model, and ecosystem that powers the core Pulumi platform.

Resource creation is also only part of the problem. As Joe laid out in The Agentic Infrastructure Era, the real challenge for AI agents isn’t with code or CLI commands, it’s with everything else: getting a cloud account, resolving credentials, wiring configuration across multiple services. Agent accounts, also released this week, simplify this by letting an agent provision its own ephemeral Pulumi Cloud account, and Pulumi ESC takes care of consolidating credentials across providers. Together, with pulumi do, agents can now go from zero to deployed infrastructure without requiring a human in the loop — and when that one-off resource needs to grow into a more permanent system, there’s a clear graduation path back to full Pulumi IaC.

What it looks like

As an example, say you wanted to provision an S3 bucket. With the AWS CLI, you’d need to assemble an aws s3api create-bucket invocation with the right set of command-line flags, region constraints, a globally unique name, and so on. With pulumi do, it’s just this:

$ pulumi do aws:s3:Bucket create

That might not look all that different on the surface — but because you’re using the Pulumi engine and resource model, you can provide a minimal set of input properties, take advantage of provider-defined defaults, and use Pulumi’s auto-naming feature to give the bucket a unique name automatically:

$ pulumi do aws:s3:Bucket create

This will create aws:s3/bucket:Bucket with the following inputs:
{
 "bucket": "bucket-279ea56",
 "tagsAll": {}
}

Please confirm that this is what you'd like to do by typing `yes`:

Answer yes (or just pass --yes), and you’re done. To delete the bucket:

$ pulumi do aws:s3:Bucket delete bucket-279ea56 --yes

Need to look up an existing resource? Use a provider function:

$ pulumi do aws:ec2:getVpc --default

{
 "arn": "arn:aws:ec2:us-west-2:663782525873:vpc/vpc-d7b311af",
 "cidrBlock": "172.31.0.0/16",
 "enableDnsHostnames": true,
 "enableDnsSupport": true,
 "enableNetworkAddressUsageMetrics": false,
 "id": "vpc-d7b311af",
 ...
}

Same CLI, same output contract, same provider ecosystem.

The command shape

The do command accepts a Pulumi resource type, or type token, to determine the action to take. Type tokens have the form <package:module:resource>. For example, aws:s3:Bucket refers to the Amazon S3 Bucket resource that belongs to the s3 module of the aws package.

You can also provide a portion of the token to help you find what you’re looking for without ever having to leave the terminal:

$ pulumi do aws:s3

Functions and resources for the s3 module.

Run 'pulumi do <module/resource/function> --help' for more details on usage.

Functions:
 aws:s3:getAccessPoint
 aws:s3:getAccountPublicAccessBlock
 aws:s3:getBucket
 aws:s3:getBucketObject
 ...

Resources:
 aws:s3:AccessPoint
 aws:s3:AccountPublicAccessBlock
 aws:s3:AnalyticsConfiguration
 aws:s3:Bucket
 ...

$ pulumi do aws:s3:Bucket read bucket-d20976f

{
 "arn": "arn:aws:s3:::bucket-d20976f",
 "bucket": "bucket-d20976f",
 "bucketDomainName": "bucket-d20976f.s3.amazonaws.com",
 "bucketNamespace": "global",
 ...
}

The package, module, and resource/function segments all come directly from the Pulumi provider schema, so --help works at every level of the tree. Pass a package name, optional module, and optional function or resource type, and do returns the appropriate level of detail.

You can also provide the input properties of a resource in a YAML or JSON file with the --input option. To create a container service in Google Cloud Run for example:

# service.yaml
location: us-central1
deletionProtection: false
template:
 containers:
 - image: us-docker.pkg.dev/cloudrun/container/hello
$ pulumi do gcp:cloudrunv2:Service create \
 --input yaml \
 --input-file service.yaml

This will create gcp:cloudrunv2/service:Service with the following inputs:
{
 "deletionProtection": false,
 "location": "us-central1",
 "name": "service-b8af752",
 "template": {
 "containers": [
 {
 "image": "us-docker.pkg.dev/cloudrun/container/hello"
 }
 ]
 }
}

The result:

{
 "createTime": "2026-05-22T23:00:22.415839Z",
 ...
 "urls": [
 "https://service-b8af752-921927215178.us-central1.run.app",
 "https://service-b8af752-ctnulmzwoa-uc.a.run.app"
 ]
}

Resource operations

Most resources support the full set of CRUD operations — create, read, update, delete, and list — directly from the CLI. Each operation maps to a provider CRUD method using the same provider logic a full Pulumi program would use, and resources are addressable by their cloud provider IDs:

# Create a resource
$ pulumi do aws:s3:Bucket create --yes | jq -r ".name"
bucket-4f5cb22

# Fetch it
$ pulumi do aws:s3:Bucket read bucket-4f5cb22 | jq -r ".hostedZoneId"
Z3BJ6K6RIION7M

# Update/patch it
$ pulumi do aws:s3:Bucket patch bucket-4f5cb22 --input yaml --input-file tags.yaml

$ pulumi do aws:s3:Bucket read bucket-4f5cb22 | jq ".tags"
{
 "key": "value"
}

# Delete it
$ pulumi do aws:s3:Bucket delete bucket-4f5cb22

Provider configuration

Today, pulumi do resolves provider configuration — for example, applying your AWS credentials — using environment variables or credential files as supported by each individual Pulumi provider. See the Pulumi Registry for provider-specific configuration details.

Designed for humans and agents

We’ve designed pulumi do to serve humans and coding agents equally well, guided by three fundamental ideas:

  • Consistent command structure across every provider. The do <package:module:type> <operation> pattern is the same for AWS, Azure, Google Cloud, Kubernetes, Cloudflare, Datadog, and every provider, including packages containing higher-level component resources. Once an agent learns that pattern, it applies across the board.

  • Predictable output contract. JSON on stdout, progress on stderr, consistent exit codes. An agent can parse the result programmatically without scraping human-formatted tables.

  • A single CLI command that works across every cloud. Many cloud and SaaS providers don’t have a full CLI at all. pulumi do generates commands from the provider schema, so if a Pulumi provider exists for it, the CLI just works. Neither humans nor agents need to install, learn, or even know about cloud provider-specific tooling.

What’s next

Resource operations and provider functions are the foundation. The pulumi do roadmap extends the same direct-operation model with credential management, state tracking, and a path to full IaC.

Unified credentials with Pulumi ESC

One of the hardest parts of multi-cloud operations is credential management. Every provider has its own authentication scheme, environment variables, and session lifecycle. An agent working across AWS, Cloudflare, and Datadog today manages three separate credential mechanisms.

We’re building Pulumi ESC integration into pulumi do so you can manage credentials in one place and resolve them everywhere. ESC handles credential resolution (including OIDC-based dynamic credential generation and short-lived tokens) across all of your providers. Name the credential set, reference it, and ESC does the rest, with rotation, RBAC, and audit built in.

Cross-resource references

Real infrastructure has dependencies — subnets need VPCs, security group rules need their security groups, and so on. When you’re building resources one at a time, those references need to flow between commands somehow.

A future version of pulumi do will let resource inputs reference outputs from previously created resources, allowing the CLI to resolve them automatically and preserve the dependency graph. Later, when the time comes to graduate to a full IaC program, the generated code contains proper resource references rather than hard-coded strings.

Stateful mode and the graduation path

Today, pulumi do is stateless. Each command runs independently. A planned stateful mode will persist resource state across operations, enabling drift detection, lifecycle management, and a graduation path to full infrastructure as code.

Here’s what we’re planning:

  1. Zero setup. Your first pulumi do implicitly creates a project and stack. No manual initialization.

  2. Accumulate resources. Each operation stores resource state. After a few commands, you have a lightweight representation of your infrastructure.

  3. Eject to a full project. When the time comes, generate a Pulumi project in your chosen language with all resources imported and dependency graphs intact.

  4. Connect to Pulumi Cloud. Layer on governance, compliance, team collaboration, and deployment automation through Pulumi Cloud. Resources created via pulumi do can be governed by Pulumi Insights from day one, even before you opt into full IaC.

This path works because pulumi do uses the same providers, resource types, and property schemas as every other pulumi operation. Provisioned cloud resources stay where they are as management capabilities are added as needed.

Get started

pulumi do ships as a research preview in Pulumi CLI v3.242.0 and later. Install or update the CLI, install a provider plugin, and start running commands. The documentation has the full reference.

We can’t wait to hear your feedback. Give it a try today, tell us what works (and what doesn’t), and help shape the CLI that agents and humans both reach for first.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Regex support for LOB types in T-SQL—available in Azure SQL & SQL Server 2025

1 Share

At a glance — Native regular expression (regex) functions in T-SQL now accept varchar(max) and nvarchar(max) inputs of up to 2 MB across all seven regex functions, including the two table-valued functions (REGEXP_MATCHES and REGEXP_SPLIT_TO_TABLE). This capability ships in SQL Server 2025 CU5 and is already available in Azure SQL Database, SQL Database in Fabric and Azure SQL Managed Instance configured with the Always-up-to-date update policy. It will reach Managed Instances on the SQL Server 2025 update policy as part of the CU5 rollout. You no longer need to split log files, HTML documents, or large JSON payloads into 8,000-byte chunks just to run a pattern match.

1. Introduction

Regular expressions have long been a cornerstone of modern data processing — used for validation, parsing, transformation, and extracting structured insights from unstructured text. With SQL Server 2025 and Azure SQL, regex is now a first-class T-SQL capability, removing the historical need to rely on SQLCLR functions or application-tier processing.

While the initial release made native regex broadly available, large-object (LOB) inputs were not yet supported on every function. CU5 closes that gap.

Under the hood, T-SQL regex implements POSIX Extended Regular Expression (ERE) semantics, augmented by a curated set of Perl-style features, and is powered by the RE2 engine. RE2 is a linear-time, non-backtracking implementation, which means it is not susceptible to catastrophic backtracking (a class of denial-of-service issue commonly known as ReDoS). That guarantee becomes far more important when the input is a 1.8 MB log blob than when it is an 8,000-byte string.

Release timeline

Milestone What shipped
Ignite 2025 — General Availability Regex went GA in SQL Server 2025 and Azure SQL. LOB inputs were initially supported only on REGEXP_LIKE, REGEXP_COUNT, and REGEXP_INSTR. LOB support on REGEXP_REPLACE and REGEXP_SUBSTR was deferred, and the two table-valued functions (TVFs) accepted only non-LOB string types.
Azure SQL (post-GA service updates) LOB inputs enabled across all seven functions.
SQL Server 2025 CU5 LOB inputs up to 2 MB enabled on all seven functions in the SQL Server.

What’s new in CU5

  • varchar(max) and nvarchar(max) inputs are accepted on every regex function.
  • The input string is capped at 2 MB per function call. The pattern is still capped at 8,000 bytes, which is far larger than any maintainable regular expression should ever need.
  • Behavior is consistent between Azure SQL and SQL Server, so code you write today is fully portable.

Note — The 2 MB limit applies to the input passed to a single function call, not to the column or row. A single value in a varchar(max) column can still store up to 2 GB; the constraint is that no single regex evaluation can consume more than 2 MB of that value.

Prerequisites

  • SQL Server 2025 CU5 or later, or Azure SQL Database, or SQL Database in Fabric or Azure SQL Managed Instance configured with the SQL Server 2025 / Always-up-to-date update policy.
  • The two table-valued functions (REGEXP_MATCHES and REGEXP_SPLIT_TO_TABLE) require database compatibility level 170, unless the database-scoped configuration ALLOW_BUILTIN_TVF_IN_ALL_COMPAT_LEVELS (preview) is enabled.

Note — On Azure SQL Managed Instance (Always-up-to-date), this capability is rolling out region by region. It is already live in regions where the rollout has completed and will light up in the remaining regions as the deployment finishes. Instances on the SQL Server 2025 update policy will receive it as part of the CU5 rollout — coming soon.

Verify compatibility level (170 required for the TVFs) –

SELECT name, compatibility_level
FROM sys.databases
WHERE name = DB_NAME();

-- If necessary:
-- ALTER DATABASE [<your-database>] SET COMPATIBILITY_LEVEL = 170;

2. Working with LOB Data

This section demonstrates the CU5 capabilities against a realistic LOB data. We build a LogEntries table whose RawPayload column holds multi-KB to multi-MB chunks of web server and application output, plus an HtmlPages table for HTML cleansing examples.

2.1 Create the sample schema and data

IF OBJECT_ID('dbo.LogEntries', 'U') IS NOT NULL DROP TABLE dbo.LogEntries;
IF OBJECT_ID('dbo.HtmlPages',  'U') IS NOT NULL DROP TABLE dbo.HtmlPages;

CREATE TABLE dbo.LogEntries
(
    LogId       BIGINT IDENTITY(1,1) PRIMARY KEY,
    Source      SYSNAME       NOT NULL,
    IngestedAt  DATETIME2(3)  NOT NULL DEFAULT SYSUTCDATETIME(),
    RawPayload  VARCHAR(MAX)  NOT NULL   -- LOB column
);

CREATE TABLE dbo.HtmlPages
(
    PageId      INT IDENTITY(1,1) PRIMARY KEY,
    Url         NVARCHAR(2048) NOT NULL,
    Body        NVARCHAR(MAX)  NOT NULL  -- LOB column (Unicode)
);

Now generate realistically large rows. The REPLICATE(CAST(... AS varchar(max)), n) pattern is required because REPLICATE returns NULL when the result would exceed 8,000 bytes unless its first argument is a max type.

-- Synthetic web access-log payload (~252 KB in row 1, plus a separate ~586 KB row).
DECLARE @logLine VARCHAR(500) =
    '127.0.0.1 - alice [21/May/2026:10:15:32 +0000] "GET /api/orders/42 HTTP/1.1" 200 1532 ' +
    'user-agent="Mozilla/5.0" ip=10.0.0.7 email=alice@contoso.com card=4111-1111-1111-1234' + CHAR(10);

DECLARE @bigLog VARCHAR(MAX) =
    REPLICATE(CAST(@logLine AS VARCHAR(MAX)), 1500)                -- ~252 KB
    + '127.0.0.1 - mallory [21/May/2026:10:16:01 +0000] "POST /login HTTP/1.1" 500 0 ' +
      'ip=203.0.113.99 ssn=123-45-6789' + CHAR(10);

INSERT INTO dbo.LogEntries (Source, RawPayload) VALUES
    ('web-01', @bigLog),                                            -- ~252 KB
    ('web-02', REPLICATE(CAST('OK ' AS VARCHAR(MAX)), 200000));     -- ~586 KB

-- Synthetic HTML page (~775 KB / ~396,000 characters).
DECLARE @htmlChunk NVARCHAR(MAX) =
    N'<div class="row"><p>Hello <b>world</b>! Contact <a href="mailto:bob@contoso.com">bob</a>.</p></div>';

INSERT INTO dbo.HtmlPages (Url, Body) VALUES
    (N'https://contoso.example/page-1',
     N'<html><head><title>Big Page</title></head><body>'
     + REPLICATE(@htmlChunk, 4000)
     + N'</body></html>');

-- Confirm payload sizes in bytes.
SELECT LogId, Source, DATALENGTH(RawPayload) AS PayloadBytes FROM dbo.LogEntries;
SELECT PageId, DATALENGTH(Body) AS BodyBytes, LEN(Body) AS BodyChars FROM dbo.HtmlPages;

Results:

LogId Source PayloadBytes
1 web-01 258,110
2 web-02 600,000

 

PageId BodyBytes BodyChars
1 792,124 396,062

Before CU5, feeding any of these payloads into REGEXP_REPLACE, REGEXP_SUBSTR, REGEXP_MATCHES, or REGEXP_SPLIT_TO_TABLE would have failed with a type-mismatch error or required a LEFT(RawPayload, 8000)-style truncation. The same queries now run end-to-end.

2.2 REGEXP_LIKE — Filter rows by LOB content

-- Find logs that contain at least one HTTP 5xx response.
SELECT LogId, Source, DATALENGTH(RawPayload) AS PayloadBytes
FROM   dbo.LogEntries
WHERE  REGEXP_LIKE(RawPayload, '"[A-Z]+\s[^"]+\sHTTP/1\.[01]"\s5[0-9]{2}\s');

REGEXP_LIKE is a Boolean predicate: it evaluates to true when the pattern matches anywhere in the input and false otherwise. Because it returns a Boolean rather than a bit, use it directly in WHERE, CASE WHEN, IIF, or CHECK constraint contexts — do not compare it with = 1 or = 0 (the parser rejects that syntax).

NoteREGEXP_LIKE itself requires database compatibility level 170. The other scalar regex functions (REGEXP_COUNT, REGEXP_INSTR, REGEXP_REPLACE, REGEXP_SUBSTR) are available at all compatibility levels.

Results:

LogId Source PayloadBytes
1 web-01 258,110

2.3 REGEXP_COUNT — Counting at scale

-- Per-row tally of GET requests, POST requests, and 5xx responses
-- across the entire LOB payload.
SELECT LogId,
       Source,
       REGEXP_COUNT(RawPayload, '"GET\s')        AS Gets,
       REGEXP_COUNT(RawPayload, '"POST\s')       AS Posts,
       REGEXP_COUNT(RawPayload, '\s5[0-9]{2}\s') AS ServerErrors
FROM   dbo.LogEntries;

Results:

LogId Source Gets Posts ServerErrors
1 web-01 1,500 1 1
2 web-02 0 0 0

2.4 REGEXP_INSTR — Locate the first error

-- 1-based character position (or 0 if no match) of the FIRST 5xx response in each payload.
SELECT LogId,
       Source,
       REGEXP_INSTR(RawPayload, '\s5[0-9]{2}\s', 1, 1, 0) AS FirstErrorPos
FROM   dbo.LogEntries;

Parameter recap: REGEXP_INSTR(string, pattern, start, occurrence, return_option [, flags [, group ]]). A return_option of 0 returns the starting position of the match; 1 returns the position immediately after the last character of the match.

Results:

LogId Source FirstErrorPos
1 web-01 258,072
2 web-02 0

2.5 REGEXP_REPLACE — Redact sensitive data in place

PII redaction over LOB payloads was one of the most-requested CU5 scenarios. Before CU5, it required a custom chunked-replace routine; it is now a single expression.

-- Redact credit-card-shaped tokens, U.S. SSN-shaped tokens, and email addresses
-- across the entire payload.
SELECT LogId,
       REGEXP_REPLACE(
           REGEXP_REPLACE(
               REGEXP_REPLACE(
                   RawPayload,
                   '\b[0-9]{4}[- ]?[0-9]{4}[- ]?[0-9]{4}[- ]?[0-9]{4}\b',
                   '****-****-****-****'),
               '\b[0-9]{3}-[0-9]{2}-[0-9]{4}\b',
               '***-**-****'),
           '\b[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}\b',
           '[redacted-email]'
       ) AS RedactedPayload
FROM   dbo.LogEntries;

Or strip every HTML tag from an nvarchar(max) page in a single call:

SELECT PageId,
       LEN(Body)                                     AS OriginalLen,
       LEN(REGEXP_REPLACE(Body, N'<[^>]+>', N''))    AS TextOnlyLen
FROM   dbo.HtmlPages;

Results — the ~775 KB HTML document collapses from 396,062 to 100,008 characters of plain text in a single call:

PageId OriginalLen TextOnlyLen
1 396,062 100,008

 

2.6 REGEXP_SUBSTR — Extract a single value

-- Pull the first IPv4 address out of each log payload.
SELECT LogId,
       REGEXP_SUBSTR(RawPayload,
                     '\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b',
                     1,    -- start position
                     1,    -- occurrence
                     'c',  -- flags: case-sensitive
                     0     -- group: 0 returns the whole match
                    ) AS FirstIp
FROM   dbo.LogEntries;

To return the contents of a specific capture group instead of the entire match, pass its 1-based group number as the final argument.

Results:

LogId FirstIp
1 127.0.0.1
2 NULL

2.7 REGEXP_MATCHES — Every match, set-based

This is where the combination of TVF and LOB delivers the largest productivity gain: extract every structured value from a megabyte of unstructured text in a single set-based query, with no client round-trips.

REGEXP_MATCHES returns one row per match with these columns:

Column Type Description
match_id bigint Sequence number of the match (1-based).
start_position int 1-based start index of the match.
end_position int 1-based end index of the match.
match_value same type as string_expression The entire matched substring.
substring_matches json JSON array describing each capture group, with the shape [{"value":"…","start":N,"length":N}, …].
-- Every email address in every log payload, alongside its row of origin.
SELECT  l.LogId,
        m.match_id,
        m.match_value AS EmailFound
FROM    dbo.LogEntries AS l
CROSS APPLY REGEXP_MATCHES(
        l.RawPayload,
        '\b[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}\b'
) AS m
ORDER BY l.LogId, m.match_id;

Capture groups are even more useful — you can project the parts of every log line as columns by reading from the substring_matches JSON document:

-- Parse Common-Log-Format-ish entries into ip, user, status, and bytes columns.
-- The pattern has four capture groups, accessed below as $[0] through $[3].
SELECT  l.LogId,
        m.match_id,
        JSON_VALUE(m.substring_matches, '$[0].value') AS Ip,
        JSON_VALUE(m.substring_matches, '$[1].value') AS UserName,
        JSON_VALUE(m.substring_matches, '$[2].value') AS Status,
        JSON_VALUE(m.substring_matches, '$[3].value') AS Bytes
FROM    dbo.LogEntries AS l
CROSS APPLY REGEXP_MATCHES(
        l.RawPayload,
        '^([0-9.]+)\s-\s(\S+)\s\[[^\]]+\]\s"[^"]+"\s([0-9]{3})\s([0-9]+)',
        'm'    -- multi-line: ^ and $ anchor to each line, not just the whole input
) AS m
ORDER BY l.LogId, m.match_id;

Important — Without the 'm' flag, the ^ anchor matches only at the start of the entire 250 KB input, so you would receive exactly one match for the first line. The multi-line flag is what unlocks per-line extraction.

Results (first two parsed rows):

LogId match_id Ip UserName Status Bytes
1 1 127.0.0.1 alice 200 1532
1 2 127.0.0.1 alice 200 1532

2.8 REGEXP_SPLIT_TO_TABLE — Shred a LOB into rows

-- Project the entire log payload as one row per non-empty line.
SELECT  l.LogId,
        s.ordinal AS [LineNo],
        s.value   AS LineText
FROM    dbo.LogEntries AS l
CROSS APPLY REGEXP_SPLIT_TO_TABLE(l.RawPayload, '\r?\n') AS s
WHERE   l.LogId = 1
  AND   s.value <> ''
ORDER BY s.ordinal;

You now have a tabular projection of a multi-megabyte text blob without leaving the engine. You can feed it into a CTE, aggregate it, join it to dimension tables, or materialize it into a staging table — all set-based.

Results (first three rows):

LogId ordinal LineText (first 80 chars)
1 1 127.0.0.1 - alice [21/May/2026:10:15:32 +0000] "GET /api/orders/42 HTTP/1.1" 200
1 2 127.0.0.1 - alice [21/May/2026:10:15:32 +0000] "GET /api/orders/42 HTTP/1.1" 200
1 3 127.0.0.1 - alice [21/May/2026:10:15:32 +0000] "GET /api/orders/42 HTTP/1.1" 200

Tip — composing LOB regex pipelinesCROSS APPLY(and OUTER APPLY when you need to preserve rows that produce no matches) is the primary composition primitive. You can stack REGEXP_SPLIT_TO_TABLE (lines) feeding REGEXP_MATCHES (fields per line) feeding ordinary aggregates, all within a single query plan.

2.9 The 2 MB ceiling — strategies for larger inputs

The 2 MB limit applies to the input string of a single regex call. If the value passed to a regex function exceeds 2 MB, the call raises an error (error number 19311, severity 16) rather than silently truncating. That is the intended behavior — silent truncation would hide correctness bugs.

In practice, 2 MB is a generous ceiling: a single log file or HTML document of that size is already unusual, and most real-world LOB data sit comfortably below it. When individual values do exceed the limit, the most reliable approach is to split them into smaller logical units before they land in the column you want to query — for example, by writing one log line, one document section, or one record per row at ingestion time. Because every regex function (including the two TVFs) shares the same 2 MB ceiling, sharding at query time is not generally feasible; doing it at the load path keeps every regex call well under the limit and avoids per-query workarounds.

Bytes vs. characters — The 2 MB limit is measured in bytes, not characters, and the byte count is based on the UTF-8 encoding of the input regardless of the column’s declared type. ASCII characters take 1 byte each, so plain ASCII text can run to roughly two million characters; non-ASCII characters take 2–4 bytes in UTF-8, so fewer characters fit. Keep in mind that DATALENGTH() reports storage size in the column’s own encoding, which may differ from the UTF-8 byte count used by the limit, and LEN() (which counts characters) is best avoided as a sizing check here.

To measure the UTF-8 byte length that the limit actually checks, cast the value to varchar(max) under a UTF-8 collation and take its DATALENGTH:
SELECT DATALENGTH(
           CONVERT(varchar(max),
                   Body COLLATE Latin1_General_100_CI_AS_SC_UTF8)
       ) AS Utf8Bytes
FROM   dbo.HtmlPages;

Anything above 2 * 1024 * 1024 (2,097,152) bytes will be rejected by a regex call on that value.

Have a scenario that genuinely needs more than 2 MB? If your workload requires regex evaluation on individual values larger than the current 2 MB ceiling, we would like to hear about it. Please share the details — data shape, payload size, pattern, and business need — on the Azure SQL feedback portal. Customer feedback directly informs how we prioritize future limit changes.

2.10 Cleanup

DROP TABLE IF EXISTS dbo.LogEntries;
DROP TABLE IF EXISTS dbo.HtmlPages;

3. Summary

What changed in CU5

  • Before CU5 — LOB inputs were accepted on REGEXP_LIKE, REGEXP_COUNT, and REGEXP_INSTR. The remaining functions — REGEXP_REPLACE, REGEXP_SUBSTR, and the two TVFs (REGEXP_MATCHES, REGEXP_SPLIT_TO_TABLE) — required non-LOB string inputs, which often meant truncating with LEFT(..., 8000) or chunking in the application tier.
  • After CU5 (and already in Azure SQL) — All seven functions accept varchar(max) and nvarchar(max) inputs of up to 2 MB. The pattern remains capped at 8,000 bytes.

Quick reference

Function Returns LOB input (CU5) Common use case
REGEXP_LIKE Boolean (predicate) Yes Filter rows in WHERE / CASE / CHECK predicates
REGEXP_COUNT int Yes Count occurrences of a pattern
REGEXP_INSTR int Yes Position of the nth match
REGEXP_REPLACE string Yes Redact, cleanse, or normalize text
REGEXP_SUBSTR string Yes Extract a single value
REGEXP_MATCHES (TVF) (match_id, start_position, end_position, match_value, substring_matches) Yes Extract every match plus capture groups (via JSON), set-based
REGEXP_SPLIT_TO_TABLE (TVF) (value, ordinal) Yes Split a LOB into rows by a regex delimiter

Further reading

Closing thought. Native regex was already a significant quality-of-life improvement when it became generally available. CU5 completes the picture: every function, every input size up to 2 MB, every shape — scalar or table-valued. The next time you are tempted to export a column out of the database in order to grep it, try one of the seven regex functions first.

Happy matching. 🧠

The post Regex support for LOB types in T-SQL—available in Azure SQL & SQL Server 2025 appeared first on Azure SQL Dev Corner.

Read the whole story
alvinashcraft
26 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

The Book of Redgate: Profits

1 Share

Redgate is a for-profit company. We look to make money by building and selling tools that help you. If we do a good job, we make money. If we don’t, you shouldn’t buy our tools.

I found this value to be very interesting:

2026-04_0228

The next page has this statement:

Focusing purely on the numbers is a sure way to kill Red Gate’s culture. We believe that if we focus on the game – building awesome products that people want to buy, and then persuading them to buy them – then success will follow.

Profits matter. Certainly all of us want to be paid (and get a bonus of some sort). With the changes in Redgate’s board this year, this is a piece of culture that I believe in and advocate to keep as an item of focus.

We watch profits, but we don’t optimize for profit, we aim to optimize in building better and better products that meet the need of our customers and prove their value from an ROI standpoint. Especially in this era of subscription software.

Our goal is what’s in the quote: build awesome products.

I have a copy of the Book of Redgate from 2010. This was a book we produced internally about the company after 10 years in existence. At that time, I’d been there for about 3 years, and it was interesting to learn a some things about the company. This series of posts looks back at the Book of Redgate 15 years later.

The post The Book of Redgate: Profits appeared first on SQLServerCentral.

Read the whole story
alvinashcraft
35 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

AI, Disposable Apps, and the Sunday Evenings We Are Losing

1 Share

It was a Sunday evening. Around 8:15 PM. The kind of evening where the whole house smells of cardamom and warmth, and you trick yourself into believing that time has stopped moving. Here is the story of AI, Disposable Apps, and the Sunday Evenings We Are Losing.

My wife had a novel open in her lap. I was on the sofa, half-asleep, letting the weight of a long work week melt into the cushions.

And our teenage daughter was sitting right next to us. Right there on the same sofa, in the same warm room, breathing the same cardamom air. We could have started talking about anything at all. About her day at school. About the book my wife was reading. About nothing in particular, the way families do when the evening is slow and there is nowhere else to be.

But we didn’t. And she didn’t either.

She was on her phone. Her thumbs moved so fast. Her face carried a deep frown. I watched her for a few seconds, this girl I used to carry on my shoulders, this girl who once cried if I left the room for two minutes. And I thought: when did she stop looking at me?

Then she looked up.

Not at her phone. At me. Directly at me. And her eyes were not angry. They were confused. They were tired. They were the eyes of a sixteen-year-old girl who is growing up in a world that does not make sense to her.

She looked up and said:

“Dad, why does every app feel temporary? I download something, I use it once, and I delete it. Nothing feels like it belongs to me anymore. Nothing stays.”

AI, Disposable Apps, and the Sunday Evenings We Are Losing aiaps1-800x600

I opened my mouth. But nothing came out.

Because she was not asking about apps. Not really. She was asking about her life. She was asking why the world she is growing up in feels like sand running through her fingers. She was asking why nothing holds still long enough for her to love it.

And I did not have an answer. Because I feel it too.

She wasn’t asking about apps. She was asking why nothing stays.

The Age of the Paper Cup

Let me describe the world we are building.

You want to organize a small dinner with old friends. You do not download an app. You do not sign up for anything. You just tell an AI: “Make me a quick tool where five people can vote on pizza toppings and split the bill.” Five seconds later, the tool exists. You share a link. Everyone votes. You eat. You laugh. You hug your friends goodbye. And then, quietly, the tool vanishes from the server. Gone. Like it was never there. No one saves it. No one remembers its name. No one even notices it disappeared.

It is brilliant. It is efficient. It is the future.

And it is a paper cup. You drink from it once. You crush it in your hand. You throw it away without a second thought.

AI, Disposable Apps, and the Sunday Evenings We Are Losing aiaps2-800x600

Now, you might be thinking: what is wrong with that? Paper cups are useful. Disposable apps are convenient. Why should I care?

Here is why.

Because every paper cup you throw away teaches your hands something. It teaches them that things are not worth holding onto. That if something is imperfect, if the design is a little off, if it does not match your mood in this exact second, you can just get a new one. No cost. No effort. No guilt. No grief.

Every paper cup teaches your hands that nothing is worth holding onto.

And that lesson does not stay inside your phone. It follows you home. It sits down at your dinner table. It crawls into your marriage, your friendships, your relationship with your children. It rewrites the way you love.

And you do not even notice. Not until it is too late.

The Muscle We Forgot to Exercise

I want to tell you something personal. Something I am not proud of.

Last month, I was sitting at my desk debugging a complex SQL Server query. It was a hard problem. The kind that does not give you the answer in five minutes. The kind that requires you to sit with the discomfort, stare at the screen, and think slowly.

And I caught myself reaching for my phone. Not because I needed to check anything. But because my brain could not tolerate the discomfort of not knowing the answer immediately. My own mind was trying to escape the difficulty. It wanted the fast thing. The easy thing. The paper cup.

That scared me.

I remembered the early days of computing in India. The heavy CRT monitors. The screaming sound of dial-up internet. The way a single webpage could take a full minute to load, and you just sat there, hands folded, watching a progress bar crawl across the screen like a tired animal. And you were fine with it. You did not rage. You did not swipe. You waited. You breathed. You let the slowness wash over you.

That waiting was not a waste of time. It was a workout. Every slow query, every stubborn bug, every hour of confused reading was training something inside us. I call it the patience muscle. And like any muscle, it grew stronger every time we used it.

AI, Disposable Apps, and the Sunday Evenings We Are Losing aiaps3-800x600

But we stopped using it. And now it is dying.

We called it slowness. It was strength, and we let it go soft.

With AI disposable apps, there is no friction. If a button is in the wrong place, you do not learn to work with it. You command the AI to rebuild the whole thing. Instantly. You have become a tiny god of your own digital kingdom, demanding that reality reshape itself around your every preference, your every mood, your every passing whim.

And here is the part that should frighten every parent, every spouse, every human being who loves someone:

Your brain does not know the difference between how you treat your technology and how you treat your people.

When you spend ten hours a day commanding machines to obey you without resistance, your brain quietly recalibrates. Friction becomes intolerable. Waiting becomes unbearable. Imperfection becomes unforgivable. And then you close your laptop and sit down across from your wife, your husband, your child. Real, messy, beautiful, imperfect human beings who cannot be rewritten with a prompt. And you find yourself getting irritated. Not because they did anything wrong. But because they are not as fast, as smooth, as instantly perfect as the digital world you just left.

Think about the last time you felt impatient with someone you love. Not over something big. Over something small. A story that went on too long. A question that could have been Googled. A pause in conversation that felt uncomfortable.

Now ask yourself: was that impatience always there? Or did you learn it?

The Beautiful, Quiet Loneliness

But the loss of patience is not even the part that keeps me awake. The part that keeps me awake is the loneliness.

In the older world of technology, we shared a common landscape. We all used the same clunky operating systems. We all wrestled with the same confusing software. We all cursed at the same blue screens. And because we shared these small frustrations, we shared something much larger: a sense of belonging. You could walk up to a coworker and say, “Did you see that crash?” and they would nod and groan and laugh. And in that tiny, forgettable moment, neither of you was alone.

Disposable apps are destroying that shared world. Quietly. Invisibly. Without anyone voting for it.

When every piece of software is custom-generated by an AI that knows exactly how you think, exactly what you like, exactly what makes you comfortable, you are no longer part of a shared digital community. You are living inside a private universe. A universe of one.

A universe of one. Perfectly comfortable. Perfectly alone.

AI, Disposable Apps, and the Sunday Evenings We Are Losing aiaps4-800x600

Look at that image carefully. That is a girl inside a bubble. Everything inside is beautiful. Every app is tuned to her. Every notification is personalized. Every screen knows her name.

And right outside the bubble, three feet away, sit two people who love her more than any algorithm ever could. Two people who would give anything to hear her laugh. Two people whose tea is going cold because they are waiting for her to look up.

She does not look up. The bubble is too perfect. The bubble is too comfortable. The bubble asks nothing of her.

This is the loneliness I am afraid of. Not the dramatic kind you see in movies. Not the kind where someone is stranded on an island. The quiet kind. The kind where you are surrounded by people who love you, and you do not even notice them. The kind where your whole family is in one room, and everyone is in a different universe.

Real people are not customizable. Your spouse will have bad days when they are short-tempered and unreasonable. Your children will say hurtful things they do not mean. Your friends will cancel plans and forget to call back. None of these people can be debugged. None of them will update their personality based on your feedback. Loving them requires you to sit with imperfection, with frustration, with the slow and sometimes painful process of understanding another human heart.

No one you truly love can be rewritten with a prompt.

And if we spend our days inside bubbles that demand nothing of us, we will slowly lose the ability to do the one thing that makes life worth living: to love someone who is difficult to love, and to stay.

The Question That Will Not Leave Me

My daughter is sixteen. In two years, she will leave for college. Maybe in a different city. Maybe across the country. She will build her own life, with her own routines, her own Sunday evenings, her own cups of tea with people I may never meet.

The Sunday evenings we have left. The ones with the three of us on this sofa, in this room, with the smell of cardamom in the air. Those evenings are numbered. I can count them. And the number is so much smaller than I thought it would be.

So when I watch her disappear into her phone, into a world that is custom-built to hold her attention forever, I feel something I do not have a word for. It is not anger. It is not frustration. It is something older and heavier than both of those things.

It is grief for a moment that has not ended yet.

She is right there. Three feet away. I can hear her breathing. And I am losing her to a paper cup.

But here is the thing that makes my chest tight when I really think about it: she is not the only one disappearing. I am too. Every time I check my email during dinner. Every time I scroll through my phone while she is talking. Every time I choose the screen over the human being sitting next to me. I am teaching her, with my own hands, that people are interruptible. That presence is optional. That love can wait.

What if she learns that lesson? What if she carries it into her marriage, her friendships, her own family one day? What if the reason she cannot put her phone down is because I never put mine down first?

Maybe she cannot look up because she learned it from a father who never did.

That is the question that will not leave me.

A Point to Ponder

I am not writing this to lecture anyone. I have no right to. I am as guilty as anyone. Maybe more.

But I am writing this because I believe we are standing at a crossroads that most of us do not even see. On one side is a world of perfect convenience, where every tool is disposable, every experience is customized, and every moment of friction is eliminated before you even feel it. On the other side is something messier. Slower. Harder. And infinitely more beautiful.

Here are a few things I have started doing. Not because I have figured anything out. But because I am afraid of what will happen if I don’t:

  • I write with a real pen. When I make a mistake, I cannot undo it. I cross it out and keep going. The smudge stays on the page. And somehow, that imperfection makes the words feel more honest than anything I have ever typed.
  • I let myself be bored. When the chai is brewing, I stand in the kitchen and listen to the water. I do not reach for my phone. I just stand there. Doing nothing. Being no one. And those ninety seconds of silence are more nourishing than anything on my screen.
  • I stay in the hard conversations. When a talk with my wife or my daughter gets tense, when every instinct tells me to glance at my phone and escape, I stay. I sit with the discomfort. I let the silence stretch. I remind myself that love is not about being comfortable. Love is about being present when it is hard.
  • I build things with my hands. A recipe I know by heart. A plant that needs watering every morning. Three clumsy chords on an old guitar. Things that resist my impatience. Things that teach me, again and again, that the most beautiful things in life are the ones that refuse to be rushed.

AI, Disposable Apps, and the Sunday Evenings We Are Losing aiaps5-800x600

Technology will keep getting faster. Apps will become more disposable. AI will keep getting better at giving us exactly what we want, in exactly the moment we want it, with zero friction and zero resistance.

But I do not want to become disposable. And I do not want my daughter to grow up believing that the people in her life can be swiped away as easily as the apps on her phone.

So here is what I am going to do tonight. And I am asking you, from the bottom of my heart, to consider doing the same.

Close this screen. Put the phone face-down on the table. Walk into the room where the people you love are sitting. Look at them. Not at a screen. At them. Their faces. Their eyes. The way they hold their cup of tea. The way they breathe when they do not know you are watching.

And start a conversation. A real one. A slow one. An imperfect, stumbling, beautiful conversation about nothing in particular. The kind of conversation that has no purpose and no destination. The kind that cannot be optimized or prompted or generated by any machine.

Because some things in this world are not disposable. Your marriage is not a paper cup. Your friendships are not a one-time-use app. The evenings with your family, the ones you think will go on forever, are not infinite. They are running out. Right now. While you are reading this.

And if someone you love is in the next room right now, I am begging you: put this down. Go sit with them. This blog post will be here when you come back.

They might not.

AI, Disposable Apps, and the Sunday Evenings We Are Losing aiaps6-800x600

Well, that’s it for today! Let’s keep building connections that last.

Reference: Pinal Dave (https://blog.sqlauthority.com/), X

First appeared on AI, Disposable Apps, and the Sunday Evenings We Are Losing

Read the whole story
alvinashcraft
42 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

20 Fun Ways To Find Plot Ideas For Your Story

1 Share

Struggling to come up with a story idea? Discover 20 fun and creative ways to find plot ideas worth turning into your next novel.

Do you struggle to find ideas for plots?

For some of us, the problem with writing a book is finding a great idea for a plot. Use these fun suggestions to help you find a story idea that makes you want to write a novel or a short story.

20 Fun Ways To Find Plot Ideas For Your Story

1. Turn Your Favourite Song Into A Book

  1. List your five favourite songs.
  2. Download the lyrics. Use a site like A-ZLyrics to find the lyrics.
  3. Which one would make a great story?

Use it as a starting point for a novel – or as the basis for a novel. Change names and places to avoid being sued for copyright infringement.

2. What If?

Look at reality and turn it on its head. A ‘What If?’ scenario envisions a reality with a critical difference to our own.

  1. Find a news site. Look at the headlines. Write a list of five ‘What if?’ scenarios based on the headlines.
  2. Look at pop stars, politicians, neighbours, and colleagues. What is the worst thing that could happen to them?
  3. Look at trends. Choose three that interest you. Write a ‘What If?’ scenario for each one. Could you turn one of them into a novel?

3. Outrageous Titles

Keep a list of the most out there titles you can think of. When you have 10, choose one that would make an interesting story.

Examples of weird titles that became books:

  1. Half Asleep in Frog Pajamas by Tom Robbins
  2. The Particular Sadness of Lemon Cake by Aimee Bender
  3. Do Androids Dream of Electric Sheep? by Philip K. Dick
  4. So Long, and Thanks for All the Fish by Douglas Adams

Use this: Book Title Generator to get some ideas.

4. The List

Write a list about your childhood with our acrostic ‘I remember ABC’ poem method. (You can make it rhyme – or not.) After you’ve completed it, see if you can come up with a synopsis for a story. Create a character who is the opposite sex to you and who lives in another city. Name them, and let them drive the story.

Example:

I remember:

Asking for my mother
Being an outsider
Catching fish with my father
Ditching my little sister

etc.

5. Issues

They don’t change. Examples include: abortion, environment, corruption, crime, government incompetence, and the death sentence.

You need to find a character who would live for this issue and one who would die for it. Set them against each other and write the book.

20 Fun Ways To Find Ideas For A Plot

6. Opening lines

Write random, weird, odd opening lines. Keep them. Look at them when you’re serious about writing a book. Or use this opening line generator.

Examples:

  1. People trust me with their husbands; they shouldn’t.
  2. Dear reader, I wish I could tell you that it ends well for you.
  3. I always wanted to be just like my stepmother.

Choose one.

  1. Which plot would it suit?
  2. Which genre would it suit?
  3. Name the protagonist and the antagonist.
  4. Write the ending.

7. Steal plots

Plagiarism is the key to originality. Somebody famous once said: ‘Good artists copy; great artists steal.’ We’re not sure who it was, but the statement is true.

There is nothing new under the sun. Great artists and writers look at those who have gone before, take their ideas, rework them, and stamp their own style and authority on it. If they are good enough, their versions seem new.

8. Flip a genre

  1. Take a Western and set it in outer space.
  2. Take literary fiction and make your heroine a detective.
  3. Take a romance and set it in Narnia.

Read: 6 Things To Consider Before You Cross Your Genres

9. Research

If you’re interested in something, research it. The findings will suggest stories. Philippa Gregory finds new stories in all the ones she is busy writing. Her research leads to more leads.

Use trusted sources like print and digital encyclopedias, such as Encyclopaedia Britannica and InfoPlease.com. Use newspaper archives.

10. Use Obsessions

What are people obsessed about? It could be serious or trivial.

They could be obsessed about:

  1. Getting even.
  2. Their jobs.
  3. Money.
  4. Leaving town.
  5. Climate change.
  6. Appearances.
  7. Their looks.
  8. Animals.
  9. The odd noise from the house next door.

It’s always a good idea to create a plot around an obsessed character. It’s easier to motivate them and they are not distracted from their story goals. A good example of this is The Old Man and the Sea by Ernest Hemingway.

Myths - Plot Ideas

11. Use Myths

A myth is a traditional story about supernatural beings, ancestors, or heroes. We use myths to explain nature, or to show where a society’s customs, religions, and ideals come from. Myths exist in every culture on Earth.

Click here to find 20 Myths To Use As Writing Prompts

You can rewrite a new myth or write a new one and turn it into a novel or short story.

12. What I Really Want To Do

What do you want to do? What would you do if you had no obligations or restrictions? Write down a list of five things. Could you turn one of them into a plot?

13. Occupations

Keep a list of unusual job titles of use this Random Job Generator. Make a list of the five occupations that fascinate you most. Could you create a story around a character with this job?

Examples:

  1. Mortician
  2. Croupier
  3. Lap dancer
  4. Motel owner
  5. Leaflet distributor
  6. Magician

This would work well for a short story.

14. The Test Of Time

  1. Buy a newspaper every day for one week.
  2. Cut out one article a day.
  3. Look at them a month later.
  4. Are there follow ups to the stories?
  5. Which have stood the test of time?
  6. Are any of them interesting enough to build a story around?

15. Play A Role-Playing Game (RPG)

The Expanse was a result of an online RPG written one post at a time between a group of about five people. They can be in-person or online.

Examples of role playing games are: Dungeons & Dragons, White WolfNumenera

Who knows, you may come up with a science-fiction or fantasy bestseller?

20 Fun Ways To Find Ideas For A Plot

16. Write A Prologue

We don’t think you should use necessarily use prologues in your final novel, but this could be the inspiration for a book. Prologues are easier and shorter to write than a book.

Write an inciting moment strong enough to cause a story, as an action-packed prologue.

17. Buy A Pile Of Comic Books

Comic books contain the largest amount of recycled plots in the world. Buy them on Comixology

18. Use The Top TikTok or Instagram Hashtags/Trends

Go to Instagram or TikTok. Write a premise for a story based on the top trending hashtags. (A premise in fiction is a brief statement that has been revealed in a story, for example: People don’t learn from voting for the same party.)

19. Make A Post On Facebook Or Instagram Asking For Plot Ideas

Ask your friends to make a short list of five plot ideas they would like to read or write about.

20. Use Your Senses

  1. Look around you. Look up. Look down. Zoom in. Notice colours.
  2. Listen. Sounds and music creates memories and feelings.
  3. Touch things. Get a sense of texture and temperature.
  4. Smell everything. Smell is the most powerful sense to take you back to a place or time.
  5. Taste things. Take your time chewing. Become aware of textures.

You will find plenty of material for your novels when you do this. You may even find an idea that is so startling, you can use it as a plot.

An Idea Is Only The Beginning

Remember that ideas are only the beginning. After that you need to develop a plot and identify your four main characters. We suggest you read: The Top 10 Tips for Plotting and Finishing a Book to get you started.

Top Tip: If you want to learn how to write a book, sign up for our online course.

© Amanda Patterson

If you liked this articleyou may enjoy:

  1. All About Pacing: 4 Key Questions Every Writer Should Ask
  2. Past Tense Or Present Tense: Which Works Best For Your Story?
  3. A Guide To The 17 Most Popular Fiction Genres
  4. How To Write A Spy Novel
  5. How To Outline A Short Story – For Beginners
  6. 6 Sub-Plots Every Writer Should Know
  7. How To Write Great Dialogue In Fiction
  8. What Is An Unreliable Narrator? 9 Types Every Writer Should Know
  9. How Writers Use The 4 Main Characters As Literary Devices
  10. Mastering Point Of View In Writing

Top Tip: Sign up for our free daily writing links.

The post 20 Fun Ways To Find Plot Ideas For Your Story appeared first on Writers Write.

Read the whole story
alvinashcraft
57 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Has the Anthropic Settlement Changed Everything?

1 Share
Header image: Black ISBN bar code on a white background (credit: Janaka Dharmasena / Shutterstock.com)

Recent developments in the world of copyright have been making many writers rethink their attitudes toward copyright registration and reversion of publishing rights.

Because many artificial intelligence companies used pirated books to train their large language models, there are now a growing number of copyright infringement class action lawsuits against them. While it is still undetermined whether these companies’ use of the copyrighted material was fair use or not, it has become clear that the use of copyrighted material from pirate libraries is a no-no, especially when the method involves torrenting, which means the companies participated in redistributing the materials.

The first of these lawsuits, Anthropic v. Bartz, just held the final Fairness Hearing on a class action settlement and, although there were some minor factors which delayed Judge Martinez-Olguin’s approval, it looks as if the class action settlement will be approved and 1.5 billion dollars will eventually be paid out to claimants who met the definition of the class.

Needless to say, this will be an unprecedented class action settlement involving copyright. As currently calculated, claimants for each copyrighted work that was pirated by Anthropic will share $3,100. If the work was self-published or the work’s rights had reverted to the author, they will receive the entire amount. It’s safe to say this is the first time the average writer will benefit from their copyright registration in any substantial way.

But not every writer benefited, for a number of reasons; the primary reason was that the book had to have had its copyright registered with the US Copyright Office. The definition of the class for the Anthropic class action was:

  • have been downloaded by Anthropic from LibGen or PiLiMi;
  • have an International Standard Book Number (ISBN) or Amazon Standard Identification Number (ASIN);
  • have been registered with the United States Copyright Office within five years of the work’s first publication; and
  • have been registered before being downloaded by Anthropic, or within three months of the work’s first publication.

Of the estimated seven million works that were pirated by Anthropic, less than 500,000 works were part of the class. As of the May 14 settlement hearing, the number of works claimed was 447,576. That’s about 7% of the pirated works.

The requirement that a work have an ISBN or ASIN is essentially unfair because it only recognizes individual books, but at least it doesn’t discriminate against self-published works. There is nothing in US Copyright Law that distinguishes between “books” and other literary works that may have a registered copyright. The requirement is only to make identifying works and verifying author and publisher easier for the settlement administrators. As I say, though, most books do have one or the other, even if the ASIN is connected to long out of print book being sold used. Presumably some book authors have managed to avoid Amazon entirely, but they must be a small number.

Another similar class action lawsuit, Elsevier Inc. v. Meta Platforms, Inc. was filed on May 5 by a bunch of publishers and Scott Turow as the only named author. It restricts the class even further. The proposed class definition is:

All legal or beneficial owners of registered copyrights, in whole or in part, for any book possessing an International Standard Book Number (ISBN) or journal article possessing a Digital Object Identifier (DOI) or International Standard Serial Number (ISSN), that Meta, without such owner’s authorization, (1) reproduced by downloading during torrenting and/or copying of web scrapes; or (2) distributed during torrenting; or (3) reproduced in connection with the development and/or training of a Llama Model. For purposes of this definition, copyrighted works are limited to those registered with the United States Copyright Office (a) within five years of the work’s publication and before being reproduced or distributed by Meta, or (b) within three months of publication.

The main difference from the Anthropic class is the limitation to only books that have ISBNs. ASINs don’t count, cutting out a large majority of self and indie published works, even if they do have registered copyrights. You can understand, I suppose, why the plaintiff publishers want the class restricted to the books that they published, but it’s even more grossly unfair to ebooks that were published without ISBNs because ISBNs are only important for physical book distribution. It’s hard to justify limiting a class action this way when, for all practical purposes, the fairer Anthropic settlement’s class definition worked (fingers crossed).

So will there now be a rush by indie authors to purchase ISBNs? It makes sense if, for example, you claimed a book without an ISBN in the Anthropic settlement, since there’s a good chance it will turn up again in the Turow class. Like with copyright registration and rights reversion, the effort and outlay start to look worthwhile. Large copyright class actions and settlements change everything.

Postscript. The fundamental problem is that there is no comprehensive registry for published works.

The post Has the Anthropic Settlement Changed Everything? appeared first on Writer Beware.

Read the whole story
alvinashcraft
1 minute ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories