Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
146808 stories
·
33 followers

The SQL Server Transaction Log, Part 1: Logging Basics

1 Share

(This post first appeared on SQLperformance.com four years ago as part of a blog series, before that website was mothballed later in 2022 and the series was curtailed. Reposted here with permission, with a few tweaks.)

All through my career as a data professional, both inside Microsoft and as a consultant, I’ve found that one of the most misunderstood parts of SQL Server is the transaction log. Lack of knowledge of how the transaction log works and needs to be managed, or just simple misconceptions, can lead to all kinds of production problems, including:

  • The transaction log growing out-of-control and potentially running out of space
  • Performance issues from repeated shrinking of the transaction log
  • Performance issues from a problem known as VLF fragmentation, which I discussed in this post
  • The inability to recover to a desired point in time using transaction log backups
  • The inability to perform a tail-log backup during disaster recovery (see here for an explanation of tail-log backups)
  • Various issues around failovers and restore performance

With this post I’m starting an occasional series (now here on my SQLskills blog) on the transaction log and how it works and should be managed, and I’ll touch on all the problems above over its course. In this post I’ll explain what logging is and why it’s required.

Basic Terminology around Logging

When I’m talking about any mechanism inside SQL Server, I find there’s a chicken-and-egg problem where I need to use a word or phrase before I’ve explained it. To avoid that problem in this series, I’m going to start by explaining some terminology that needs to be used when discussing logging, and I’ll expand on many of these terms as the series progresses.

Transaction, Commit, and Rollback

A transaction encompasses a change or a set of changes to a database. It has a defined beginning and a defined end. The beginning is when a BEGIN TRANSACTION statement is used, or SQL Server automatically begins a transaction for you. The end can be one of four things:

  • The transaction commits, when a COMMIT TRANSACTION statement is executed
  • The transaction commits, when SQL Server automatically commits the transaction in the case of an autocommit transaction
  • The transaction finishes rolling back, after a ROLLBACK TRANSACTION statement is executed
  • The transaction finishes rolling back, after a problem occurred and SQL Server automatically rolled back the transaction

When a transaction commits, the changes that the transaction made are finalized in the database and are durable in the transaction log on disk. Note that I said ‘in the transaction log’. The actual changes to the data file pages in memory are *not* written to disk when the transaction commits. They don’t need to be made durable in the data files because the changes are already durable in the transaction log. Eventually the data file pages will be written to disk by a checkpoint operation (or possibly because of the lazywriter).

Conversely, when a transaction rolls back, the data changes that the transaction made are no longer present in the database. There will still be some physical changes in the database, as rolling back a transaction means performing more changes, but you can think of a rolled back transaction as not having affected the data in the database.

Checkpoints and rollback operations are topics worthy of their own posts, so I’ll explain them later in the series.

I discuss these three terms in much more depth in the tutorial Introduction to SQL Server Transactions on the SentryOne blog. <insert link to article that’s due to be published>

Logging, Log Records, and the Transaction Log

Logging simply means creating a durable description of a change to a database, almost always in the context of a transaction. When a change is made, the change is described in a log record. A log record usually has enough information in it to allow the change to be replayed in the database or rolled back in the database, if needed.

This log record will initially be in memory and may be written to disk before the transaction commits, but must definitely be written to disk before the transaction can finish committing, otherwise the transaction would not be durable. An exception to this rule is when the delayed durability feature is enabled, which Aaron Bertrand discusses in this post.

Log records are stored in the transaction log (one or more log files on disk) which has a somewhat complex internal architecture, and I’ll discuss that and much more on log records in the next post in the series.

Crash Recovery

A crash is where SQL Server shut down unexpectedly and the various changed databases weren’t able to be shut down correctly (making sure all changed data file pages are written to disk and the database marked as cleanly shut down).

When SQL Server starts up, it checks all the databases to see whether any were not marked as cleanly shut down. If it finds one, that database must go through crash recovery. This ensures the following:

  • For any transaction that was committed before the crash, ensure all the changes in the transaction are reflected in the database (i.e. replay the transaction)
  • For any transaction that was not committed before the crash, ensure that none of the changes in the transaction are reflected in the database (i.e. roll the transaction back)

In other words, crash recovery makes a database transactionally consistent as of the time the crash occurred. Crash recovery is used:

  • When SQL Server starts and finds a database that needs to be recovered
  • During a failover to a secondary copy of a database
  • At the end of a restore sequence involving backups (see here)

Crash recovery is a complex process and requires another few posts in the series before I can explain it in depth.

Why is Logging Required?

The most basic reason for logging is to allow SQL Server to make transactions durable so they can be recovered during crash recovery or rolled back if needed during normal database operations. If there was no logging, a database would be transactionally inconsistent and possibly even structurally corrupt after a crash.

Without logging though, a host of other features in SQL Server would not be possible, including:

  • Data backups that can be recovered consistently
  • Transaction log backups that can be used during a restore operation and to implement log shipping
  • Replication, which relies on being able to harvest transactions from the transaction log
  • Change Data Capture, which uses the transactional replication Log Reader Agent to harvest changes from the transaction log
  • Change Event Streaming, new for SQL Server 2025, that is similar to CDC but mutually-exclusive to it and uses a different log-scan mechanism
  • Database mirroring and availability groups, which rely on sending log records to secondary copies of the database for replaying

SQL Server (and all similar database servers) uses what is called write-ahead logging. This means that the descriptions of changes have to be written to disk before the changes themselves, to guarantee the ability to properly crash recover a database. If a change was written to a data file before the log records describing it, and SQL Server crashed, there would be no way to know what to roll back, and the database would be inconsistent. This ordering is an invariant, no matter what isolation level, type of transaction, or whether the delayed durability feature is used. Log records first, data pages later.

Just the Tip of the Iceberg

As you can see from this introductory post, there’s a huge amount that goes into the transaction log and logging in SQL Server, and all I’ve done so far is define some high-level terminology and explain why logging is required. I hope you’ll join me as I branch out and go deeper as the series progresses!

The post The SQL Server Transaction Log, Part 1: Logging Basics appeared first on Paul S. Randal.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Azure Cosmos DB Data Explorer now supports Dark Mode

1 Share

If you spend time in the Azure Portal’s using Azure Cosmos DB Data Explorer, you know it’s a “lots of screens, lots of tabs, lots of work happening” kind of place. So I’m excited to share a small upgrade that makes a big difference: Dark Mode is now supported in Cosmos DB Data Explorer.

It just works with Azure Portal Dark Mode

If you already have Dark Mode enabled in the Azure Portal, Data Explorer will automatically switch to dark with it. No extra setup, no separate preference buried in a panel somewhere.

If you want to switch on the on demand, you can also use the sun/moon toggle in the Data Explorer header to turn Dark Mode on or off whenever you want.

Full screen Data Explorer note

One thing to be aware of: full screen Data Explorer doesn’t carry over your Azure Portal appearance setting.

So if you’re working in full screen and you switch the Cosmos DB account you’re working on, the appearance may go back to the default. If you prefer Dark Mode there too, you can quickly flip it back using the sun/moon toggle.

Conclusion

Dark Mode isn’t just aesthetic. When you’re debugging a query, scanning items, checking throughput settings, or hopping between containers, reducing glare and visual noise helps you stay in flow. It’s the kind of small improvement that makes Data Explorer feel calmer and easier to work in, especially during long sessions.

If you try it out, We’d love to hear where it feels best for you, query editor, items view, or scale and settings. You can leave us feedback by clicking the smile button in the Data Explorer top menu.

About Azure Cosmos DB

Azure Cosmos DB is a fully managed and serverless NoSQL and vector database for modern app development, including AI applications. With its SLA-backed speed and availability as well as instant dynamic scalability, it is ideal for real-time NoSQL and MongoDB applications that require high performance and distributed computing over massive volumes of NoSQL and vector data.

To stay in the loop on Azure Cosmos DB updates, follow us on XYouTube, and LinkedIn.

The post Azure Cosmos DB Data Explorer now supports Dark Mode appeared first on Azure Cosmos DB Blog.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Introducing the Codex app

1 Share

Introducing the Codex app

OpenAI just released a new macOS app for their Codex coding agent. I've had a few days of preview access - it's a solid app that provides a nice UI over the capabilities of the Codex CLI agent and adds some interesting new features, most notably first-class support for Skills, and Automations for running scheduled tasks.

Screenshot of a macOS desktop application with a dark sidebar and light main content area. Left sidebar shows navigation items "New thread", "Automations", "Skills", and a "Threads" section containing two project folders: "local-codex-scratch" with tasks "Reply to greeting task 2h" and "List Codex.app contents 3h", and "shot-scraper" with tasks "Validate tests on dependency... 3h", "Document shot-scraper CLI usage 18h" (highlighted), and "Run this: uv run pytest -x 1d". Bottom left shows "Personal". Top bar displays "Document shot-scraper CLI usage shot-scraper" with "Open" and "Commit" dropdown buttons. Main content shows a conversation: user message "Build a skill describing how to use the shot-scraper CLI tool" followed by assistant responses: "I'll use the skill-creator skill, scan the repo for existing shot-scraper docs/commands, then draft a concise skill in this workspace and sanity-check it." then "I've confirmed the skill-creator scripts exist. Next I'll inspect the CLI entrypoint and docs, then scaffold a skill under skills/ in this repo." then "I've scaffolded skills/shot-scraper-cli/. I'll now replace the template SKILL.md with a task-focused skill and add a compact CLI reference under references/." then "I'll run the skill packager/validator next to ensure the new skill structure is valid." then "Done — I created and packaged the skill. Key outputs: • Skill definition: SKILL.md • Compact reference: cli-reference.md • Packaged skill: shot-scraper-cli.skill". Bottom shows input field with placeholder "Ask for follow-up changes", "+ Custom Medium" dropdown, and "Local" and "main" branch indicators.

The app is built with Electron and Node.js. Automations track their state in a SQLite database - here's what that looks like if you explore it with uvx datasette ~/.codex/sqlite/codex-dev.db:

Database schema documentation on light gray background showing three tables: "automation_runs" (teal underlined link) with italic columns "thread_id, automation_id, status, read_at, thread_title, source_cwd, inbox_title, inbox_summary, created_at, updated_at, archived_user_message, archived_assistant_message, archived_reason" and "1 row"; "automations" (teal underlined link) with italic columns "id, name, prompt, status, next_run_at, last_run_at, cwds, rrule, created_at, updated_at" and "1 row"; "inbox_items" (teal underlined link) with italic columns "id, title, description, thread_id, read_at, created_at" and "0 rows".

Here’s an interactive copy of that database in Datasette Lite.

The announcement gives us a hint at some usage numbers for Codex overall - the holiday spike is notable:

Since the launch of GPT‑5.2-Codex in mid-December, overall Codex usage has doubled, and in the past month, more than a million developers have used Codex.

Automations are currently restricted in that they can only run when your laptop is powered on. OpenAI promise that cloud-based automations are coming soon, which will resolve this limitation.

They chose Electron so they could target other operating systems in the future, with Windows “coming very soon”. OpenAI’s Alexander Empiricos noted on the Hacker News thread that:

it's taking us some time to get really solid sandboxing working on Windows, where there are fewer OS-level primitives for it.

Like Claude Code, Codex is really a general agent harness disguised as a tool for programmers. OpenAI acknowledge that here:

Codex is built on a simple premise: everything is controlled by code. The better an agent is at reasoning about and producing code, the more capable it becomes across all forms of technical and knowledge work.

Claude Code had to rebrand to Cowork to better cover the general knowledge work case. OpenAI can probably get away with keeping the Codex name for both.

OpenAI have made Codex available to free and Go plans for "a limited time" (update: Sam Altman says two months) during which they are also doubling the rate limits for paying users.

Tags: sandboxing, sqlite, ai, datasette, electron, openai, generative-ai, llms, ai-agents, coding-agents, codex-cli

Read the whole story
alvinashcraft
24 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

What’s new in Git 2.53.0?

1 Share

The Git project recently released Git 2.53.0. Let's look at a few notable highlights from this release, which includes contributions from the Git team at GitLab.

Geometric repacking support with promisor remotes

Newly written objects in a Git repository are often stored as individual loose files. To ensure good performance and optimal use of disk space, these loose objects are regularly compressed into so-called packfiles. The number of packfiles in a repository grows over time as a result of the user’s activities, like writing new commits or fetching from a remote. As the number of packfiles in a repository increases, Git has to do more work to look up individual objects. Therefore, to preserve optimal repository performance, packfiles are periodically repacked via git-repack(1) to consolidate the objects into fewer packfiles. When repacking there are two strategies: “all-into-one” and “geometric”.

The all-into-one strategy is fairly straightforward and the current default. As its name implies, all objects in the repository are packed into a single packfile. From a performance perspective this is great for the repository as Git only has to scan through a single packfile when looking up objects. The main downside of such a repacking strategy is that computing a single packfile for a repository can take a significant amount of time for large repositories.

The geometric strategy helps mitigate this concern by maintaining a geometric progression of packfiles based on their size instead of always repacking into a single packfile. To explain more plainly, when repacking Git maintains a set of packfiles ordered by size where each packfile in the sequence is expected to be at least twice the size of the preceding packfile. If a packfile in the sequence violates this property, packfiles are combined as needed until the progression is restored. This strategy has the advantage of still minimizing the number of packfiles in a repository while also minimizing the amount of work that must be done for most repacking operations.

One problem with the geometric repacking strategy was that it was not compatible with partial clones. Partial clones allow the user to clone only parts of a repository by, for example, skipping all blobs larger than 1 megabyte. This can significantly reduce the size of a repository, and Git knows how to backfill missing objects that it needs to access at a later point in time.

The result is a repository that is missing some objects, and any object that may not be fully connected is stored in a “promisor” packfile. When repacking, this promisor property needs to be retained going forward for packfiles containing a promisor object so it is known whether a missing object is expected and can be backfilled from the promisor remote. With an all-into-one repack, Git knows how to handle promisor objects properly and stores them in a separate promisor packfile. Unfortunately, the geometric repacking strategy did not know to give special treatment to promisor packfiles and instead would merge them with normal packfiles without considering whether they reference promisor objects. Luckily, due to a bug the underlying git-pack-objects(1) dies when using geometric repacking in a partial clone repository. So this means repositories in this configuration were not able to be repacked anyways which isn’t great, but better than repository corruption.

With the release of Git 2.53, geometric repacking now works with partial clone repositories. When performing a geometric repack, promisor packfiles are handled separately in order to preserve the promisor marker and repacked following a separate geometric progression. With this fix, the geometric strategy moves closer towards becoming the default repacking strategy. For more information check out the corresponding mailing list thread.

This project was led by Patrick Steinhardt.

git-fast-import(1) learned to preserve only valid signatures

In our Git 2.52 release article, we covered signature related improvements to git-fast-import(1) and git-fast-export(1). Be sure to check out that post for a more detailed explanation of these commands, how they are used, and the changes being made with regards to signatures.

To quickly recap, git-fast-import(1) provides a backend to efficiently import data into a repository and is used by tools such as git-filter-repo(1) to help rewrite the history of a repository in bulk. In the Git 2.52 release, git-fast-import(1) learned the --signed-commits=<mode> option similar to the same option in git-fast-export(1). With this option, it became possible to unconditionally retain or strip signatures from commits/tags.

In situations where only part of the repository history has been rewritten, any signature for rewritten commits/tags becomes invalid. This means git-fast-import(1) is limited to either stripping all signatures or keeping all signatures even if they have become invalid. But retaining invalid signatures doesn’t make much sense, so rewriting history with git-repo-filter(1) results in all signatures being stripped, even if the underlying commit/tag is not rewritten. This is unfortunate because if the commit/tag is unchanged, its signature is still valid and thus there is no real reason to strip it. What is really needed is a means to preserve signatures for unchanged objects, but strip invalid ones.

With the release of Git 2.53, the git-fast-import(1) --signed-commits=<mode> option has learned a new strip-if-invalid mode which, when specified, only strips signatures from commits that become invalid due to being rewritten. Thus, with this option it becomes possible to preserve some commit signatures when using git-fast-import(1). This is a critical step towards providing the foundation for tools like git-repo-filter(1) to preserve valid signatures and eventually re-sign invalid signatures.

This project was led by Christian Couder.

More data collected in git-repo-structure

In the Git 2.52 release, the “structure” subcommand was introduced to git-repo(1). The intent of this command was to collect information about the repository and eventually become a native replacement for tools such as git-sizer(1). At GitLab, we host some extremely large repositories, and having insight into the general structure of a repository is critical to understand its performance characteristics. In this release, the command now also collects total size information for reachable objects in a repository to help understand the overall size of the repository. In the output below, you can see the command now collects both the total inflated and disk sizes of reachable objects by object type.

$ git repo structure

| Repository structure | Value      |
| -------------------- | ---------- |
| * References         |            |
|   * Count            |   1.78 k   |
|     * Branches       |      5     |
|     * Tags           |   1.03 k   |
|     * Remotes        |    749     |
|     * Others         |      0     |
|                      |            |
| * Reachable objects  |            |
|   * Count            | 421.37 k   |
|     * Commits        |  88.03 k   |
|     * Trees          | 169.95 k   |
|     * Blobs          | 162.40 k   |
|     * Tags           |    994     |
|   * Inflated size    |   7.61 GiB |
|     * Commits        |  60.95 MiB |
|     * Trees          |   2.44 GiB |
|     * Blobs          |   5.11 GiB |
|     * Tags           | 731.73 KiB |
|   * Disk size        | 301.50 MiB |
|     * Commits        |  33.57 MiB |
|     * Trees          |  77.92 MiB |
|     * Blobs          | 189.44 MiB |
|     * Tags           | 578.13 KiB |

The keen-eyed among you may have also noticed that the size values in the table output are also now listed in a more human-friendly manner with units appended. In subsequent releases we hope to further expand this command's output to provide additional data points such as the largest individual objects in the repository.

This project was led by Justin Tobler.

Read more

This article highlighted just a few of the contributions made by GitLab and the wider Git community for this latest release. You can learn about these from the official release announcement of the Git project. Also, check out our previous Git release blog posts to see other past highlights of contributions from GitLab team members.

Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

Inside an AI-Run Company

1 Share

AI agents are moving from demos to real workplaces, but what actually happens when they run a company? In this episode, journalist Evan Ratliff, host of Shell Game, joins Chris to discuss his immersive journalism experiment building a real startup staffed almost entirely by AI agents. They explore how AI agents behave as coworkers, how humans react when interacting with them, and where ethical and workplace boundaries begin to break down.

Featuring:

Links:

Upcoming Events: 





Download audio: https://media.transistor.fm/bc10d952/62cf79c5.mp3
Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete

The Moltbook Uprising, NVIDIA’s OpenAI Pullback, Apple’s Conundrum

1 Share

M.G. Siegler of Spyglass is back for our monthly tech news discussion. M.G. joins us to discuss Moltbook, the new Reddit-style social network where 150,000 AI agents are chatting, upvoting, and even proposing their own private language to keep humans out. Tune in to hear whether this is a preview of the singularity or just elaborate role-play—and why the security vulnerabilities are genuinely concerning. We also cover NVIDIA quietly backing away from its $100 billion OpenAI deal, Apple's record quarter that Wall Street shrugged off, and OpenAI's race to IPO before Anthropic (with Elon potentially beating them both). Hit play for a conversation about where AI is heading and what it means when the bots start talking to each other.


Learn more about your ad choices. Visit megaphone.fm/adchoices





Download audio: https://pdst.fm/e/tracking.swap.fm/track/t7yC0rGPUqahTF4et8YD/pscrb.fm/rss/p/traffic.megaphone.fm/AMPP2622010939.mp3?updated=1770055939
Read the whole story
alvinashcraft
3 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories