Sr. Content Developer at Microsoft, working remotely in PA, TechBash conference organizer, former Microsoft MVP, Husband, Dad and Geek.
155585 stories
·
33 followers

Creating a SQL Stored Procedure to Load a SCD2

1 Share

This is a blog that I am writing for future me and hopefully it’ll help a few of you save some time too! It’s not often that I get to build out a data warehouse from scratch, but when I do, I want to make sure I do it well with best practices in place. Because this is not something I do a lot of, I frequently forget lessons I’ve learned and have to go back and drop tables to recreate them in the best way before it’s too late. One table type that is vital to do right the first time is a Slowly Changing Dimension Type 2 (SCD2 for short).

Image created using ChatGPT, but everything else in this blog post is all my own imperfections & hallucinations.

A SCD2 is a dimension, or lookup, table that soft deletes old records instead of overwriting, modifying, or fully removing records that no longer exist. They can be useful for a number of circumstances, but the primary reason is for time traveling. For example, let’s say that my sales data contains a customer key, but I want to know where a customer lived when they bought the item, not where they live now. In that case, I would need a record in my customer table that contains both their current and past addresses. The example below is for GL accounts, but it can be used with anything.

Skip to the end of this blog for the full script including the table creation.

In the example below, I am tracking any changes to the descriptions of a given account. For example, if GL code 40 changes from INVENTORY to GYM INVENTORY on 6/10/2026 12:00 PM, then any records before that minute in time will return INVENTORY as the GL code description while everything on or after will return GYM INVENTORY.

There are a couple of columns you need to do a proper SCD2 – effective start date, effective end date, hash key, and hash diff. You can have different flavors of names than what I’ve chosen (and some folks don’t leverage a hash key), but the pattern should remain. Effective start and end dates don’t need to include time if you are only loading data on a daily basis. Some folks prefer not to hash keys, but I will warn you that those patterns are not SARGable and may result in very slow inserts if you have a bunch of columns. Instead, hash keys give you an easy way to uniquely identify a unique row that may not have a unique id. Without a hash key, you’ll have to write out long SQL scripts to see if any values in any column match between source and target tables. With hash keys, it’s a simple = or <> to manage.

The downside to using a hash key is that if a column is added, you’ll need to manually rerun all the hashes to include the new column so you don’t end up with a huge change on the day the new column was added. Small price to pay for the daily speed in my opinion, but it could be a valid concern if columns change frequently enough.

Creating the SCD2 Table

One caveat to the data types in the create table, you may notice I’m using DATETIME2(7) instead of just DATETIME. The reason is that I’ve been working in Microsoft Fabric data warehouses for a while and DATETIME is not supported (Data Types in Fabric Data Warehouse – Microsoft Fabric | Microsoft Learn). If you’re using an on-prem SQL Server, feel free to use your preferred datetime data type.

Why the || between each field? The || is a character combo that doesn’t appear in the wild very often and it’s critical to have a clear separation between fields so the hash is truly unique. It’s a small thing that makes a huge difference to make sure you don’t end up modifying records unintentionally.

I’ve also created all the columns to be in alphabetical order for future user sanity with the standard fields at the end (effective start/end and the hashes). There’s also an index on this table for the current records since I’m anticipating most queries will only care about the current account descriptions.

Why two hash columns? The first hash gives us what I think of as Type 1 fields, that is fields that if they change actually denote a new record entirely, not a modification on existing record. The hash diff is built off Type 2 fields, fields that when they change I want the old version soft deleted and the new one set to current.

ONLY DO THIS ONCE. If you drop the table after facts have started flowing in, be aware that it will be very hard to rebuild since you will have deleted the ids that the fact table is trying to join to. If you need to make adjustments to this dimension, put it in a temp table and temporarily suspend the identity insert so you can preserve the original ids of the table.

DROP TABLE IF EXISTS dim.Account;
CREATE TABLE dim.Account (
id INT IDENTITY(1,1) NOT NULL
,[4444] VARCHAR(10) NULL
,[4444_desc] VARCHAR(550) NULL
,aa VARCHAR(10) NULL
,aa_desc VARCHAR(550) NULL
,fund VARCHAR(10) NULL
,gl VARCHAR(10) NULL
,gl_desc VARCHAR(550) NULL
,llll VARCHAR(10) NULL
,llll_desc VARCHAR(550) NULL
,o VARCHAR(10) NULL
,obbb VARCHAR(10) NULL
,obbb_desc VARCHAR(550) NULL
,pp VARCHAR(10) NULL
,ppss VARCHAR(10) NULL
,pprr_desc VARCHAR(550) NULL
,s VARCHAR(10) NULL
,s_desc VARCHAR(550) NULL
,effective_start_datetime DATETIME2(7) NOT NULL
,effective_end_datetime DATETIME2(7) NOT NULL
,is_current BIT NOT NULL
-- identity hash: the 10 business key columns
,hash_key AS CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT(
ISNULL([4444], ''), '||'
,ISNULL(aa, ''), '||'
,ISNULL(fund, ''), '||'
,ISNULL(gl, ''), '||'
,ISNULL(llll, ''), '||'
,ISNULL(o, ''), '||'
,ISNULL(obbb, ''), '||'
,ISNULL(pp, ''), '||'
,ISNULL(ppss, ''), '||'
,ISNULL(s, '')))) PERSISTED
-- change hash: the 7 tracked description columns
,hash_diff AS CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT(
ISNULL([4444_desc], ''), '||'
,ISNULL(aa_desc, ''), '||'
,ISNULL(gl_desc, ''), '||'
,ISNULL(llll_desc, ''), '||'
,ISNULL(obbb_desc, ''), '||'
,ISNULL(pprr_desc, ''), '||'
,ISNULL(s_desc, '')))) PERSISTED
);
-- One current row per identity. Also fails the load loudly if the
-- source ever produces two current versions of the same key.
-- INCLUDE (hash_diff) covers the Step 3 existence check.
CREATE UNIQUE INDEX UX_Account_current
ON dim.Account (hash_key)
INCLUDE (hash_diff)
WHERE is_current = 1;

Create Default “NULL” Row

This is optional, but highly encouraged. Having a NULL row allows your fact table to never have a NULL value in the surrogate key for dim account. Pretty handy for reporting purposes to have a default “Unknown Account” show up instead of an empty record or, heaven forbid, the record get removed automatically when you group on a field from the account dimension. This only needs to be done once and is not included in the stored proc because it never needs to be updated.

SET IDENTITY_INSERT dim.Account ON;
INSERT INTO dim.Account (
id
,[4444]
,[4444_desc]
,aa
,aa_desc
,fund
,gl
,gl_desc
,llll
,llll_desc
,o
,obbb
,obbb_desc
,pp
,ppss
,pprr_desc
,s
,s_desc
,effective_start_datetime
,effective_end_datetime
,is_current
)
VALUES (
-1
,'Unknown' --you can be more descriptive with these unknowns if you want
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'1900-01-01'
,'9999-12-31 23:59:59.9999999'
,1
);
SET IDENTITY_INSERT dim.Account OFF;

Create Loading Stored Procedure

This is where the real work happens. Keep in mind, this stored procedure will need to be called at some point by an orchestrator. Orchestrators can be a data pipeline, a SQL trigger, a SQL agent job, a notebook, etc. Anything that can call EXEC dim.usp_LoadAccount will work. Be sure that this comes BEFORE you create your fact tables. Fact tables will need to know what id to write into the surrogate key for your dimension.

This stored procedure does a few things – creates the source table, expires (aka soft deletes) records changed in the source, insert new and updated records, and expire the records that don’t exist at all in source anymore.

General notes about this stored procedure:

  • The source temporary table does contain an index on it in the example below since that source temp table is called 3 times throughout the procedure.
  • Everything is wrapped in transactions and try catches so nothing fails silently, and clear failure messages will bubble up to our orchestrator.
CREATE OR ALTER PROCEDURE dim.usp_Load_Account
AS
BEGIN
SET NOCOUNT ON;
SET XACT_ABORT ON;
DECLARE @now DATETIME2(7) = SYSDATETIME();
DECLARE @end DATETIME2(7) = DATEADD(MINUTE, -1, @now); -- expired rows end 1 min before successor starts
DECLARE @future DATETIME2(7) = '9999-12-31 23:59:59.9999999';
/*------------------------------------------------------------
Step 1. Stage distinct source rows (outside the transaction to keep the write transaction short).
COLLATE DATABASE_DEFAULT normalizes lakehouse collation,
so the hashes computed here are byte comparable with dim.Account's persisted hashes.
------------------------------------------------------------*/
DROP TABLE IF EXISTS #src;
;WITH src_cast AS (
SELECT DISTINCT
CAST([4444] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS [4444]
,CAST([4444 Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS [4444_desc]
,CAST([AA] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS aa
,CAST([AA Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS aa_desc
,CAST([Fund] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS fund
,CAST([GL] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS gl
,CAST([GL Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS gl_desc
,CAST([LLLL] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS llll
,CAST([LLLL Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS llll_desc
,CAST([O] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS o
,CAST([OBBB] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS obbb
,CAST([OBBB Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS obbb_desc
,CAST([PP] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS pp
,CAST([PPSS] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS ppss
,CAST([PPRR Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS pprr_desc
,CAST([S] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS s
,CAST([S Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS s_desc
FROM LakehouseName.dbo.[RawTable]
)
SELECT
[4444]
,[4444_desc]
,aa
,aa_desc
,fund
,gl
,gl_desc
,llll
,llll_desc
,o
,obbb
,obbb_desc
,pp
,ppss
,pprr_desc
,s
,s_desc
-- expressions MUST match dim.Account's computed columns exactly
,CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT(
ISNULL([4444], ''), '||'
,ISNULL(aa, ''), '||'
,ISNULL(fund, ''), '||'
,ISNULL(gl, ''), '||'
,ISNULL(llll, ''), '||'
,ISNULL(o, ''), '||'
,ISNULL(obbb, ''), '||'
,ISNULL(pp, ''), '||'
,ISNULL(ppss, ''), '||'
,ISNULL(s, '')))) AS hash_key
,CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT(
ISNULL([4444_desc], ''), '||'
,ISNULL(aa_desc, ''), '||'
,ISNULL(gl_desc, ''), '||'
,ISNULL(llll_desc, ''), '||'
,ISNULL(obbb_desc, ''), '||'
,ISNULL(pprr_desc, ''), '||'
,ISNULL(s_desc, '')))) AS hash_diff
INTO #src
FROM src_cast;
-- index the staged rows: read 3 times on hash_key below
CREATE CLUSTERED INDEX CIX_src ON #src (hash_key);
/*------------------------------------------------------------
Steps 2 to 4 mutate the dimension.
Run them atomically so a failure can never leave a key with zero current rows.
------------------------------------------------------------*/
BEGIN TRY
BEGIN TRANSACTION;
/*--------------------------------------------------------
Step 2. Expire current rows whose identity still exists in the source but whose description set changed.
--------------------------------------------------------*/
UPDATE d
SET effective_end_datetime = @end
,is_current = 0
FROM dim.Account d
INNER JOIN #src s
ON s.hash_key = d.hash_key
WHERE d.is_current = 1
AND d.id <> -1
AND s.hash_diff <> d.hash_diff;
/*--------------------------------------------------------
Step 3. Insert brand new identities and new versions of the rows just expired in Step 2.
A current row matching on both hashes is unchanged and skipped.
--------------------------------------------------------*/
INSERT INTO dim.Account (
[4444]
,[4444_desc]
,aa
,aa_desc
,fund
,gl
,gl_desc
,llll
,llll_desc
,o
,obbb
,obbb_desc
,pp
,ppss
,pprr_desc
,s
,s_desc
,effective_start_datetime
,effective_end_datetime
,is_current
)
SELECT
s.[4444]
,s.[4444_desc]
,s.aa
,s.aa_desc
,s.fund
,s.gl
,s.gl_desc
,s.llll
,s.llll_desc
,s.o
,s.obbb
,s.obbb_desc
,s.pp
,s.ppss
,s.pprr_desc
,s.s
,s.s_desc
,@now
,@future
,1
FROM #src s
WHERE NOT EXISTS (
SELECT 1
FROM dim.Account d
WHERE d.is_current = 1
AND d.id <> -1
AND d.hash_key = s.hash_key
AND d.hash_diff = s.hash_diff
);
/*--------------------------------------------------------
Step 4. Expire current rows whose identity has dropped out of the source entirely.
--------------------------------------------------------*/
UPDATE d
SET effective_end_datetime = @end
,is_current = 0
FROM dim.Account d
WHERE d.is_current = 1
AND d.id <> -1
AND NOT EXISTS (
SELECT 1
FROM #src s
WHERE s.hash_key = d.hash_key
);
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
IF @@TRANCOUNT > 0
ROLLBACK TRANSACTION;
THROW;
END CATCH;
DROP TABLE IF EXISTS #src;
END;

Full SQL Script (Including Create Table)

/*================================================================
dim.Account -- Type 2 SCD Setup + Load Procedure
* hash_key : identity columns (defines "same account")
* hash_diff : tracked description columns (drives versioning)
* filtered UNIQUE index guarantees one current row per account
* #src is indexed after load
* load runs inside a single transaction with rollback
* expired rows end one minute before the successor starts
================================================================---*/
/*----------------------------------------------------------------
1. Table
----------------------------------------------------------------*/
DROP TABLE IF EXISTS dim.Account;
CREATE TABLE dim.Account (
id INT IDENTITY(1,1) NOT NULL
,[4444] VARCHAR(10) NULL
,[4444_desc] VARCHAR(550) NULL
,aa VARCHAR(10) NULL
,aa_desc VARCHAR(550) NULL
,fund VARCHAR(10) NULL
,gl VARCHAR(10) NULL
,gl_desc VARCHAR(550) NULL
,llll VARCHAR(10) NULL
,llll_desc VARCHAR(550) NULL
,o VARCHAR(10) NULL
,obbb VARCHAR(10) NULL
,obbb_desc VARCHAR(550) NULL
,pp VARCHAR(10) NULL
,ppss VARCHAR(10) NULL
,pprr_desc VARCHAR(550) NULL
,s VARCHAR(10) NULL
,s_desc VARCHAR(550) NULL
,effective_start_datetime DATETIME2(7) NOT NULL
,effective_end_datetime DATETIME2(7) NOT NULL
,is_current BIT NOT NULL
-- identity hash: the 10 business key columns
,hash_key AS CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT(
ISNULL([4444], ''), '||'
,ISNULL(aa, ''), '||'
,ISNULL(fund, ''), '||'
,ISNULL(gl, ''), '||'
,ISNULL(llll, ''), '||'
,ISNULL(o, ''), '||'
,ISNULL(obbb, ''), '||'
,ISNULL(pp, ''), '||'
,ISNULL(ppss, ''), '||'
,ISNULL(s, '')))) PERSISTED
-- change hash: the 7 tracked description columns
,hash_diff AS CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT(
ISNULL([4444_desc], ''), '||'
,ISNULL(aa_desc, ''), '||'
,ISNULL(gl_desc, ''), '||'
,ISNULL(llll_desc, ''), '||'
,ISNULL(obbb_desc, ''), '||'
,ISNULL(pprr_desc, ''), '||'
,ISNULL(s_desc, '')))) PERSISTED
);
-- One current row per identity. Also fails the load loudly if the
-- source ever produces two current versions of the same key.
-- INCLUDE (hash_diff) covers the Step 3 existence check.
CREATE UNIQUE INDEX UX_Account_current
ON dim.Account (hash_key)
INCLUDE (hash_diff)
WHERE is_current = 1;
/*----------------------------------------------------------------
2. Default unknown row (-1)
----------------------------------------------------------------*/
SET IDENTITY_INSERT dim.Account ON;
INSERT INTO dim.Account (
id
,[4444]
,[4444_desc]
,aa
,aa_desc
,fund
,gl
,gl_desc
,llll
,llll_desc
,o
,obbb
,obbb_desc
,pp
,ppss
,pprr_desc
,s
,s_desc
,effective_start_datetime
,effective_end_datetime
,is_current
)
VALUES (
-1
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'Unknown'
,'1900-01-01'
,'9999-12-31 23:59:59.9999999'
,1
);
SET IDENTITY_INSERT dim.Account OFF;
/*----------------------------------------------------------------
3. Load procedure
----------------------------------------------------------------*/
CREATE OR ALTER PROCEDURE dim.usp_Load_Account
AS
BEGIN
SET NOCOUNT ON;
SET XACT_ABORT ON;
DECLARE @now DATETIME2(7) = SYSDATETIME();
DECLARE @end DATETIME2(7) = DATEADD(MINUTE, -1, @now); -- expired rows end 1 min before successor starts
DECLARE @future DATETIME2(7) = '9999-12-31 23:59:59.9999999';
/*------------------------------------------------------------
Step 1. Stage distinct source rows (outside the transaction to keep the write transaction short).
COLLATE DATABASE_DEFAULT normalizes lakehouse collation
so the hashes computed here are byte comparable with dim.Account's persisted hashes.
------------------------------------------------------------*/
DROP TABLE IF EXISTS #src;
;WITH src_cast AS (
SELECT DISTINCT
CAST([4444] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS [4444]
,CAST([4444 Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS [4444_desc]
,CAST([AA] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS aa
,CAST([AA Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS aa_desc
,CAST([Fund] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS fund
,CAST([GL] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS gl
,CAST([GL Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS gl_desc
,CAST([LLLL] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS llll
,CAST([LLLL Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS llll_desc
,CAST([O] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS o
,CAST([OBBB] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS obbb
,CAST([OBBB Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS obbb_desc
,CAST([PP] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS pp
,CAST([PPSS] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS ppss
,CAST([PPRR Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS pprr_desc
,CAST([S] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS s
,CAST([S Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS s_desc
FROM LakehouseName.dbo.[RawTable]
)
SELECT
[4444]
,[4444_desc]
,aa
,aa_desc
,fund
,gl
,gl_desc
,llll
,llll_desc
,o
,obbb
,obbb_desc
,pp
,ppss
,pprr_desc
,s
,s_desc
-- expressions MUST match dim.Account's computed columns exactly
,CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT(
ISNULL([4444], ''), '||'
,ISNULL(aa, ''), '||'
,ISNULL(fund, ''), '||'
,ISNULL(gl, ''), '||'
,ISNULL(llll, ''), '||'
,ISNULL(o, ''), '||'
,ISNULL(obbb, ''), '||'
,ISNULL(pp, ''), '||'
,ISNULL(ppss, ''), '||'
,ISNULL(s, '')))) AS hash_key
,CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT(
ISNULL([4444_desc], ''), '||'
,ISNULL(aa_desc, ''), '||'
,ISNULL(gl_desc, ''), '||'
,ISNULL(llll_desc, ''), '||'
,ISNULL(obbb_desc, ''), '||'
,ISNULL(pprr_desc, ''), '||'
,ISNULL(s_desc, '')))) AS hash_diff
INTO #src
FROM src_cast;
-- index the staged rows: read 3 times on hash_key below
CREATE CLUSTERED INDEX CIX_src ON #src (hash_key);
/*------------------------------------------------------------
Steps 2 to 4 mutate the dimension.
Run them atomically so a failure can never leave a key with zero current rows.
------------------------------------------------------------*/
BEGIN TRY
BEGIN TRANSACTION;
/*--------------------------------------------------------
Step 2. Expire current rows whose identity still exists in the source but whose description set changed.
--------------------------------------------------------*/
UPDATE d
SET effective_end_datetime = @end
,is_current = 0
FROM dim.Account d
INNER JOIN #src s
ON s.hash_key = d.hash_key
WHERE d.is_current = 1
AND d.id <> -1
AND s.hash_diff <> d.hash_diff;
/*--------------------------------------------------------
Step 3. Insert brand new identities and new versions of the rows just expired in Step 2.
A current row matching on both hashes is unchanged and skipped.
--------------------------------------------------------*/
INSERT INTO dim.Account (
[4444]
,[4444_desc]
,aa
,aa_desc
,fund
,gl
,gl_desc
,llll
,llll_desc
,o
,obbb
,obbb_desc
,pp
,ppss
,pprr_desc
,s
,s_desc
,effective_start_datetime
,effective_end_datetime
,is_current
)
SELECT
s.[4444]
,s.[4444_desc]
,s.aa
,s.aa_desc
,s.fund
,s.gl
,s.gl_desc
,s.llll
,s.llll_desc
,s.o
,s.obbb
,s.obbb_desc
,s.pp
,s.ppss
,s.pprr_desc
,s.s
,s.s_desc
,@now
,@future
,1
FROM #src s
WHERE NOT EXISTS (
SELECT 1
FROM dim.Account d
WHERE d.is_current = 1
AND d.id <> -1
AND d.hash_key = s.hash_key
AND d.hash_diff = s.hash_diff
);
/*--------------------------------------------------------
Step 4. Expire current rows whose identity has dropped out of the source entirely.
--------------------------------------------------------*/
UPDATE d
SET effective_end_datetime = @end
,is_current = 0
FROM dim.Account d
WHERE d.is_current = 1
AND d.id <> -1
AND NOT EXISTS (
SELECT 1
FROM #src s
WHERE s.hash_key = d.hash_key
);
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
IF @@TRANCOUNT > 0
ROLLBACK TRANSACTION;
THROW;
END CATCH;
DROP TABLE IF EXISTS #src;
END;

Additional Information

The post Creating a SQL Stored Procedure to Load a SCD2 appeared first on SQLServerCentral.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

Learn T-SQL With Erik: Indexed View Matching

1 Share

Learn T-SQL With Erik: Indexed View Matching


Chapters

  • 00:00:00 – Introduction
  • 00:02:45 – Creating the Table with a Computed Column
  • 00:06:39 – Index View and Filter Definition
  • 00:08:17 – Non-Parallel Plan Reason
  • 00:10:34 – Scalar UDF Not Parallelizable Warning
  • 00:12:07 – No Expand Hint Effect
  • 00:12:57 – Summary and Next Video

Full Transcript

Erik Darling here with Darling Data, and in today’s video we’re going to continue on the Learn T-SQL voyage that we have started, and I’m going to talk today a little bit about indexed view matching, because SQL Server is, let’s just call it a mature, or an experienced database engine, and is quite capable, at least in Enterprise Edition, Standard Edition, you do not pay the Microsoft Friendship Tax, so you will be taxed performance-wise, but is quite capable of matching base queries to an indexed view where the syntax matches in some way between them. So, usually exactly between them, not in some way, usually pretty close to just about what you would ask for.

But that’s why, you know, it’s important when you’re, if you’re going to go down the indexed view path, and I do, I do want to say that if you’re going to start creating indexed views, you know, you should be very careful with them, you know, usually you want to limit them to just involving one table, and you want to write indexed views that are sort of a general purpose variety. And that’simo Scrum, and the more obvious, like the number of times, you want to stay in one table because that’s very common.

And you aren’t really going to send a lot of pub, or a ton of files, or not a ton of files back into the indexed. You’re going to have to deal with a diverse set of Mmmm. Repetitive Data or Web connect devices and all sorts of things, in terms of the three sections.

But a lot of this being ordered is flip side up a lot more often than previous, and I work with native comic books, of course. And I would like to turn it up a little bit now with a full-on summary of some 할게요 and that kind of, cause Iersh entrepreneurial app. CAR N stick?

store still has quite a bit of hobbling on standard edition and so on standard edition you just might need to go down the index view path. There’s also something I think to be said for having the data already pre-aggregated for you in some cases because look as powerful as columnstore is you know you still have to do the work to get the result if you have that data pre-aggregated you’re skipping a lot of work in a lot of cases but you know anyway I’m going on too long without showing you anything interesting rambling babbling like a brook let’s let’s talk about some stuff and then look at some management studio goodies down in the video description you can hire me for all sorts of helpful things for you mostly well I mean it helps both of us right like you get expert SQL Server help and or training and I get money so that’s that’s that’s that’s that’s that’s that’s that’s that’s that’s that’s that’s that’s that is a really fair deal I think you can also become a supporting member of the channel if you’re like wow Eric thanks for doing all this YouTube and for us you sure sure put in some hard work you can support this channel monetarily again links are all down down down below that’s why the fingers are pointing this way and of course if you’re just I don’t know if you’re you’re too poor for words or I don’t know maybe you just don’t like me that much you can always say well maybe someone else will like them better and you can you can like subscribe and tell a friend who might like me better right because that’s what friends are for even if I mean your friends not where I don’t know about you and me maybe someday what we have to meet and sort some things out I think if you would like absolutely free no strings attached SQL server performance monitoring I have my free open source monitoring tool available on github that’s a link but there’s a link down in the video description below if you want to check that out as well that’ll take you there you know just average ordinary everyday T-SQL collectors brings all the important performance data in you know the stuff that I would care about and look at if I were performance tuning a SQL Server on my own makes them up puts them in all pretty charts and graphs and whatnot and one thing that’s pretty cool is if you want if you optionally would like to use your robot companions to help you analyze summarize or analyze about your SQL Server performance problems you can do that there are a whole bunch of read-only built-in MCP tools that allow you to interact with your performance data and just your performance data for weeding out your problems I think when this gets released I will be in Chicago so it might even be May 7th and 8th this comes up no I think I think it’s probably maybe just the seven uh let’s see um actually maybe not I don’t know whatever uh so uh you will have missed your opportunity for pass on tour uh but you would still be able to get to go to SQL day Poland and see me so that would be cool if you did that I have an advanced T-SQL pre-con there uh I’ll be hopefully wowing my my friends from Poland and the surrounding surrounding climbs or areas or whatever uh with with my T-SQL wit and wisdom um after that uh home for a little bit uh then I will get data Saturday Croatia June 12th and 13th uh I also have an advanced T-SQL pre-con there uh while I’ll be uh wowing my Croatian friends and friends from the surrounding areas uh I hope some Hungarians show up so I can I can talk about being part Hungarian and then saying yeah okay you’re American uh I’m gonna say yeah yeah you’re right but I had a grandma and they’ll say great anyway uh after that uh presumably home for even longer and then it passed data Community Summit in Seattle Washington November 9th through 11. uh where uh undecided events will unfold uh as far as my my speaking goes but for now uh we are still Maying about being real rainy cold May people um bringing out the skull I guess uh ghosts of Springs past haunting us reminding remember remember when this used to be a beautiful time of year not this year damn it anyway um so the the thing that I want to talk about uh with index View matching is uh how like the optimizer even even if it matches your indexed view uh will always expand the view as part of query optimization it just does that um even if very early on it’s like yo index view why it matched to you it still expands that and like looks at stuff and that can have some weird impacts not only on your query plans but also on performance. I want to show you a very simple example of that with some stuff here. So like, you know, like in a lot of videos, I’ve talked about, you know, like, ah, you make a computed column, the filter definition of a filtered index can’t go on it. So you can sort of like work your way around that a little bit with index views. And I want to show you that. But I also want to sort of like prove out a little bit in the full video material. This is, of course, you know, this is I have a terrible salesman. This is, of course, part of the Learn T-SQL with Eric course. So the full like module material, I go way in depth on this with like trace flags and stuff to show you what’s going on. But in this one, I just want to show you a shortened version to hopefully whet your appetite for buying the full version. You know, it’s what you what I made it for so you can buy it, right? So let’s drop a view and a table if they exist.

And what I want to show you in this one is a simple table with something that I would normally find absolutely abhorrent and scream, holler and curse at people about. And that is this table has a computed column in it. And that computed column has a scalar user defined function in it, right?

One of the absolute worst things you can do to a table in SQL Server is this, don’t do this, do not follow this. But sometimes you have very, very big tables, that have these things in them. And sometimes you need to figure out a way to get your queries to not get beat up by them. And you put a scalar UDF in a computed column, or a check constraint, or a default constraint I learned recently from a brave and bold YouTube buddy.

It messes up parallelism for any query that touches that table. So if you have what even even a small table, and you join that small table routinely off to larger tables, where parallelism would be of some unknown benefit, you will not get it, right? Even scalar UDF inlining does not help you with computed column, check constraint or default constraint scalar UDF.

So we don’t we do not like this, we do not enjoy this. But we’ve got this situation here. And I think I created the table, I think instead of just pondering about it, that’s good. So of course, you know, we can’t create a computed column where the filter definition hits that, right? So fine.

But we can create a computed column where that’s the filter definition in our table. Because, you know, we can create a view on our table. And this can make a lot of sense and we can maybe want to try to get around that, you know, we can create in a view on our, on our on our table.

And we could even index that view. But what, you know, further annoying things in our lives is that we cannot create a filtered index on an indexed view, right? So this just extends even further into the things that don’t cooperate with each other in SQL Server.

into the index view, or rather into the view, then create the index. The thing is, you will still find the same problems unless you use a no expand hint.

And this is because SQL Server’s query optimizer, like I said before, really does expand those index view definitions. So let’s start off by looking at a select from the base table, right?

We run this, we actually, we end up using the index view, which, you know, great for us, I guess. I don’t know what that proves. Maybe that, you know, T-SQL’s, or rather SQL Server’s cost-based query optimizer really does do index view matching, but we have this warning, this non-parallel plan reason.

In earlier versions of SQL Server, you’ll just see a generic, could not generate valid parallel plan, but newer versions of SQL Server, we get like a reason, right? T-SQL user-defined functions not parallelizable. Oh boy, right?

Nuts, as they say. Even if we select directly from the view, right? And we say, SQL Server, I bet you’re not gonna do anything stupid now. SQL Server says, oh my God, you’re gonna hate me.

And we get the same thing here, right? So whether or not SQL Server does all that stuff, we still get the bad effect of the scalar UDF. However, if we add a no expand hint to this query, right?

We’re still not getting any results back. Notice, actually, one thing I should do here is I should run all three of these at the same time. Notice in the execution plans for the first two, we have a compute scalar, right?

And that compute scalar, if we’re having a very lucky day, right? Let’s see, is it gonna give it, let’s see. Well, no, it just tells us the column name. It doesn’t give us the full thing.

But the important part here, of course, is that the third query with the no expand hint does not have that compute scalar operator. There is nothing betwixt or between the stream aggregate and the clustered end.

Dot, dot, dot. What are you? I don’t know. But this query, I mean, obviously, there are no rows in the table or in the view. So it doesn’t help here.

But notice the lack of a non-parallel plan reason. It’s always difficult to prove a negative like this. But you’ll notice that we do not have anything in here that says non-parallel plan reason, the T-SQL scalar UDFs, yada, yada, right?

Whereas with these ones, of course, that thing, it does show back up. If we just tinkle back and forth, and then it’s gone again.

So the no expand hint can be quite useful beyond just getting SQL Server to directly match to an indexed view. When you query it, it can also keep SQL Server from doing some of the expansion activities that it is wont to do when you get involved, when you jump in bed with indexed views.

So anyway, that’s about good here. I hope you enjoyed yourselves. I hope you learned something. And I will see you in tomorrow’s video. Oh yeah, I did talk about that there. All right, let’s get rid of that index view.

Actually, no, there is no tomorrow’s video. I will see you in Tuesday’s office hours video. Forget. I’m taking long weekends now. It’s great. I get to rest, relax, do my hair, tidy up my face a little bit.

It’s beautiful. Drink more. It’s a good time. Anyway, goodbye.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.

The post Learn T-SQL With Erik: Indexed View Matching appeared first on Darling Data.

Read the whole story
alvinashcraft
just a second ago
reply
Pennsylvania, USA
Share this story
Delete

I Almost Did Not Publish This Book

1 Share

I Almost Did Not Publish This Book ai-nobodys-in-there-cover-800x1280 I almost did not publish this book.

Not because I doubted the ideas. Because I was not ready for anyone to see how much of it was about me. And, I suspect, about you.

For twenty-five years my job was to make slow things fast. Queries, indexes, servers. I was good at it, and I built a quiet pride on being good at it. I never imagined that a fast thing would be the one to break my heart a little.

I have written more than five thousand articles. Not one of them ever kept me awake at night. This one did, for months.

I Almost Did Not Publish This Book ai-nobodys-in-there-social-800x420

It started with a small grief

One ordinary afternoon, I watched a machine do in four seconds something that used to take me half a day. Something that, if I am honest, I used to look forward to all morning.

I waited to feel amazed. Everyone told me I was supposed to feel amazed.

Instead, something in my chest went very quiet.

It was grief. The small, embarrassing kind you feel watching someone do, without effort, the thing you loved doing slowly. And then I understood what I was actually mourning. It was not the task. It was the part of me that the task used to need.

I did not even have a word for it. So I started writing, just to find one.

Thirty essays later, I found it. The word was caring.

The machine can answer. It cannot care. It can write the sentence. It cannot want to. That part never left you, and it was never going to.

The sentence I was most afraid to write

Somewhere in the middle of the book, I typed one line, and then I had to push my chair back from the desk and just sit there for a while.

This is hard. And without the hard part, nothing works.

It is the thing none of us wants to say out loud, because it sounds ungrateful in a world this fast. But it is true. The struggle was never standing in the way of the learning. The struggle was where the learning was quietly being built, the whole time. And every time we hand the hard part to a machine because it is faster, we save an hour and give away a small piece of who we were becoming.

I was afraid of that sentence because, for one honest moment, I was not sure I still believed it. I do now. Writing this book is what gave it back to me.

Then I gave it to people I love

I sent the early pages to a few friends. Engineers, leaders, teachers. I told them I wanted feedback. The truth was smaller, and more frightening than that. I wanted to know if I was the only one who felt this way.

One of them called me and did not say hello. He was quiet for a long moment. Then, almost in a whisper, he said, “How did you know? This is exactly what I have been carrying and could never put into words.”

Another wrote back one line that I still cannot read in a steady voice. “I did not expect a book about AI to make me cry.”

That was the moment the fear left me. Because it was never only my feeling. It was ours. It is yours too. You simply have not had anyone say it back to you yet.

What is actually inside

This is not a book about prompts. It is not about tools, or tricks, or being ten percent faster. There are a thousand books for that, and most of them will be out of date by spring.

It is thirty short essays, and they ask the questions I could not stop asking in the quiet. What is your expertise worth when the answer arrives in seconds? What happens to the young ones who never get to struggle, the way struggling once shaped us? Why does easier work leave us feeling emptier instead of lighter? When everyone can make something polished, what suddenly becomes rare? And when the machine can fill the page in a blink, who is left to decide whether the page deserved to exist at all?

Each one is short enough to read between two meetings. None of them are disposable. They are about the things that do not expire. Judgment. Taste. Verification. Responsibility. The quiet pride you spent years building around being good at something hard, and are secretly afraid of losing.

That is the whole reason this book exists. Not to teach you another tool. To sit down beside you and remind you why you started.

Nobody’s in there. But we’re still in here.

If you have felt it too

If you have ever looked at a flawless answer on your screen and felt impressed and strangely hollow in the same breath. If your days have grown faster and somehow smaller. If some stubborn part of you misses the struggle you used to complain about. Then please hear this clearly.

You are not behind. You are not too sentimental. You are not the only one.

I did not write this book about you. I wrote it for you. I wrote it because I needed to read it myself, on the nights I forgot.

And if you decide not to buy it at all? No worries, truly. Every one of those thirty essays is free to read right here on my blog. Go and read one tonight. It takes a few minutes. If a single line finds the place you keep quiet, the book will be here when you are ready for the rest.

I put my whole heart into this. It is the truest thing I have ever made. If you read only one page, I hope it is the page you did not know you needed.

You can find AI: Nobody’s in There. But we’re still in here. on Amazon now, as a Kindle ebook, an audiobook, and a paperback.

AI: Nobody’s in There.
But we’re still in here.

Reference: Pinal Dave (https://blog.sqlauthority.com/), X

First appeared on I Almost Did Not Publish This Book

Read the whole story
alvinashcraft
13 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Copilot Autofix for GitHub Advanced Security for Azure DevOps

1 Share

Over the last few years, we’ve encouraged customers to move their repositories from Azure Repos to GitHub, where the newest AI-powered and agentic development experiences land first.

Migrating isn’t equally simple for everyone. A move to GitHub can range from straightforward to a multi-year program, depending on an organization’s size, customizations, compliance requirements, tooling, and industry constraints. While many customers are actively planning or running migrations today, others aren’t ready yet. They continue to rely on Azure Repos for day-to-day development.

For teams still building on Azure Repos, here’s what’s new. Copilot Autofix is available today in limited private preview for GitHub Advanced Security for Azure DevOps. To request enrollment, sign up here. Enablement is processed in waves, and it may take a few weeks before the functionality is available for your organization. We will notify each customer by email once the feature has been enabled for their organization.

This phased rollout allows us to closely monitor usage, collect feedback, and validate the experience before making the feature more broadly available.

Why Autofix

Advanced Security has been good at finding vulnerabilities. CodeQL scans your code, flags the SQL injection or the path traversal, and hands you an alert. Until now, fixing it has been the part left to you. You research the vulnerability, work out a safe change, write the patch, and open a pull request. For most teams, that’s where alerts pile up.

Autofix closes that gap. It uses the same CodeQL engine that finds a vulnerability to generate an AI-suggested fix for it, right in the Azure DevOps alert experience. You review the suggested change, edit it if you need to, and then commit it to a pull request without leaving the alert.

From scanning to remediation, on the same surface

When you open a CodeQL alert in the Advanced Security tab of your repository, you’ll see a new Generate fix button on alerts from supported rules.

copilot autofix generate fix image

Autofix gathers the surrounding code and alert context to return a suggested change automatically as a pull request.

advanced security autofix pull request image

Your usual review and build gates run on the pull request. Once it merges and the next CodeQL scan completes, the alert resolves on its own.

This works well alongside CodeQL default setup. Default setup turns scanning on without any pipeline configuration, and Autofix then turns the resulting alerts into pull requests without a manual rewrite. Together, they shorten the path from “we have a vulnerability” to “we have a fix in review.” A developer can do it in a few minutes.

What’s in the preview

Copilot Autofix at limited private preview covers:

  • All CodeQL-supported languages — C/C++, C#, Go, Java, Kotlin, JavaScript, TypeScript, Python, Ruby, and Swift
  • A curated set of CodeQL queries — the same set GitHub uses on GitHub.com, covering the highest-frequency vulnerability classes: SQL injection, cross-site scripting, path traversal, hardcoded credentials, and more
  • Backlog alerts in the Advanced Security tab for the default branch

It’s included with your GitHub Advanced Security for Azure DevOps license. Fix generation consumes AI credits from your organization’s Azure billing meter.

Billing

Each fix generation consumes tokens. These include input tokens for the code context sent to the model, output tokens for the suggested change, and cached tokens that reuse existing context.

To keep billing simple, we convert those tokens into a standard unit called a GitHub AI credit, where 1 credit equals $0.01 USD. We bill the charges to the Azure subscription linked to your Azure DevOps organization, and they appear as a separate meter in Azure Cost Management.

The cost of each fix varies with the size of the surrounding code context and the complexity of the change. So before you roll Autofix out widely, enable it on one or two repositories first and watch daily usage.

To monitor your daily charges, go to Subscription > Cost Management > Cost analysis in the Azure portal.

Getting started

Copilot Autofix is gated for limited private preview. To request enrollment, sign up here. We’ll be onboarding customers in waves over the next several weeks.

Once your organization is enrolled:

  1. Make sure Code Security and CodeQL code scanning is configured on the repository — either with default setup or by adding CodeQL tasks to your pipeline.
  2. Enable the feature at the repository-level. advanced security autofix enablement
  3. Select a CodeQL alert in Advanced Security.
  4. Select Generate fix on any alert from a supported rule.
  5. Review the suggestion, refine if needed, and merge the pull request.

Full setup and usage guidance lives in the new Fix code scanning alerts with Copilot Autofix (Preview) documentation.

What’s next

Next, we’re working on bringing Autofix to all CodeQL alerts for all branches, then enabling Autofix for all code scanning alerts.

If you want to shape our roadmap, sign up for the preview and tell us what’s working and what isn’t. We’re closing the gap on remediation, and we’d like to build the rest of it with you.

The post Copilot Autofix for GitHub Advanced Security for Azure DevOps appeared first on Azure DevOps Blog.

Read the whole story
alvinashcraft
40 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

Introducing Claude Corps

1 Share
Introducing Claude Corps
Read the whole story
alvinashcraft
55 seconds ago
reply
Pennsylvania, USA
Share this story
Delete

One Copilot to Rule Them All? Microsoft’s Unified AI Platform for Work and Life

1 Share

Key Takeaways:

  • Microsoft is unifying consumer and enterprise Copilot into one platform to reduce fragmentation and create a more seamless AI experience across work and personal use.
  • The strategy may improve product coherence and speed innovation, but it also increases the risk of deeper customer lock-in to Microsoft’s cloud, apps, identity, and AI stack.
  • For IT leaders, the big questions are governance and control: data separation, admin policies, licensing, and how to manage Copilot across both personal and business contexts.

Microsoft went all-in on One Copilot earlier this year. In March 2026, CEO Satya Nadella reorganized the company’s AI efforts, consolidating the consumer and enterprise Copilot teams under one unified leadership. That means Microsoft’s once-separate personal and business Copilot projects, like Copilot for individuals and Microsoft 365 Copilot for enterprises, are now being built as one platform with a single boss, Jacob Andreou, reporting directly to Nadella.

Microsoft’s rationale is that if Copilot is supposed to be the everyday AI assistant across everything you do, it’s better designed as a single, integrated system rather than a scattered collection of AI features.

This consolidation makes pragmatic sense for Microsoft. For one, it reduces fragmentation. Instead of two different AI assistants with separate capabilities and roadmaps, you will get one Copilot spanning Windows, Microsoft 365 apps, Teams, and personal accounts. The unified approach could also accelerate innovation by aligning every Copilot feature on one foundation, with shared AI models and design ethos.

So where’s the catch? In a word: lock-in.

One Copilot everywhere inherently rests on being deeply embedded in Microsoft’s platform (cloud, OS, apps) at every turn. It’s a good strategy for Microsoft, boosting the “platform gravity” that keeps customers orbiting their services.

But if you’re an IT leader trying to maintain flexibility, the unified Copilot approach may heighten your long-term dependency on Microsoft’s stack. The more your users rely on the seamless Copilot spanning their lives, the less room you’ll have to adopt alternate AI solutions or switch providers down the road.

In effect, Microsoft is making the case: stick with us, we’ll make AI easy everywhere. It’s a compelling pitch but be aware of the golden handcuffs.

A cautionary tale of Teams and Skype

Even as Microsoft’s Teams collaboration suite has become ubiquitous in business, it hasn’t matched the consumer reach of its predecessor Skype. Microsoft tried to collapse personal and enterprise communications into one platform, phasing out Skype in 2025 in favor of “Teams (Free) for personal use”. Even baking a “Chat with Teams” button into Windows 11 by default, which it has since removed.

But Teams hasn’t achieved the same consumer ubiquity Skype once had (the latter still boasted some 300 million monthly users as recently as 2019). This mixed track record of uniting enterprise and consumer experiences (with Teams still mainly perceived as a work app) stands as a cautionary tale for Copilot’s unification. It underscores that even sensible platform consolidation doesn’t guarantee broad adoption. Especially if consumers see the product as an enterprise tool rather than an everyday essential.

Copilot data and identity boundaries

There are also practical governance questions. When Copilot lives in both personal and business contexts, data and identity boundaries become paramount. Today, enterprise admins can control and configure Microsoft 365 Copilot or Windows Copilot separately via policies and settings (like toggling Copilot on corporate devices, for example).

Under a unified Copilot, how will Microsoft ensure corporate data stays completely separated from personal Copilot interactions? Today, this is enforced by forcing you to switch between Microsoft work and personal accounts. It provides clear separation but it doesn’t lead to an elegant user experience.

With Copilot integration pervasive, companies may need to update internal guidelines, training employees on personal vs. work usage and adjusting compliance rules for AI-generated content.

Microsoft MAI

Notably, the Copilot mega-merger coincides with another strategic shift: Microsoft’s launch of its first in-house AI models (nicknamed “MAI”) for speech-to-text, voice, and image generation. Rolled out in April 2026, these models are Microsoft’s hedge to reduce dependence on OpenAI’s tech. The move aims to give Microsoft more control over the AI stack powering Copilot for cost efficiency, scale, and customization.

In short, the unified Copilot vision is increasingly backed by Microsoft’s own AI engines, potentially making it an even more fully Microsoft-native platform going forward.

Convenience comes with deeper platform entrenchment

Unifying Copilot under one umbrella is a forward-looking bet from Microsoft. It could usher in seamless, contextual AI assistance, one that follows a user from writing a Word report to planning a family holiday, no disjointed handoffs or app-hopping needed.

For IT decision-makers and practitioners, the upside is a more coherent AI deployment (one platform to manage) and a workforce that can reap productivity gains.

Just go in with eyes open: convenience comes with deeper platform entrenchment. Now is a good time to ask questions about licensing, admin controls, and data segregation. And to fine-tune your governance policies for a world where Copilot is everywhere.

The post One Copilot to Rule Them All? Microsoft’s Unified AI Platform for Work and Life appeared first on Petri IT Knowledgebase.

Read the whole story
alvinashcraft
4 hours ago
reply
Pennsylvania, USA
Share this story
Delete
Next Page of Stories