Being able to track when data is added, modified or removed from a SQL Server database is desirable for several potential business scenarios. Where things start to get tricky is in identifying a solution that runs efficiently and also only takes effect against data that has actually changed. Whereas a decade ago, you would have to configure a complex solution involving tables, functions and other sorts of wizardry, these days we are truly blessed. This state of affairs is thanks, in no small part, to the capabilities on offer as part of SQL Server change tracking. There’s a good chance if you are a seasoned SQL Server professional that you are aware of this feature. But if you are coming into the world of SQL Server for the first time, then the change tracking feature may tick a few boxes for you. It provides an easy to configure and scalable solution to detect which tables and columns have changed over time, allowing you then to kick off any appropriate business logic. In this week’s post, I’m going to provide an overview of how to set up change tracking functionality within an Azure SQL Database. From there, we’ll take a look at how you can then leverage this as part of an example scenario.
Setting up your Database
Change Tracking must be enabled at the database level; it is not a server-level property that is cascaded down to all databases. As part of this, you will need to consider the following:
- Do you want SQL Server to manage the removal of change tracking history automatically?
- How long do you want any changes tracked within the database?
The answers to these questions will be dictated largely by the total amount of data in your database. Having change tracking enabled to retain data indefinitely could have adverse impacts on your storage costs over time. In most cases, I would suggest that a retention period of between 7 and 14 days, working on the basis that you are querying for changes at least once per day. This length will provide you with an appropriate buffer, should any problems occur. The following script will enable Change Tracking for the CT-Test database, with a 7-day retention period:
--Enable Change Tracking, with auto-cleanup and retention period of 7 days. ALTER DATABASE [CT-Test] SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 7 DAYS, AUTO_CLEANUP = ON); GO
You can then verify that this has taken effect successfully by running the following query and confirming you get a result back:
--Verify that the database has been enabled for Change Tracking SELECT d.name, ctd.is_auto_cleanup_on, ctd.retention_period, ctd.retention_period_units_desc FROM sys.change_tracking_databases AS ctd INNER JOIN sys.databases AS d ON ctd.database_id = d.database_id; GO
If you are working with a SQL Server database project, then you would instead add the following lines into your .sqlproj file, within the node:
It’s also worth noting that you have several different frequency options when defining your retention periods, such as days (as indicated above), hours and minutes.
Configuring your Tables
Enabling change tracking at a database level will not automatically apply this to each of your tables (thankfully!). You must instead enable these individually. For example, let’s assume we have the following simple table:
CREATE TABLE [dbo].[MyTable] ( [ID] INT IDENTITY(1,1) PRIMARY KEY NOT NULL, [Name] NVARCHAR(50) NULL, [Birthday] DATETIME NULL, [FavouriteCake] NVARCHAR(50) );
A very concise ALTER TABLE statement will enable change tracking for this table:
ALTER TABLE [dbo].[MyTable] ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = ON); GO
An important thing to point out here is that your table must have a Primary Key defined for it. If not, then you will likely get the following error when creating your table:
An additional option you have available here is to track which columns have changed as part of an UPDATE operation. While this may introduce additional storage/performance overhead into your solution, it can be useful for particular auditing scenarios.
Querying Changed Data
The CHANGETABLE() table function is a central component when working with change tracking. Through this, you can determine what data changed at a specific version of its history. SQL Server provides two additional functions, to help you investigate this further:
- CHANGE_TRACKING_CURRENT_VERSION(): This will return the database current change tracking version.
- CHANGE_TRACKING_MIN_VALID_VERSION(): This will return the earliest possible change version that is queryable, for the supplied table object. Availability is dictated by the retention and AUTO_CLEANUP settings defined earlier.
Therefore, you can use the following query to check for changes to the MyTable object, based on supplying the current change tracking version:
DECLARE @changeTrackingMinimumVersion INT = CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID('dbo.MyTable')); SELECT * FROM CHANGETABLE(CHANGES [dbo].[MyTable], @changeTrackingMinimumVersion) AS CT;
When using the CHANGETABLE() function, you must ensure that it is aliased properly, as indicated above. Otherwise, you may start getting errors similar to the below:
While this is all well and good, there is no way to persist the current change tracking version for a particular table. This fact is important, as you may have to rely on this value as a reference point as part of a synchronisation task. What’s more, some additional work is required to facilitate this process from start to finish. A potential solution to get around this involves the following steps:
- Setting up a table that records the last Change Tracking version for the table, based on when data was last synchronised.
- Building a stored procedure that will update the change tracking table with the latest version.
- Defining a query that will grab the latest changes and move the data into a staging table (for example purposes, this will exist in the same database)
- Writing a MERGE statement that will combine the changes from the staging table into the new table and insert/update/delete records accordingly.
With a bit of tinkering, we can achieve all of the above by executing the following SQL statements
--First, we setup our change tracking logging table --The constraints are optional, but can help to enforce data integrity --We also initialise the table based on the current change tracking version in the DB CREATE TABLE [dbo].[ChangeTrackingVersion] ( [TableName] VARCHAR(255) NOT NULL, CONSTRAINT CHK_TableName CHECK ([TableName] IN ('MyTable')), CONSTRAINT UC_TableName UNIQUE ([TableName]), [SYS_CHANGE_VERSION] BIGINT NOT NULL ); INSERT INTO [dbo].[ChangeTrackingVersion] VALUES ('MyTable', CHANGE_TRACKING_CURRENT_VERSION()); GO --Next, we create our staging tables and table where data will be merged into --In this case, we want to aggregate cake preferences based on birthday and remove any personally identifiable information --These would typically sit in a seperate database and be moved across via SSIS or Azure Data Factory CREATE TABLE [dbo].[Staging_MyTable] ( [ID] INT NOT NULL, [Birthday] DATETIME NULL, [FavouriteCake] NVARCHAR(50), [SYS_CHANGE_OPERATION] NCHAR(1) NOT NULL, CONSTRAINT [CHK_MyTable] CHECK ([SYS_CHANGE_OPERATION] IN ('U','I','D')) ); CREATE TABLE [dbo].[ReportingMyTable] ( [ID] INT NOT NULL, [Birthday] DATETIME NULL, [FavouriteCake] NVARCHAR(50) ); GO --Finally, we create the stored procedure that will be called as part of the query/merge operation CREATE PROCEDURE [dbo].[uspUpdateChangeTrackingVersion] @CurrentTrackingVersion BIGINT, @TableName varchar(50) AS BEGIN UPDATE [dbo].[ChangeTrackingVersion] SET [SYS_CHANGE_VERSION] = @CurrentTrackingVersion WHERE [TableName] = @TableName END; GO --With all objects created, we can now run the following query to start synchronising data. --First, run a one-off query to get all current records moved across. INSERT INTO dbo.ReportingMyTable SELECT ID, Birthday, FavouriteCake FROM dbo.MyTable; --Verify table results SELECT * FROM dbo.ReportingMyTable; --Then, make some additional data changes UPDATE dbo.MyTable SET [Birthday] = '1989-10-1' WHERE ID = 2; GO INSERT INTO dbo.MyTable VALUES ('Mary', '1991-10-11', 'Banana'), ('Jude', '1978-09-25', 'Pannacotta'); GO DELETE FROM MyTable WHERE ID = 3; GO --Now, we can run the synchronisation scripts. --First, import all data into the staging table DECLARE @lastChangeTrackingVersion BIGINT = (SELECT TOP 1 SYS_CHANGE_VERSION FROM [dbo].[ChangeTrackingVersion]), @currentChangeTrackingVersion BIGINT = (SELECT CHANGE_TRACKING_CURRENT_VERSION()); INSERT INTO [dbo].[Staging_MyTable] SELECT CT.ID, ISNULL(MT.Birthday, '') AS Birthday, ISNULL(MT.FavouriteCake, '') AS FavouriteCake, CT.SYS_CHANGE_OPERATION FROM [dbo].[MyTable] AS MT RIGHT JOIN CHANGETABLE(CHANGES [dbo].[MyTable], @lastChangeTrackingVersion) AS CT ON MT.ID = CT.ID WHERE CT.SYS_CHANGE_VERSION <= @currentChangeTrackingVersion; --Then, run a merge script, with logic in place to handle each potential record operation MERGE [dbo].[ReportingMyTable] AS target USING [dbo].[Staging_MyTable] AS source ON target.[ID] = source.[ID] --If change was an INSERT, add it to the database. WHEN NOT MATCHED BY TARGET AND source.[SYS_CHANGE_OPERATION] = 'I' THEN INSERT ([ID], [Birthday], [FavouriteCake]) VALUES (source.[ID], source.[Birthday], source.[FavouriteCake]) --If change was an UPDATE, update existing record. WHEN MATCHED AND source.[SYS_CHANGE_OPERATION] = 'U' THEN UPDATE SET target.[Birthday] = source.[Birthday], target.[FavouriteCake] = source.[FavouriteCake] --If change was a DELETE, then delete the record in target WHEN MATCHED AND source.[SYS_CHANGE_OPERATION] = 'D' THEN DELETE; GO --Finally, we update the change tracking table to record the fact that we have grabbed the latest changes DECLARE @currentChangeTrackingVersion BIGINT = CHANGE_TRACKING_CURRENT_VERSION(); EXEC [dbo].[uspUpdateChangeTrackingVersion] @currentChangeTrackingVersion, 'MyTable' --Verify the results now - should be 3 results and ID 2 should have a birthday of '1989-10-1' SELECT * FROM ReportingMyTable
With this, you can start to use change tracking to manage your end-to-end synchronisation processes between multiple databases.
Change tracking is an incredibly useful tool to have within your arsenal, particularly in the context of working with SQL Server Integration Services (SSIS) or Azure Data Factory integration tasks. When working with enormous datasets, you can build in logic to only interact with the data you need, reducing the amount of time it takes to complete these tasks. Also, if you are also running these workloads in the cloud, there is a vast cost-reduction potential as well. And, as I hope, this post demonstrated just how easy it is to get started with using change tracking. You can download the entire, end-to-end SQL script file for everything covered in this post from my GitHub page. Let me know if you have any queries in the comments below! :)