Change tracking used to be a nightmare. It involved creating tables to keep track of inserts, updates and deletes, triggers to keep these tables up to date, and a whole lot of logic to perform cleanups on these tables. This usually (read: always) has the result of turning your nicely designed database into something that looks awful. Enter SQL 2008.
SQL Server 2008 shipped with a wonderful feature called SQL Change Tracking which allows applications to keep track and take advantage of change tracking information without having to create and manage tables and triggers. All the hard stuff is done for free, meaning that the developers can focus on application features rather than background tasks. It’s not entirely for free though.
While SQL Change Tracking is a great product, you will probably want to use it in conjunction with a tool such as the Sync Services for ADO.Net which unfortunately only supports SQL CE at the moment. There is a SQL Express client sample, but there’s no official support for it yet. Hence this post. Over the next few blog posts I’ll be documenting the process involved in creating a SQL Express Change Tracking Framework using Sync Services for ADO.Net.
Why use SQL Change Tracking?
Change tracking allows applications to (just about) seamlessly cater for occasionally connected clients or to allow for replication where a full replication solution may not be possible or required. In the instance which prompted this series of posts, I need to synchronise data from various client databases to a central data warehouse. Sounds simple, but the client databases are all running SQL Express which means no SQL Agent meaning no SQL Log Shipping.
Change tracking not only keeps a record of data changes, but it also keeps it all nicely versioned which was always a big problem with custom change tracking solutions. Each time a client queries the change tracking records, the client would specify which version of the data it has would be able to see all the changes since that version.
Back to the current series:
The requirement was for one central data warehouse to house all the data from all the clients. Each client would only ever have a subset of the data based on some or other filter. Data could also be deleted from the client, but the data warehouse should retain all data indefinitely. Some of the tables needed to be bi-directional, some download only, some upload only. The solution also needed to work regardless of whether the client was connected 24/7, once a day, once a week, once a month, etc. Again, all of the requirements would have been met by LOG Shipping or some other form of replication, but the client databases weren’t going to be big enough to warrant a full version of SQL.

We eventually decided on using SQL Change Tracking to keep track of the changes at the data level and to use Sync Services and WCF to communicate the changes and versions between the data warehouse and the client DBs.
We have a working solution in place, but it’s definitely not scalable or adaptable to different environments or changing requirements. Hence the series. I’m going to use this series of articles to document the process I followed to convert our current solution into a decent framework which can be used to synchronise data between different types of data stores (I don’t want to be limited to SQL databases), from different environments with different requirements to a central location.
There were a couple problems we experienced with our initial implementation (besides the lack of proper architecture):
- Currently Sync Services only supports CE, not SQL Express
- Large tables needed to be synced in batches
- Self-Joins on tables doesn’t work
There are others, but these are the biggies which spring to mind at the moment.
Why SQL Express?
Sync Services supports SQL CE which doesn’t always provide a workable solution. Some of the shortcomings of SQL CE include:
- Lack of full security management
- No stored procedures, triggers or views
- No user defined types
There are others, but these are the typical deal-breakers for me. And yes, there are work-arounds for most of them but I don’t think the fact that you can get by without these features is any excuse that you should.
Just for completeness, we did try and run the client apps using SQL CE but the performance was completely abysmal. That and the lack of remote connections meant it wasn’t a workable solution at all.
How does it work?
In order to use change tracking it needs to be enabled at both the database and the table level. Enabling change tracking at the DB level is as easy as:
1: ALTER DATABASE ChangeTrackingDemo SET CHANGE_TRACKING = ON
2: (
3: CHANGE_RETENTION = 31 DAYS,
4: AUTO_CLEANUP = ON
5: )
There are 2 optional parameters:
- CHANGE_RETENTION (Default: 2 days) – specifies how long change tracking data must be stored for. The longer you need to store the information, the more space you’ll use. Keep in mind that if a client tries to synchronise with the server and the client hasn’t synced within the amount of time stored, the client won’t be able to sync. E.g. A client syncs on the 1st of the month and again on the 10th. If the server only stores change tracking data for 5 days the client won’t be able to sync.
- AUTO_CLEANUP (Default: On) – determines whether or not the change tracking data must be cleared automatically after the retention period has passed. You may want to put this off if your clients are having problems with connecting or synching and you need to keep the change tracking data for longer than the initial retention period (I haven’t yet worked out how to clear the change tracking data manually without truncating the table, so please let me know if you know a way).
Now that change tracking is enabled at the DB level we need to enable it at the table level. Enable change tracking on a table like so:
1: ALTER TABLE TableToBeTracked ENABLE CHANGE_TRACKING WITH
2: (
3: TRACK_COLUMNS_UPDATED = OFF
4: )
One optional parameter TRACK_COLUMNS_UPDATED (Default: Off) which records the columns which were updated on the table. This adds additional overhead and isn’t terribly useful unless you have really large rows in your table and want to be sure to only sync the columns that have changed.
Note that change tracking cannot be enabled for a table without a primary key.
So that’s it, you now have a database which supports change tracking and a table which is being tracked.
As soon as you enabled change tracking on the table, an internal table was created to record the changes to the data. This table would be called sys.change_tracking_<object id of table being tracked>. The size of this table depends on how long you’re keeping the change data for as well as whether the column mask is being recorded. To see the current size of the table you can use the sp_spaceused store proc.
How do we use it?
Now that we’ve enable change tracking let’s play around with some data to check the changes.
There are a few functions which we’re going to use frequently when dealing with change tracking:
- CHANGE_TRACKING_CURRENT_VERSION
- CHANGE_TRACKING_MIN_VALID_VERSION
- CHANGETABLE
CHANGE_TRACKING_CURRENT_VERSION is a database wide function which returns the version of the last committed transaction. Every transaction that is committed on a change tracking enabled table will affect the result of this function.
CHANGE_TRACKING_MIN_VALID_VERSION accepts as a parameter the object ID of a table. The result is the minimum change tracking version stored for the table. These 2 functions give us the range of changes stored. If the current version of the database is 100,000 and the min version for a table is 45,000 then all clients which have versions greater than 45,000 of the table in question can sync. If the client’s version is less than 45,000 the client won’t be able to sync.
The CHANGETABLE function can be used to get a list of all the changes made to the table when compared to a version. To do this you would specify that you want the changes to a particular table made since a particular version, like so:
1: SELECT
2: *
3: FROM
4: CHANGETABLE(CHANGES TableToBeTracked, @CurrentVersionNumber) ct
You’ll notice that the function is a table valued function and that it has to be aliased.
Also, prefixing the table name is the keyword CHANGES. This specifies that you want a list of all the changes made to the table. The result
You can also pass in the keyword VERSION there which will change the behaviour of the function to accept different parameters and to return the change tracking info for a specific row in the table. If you use this version of the function you have to specify the primary key columns for the table as well as values for the primary key. E.g.
1: SELECT
2: *
3: FROM
4: CHANGETABLE(VERSION TableToBeTracked, (IDTableToBeTracked), (1)) ct
This returns the change tracking version information about the row with the PK (IDTableToBeTracked) equal to 1. This will allow you to see the version of the row in question. You can use this to determine what effect the changes on the row had on the version of the db. You can use this to check if a client and a server have both made changes to the same row.
Tying it together
Let’s take these functions and try and do something useful. Let’s assume that a client has synced with the server at version 100, has gone away for a while and come back when the server was at version 120. How does the client go about trying to figure out what data has changed?
The easiest way would be to query the CHANGETABLE function and specify 100 as the current version number. This would give us all the operations that have occurred on that table since version 100.
1: SELECT
2: ct.SYS_CHANGE_VERSION,
3: ct.SYS_CHANGE_OPERATION,
4: ct.IDTableToBeTracked
5: FROM
6: CHANGETABLE(CHANGES Table2, 100) ct
Which results in:
We now know all the ID’s which have been changed since version 100.
The SYS_CHANGE_OPERATION column tells us what the operation was (I = Insert, U = Update, D = Delete). Careful though, the delete operations can occur on rows that were created after the previous sync and can therefore be ignored as we’re going to look at the data as it is now. So if we change our query to this:
1: SELECT
2: ct.SYS_CHANGE_VERSION,
3: ct.SYS_CHANGE_OPERATION,
4: ct.IDTableToBeTracked,
5: -- Select columns here
6: Table2.Data
7: FROM
8: CHANGETABLE(CHANGES Table2, @PreviousSyncedVersion) ct
9: LEFT JOIN Table2 ON Table2.IDTableToBeTracked = ct.IDTableToBeTracked
10: WHERE
11: CASE ct.SYS_CHANGE_OPERATION
12: WHEN 'D' THEN COALESCE(ct.SYS_CHANGE_CREATION_VERSION, @PreviousSyncedVersion)
13: ELSE @PreviousSyncedVersion
14: END <= @PreviousSyncedVersion
We get the following results:
This tells us which rows have changed since the last sync and gives us the current values of the table.
These queries should probably be done in a single transaction set at snapshot isolation so that you can be sure that the data you’re working with isn’t changing while you’re processing it.
Round-up
We’ve taken a look at the basics of change tracking from a db perspective, why we’d want to use it as well as how to query the underlying change tracking tables but still have no idea of how to pull the pieces together to make a sync framework.
Next time
Next time we’ll be taking a look at tying the SQL change tracking into a simple app using Sync Services to get the data to synchronise between a client and server database.
Stay Tuned!