In large database applications, it is common to encounter partial data offline and multiple database consolidation issues. For example, there is now a province-wide application, each city has deployed a separate application server and database server, each month need to aggregate all the city's data for the province's report, which is a very common problem of database consolidation. For example, we do a smartclient application, each client has an application and a database, and a central database to summarize all the client's data. Each smart client can be modified on its own database, and once the smart client connects to the network, the system will apply the data changes in the client database all to the central database, this occasionally connected application also needs the synchronization of the database.
For the applications mentioned earlier, the simplest way to synchronize is to delete the original data and then repopulate the new data, which is fine for small data volumes, but if each city has millions of tens of millions of data, It is obviously not feasible to delete the data in the Save database and populate the data in each city into the provincial database. In this case, you should use the method of tracking data changes to apply the monthly data changes for each city to the provincial database (feeling a bit like a differential backup, just recording the changes). Two scenarios for tracking data changes are available in SQL Server 2008:
- Change data capture
- Change Tracking (Chang Tracking)
Today I am mainly talking about change tracking, and change data capture will be explained later.
Enable change tracking
Change tracking is a new feature of SQL Server 2008, which is not enabled by default. Change tracking can be applied to specific tables in a specific database or even to specific columns. Change tracking does not create objects such as triggers, except that when a user adds, modifies, and deletes a table with change tracking enabled, it automatically generates a version number that records information such as the timestamp of the action, the type of operation, the primary key of the affected data, and so on. The performance impact on data operations is not significant after you enable change tracking. This information is logged to the SQL Server system tables and the system is automatically responsible for cleanup and maintenance.
To use change tracking, you need to enable the database change tracking feature and the table change tracking feature. Database change tracking can be enabled in the Properties window of the database in SSMs:
This sets the change tracking option to true to enable change tracking. The other 3 options are tracking the data automatically clean the switch and cleanup time, this automatic cleanup time must be greater than we want to synchronize the period of data, such as our data is one months synchronization, then this retention period should be greater than 31 days, if the set retention period is too short, Then our tracking data is automatically cleaned up before it can be synchronized.
This is just to enable change tracking for the database, and next is to enable change tracking for the table. Here we create a new table T1 and initialize several of the data:
CREATE TABLE T1
(
C1 INT IDENTITY PRIMARY KEY,
C2 VARCHAR () not NULL,
C3 DATETIME not NULL,
C4 VARCHAR (max)
)
GO
INSERT into T1 VALUES (' test1 ', ' 2009-1-1 ', ' Www.cnblogs.com/studyzy ')
INSERT into T1 VALUES (' test2 ', ' 2009-1-1 ', ' Www.cnblogs.com/studyzy ')
INSERT into T1 VALUES (' test3 ', ' 2009-1-2 ', ' Www.cnblogs.com/studyzy ')
Next, in SSMs, look at the Properties window for the table T1, which enables change tracking for the table in the Properties window:
The second option, "Track updated columns", indicates whether change tracking is refined to a column. For a typical table, we only need to know which rows were changed, and then update the entire row of data to the central database when merging the data, but if the table has large object columns (columns of data types such as text image varchar (max) varbinary (max) XML) , it can be very slow to update the entire row, so we can enable "track the updated columns" to record which columns are updated, so that the columns are updated directly when the data is merged.
Change tracking common functions
One of the most important things in change tracking is the version number, which is incremented from 0, and each change to the table produces a new version number. Use
SELECT change_tracking_min_valid_version (object_id (' dbo.t1 '))
The minimum version number of the T1 table can be obtained, because the change tracking is just created, so this return is 0, if we do a lot of operations, and these operations have exceeded the retention period set in the database change tracking, then the expired version will be automatically cleaned up, the minimum version is not 0 after the cleanup, Instead, the earliest version that is available is retained.
SELECT change_tracking_current_version () can get the latest version of change tracking for the current database. This is because we have not done database operations since we enabled change tracking, so we are returning 0.
Now let's insert a piece of data into the table T1 and then view the current latest version:
INSERT into T1 VALUES (' Test ', ' 2009-1-4 ', ' Www.cnblogs.com/studyzy ')
SELECT change_tracking_current_version ()--return 1
The version number returned is now 1.
Next we modify 2 data and delete 1 data, and then view the version number:
UPDATE T1 SET c3=getdate () WHERE c1<3--Affected 2 data
DELETE from T1 WHERE c2= ' test3 '--Affected 1 data
SELECT change_tracking_current_version ()--Return 3
Here we have a total of 4 data, but the version number of 3 indicates that the version number is not determined by the affected line, the update operation regardless of the impact of a lot of data (of course, this cannot be 0) version number only increased by 1.
Now that the version number is there, the next step is to query for changes in the time T1, using the table-valued Function: changetable (changes [the name of the table to query for change tracking], which version starts the change). Here to query the T1 table from 0 to now all data changes, then the corresponding query statement is:
SELECT *
From changetable (changes dbo.t1,0) as CT
The system returns the result:
Sys_change_version |
Sys_change_creation_version |
Sys_change_operation |
Sys_change_columns |
Sys_change_context |
C1 |
2 |
Null |
U |
0x0000000003000000 |
Null |
1 |
2 |
Null |
U |
0x0000000003000000 |
Null |
2 |
3 |
Null |
D |
Null |
Null |
3 |
1 |
1 |
I |
Null |
Null |
4 |
The data type, meaning, etc. of each column here are explained clearly in the online books, and I'll simply describe the table that is returned here:
In the data change operation with version number 1, a piece of data is inserted, the primary key of the data is inserted c1=4, the 2 data is updated in the operation of version number 2, respectively, the rows of C1=1 and c1=2, and a piece of data from C1=3 is deleted in the operation of version 3.
Synchronizing data based on change tracking
Now that all the changes have been queried, we can then synchronize the data based on the results of the query. For illustrative purposes, I will establish the TESTDB1 database in the same instance and initialize the T1 table to represent the central database. Then the operation of synchronizing the data should be:
--First insert the new data into the central database:
SET Identity_insert TestDB1.dbo.t1 on
INSERT into TestDB1.dbo.t1 (C1,C2,C3,C4)
SELECT t1.*
From changetable (changes dbo.t1,0) as CT
INNER JOIN T1
On CT.C1=T1.C1
WHERE Ct. sys_change_operation= ' I '
--then apply the changed data to the central database:
UPDATE TestDB1.dbo.t1
SET c2=newt1.c2,c3=newt1.c3,c4=newt1.c4
From changetable (changes dbo.t1,0) as CT
INNER JOIN Dbo.t1 as Newt1
On CT.C1=NEWT1.C1
WHERE Ct. Sys_change_operation= ' U ' and T1.C1=NEWT1.C1
--Delete the deleted data from the central database:
DELETE from TestDB1.dbo.t1
WHERE C1 in (
SELECT C1
From changetable (changes dbo.t1,0) as CT
WHERE Ct. sys_change_operation= ' D ')
This allows us to synchronize the database using change tracking. The version number of this synchronization operation is 3, this version number must be written down separately, then the next time synchronization is starting from 3 query.
Update columns with change tracking
The data update operation in the previous synchronization script is:
UPDATE TestDB1.dbo.t1
SET c2=newt1.c2,c3=newt1.c3,c4=newt1.c4
Since C4 is a large object data type, if there are a few 10 trillion or larger data stored in it, and in fact we are not updating the C4 column, then this kind of update method is a waste of time and resources. We have already enabled "track updated columns" to the T1 table, so you can update the data based on the columns that are actually updated.
Use the Change_tracking_is_column_in_mask () function to determine whether a column has changed, return 1 if a change has occurred, or 0 if no change is made. For example, whether the query C2 changes:
SELECT *, Change_tracking_is_column_in_mask (ColumnProperty (object_id (' dbo.t1 '), ' C2 ', ' ColumnId '), Sys_change_ COLUMNS)
From changetable (changes dbo.t1,0) as CT
WHERE Ct. sys_change_operation= ' U '
This returns 0 note that the C2 column is not changed, and the same method can be used to determine if the C3 column has changed.
Now that you can tell which columns have changed, you can update the column's data based on the changed columns, such as the UPDATE statement for C2:
UPDATE TestDB1.dbo.t1
SET C2=NEWT1.C2--Update C2 column
From changetable (changes dbo.t1,0) as CT
INNER JOIN Dbo.t1 as Newt1
On CT.C1=NEWT1.C1
WHERE Ct. Sys_change_operation= ' U ' and T1.C1=NEWT1.C1
and Change_tracking_is_column_in_mask (ColumnProperty (object_id (' dbo.t1 '), ' C2 ', ' ColumnId '), CT. Sys_change_columns) =1--Update only when changes occur
SQL Server 2008 New Features--change tracking