Introduction
SQL Server 2008 introduces CDC (Change Data Capture), which can record:
1. What data lines have changed
2. The history of data row changes, not just the final value.
It implements asynchronous change tracking (like transactional replication) through the job, compared to the Tracking CT, which is implemented synchronously. As a result, it has less impact on performance and does not affect transactions.
A typical application is to extract, transfer, and load data to other data sources, just like the Data warehouse in the diagram.
650) this.width=650; "Width=" 147 "height=" 244 "title=" clip_image001 "style=" border-top-width:0px;border-right-width : 0px;border-bottom-width:0px; "alt=" clip_image001 "src=" http://s3.51cto.com/wyfs02/M00/5A/16/wKioL1T2fBnS_cp_ Aabwu-ftkdc786.jpg "border=" 0 "/>
Implement
Microsoft recommends that the CDC be used in conjunction with snapshot snapshot isolation levels to avoid read-write blocking of reading change data and change data writing.
It is important to note that the snapshot isolation level has additional overhead, especially tempdb (all data changes are versioned to tempdb).
Use mastergocreate database cdctestgoalter database cdctest set allow_snapshot_isolation ongo--enable CDC on database cdct Estuse cdctestgoexec Sys.sp_cdc_enable_dbgo
When CDC is enabled, a schema called CDC and a series of system tables, SPS, and view are added. The official recommendation is not to query the system tables directly but to use the corresponding system SP/FN to obtain CDC data.
650) this.width=650; "Width=" "height=" 197 "title=" clip_image002 "style=" border-top-width:0px;border-right-width : 0px;border-bottom-width:0px; "alt=" clip_image002 "src=" http://s3.51cto.com/wyfs02/M01/5A/16/ Wkiol1t2fbninewqaacoci0qnzg012.jpg "border=" 0 "/>
System objects |
Description |
Recommended objects for use |
Cdc.captured_columns |
Returns a row for each column tracked in the capture instance |
Sys.sp_cdc_get_source_columns |
Cdc.change_tables |
Returns a row for each change table in the database |
Sys.sp_cdc_help_change_data_capture |
Cdc.ddl_history |
Returns one row for each data definition language (DDL) change made for a table with change data capture enabled |
Sys.sp_cdc_get_ddl_history |
Cdc.lsn_time_mapping |
Returns a row for each transaction that has rows in the change table |
sys.fn_cdc_map_lsn_to_time (Transact-SQL) , SYS.FN_CDC_MAP_TIME_TO_LSN (Transact-SQL) |
Cdc.index_column |
Returns a row for each indexed column associated with the change table |
Sys.sp_cdc_help_change_data_capture |
Msdb.dbo.cdc_jobs |
Store change data capture configuration parameters for capturing and purging jobs |
NA |
Cdc.<capture_instance>_ct |
The change table that is created when change data capture is enabled on the source table. The table returns one row for each insert and delete operation performed on the source table, and two rows for each update operation performed on the source table. capture_instance Format =schamename_tablename |
cdc.fn_cdc_get_all_changes_<capture_instance> , Cdc.fn_cdc_get_net_changes_<capture_instance> |
Create a test table and enable CDC for the period. Use sys.sp_cdc_enable_table to enable CDC for a table.
--create a test table for Cdcuse cdctestgocreate table TB (ID int primary key, name varchar (), Weight decimal (10,2)); Goexe CUTE [Email Protected]_schema = n ' dbo ', @source_name = N ' tb ', @role_name = null; GO
Sys.sp_cdc_enable_table also creates a capture and cleanup job for the database if the source table is the first table in the database to enable change data capture, and the database does not have a transactional publication. It sets the IS_TRACKED_BY_CDC column in the Sys.tables catalog view to 1.
The corresponding trace table cdc.dbo_tb_ct contains all the change data for the source table. It contains all of the original columns and 5 new columns, the structure
650) this.width=650; "width=" 226 "height=" 187 "title=" clip_image003 "style=" border-top-width:0px;border-right-width : 0px;border-bottom-width:0px; "alt=" clip_image003 "src=" http://s3.51cto.com/wyfs02/M02/5A/16/ Wkiol1t2fbqioireaacstdmesie160.jpg "border=" 0 "/>
Validation
When you conduct data change operations in the source table, the table cdc.dbo_tb_ct is recorded. Try it:
650) this.width=650; "width=" 470 "height=" 168 "title=" clipboard[22] "style=" border-top-width:0px;border-right-width : 0px;border-bottom-width:0px; "alt=" clipboard[22] "src=" http://s3.51cto.com/wyfs02/M01/5A/16/wKioL1T2fBrQ__ Abaacls_ooznc833.jpg "border=" 0 "/>
Why is there no data? Because the CDC was working on a job to capture change data, my agent hasn't run yet.
When enabled manually, data is available.
650) this.width=650, "width=" 632 "height=", "title=" clip_image005 "style=" border-top-width:0px;border-right-width : 0px;border-bottom-width:0px; "alt=" clip_image005 "src=" http://s3.51cto.com/wyfs02/M00/5A/1A/ Wkiom1t2ewfqptr5aaclovaer1o944.jpg "border=" 0 "/>
Meaning of the result column:
Column Name |
Data type |
Description |
__$start_lsn |
Binary (10) |
Changes the LSN of the commit. Changes committed in the same transaction will share the same commit LSN value. |
__$seqval |
Binary (10) |
Multiple changes within a transaction can occur, and this value is used to sort them. |
__$operation |
Int |
change the type of operation: 1 = delete 2 = insert 3 = Update (the captured column value is the value before the update operation). 4 = Update (the captured column value is the value after the update operation). |
__$update_mask |
varbinary (128) |
Bitmask that corresponds to one bit for each column in the source table that is tracked by the CDC. If __$operation = 1 or 2, this value sets all defined bits to 1. If __$operation = 3 or 4, only those bits that correspond to the changed column are set to 1. |
Now insert the row again, update it, and then delete the Id=1 row. To see the results again:
650) this.width=650; "Width=" 636 "height=" 298 "title=" clip_image006 "style=" border-top-width:0px;border-right-width : 0px;border-bottom-width:0px; "alt=" clip_image006 "src=" http://s3.51cto.com/wyfs02/M01/5A/1A/ Wkiom1t2ewjdp-fiaagwcnsx5me864.jpg "border=" 0 "/>
A brief description of the tracked query results: A total of 5 rows, the first and second rows are inserted data, the third row and the fourth row is the data before and after the update, and the fifth row deletes the data. The type of operation is known by the _$operation value.
Simple Application
The TB tables created in the previous article record each person's name and weight change information. Another database (table Tb_rs), which is the data source for the weight change trend report. It synchronizes data once a day and updates its own data. How to use CDC to achieve this demand?
START_LSN is logged in the CDC if you know the maximum LSN that was synchronized in TB when the Tb_rs last synchronization was completed. The next time you synchronize, you only need to synchronize the change records in the TB table that are larger than this LSN.
The problem is simple: get the maximum LSN of the last synchronization complete TB, get all the change records larger than this LSN, update tb_rs.
The SYS.FN_CDC_MAP_TIME_TO_LSN can map the time to the corresponding LSN, which is the day before.
The cdc.fn_cdc_get_net_changes_<capture_instance> can get all the Net Change records in a day.
Custom synchronization logic and statements are made by the change record.
insert into tbvalues (1, ' Ken ', 70.2), (3, ' Joe '), (4, ' Rose '), update tbset weight= 70where id=3;delete from tb where name= ' Rose ';godeclare @begin_time datetime, @end_time datetime, @begin_lsn binary (Ten), @end_lsn binary ( -) -get the intervalselect @begin_time =getdate ()-1, @end_time =getdate ();--map the time to LSN of the CDC table tbselect @begin_lsn = sys.fn_ CDC_MAP_TIME_TO_LSN (' smallest greater than or equal ', @begin_time), @end_ LSN&NBSP;=&NBSP;SYS.FN_CDC_MAP_TIME_TO_LSN (' largest less than or equal ', @end_time);- -get the net changes within the specified lsnsselect * from CDC.FN_CDC_GET_NET_CHANGES_DBO_TB (@begin_lsn, @end_lsn, ' all ');
650) this.width=650; "Width=" 520 "height=" "title=" clip_image007 "style=" border-top-width:0px;border-right-width : 0px;border-bottom-width:0px; "alt=" clip_image007 "src=" Http://s3.51cto.com/wyfs02/M02/5A/1A/wKiom1T2ewjCy_ Owaadfsmwap9o920.jpg "border=" 0 "/>
There's no record of Rose? Joe's information has been updated, how is a record?
This is because this is the net Change line, which is the meaning of the final result. Added and then deleted, without affecting the final result, so no. Updates a column of data on the same row more than once, returning only the results of the last update.
With this result, we can define the logic of synchronizing the data based on the __$operation and the actual data. Like what:
--generate Sync Statementsselect (case __$operation when 2 then's insert into tb_rs values (' +cast (ID as varchar (2) + ', ' +n Ame+ ', ' +cast (weight as varchar () + ') ' When 4 Then ' update tb_rs set name= ' +name+ ', weight= ' +cast (weight as varchar (10)) + ' Where id= ' ++cast (ID as varchar (2)) END) from CDC.FN_CDC_GET_NET_CHANGES_DBO_TB (@begin_lsn, @end_lsn, ' all ');
For the updated rows, when synchronizing the data, I want to first determine if the dequeue has been changed or changed. The changed columns need to be synchronized, not all columns. Take name as an example:
declare @begin_time datetime, @end_time datetime, @begin_lsn binary (Ten), @ End_lsn binary (; --get the intervalselect @begin_time =getdate ()-1, @end_time = GETDATE ();--map the time to lsn of the cdc table tbselect @begin_lsn = sys.fn_cdc_map_time_to_lsn (' smallest greater than or equal ', @begin_time), @end_lsn = sys.fn_cdc_map_time_to_lsn (' Largest less than or equal ', @end_time);--get the all changes within the specified lsnsselect *, (case sys.fn_cdc_has_column_changed (' Dbo_tb ', ' name ', __$update_mask) when 1 then ' Yes ' when 0 then ' No ' end) as isnameupdated,sys.fn_ Cdc_map_lsn_to_time (__$START_LSN) &NBSP;AS&NBSP;UPDATETIMEFROM&NBSP;CDC.FN_CDC_GET_ALL_CHANGES_DBO_TB (@begin_ lsn, @end_lsn, ' all') where __$operation in (3,4); go
The CDC can record not only DML operations, but also DDL operations. Query cdc.ddl_history.
650) this.width=650, "width=" 727 "height=", "title=" clip_image008 "style=" border-top-width:0px;border-right-width : 0px;border-bottom-width:0px; "alt=" clip_image008 "src=" http://s3.51cto.com/wyfs02/M00/5A/1A/ Wkiom1t2ewnjxznaaaf9utryvtw539.jpg "border=" 0 "/>
But one thing to keep in mind : The new columns can be tracked by the CDC DDL, but the new column's data changes cannot be tracked by the CDC. If you need to track it, disable CDC on the table before you enable it.
CDC Agent Job
When CDC is first enabled in the specified database and there is no transactional replication, two jobs are created for capture and cleanup:
650) this.width=650; "Width=" "height=" 107 "title=" clip_image009 "style=" border-top-width:0px;border-right-width : 0px;border-bottom-width:0px; "alt=" clip_image009 "src=" http://s3.51cto.com/wyfs02/M02/5A/16/ Wkiol1t2fb2gjl-vaabe3ioa29c323.jpg "border=" 0 "/>
The capture job is used to scan the log files and write the change records to the change table. Called Sp_mscdc_capture_job, the scan parameters and scan intervals can be set based on the actual transaction throughput of the current library, resulting in a reasonable balance between performance overhead and tracking requirements.
The cleanup job is to clean up the data in the change table by default for three days.
Therefore, it is very important to set the interval of cleanup rationally.
The related configuration for these two jobs is stored in msdb.dbo.cdc_jobs. The current default configuration
650) this.width=650; "Width=" 804 "height=" "The Title=" clip_image010 "style=" border-top-width:0px;border-right-width : 0px;border-bottom-width:0px; "alt=" clip_image010 "src=" http://s3.51cto.com/wyfs02/M00/5A/16/ Wkiol1t2fb6cg98taadmzfev5n0590.jpg "border=" 0 "/>
Summary
1. CDC is easy to use, easy to configure, and can be used in conjunction with applications such as synchronous extraction.
2. The CDC can meet most of the data audit requirements, but cannot tell you who changed the data.
3. Although the CDC is asynchronous and has a small impact on performance, it increases overhead, especially IO Read and write and capacity. When CDC is turned on, at least one additional data file write and log file write is added to each change.
This article is from "Joe TJ" blog, be sure to keep this source http://joetang.blog.51cto.com/2296191/1617163
Getting Started with SQL Server audit features: CDC (change Data Capture)