測試原由
隨著PXC的逐步上線。線上資料庫的同步方式慢慢由之前的STATEMENT模式轉換到了ROW模式。由於同步方式的改變引發了一些同步問題。
測試目的
一定程度上解決ROW模式下主從同步的問題。作為以後PXC叢集down掉,人工修複的操作文檔。
測試環境
masterold02:7301
masterold03:7302
skavetest178:7303
主庫操作
vim my.cnf 加入下一面一句
binlog_format=ROW 資料庫binlog使用ROW模式同步
分別賦予叢庫同步處理的使用者的許可權
grant all on *.* to okooo_rep@'192.168.%.%' identified by 'Bjfcmlc@Mhxzkhl';
flush privileges;
測試開始
測試基礎同步功能
?.讓test178作為從去同步old02的資料
CHANGE MASTER TO MASTER_HOST='192.168.8.72',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',
MASTER_PORT=7301,MASTER_LOG_FILE='logbin.000001',MASTER_LOG_POS=4;
? 查看主從狀態,我們看到很快test178就可以和old02保持一致了。
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.8.72
Master_User: okooo_rep
Master_Port: 7301
Connect_Retry: 60
Master_Log_File: logbin.000006
Read_Master_Log_Pos: 332
Relay_Log_File: relay.000007
Relay_Log_Pos: 475
Relay_Master_Log_File: logbin.000006
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
? 讓test178作為從去同步old03的資料,我們看到很快test178也和old03保持一致了。
stop slave;
CHANGE MASTER TO MASTER_HOST='192.168.8.73',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',MASTER_PORT=7302,MASTER_LOG_FILE='logbin.000001',MASTER_LOG_POS=4;
start slave;
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.8.73
Master_User: okooo_rep
Master_Port: 7302
Connect_Retry: 60
Master_Log_File: logbin.000005
Read_Master_Log_Pos: 332
Relay_Log_File: relay.000006
Relay_Log_Pos: 475
Relay_Master_Log_File: logbin.000005
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
總結:基礎同步測試完成,說明在資料庫新搭建結束的時候資料庫中資料一致的情況下,test178可以正常的和old02和old03中任意主庫同步資料。
寫入測試
? 分別在old02,old03上建立新的資料庫和表
create database row_slave;
CREATE TABLE `row_test` (
`id` int(10) unsigned NOT NULL,
`hostname` varchar(20) NOT NULL default '',
`create_time` datetime NOT NULL default '0000-00-00 00:00:00',
`update_time` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 ;
? old02寫入資料
insert into row_test values(1,'old02','2013-12-11 00:00:00','2013-12-11 00:00:00');
insert into row_test values(2,'old02','2013-12-11 00:00:00','2013-12-11 00:00:00');
insert into row_test values(3,'old03','2013-12-11 01:00:00','2013-12-11 01:00:00');
insert into row_test values(4,'old03','2013-12-11 01:00:00','2013-12-11 01:00:00');
?查看old02,old03,test178 皆可以查出來
mysql> select * from row_test;
+----+----------+---------------------+---------------------+
| id | hostname | create_time | update_time |
+----+----------+---------------------+---------------------+
| 1 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |
| 2 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |
| 3 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |
| 4 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |
+----+----------+---------------------+---------------------+
?old03寫入資料,此時old03主)和test178叢)在同步
insert into row_test values(5,'old03','2013-12-11 02:00:00','2013-12-11 02:00:00');
insert into row_test values(6,'old03','2013-12-11 02:00:00','2013-12-11 02:00:00');
?查看old03,test178 皆可查出。此時test178和 old02資料已經不一致了,叢庫比old02多出2條資料id=5,6。
+----+----------+---------------------+---------------------+
| id | hostname | create_time | update_time |
+----+----------+---------------------+---------------------+
| 1 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |
| 2 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |
| 3 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |
| 4 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |
| 5 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |
| 6 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |
+----+----------+---------------------+---------------------+
?old02寫入資料 此時主從庫還是test178和old03在同步,和old02沒有關係
insert into row_test values(7,'old02','2013-12-11 03:00:00','2013-12-11 03:00:00');
insert into row_test values(8,'old02','2013-12-11 03:00:00','2013-12-11 03:00:00');
?查看 old02的binlog 來找到插入id =7,8的 pos點
cd /home/okooo/apps/tmp_slave01/logs
../bin/mysqlbinlog --no-defaults --base64-output=decode-rows -v -v ./logbin.000007
# at 1399
#131211 11:36:42 server id 1287301 end_log_pos 1472 Query thread_id=5 exec_time=0 error_code=0
SET TIMESTAMP=1386733002/*!*/;
BEGIN
/*!*/;
# at 1472
# at 1529
#131211 11:36:42 server id 1287301 end_log_pos 1529 Table_map: `row_slave`.`row_test` mapped to number 33
#131211 11:36:42 server id 1287301 end_log_pos 1585 Write_rows: table id 33 flags: STMT_END_F
### INSERT INTO row_slave.row_test
### SET
### @1=7 /* INT meta=0 nullable=0 is_null=0 */
### @2='old02' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */
### @3=2013-12-11 03:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
### @4=2013-12-11 03:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
# at 1585
#131211 11:36:42 server id 1287301 end_log_pos 1612 Xid = 40
COMMIT/*!*/;
# at 1612
#131211 11:36:43 server id 1287301 end_log_pos 1685 Query thread_id=5 exec_time=0 error_code=0
SET TIMESTAMP=1386733003/*!*/;
BEGIN
/*!*/;
# at 1685
# at 1742
#131211 11:36:43 server id 1287301 end_log_pos 1742 Table_map: `row_slave`.`row_test` mapped to number 33
#131211 11:36:43 server id 1287301 end_log_pos 1798 Write_rows: table id 33 flags: STMT_END_F
### INSERT INTO row_slave.row_test
### SET
### @1=8 /* INT meta=0 nullable=0 is_null=0 */
### @2='old02' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */
### @3=2013-12-11 03:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
### @4=2013-12-11 03:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
# at 1798
#131211 11:36:43 server id 1287301 end_log_pos 1825 Xid = 41
COMMIT/*!*/;
DELIMITER ;
# End of log file
?改變test178的同步點和old02同步
stop slave;
CHANGE MASTER TO MASTER_HOST='192.168.8.72',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',MASTER_PORT=7301,MASTER_LOG_FILE='logbin.000007',MASTER_LOG_POS=1399;
start slave;
show slave status\G
?發現old02資料改變以後叢庫同步了old02的資料,這時候的test178叢庫) 已經擁有全部資料了。 其中id in1,2,3,4)3庫共有的。 id in5,6 )old03專屬的 id in 7,8) odl03專屬的。
mysql> select * from row_test;
+----+----------+---------------------+---------------------+
| id | hostname | create_time | update_time |
+----+----------+---------------------+---------------------+
| 1 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |
| 2 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |
| 3 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |
| 4 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |
| 5 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |
| 6 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |
| 7 | old02 | 2013-12-11 03:00:00 | 2013-12-11 03:00:00 |
| 8 | old02 | 2013-12-11 03:00:00 | 2013-12-11 03:00:00 |
+----+----------+---------------------+---------------------+
總結:確認叢庫表比主庫表少資料不影響新資料寫入
更新測試
?改變一條old02和test78都存在的資料 此時test178和old02同步資料,主從依然同步
update row_test set update_time =now() ,hostname ='old021' where id=7;
?改變一條old03和test178都有的資料此時test178和old02同步資料,沒有和old03同步,改變old03的資料為下面做準備
update row_test set update_time =now() ,hostname ='old031' where id=5;
? 查看old03的binlog,尋找要同步的POS點
../bin/mysqlbinlog --no-defaults --base64-output=decode-rows -v -v ./logbin.000006
# at 1825
#131211 15:20:16 server id 1807302 end_log_pos 1906 Query thread_id=4 exec_time=0 error_code=0
SET TIMESTAMP=1386746416/*!*/;
SET @@session.time_zone='SYSTEM'/*!*/;
BEGIN
/*!*/;
# at 1906
# at 1963
#131211 15:20:16 server id 1807302 end_log_pos 1963 Table_map: `row_slave`.`row_test` mapped to number 33
#131211 15:20:16 server id 1807302 end_log_pos 2048 Update_rows: table id 33 flags: STMT_END_F
### UPDATE row_slave.row_test
### WHERE
### @1=5 /* INT meta=0 nullable=0 is_null=0 */
### @2='old03' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */
### @3=2013-12-11 02:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
### @4=2013-12-11 02:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
### SET
### @1=5 /* INT meta=0 nullable=0 is_null=0 */
### @2='old031' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */
### @3=2013-12-11 02:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
### @4=2013-12-11 15:20:16 /* DATETIME meta=0 nullable=0 is_null=0 */
# at 2048
#131211 15:20:16 server id 1807302 end_log_pos 2075 Xid = 32
COMMIT/*!*/;
DELIMITER ;
# End of log file
?改變test178的同步點和old03同步
stop slave;
CHANGE MASTER TO MASTER_HOST='192.168.8.73',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',MASTER_PORT=7302,MASTER_LOG_FILE='logbin.000006',MASTER_LOG_POS=1825;
start slave;
show slave status\G
?查看test178資料,發現更新成功 確認修改不同行資料的時候,同時多個主同步資料不會相互牽制。深層理解,主從同步不會校正表資料是否一致和行資料是否一致。之後會繼續驗證這個觀點)
mysql> select * from row_test;
+----+----------+---------------------+---------------------+
| id | hostname | create_time | update_time |
+----+----------+---------------------+---------------------+
| 1 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |
| 2 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |
| 3 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |
| 4 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |
| 5 | old031 | 2013-12-11 02:00:00 | 2013-12-11 15:20:16 |
| 6 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |
| 7 | old021 | 2013-12-11 03:00:00 | 2013-12-11 15:15:34 |
| 8 | old02 | 2013-12-11 03:00:00 | 2013-12-11 03:00:00 |
+----+----------+---------------------+---------------------+
?修改在3個庫上全都有的資料 首先改old03上的 id=1的資料
update row_test set update_time =now() ,hostname ='old032' where id=1;
?主叢庫同步資料以後 test178和old03在同步資料
mysql> select * from row_test where id=1;
+----+----------+---------------------+---------------------+
| id | hostname | create_time | update_time |
+----+----------+---------------------+---------------------+
| 1 | old032 | 2013-12-11 00:00:00 | 2013-12-11 15:49:53 |
+----+----------+---------------------+---------------------+
?修改old02上同樣的資料。
update row_test set update_time =now() ,hostname ='old022' where id=1;
? 查看old02上的binlog
../bin/mysqlbinlog --no-defaults --base64-output=decode-rows -v -v ./logbin.000007
# at 2075
#131211 15:51:15 server id 1287301 end_log_pos 2156 Query thread_id=9 exec_time=0 error_code=0
SET TIMESTAMP=1386748275/*!*/;
BEGIN
/*!*/;
# at 2156
# at 2213
#131211 15:51:15 server id 1287301 end_log_pos 2213 Table_map: `row_slave`.`row_test` mapped to number 33
#131211 15:51:15 server id 1287301 end_log_pos 2298 Update_rows: table id 33 flags: STMT_END_F
### UPDATE row_slave.row_test
### WHERE
### @1=1 /* INT meta=0 nullable=0 is_null=0 */
### @2='old02' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */
### @3=2013-12-11 00:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
### @4=2013-12-11 00:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
### SET
### @1=1 /* INT meta=0 nullable=0 is_null=0 */
### @2='old022' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */
### @3=2013-12-11 00:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */
### @4=2013-12-11 15:51:15 /* DATETIME meta=0 nullable=0 is_null=0 */
# at 2298
#131211 15:51:15 server id 1287301 end_log_pos 2325 Xid = 73
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
?修改test178到old02的同步點主從和old02同步)
stop slave;
CHANGE MASTER TO MASTER_HOST='192.168.8.72',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',MASTER_PORT=7301,MASTER_LOG_FILE='logbin.000007',MASTER_LOG_POS=2075;
start slave;
show slave status\G
?探索資料可以同步過來old02的資料 覆蓋了old03的資料,在一開始我們分析第一個binlog的時候就已經發現,ROW的同步是一個全行的update操作)
mysql> select * from row_test where id=1;
+----+----------+---------------------+---------------------+
| id | hostname | create_time | update_time |
+----+----------+---------------------+---------------------+
| 1 | old022 | 2013-12-11 00:00:00 | 2013-12-11 15:51:15 |
+----+----------+---------------------+---------------------+
總結:同時多個主同步資料不會相互牽制。深層理解,主從同步不會校正表資料是否一致和行資料是否一致。ROW的同步是一個全行的update操作。屬於無腦執行,不會判斷未經處理資料內容。
刪除測試
?刪除test178的id=1的資料
delete from row_test where id=1;
?更新old02的id=1的資料主庫和old02在同步資料)
update row_test set update_time =now() ,hostname ='old023' where id=1;
mysql> select * from row_test where id=1;
+----+----------+---------------------+---------------------+
| id | hostname | create_time | update_time |
+----+----------+---------------------+---------------------+
| 1 | old023 | 2013-12-11 00:00:00 | 2013-12-11 16:09:12 |
+----+----------+---------------------+---------------------+
?在test178上看叢庫同步狀態
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.8.72
Master_User: okooo_rep
Master_Port: 7301
Connect_Retry: 60
Master_Log_File: logbin.000007
Read_Master_Log_Pos: 3078
Relay_Log_File: relay.000002
Relay_Log_Pos: 500
Relay_Master_Log_File: logbin.000007
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB: mysql
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1032
Last_Error: Could not execute Update_rows event on table row_slave.row_test; Can't find record in 'row_test', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log logbin.000007, end_log_pos 2549
Skip_Counter: 0
Exec_Master_Log_Pos: 2325
Relay_Log_Space: 1399
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Update_rows event on table row_slave.row_test; Can't find record in 'row_test', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log logbin.000007, end_log_pos 2549
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1287301
錯誤解釋:主從資料庫中表的資料不一致導致。進過上面的實驗我們發現,只有delete資料才會出現這個錯誤。
現在為止重現了schedule的PXC倒掉以後的備份庫同步失敗的現象。
總結:當資料不存在叢庫的時候,主庫的更新無法執行。
測試總結:當叢庫上表的資料和主庫不一致的時候,可以執行insert操作。update操作會把最後一次執行的記錄覆蓋到叢庫上。delete的資料如果不存在的話,則detele失敗,導致主從不同步。
修複方式
1.暴力的方法,也是對資料重要的方法
stop slave;
SET GLOBAL sql_slave_skip_counter=1; 跳過一句叢庫同步
start slave;
2.針對小量資料比較好的方式,手動修改叢庫資料。以為在上面我們知道ROW模式檢驗資料一致性,只是覆蓋資料。所以,我們只要補上缺失的資料即可。
insert into row_test values(1,'new_row',now(),now());
mysql> select * from row_test where id=1; 我們加入了一條自己編的資料 hostname=‘new_row’
+----+----------+---------------------+---------------------+
| id | hostname | create_time | update_time |
+----+----------+---------------------+---------------------+
| 1 | new_row | 2013-12-11 08:49:37 | 2013-12-11 08:49:37 |
+----+----------+---------------------+---------------------+
stop slave;
start slave;
mysql> select * from row_test where id=1; 資料變成了同步以後的資料
+----+----------+---------------------+---------------------+
| id | hostname | create_time | update_time |
+----+----------+---------------------+---------------------+
| 1 | old023 | 2013-12-11 00:00:00 | 2013-12-11 16:09:12 |
+----+----------+---------------------+---------------------+
3.最保險的方式,同時也是資料量比較大的時候。我們可以找到主庫上寫入id=1的這個時間點的binlog,讓資料重頭開始同步資料。這個方式時間比較長,基本是基於時間點的增量資料恢複)