mysql行模式(ROW)主從同步測試及錯誤修複

來源:互聯網
上載者:User

測試原由

隨著PXC的逐步上線。線上資料庫的同步方式慢慢由之前的STATEMENT模式轉換到了ROW模式。由於同步方式的改變引發了一些同步問題。

測試目的

一定程度上解決ROW模式下主從同步的問題。作為以後PXC叢集down掉,人工修複的操作文檔。


測試環境

masterold02:7301

masterold03:7302

skavetest178:7303

主庫操作

          vim my.cnf 加入下一面一句

          binlog_format=ROW  資料庫binlog使用ROW模式同步

          分別賦予叢庫同步處理的使用者的許可權

grant all on *.* to okooo_rep@'192.168.%.%' identified by 'Bjfcmlc@Mhxzkhl';

flush privileges;



測試開始

測試基礎同步功能

?.讓test178作為從去同步old02的資料

CHANGE MASTER TO MASTER_HOST='192.168.8.72',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',

MASTER_PORT=7301,MASTER_LOG_FILE='logbin.000001',MASTER_LOG_POS=4;

          ? 查看主從狀態,我們看到很快test178就可以和old02保持一致了。

mysql> show slave status\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.8.72

Master_User: okooo_rep

Master_Port: 7301

Connect_Retry: 60

Master_Log_File: logbin.000006

Read_Master_Log_Pos: 332

Relay_Log_File: relay.000007

Relay_Log_Pos: 475

Relay_Master_Log_File: logbin.000006

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

           ?  讓test178作為從去同步old03的資料,我們看到很快test178也和old03保持一致了。

stop slave;


CHANGE MASTER TO MASTER_HOST='192.168.8.73',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',MASTER_PORT=7302,MASTER_LOG_FILE='logbin.000001',MASTER_LOG_POS=4;



start slave;


mysql> show slave status\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.8.73

Master_User: okooo_rep

Master_Port: 7302

Connect_Retry: 60

Master_Log_File: logbin.000005

Read_Master_Log_Pos: 332

Relay_Log_File: relay.000006

Relay_Log_Pos: 475

Relay_Master_Log_File: logbin.000005

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

總結:基礎同步測試完成,說明在資料庫新搭建結束的時候資料庫中資料一致的情況下,test178可以正常的和old02和old03中任意主庫同步資料。


寫入測試

          ? 分別在old02,old03上建立新的資料庫和表

create database row_slave;


CREATE TABLE `row_test` (

`id` int(10) unsigned NOT NULL,

`hostname` varchar(20) NOT NULL default '',

`create_time` datetime NOT NULL default '0000-00-00 00:00:00',

`update_time` datetime NOT NULL default '0000-00-00 00:00:00',

PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 ;

            ? old02寫入資料

insert into row_test values(1,'old02','2013-12-11 00:00:00','2013-12-11 00:00:00');

insert into row_test values(2,'old02','2013-12-11 00:00:00','2013-12-11 00:00:00');

insert into row_test values(3,'old03','2013-12-11 01:00:00','2013-12-11 01:00:00');

insert into row_test values(4,'old03','2013-12-11 01:00:00','2013-12-11 01:00:00');

            ?查看old02,old03,test178 皆可以查出來

mysql> select * from row_test;

+----+----------+---------------------+---------------------+

| id | hostname | create_time | update_time |

+----+----------+---------------------+---------------------+

| 1 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |

| 2 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |

| 3 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |

| 4 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |

+----+----------+---------------------+---------------------+

             ?old03寫入資料,此時old03主)和test178叢)在同步

insert into row_test values(5,'old03','2013-12-11 02:00:00','2013-12-11 02:00:00');

insert into row_test values(6,'old03','2013-12-11 02:00:00','2013-12-11 02:00:00');

             ?查看old03,test178 皆可查出。此時test178和 old02資料已經不一致了,叢庫比old02多出2條資料id=5,6。

+----+----------+---------------------+---------------------+

| id | hostname | create_time | update_time |

+----+----------+---------------------+---------------------+

| 1 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |

| 2 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |

| 3 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |

| 4 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |

| 5 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |

| 6 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |

+----+----------+---------------------+---------------------+

             ?old02寫入資料 此時主從庫還是test178和old03在同步,和old02沒有關係

insert into row_test values(7,'old02','2013-12-11 03:00:00','2013-12-11 03:00:00');

insert into row_test values(8,'old02','2013-12-11 03:00:00','2013-12-11 03:00:00');

            ?查看 old02的binlog 來找到插入id =7,8的 pos點

cd /home/okooo/apps/tmp_slave01/logs

../bin/mysqlbinlog --no-defaults --base64-output=decode-rows -v -v ./logbin.000007

# at 1399

#131211 11:36:42 server id 1287301 end_log_pos 1472 Query thread_id=5 exec_time=0 error_code=0

SET TIMESTAMP=1386733002/*!*/;

BEGIN

/*!*/;

# at 1472

# at 1529

#131211 11:36:42 server id 1287301 end_log_pos 1529 Table_map: `row_slave`.`row_test` mapped to number 33

#131211 11:36:42 server id 1287301 end_log_pos 1585 Write_rows: table id 33 flags: STMT_END_F

### INSERT INTO row_slave.row_test

### SET

### @1=7 /* INT meta=0 nullable=0 is_null=0 */

### @2='old02' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */

### @3=2013-12-11 03:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

### @4=2013-12-11 03:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

# at 1585

#131211 11:36:42 server id 1287301 end_log_pos 1612 Xid = 40

COMMIT/*!*/;

# at 1612

#131211 11:36:43 server id 1287301 end_log_pos 1685 Query thread_id=5 exec_time=0 error_code=0

SET TIMESTAMP=1386733003/*!*/;

BEGIN

/*!*/;

# at 1685

# at 1742

#131211 11:36:43 server id 1287301 end_log_pos 1742 Table_map: `row_slave`.`row_test` mapped to number 33

#131211 11:36:43 server id 1287301 end_log_pos 1798 Write_rows: table id 33 flags: STMT_END_F

### INSERT INTO row_slave.row_test

### SET

### @1=8 /* INT meta=0 nullable=0 is_null=0 */

### @2='old02' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */

### @3=2013-12-11 03:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

### @4=2013-12-11 03:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

# at 1798

#131211 11:36:43 server id 1287301 end_log_pos 1825 Xid = 41

COMMIT/*!*/;

DELIMITER ;

# End of log file

            ?改變test178的同步點和old02同步

stop slave;


CHANGE MASTER TO MASTER_HOST='192.168.8.72',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',MASTER_PORT=7301,MASTER_LOG_FILE='logbin.000007',MASTER_LOG_POS=1399;



start slave;


show slave status\G

           ?發現old02資料改變以後叢庫同步了old02的資料,這時候的test178叢庫) 已經擁有全部資料了。 其中id in1,2,3,4)3庫共有的。 id in5,6 )old03專屬的  id in 7,8) odl03專屬的。      

mysql> select * from row_test;

+----+----------+---------------------+---------------------+

| id | hostname | create_time | update_time |

+----+----------+---------------------+---------------------+

| 1 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |

| 2 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |

| 3 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |

| 4 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |

| 5 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |

| 6 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |

| 7 | old02 | 2013-12-11 03:00:00 | 2013-12-11 03:00:00 |

| 8 | old02 | 2013-12-11 03:00:00 | 2013-12-11 03:00:00 |

+----+----------+---------------------+---------------------+


總結:確認叢庫表比主庫表少資料不影響新資料寫入



更新測試

            ?改變一條old02和test78都存在的資料 此時test178和old02同步資料,主從依然同步

update row_test set update_time =now() ,hostname ='old021' where id=7;

             ?改變一條old03和test178都有的資料此時test178和old02同步資料,沒有和old03同步,改變old03的資料為下面做準備

update row_test set update_time =now() ,hostname ='old031' where id=5;

            ? 查看old03的binlog,尋找要同步的POS點

../bin/mysqlbinlog --no-defaults --base64-output=decode-rows -v -v ./logbin.000006

# at 1825

#131211 15:20:16 server id 1807302 end_log_pos 1906 Query thread_id=4 exec_time=0 error_code=0

SET TIMESTAMP=1386746416/*!*/;

SET @@session.time_zone='SYSTEM'/*!*/;

BEGIN

/*!*/;

# at 1906

# at 1963

#131211 15:20:16 server id 1807302 end_log_pos 1963 Table_map: `row_slave`.`row_test` mapped to number 33

#131211 15:20:16 server id 1807302 end_log_pos 2048 Update_rows: table id 33 flags: STMT_END_F

### UPDATE row_slave.row_test

### WHERE

### @1=5 /* INT meta=0 nullable=0 is_null=0 */

### @2='old03' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */

### @3=2013-12-11 02:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

### @4=2013-12-11 02:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

### SET

### @1=5 /* INT meta=0 nullable=0 is_null=0 */

### @2='old031' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */

### @3=2013-12-11 02:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

### @4=2013-12-11 15:20:16 /* DATETIME meta=0 nullable=0 is_null=0 */

# at 2048

#131211 15:20:16 server id 1807302 end_log_pos 2075 Xid = 32

COMMIT/*!*/;

DELIMITER ;

# End of log file

             ?改變test178的同步點和old03同步

stop slave;



CHANGE MASTER TO MASTER_HOST='192.168.8.73',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',MASTER_PORT=7302,MASTER_LOG_FILE='logbin.000006',MASTER_LOG_POS=1825;



start slave;


show slave status\G

                ?查看test178資料,發現更新成功 確認修改不同行資料的時候,同時多個主同步資料不會相互牽制。深層理解,主從同步不會校正表資料是否一致和行資料是否一致。之後會繼續驗證這個觀點)

mysql> select * from row_test;

+----+----------+---------------------+---------------------+

| id | hostname | create_time | update_time |

+----+----------+---------------------+---------------------+

| 1 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |

| 2 | old02 | 2013-12-11 00:00:00 | 2013-12-11 00:00:00 |

| 3 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |

| 4 | old03 | 2013-12-11 01:00:00 | 2013-12-11 01:00:00 |

| 5 | old031 | 2013-12-11 02:00:00 | 2013-12-11 15:20:16 |

| 6 | old03 | 2013-12-11 02:00:00 | 2013-12-11 02:00:00 |

| 7 | old021 | 2013-12-11 03:00:00 | 2013-12-11 15:15:34 |

| 8 | old02 | 2013-12-11 03:00:00 | 2013-12-11 03:00:00 |

+----+----------+---------------------+---------------------+

              ?修改在3個庫上全都有的資料 首先改old03上的 id=1的資料

update row_test set update_time =now() ,hostname ='old032' where id=1;

               ?主叢庫同步資料以後 test178和old03在同步資料

mysql> select * from row_test where id=1;

+----+----------+---------------------+---------------------+

| id | hostname | create_time | update_time |

+----+----------+---------------------+---------------------+

| 1 | old032 | 2013-12-11 00:00:00 | 2013-12-11 15:49:53 |

+----+----------+---------------------+---------------------+

                ?修改old02上同樣的資料。

update row_test set update_time =now() ,hostname ='old022' where id=1;

                ? 查看old02上的binlog

../bin/mysqlbinlog --no-defaults --base64-output=decode-rows -v -v ./logbin.000007

# at 2075

#131211 15:51:15 server id 1287301 end_log_pos 2156 Query thread_id=9 exec_time=0 error_code=0

SET TIMESTAMP=1386748275/*!*/;

BEGIN

/*!*/;

# at 2156

# at 2213

#131211 15:51:15 server id 1287301 end_log_pos 2213 Table_map: `row_slave`.`row_test` mapped to number 33

#131211 15:51:15 server id 1287301 end_log_pos 2298 Update_rows: table id 33 flags: STMT_END_F

### UPDATE row_slave.row_test

### WHERE

### @1=1 /* INT meta=0 nullable=0 is_null=0 */

### @2='old02' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */

### @3=2013-12-11 00:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

### @4=2013-12-11 00:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

### SET

### @1=1 /* INT meta=0 nullable=0 is_null=0 */

### @2='old022' /* VARSTRING(20) meta=20 nullable=0 is_null=0 */

### @3=2013-12-11 00:00:00 /* DATETIME meta=0 nullable=0 is_null=0 */

### @4=2013-12-11 15:51:15 /* DATETIME meta=0 nullable=0 is_null=0 */

# at 2298

#131211 15:51:15 server id 1287301 end_log_pos 2325 Xid = 73

COMMIT/*!*/;

DELIMITER ;

# End of log file

ROLLBACK /* added by mysqlbinlog */;

              ?修改test178到old02的同步點主從和old02同步)

stop slave;


CHANGE MASTER TO MASTER_HOST='192.168.8.72',MASTER_USER='okooo_rep',MASTER_PASSWORD='Bjfcmlc@Mhxzkhl',MASTER_PORT=7301,MASTER_LOG_FILE='logbin.000007',MASTER_LOG_POS=2075;



start slave;


show slave status\G

                ?探索資料可以同步過來old02的資料 覆蓋了old03的資料,在一開始我們分析第一個binlog的時候就已經發現,ROW的同步是一個全行的update操作)

mysql> select * from row_test where id=1;

+----+----------+---------------------+---------------------+

| id | hostname | create_time | update_time |

+----+----------+---------------------+---------------------+

| 1 | old022 | 2013-12-11 00:00:00 | 2013-12-11 15:51:15 |

+----+----------+---------------------+---------------------+

總結:同時多個主同步資料不會相互牽制。深層理解,主從同步不會校正表資料是否一致和行資料是否一致。ROW的同步是一個全行的update操作。屬於無腦執行,不會判斷未經處理資料內容。


刪除測試

              ?刪除test178的id=1的資料

delete from row_test where id=1;

              ?更新old02的id=1的資料主庫和old02在同步資料)

update row_test set update_time =now() ,hostname ='old023' where id=1;


mysql> select * from row_test where id=1;

+----+----------+---------------------+---------------------+

| id | hostname | create_time | update_time |

+----+----------+---------------------+---------------------+

| 1 | old023 | 2013-12-11 00:00:00 | 2013-12-11 16:09:12 |

+----+----------+---------------------+---------------------+

                ?在test178上看叢庫同步狀態

mysql> show slave status\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.8.72

Master_User: okooo_rep

Master_Port: 7301

Connect_Retry: 60

Master_Log_File: logbin.000007

Read_Master_Log_Pos: 3078

Relay_Log_File: relay.000002

Relay_Log_Pos: 500

Relay_Master_Log_File: logbin.000007

Slave_IO_Running: Yes

Slave_SQL_Running: No

Replicate_Do_DB:

Replicate_Ignore_DB: mysql

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 1032

Last_Error: Could not execute Update_rows event on table row_slave.row_test; Can't find record in 'row_test', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log logbin.000007, end_log_pos 2549

Skip_Counter: 0

Exec_Master_Log_Pos: 2325

Relay_Log_Space: 1399

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 1032

Last_SQL_Error: Could not execute Update_rows event on table row_slave.row_test; Can't find record in 'row_test', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log logbin.000007, end_log_pos 2549

Replicate_Ignore_Server_Ids:

Master_Server_Id: 1287301

錯誤解釋:主從資料庫中表的資料不一致導致。進過上面的實驗我們發現,只有delete資料才會出現這個錯誤。

現在為止重現了schedule的PXC倒掉以後的備份庫同步失敗的現象。

總結:當資料不存在叢庫的時候,主庫的更新無法執行。


測試總結:當叢庫上表的資料和主庫不一致的時候,可以執行insert操作。update操作會把最後一次執行的記錄覆蓋到叢庫上。delete的資料如果不存在的話,則detele失敗,導致主從不同步。


修複方式

1.暴力的方法,也是對資料重要的方法


stop slave;


SET GLOBAL sql_slave_skip_counter=1; 跳過一句叢庫同步


start slave;


2.針對小量資料比較好的方式,手動修改叢庫資料。以為在上面我們知道ROW模式檢驗資料一致性,只是覆蓋資料。所以,我們只要補上缺失的資料即可。

insert into row_test values(1,'new_row',now(),now());


mysql> select * from row_test where id=1; 我們加入了一條自己編的資料 hostname=‘new_row’

+----+----------+---------------------+---------------------+

| id | hostname | create_time | update_time |

+----+----------+---------------------+---------------------+

| 1 | new_row | 2013-12-11 08:49:37 | 2013-12-11 08:49:37 |

+----+----------+---------------------+---------------------+


stop slave;


start slave;


mysql> select * from row_test where id=1;  資料變成了同步以後的資料

+----+----------+---------------------+---------------------+

| id | hostname | create_time | update_time |

+----+----------+---------------------+---------------------+

| 1 | old023 | 2013-12-11 00:00:00 | 2013-12-11 16:09:12 |

+----+----------+---------------------+---------------------+


3.最保險的方式,同時也是資料量比較大的時候。我們可以找到主庫上寫入id=1的這個時間點的binlog,讓資料重頭開始同步資料。這個方式時間比較長,基本是基於時間點的增量資料恢複)


相關文章

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.