Xtrabackup a serious business failure caused by a backup trap

Source: Internet
Author: User

First, scene: Wake up at 7 o ' clock in the morning, saying that there were a large number of business orders were unsuccessful at night, Web server escalated many SqlException, suspected to be related to the newly deployed Xtrabackup backup script. Production system environment is REDHAT5.8,MYSQL version 5.6,xtrabackup version is 2.2.10


Second, the analysis process:

1, on the DB a look, the backup time is really unusual, 1 o'clock crontab, run until 6:55 to complete:

Innobackupex:backup created in directory '/backup/inc_1 '
150825 06:55:57 innobackupex:connection to database server closed
150825 06:55:57 innobackupex:completed ok!


2, before the test, with the production of the same size of the database (30G), fully prepared in half an hour can be completed, incremental backup of about 5 minutes to complete, why this backup consumed nearly 6 hours? Is it a lock table? Should not ah, all the information on the internet said Xtrabackup backup process is absolutely not lock InnoDB table, will only lock MyISAM table, and the business table is all InnoDB, where is the problem? Go ahead and look at the backup log:

150825 01:04:09 innobackupex:continuing after ibbackup have suspended
150825 01:04:09 innobackupex:executing FLUSH TABLES with READ LOCK ...
>> Log scanned up to (547712171257)
>> Log scanned up to (547712171257)
>> Log scanned up to (547712171257)

。。。。。。

A bunch of log scanned up to

150825 06:55:44 innobackupex:all tables locked and flushed to disk
150825 06:55:44 innobackupex:starting to backup Non-innodb tables and files

。。。。。。

150825 06:55:57 innobackupex:all Tables Unlocked


Obviously, Xtrabackup was blocked while executing "FLUSH TABLES with READ LOCK" and was blocked until 06:55 to execute successfully. I didn't understand the power of "FLUSH TABLES with READ LOCK" before, and the execution of this thing would happen in two ways:

1), quickly execute the success, this way the subsequent backup can be completed faster. The problem is that "FLUSH TABLES with READ lock" to "all TABLES unlocked (unlock TABLES)" is bound to lock the table, whether you are a InnoDB table or a MyISAM table. As can be seen from the above log, the lock for 13 seconds, this 13 seconds, all DML statements will be blocked.


2), unable to execute the success quickly, so the disaster came, that is, the cause of the failure occurred. What will block the execution of "FLUSH TABLES with READ LOCK"? You can do the next experiment to confirm

A, long query before the run is not finished

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/72/26/wKioL1XdyMmS1OdOAAJwS7lsKBU012.jpg "title=" Lock1.jpg "alt=" Wkiol1xdymms1odoaajws7lskbu012.jpg "/>


As you can see, a long query blocks the "FLUSH TABLES with READ LOCK".


B. Any tables are explicitly locked before running

For any table, such as T1, execute LOCK table T1 read, and then execute "FLUSH TABLES with Read LOCK" in the other session, blocking. If you write to T1 lock table T1 write, the "FLUSH TABLES with READ lock" is blocked.


Whatever the reason, "FLUSH TABLES with READ LOCK" is blocked, the result is very sad. Not only are the DML statements of all libraries blocked, but queries can be problematic for long queries or explicitly locked databases.


Third, DB log analysis

You can verify the previous analysis from the slow query log:

1. Orders

# time:150825 6:55:57
# [email protected]: Root[root] @ [172.17.5.15] id:9922648
# query_time:1534.181557 lock_time:0.000000 rows_sent:0 rows_examined:0
SET timestamp=1440456957;
INSERT into ' globallink_g '. ' G_order ' (' id ', ' userid ', ' create_date ', ' modify_date ', ' amount_paid ', ' order_status ') VALUES (234977, ' Test ', ' 2015-08-25 06:36:39 ', ' 2015-08-25 06:36:42 ', 232, 1);

Such a simple SQL, executed for nearly half an hour, and did not execute at 6:55:57, this is exactly the backup completed (Unlocked moment)


2. Report

# time:150825 6:55:58
# [email protected]: Root[root] @ [192.168.1.200] id:9921401
# query_time:4173.803182 lock_time:0.000081 rows_sent:1 rows_examined:9742
SET timestamp=1440456958;
Call G_getactiveimsi (' 201508250551 ', ' 10008 ');


This is also blocked for 1 hours, after the backup is completed.


Iv. Summary

1, the Production system database architecture is the traditional ha mode, that is, the data directory in the shared magnetic array, cluster software monitoring MySQL process, not the popular MySQL master-slave architecture, if the Master-slave architecture, can be directly backed up from the library, This will not have any effect on the main library.


2, "Xtrabackup full backup does not lock InnoDB table, only lock MyISAM table" This argument has a problem. After testing, in the Innobackupex command to add a--no-lock parameter, it really does not lock the table, but the use of this parameter has several premises.

The official explanations are as follows:

Use the This option to the Disable table lock with the FLUSH TABLES with READ lock. Use it only if all your tables is InnoDB and you don't care about the binary log position of the backup. This option shouldn ' t is used if there is any DDL statements being executed or if any updates is happening on Non-innod B tables (this includes the system MyISAM tables in the MySQL database), otherwise it could leads to an inconsistent backup . If you is considering to use--no-lock because your backups is failing to acquire the lock,this could is because of Inco Ming replication events preventing the lock from succeeding. Please try using
--safe-slave-backup to momentarily stop the replication slave thread, this could help the backup to succeed and your then don ' t need to resort to using the This option. Xtrabackup_binlog_info is isn't created When–no-lock option is used (because SHOW MASTER STATUS could be inconsistent), but u NDEr certain conditions Xtrabackup_binlog_pos_innodb can be used instead to get consistent binlog coordinates as described In working with Binary Logs.


3, this problem in the business is not busy system, may not be easy to appear. 1 o ' Night, lock table lock 10 seconds, the problem is not big.

If you are running in a busy system, or if the backup is just working with a heavy task (a day knot, and so on), then Xtrabackup is likely to cause a business failure by locking the business table for a long time.


This article is from "Memory Fragment" blog, declined reprint!

Xtrabackup a serious business failure caused by a backup trap

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.