The risk of running Slave-related commands when the SlaveSQL thread is blocked

Source: Internet
Author: User
The content of this article is subject to the CC copyright agreement and can be reproduced at will. However, the original source and author information of the article and the copyright statement URL must be indicated in hyperlink form: www.penglixun.comtechdatabaseslave_ SQL _locked_bug.html is working on a batch of secondary, unexpectedly, if one thread blocks the SlaveSQL thread

The content of this article complies with the CC copyright agreement, can be reproduced at will, but must be in the form of hyperlink to indicate the original source of the article and the author information and copyright statement Web site: http://www.penglixun.com/tech/database/slave_ SQL _locked_bug.html today to do a batch of backup machine and primary key work, unexpectedly, if one thread blocks the Slave SQL thread

The content of this article complies with the CC copyright agreement, can be reproduced at will, but must be in the form of hyperlink to indicate the original source of the article and author information and copyright statement URL: http://www.penglixun.com/tech/database/slave_ SQL _locked_bug.html

When I was working on a batch of backup hosts with primary keys today, I was surprised to find that if one thread blocked Slave SQLThread Application Log, resulting in Slave SQLWhen you attempt to execute the Slave Stop command in the Locked status, the statement such as show slave status/master status must be executed to death.
The solution is to wait for the Slave SQL to be locked.Or restart the database. It has been reproduced on MySQL 5.0.68, 5.1.30/34/40.
Search for the Bug library, indeed found this bug, http://bugs.mysql.com/bug.php? Id = 56676, at least before 5.1.50.

Check the source code, mainly becauseMi-> run_lockAndLOCK_active_miProblems caused by two locks.
The slave running process is to use the start_slave_thread function to create the handler_slave_ SQL thread to poll logs, and handler_slave_ SQL calls exec_relay_log_event to apply log events, exec_relay_log_event calls apply_event_and_update_pos to read the application logs of a log event to the storage engine and update the pos information of relay-log. Finally, the XXX_log_event of different types of overload is called Based on the log type :: do_apply_event is used to actually extract logs.

The reason for Hang is as follows:
Once slave_ SQL is successfully started, it will hold the mi-> run_lock lock. mi is an instance of Master_info and records the host information, which is the content of master.info, mi-> run_lock is held, indicating that the mi Slave is running (mi is defined as Master_info *, which is also mentioned in the comment. After Multi Master is written, mi is an array, each Master can hold the lock separately, so MySQL is also doing this). Because currently only one Master is supported, the mi lock is global, that is, LOCK_active_mi. When an SQL statementSlave SQLHold mi-> run_lock, cond_wait cannot wait for the conditions to continue, so if (! SQL _slave_killed (thd, rli) statement. So stop_slave cannot be identified when it sends a kill command, so slave stop will hold the Hang. Because stop slave holds LOCK_active_mi (to disable Slave, you need to save master.info), and show slave status/show status will first do pthread_mutex_lock (& LOCK_active_mi); thus, all are blocked.
Another possible risk is that the tables_to_lock linked list of the Relay_log_info class stores the table to be locked by the Slave. If the Slave cannot continue in time, the tables_to_lock linked list cannot be cleared in time, which may cause many lock problems, it may cause large-scale blocking. The last time there was a fault, MySQL Hang died, it is likely that we skipped the copy error script show slave status and slave start/stop execution frequency is very high, when a large number of connections are required to be established between the master and slave nodes, many CPU context switches will occur, and the LOCK_active_mi lock release will not be able to keep up with the speed. Other scripts for the show slave status collection and monitoring will be quickly blocked, as a result, the tables_to_lock linked list cannot be released in time, leading to normal SQLThe execution is blocked by the lock. Because of the large amount of changes, the blocking spread rapidly, and the lock wait almost killed the database Hang.

So I would like to remind you that there are long SQL statements in SlaveOr Locked SQLIn addition to show processlist, do not run slave commands such as show slave/master status and slave stop.

Handler_slave_ SQL statement execution cycle:
03058 while (! SQL _slave_killed (thd, rli ))
03059 {
03060 thd_proc_info (thd, "Reading event from the relay log ");
03061 DBUG_ASSERT (rli-> SQL _thd = thd );
03062 THD_CHECK_SENTRY (thd );
03063
03064 if (saved_skip & rli-> slave_skip_counter = 0)
03065 {omitted
03076}
03077
03078 if (exec_relay_log_event (thd, rli ))
03079 {
03080 DBUG_PRINT ("info", ("exec_relay_log_event () failed "));
03081 // do not scare the user if SQLThread was simply killed or stopped
03082 if (! SQL _slave_killed (thd, rli ))
03083 {omitted
03144}
03145 goto err;
03146}
03147}

Show slave status command
07409 static int show_slave_running (THD * thd, SHOW_VAR * var, char * buff)
07410 {
07411 var-> type = SHOW_MY_BOOL;
07412 pthread_mutex_lock (& LOCK_active_mi );
07413 var-> value = buff;
07414 * (my_bool *) buff) = (my_bool) (active_mi &&
07415 active_mi-> slave_running = MYSQL_SLAVE_RUN_CONNECT &&
07416 active_mi-> rli. slave_running );
07417 pthread_mutex_unlock (& LOCK_active_mi );
07418 return 0;
07419}

Clear clear_tables_to_lcok of the locked table
01222 void Relay_log_info: clear_tables_to_lock ()
01223 {
01224 while (tables_to_lock)
01225 {
01226 uchar * to_free = reinterpret_cast (tables_to_lock );
01227 if (tables_to_lock-> m_tabledef_valid)
01228 {
01229 tables_to_lock-> m_tabledef.table_def ::~ Table_def ();
01230 tables_to_lock-> m_tabledef_valid = FALSE;
01231}
01232 tables_to_lock =
01233 static_cast (tables_to_lock-> next_global );
01234 tables_to_lock_count -;
01235 my_free (to_free, MYF (MY_WME ));
01236}
01237 DBUG_ASSERT (tables_to_lock = NULL & tables_to_lock_count = 0 );
01238}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.