MySQL source code: how to process read/write locks

Source: Internet
Author: User
Tags semaphore

Reprinted, please sign: Yin Feng
-----------------------------------------------------------
Recently, I encountered a problem where an online machine waited for the semaphore for a long time. The mysql monitoring thread thought that mysqld had been hang, so it committed suicide and restarted. This involves an interesting question, that is, how mysql handles read/write locks.
It consists of three parts:
1. Lock Creation
2. Lock
3. Unlock
4. Monitoring lock
The following content is analyzed based on Percona5.5.18
 
1. Create a lock
The creation of the lock is actually to initialize a RW struct (rw_lock_t). The actual call function is as follows:
 
# Define rw_lock_create (K, L, level )\
Rw_lock_create_func (L), # L)
 
There are three parameters in rw_lock_create. In actual scenarios, only 2nd parameters are used.
K indicates mysql_pfs_key_t, and level indicates the current operation type (at least it looks like yes, in the file sync0sync. h). It seems that k is prepared for performance schema, and k represents the level of the current operation.
For example, create a read/write lock for the purge thread:
 
Rw_lock_create (trx_purge_latch_key,
& Purge_sys-> latch, SYNC_PURGE_LATCH );
 
Let's go to rw_lock_create_func to see how it was created.
We can see that the logic of this function is actually very simple:
Lock-> lock_word = X_LOCK_DECR; // key field
Used to limit the maximum number of concurrent read/write locks. The comments in the Code are as follows:
 
/* We decrement lock_word by this amountfor each x_lock. It is also
Start value for the lock_word, meaning thatit limits the maximum number
Of concurrent read locks before the rw_lockbreaks. The current value
0x00100000 allows 1,048,575 concurrentreaders and 2047 recursive writers .*/
 
When trying to lock, rw_lock_lock_word_decr will be called to reduce lock_word
After initializing a series of variables, execute:
 
Lock-> event = OS _event_create (NULL );
Lock-> wait_ex_event = OS _event_create (NULL );
OS _event_create is used to create a system signal. The mutex is actually created (OS _fast_mutex_init (& (event-> OS _mutex )); and condition variables (OS _cond_init (& (event-> cond_var ));)
Add the lock to the global linked list rw_lock_list.
 
2. Lock
The locking function is defined by a macro. The actual called function is:
1) Write lock
 
# Define rw_lock_x_lock (M )\
Rw_lock_x_lock_func (M), 0, _ FILE __, _ LINE __)
 
When applying for a write lock, perform the following steps:
(1). Call the rw_lock_x_lock_low function to obtain the lock. If the lock is obtained, rw_x_spin_round_count + = I will be returned directly. If the lock is not obtained, continue to execute
(2) In the loop process, only rw_x_spin_wait_count ++ is executed once.
(3). Wait multiple times in millisecond-level loop
 
While (I <SYNC_SPIN_ROUNDS
& Lock-> lock_word <= 0 ){
If (srv_spin_wait_delay ){
Ut_delay (ut_rnd_interval (0,
Srv_spin_wait_delay ));
}
I ++;
}
 
Two system variables are involved:
Innodb_sync_spin_loops (SYNC_SPIN_ROUNDS)
Innodb_spin_wait_delay (srv_spin_wait_delay)
 
Call the ut_delay function in the SYNC_SPIN_ROUNDS loop. This function is very simple, that is, the delay * 50 empty loops.
 
Ut_delay (uint delay ):
For (I = 0; I <delay * 50; I ++ ){
J + = I;
UT_RELAX_CPU ();
}
UT_RELAX_CPU () calls the Assembly command to exclusively occupy the CPU to prevent thread switching.
(4) If the number of loop times is equal to SYNC_SPIN_ROUNDS, call OS _thread_yield (the actual call of pthread_yield causes the calling thread to discard CPU usage) to suspend the thread; otherwise, the thread is suspended until 1 to continue the loop.
(5). Obtain a cell in sync_primary_wait_array ?). Call sync_array_reserve_cell. It seems that there are 1000 slots (sync_primary_wait_array-> n_cells)
(6). Call the rw_lock_x_lock_low function again to try to obtain the lock. If the lock is obtained successfully, return
(7). Call sync_array_wait_event to wait for the conditional variable, and then return 1 to continue loop
The specific locking function (rw_lock_x_lock_low) will be analyzed later
 
2) read lock
 
# Define rw_lock_s_lock (M )\
Rw_lock_s_lock_func (M), 0, _ FILE __, _ LINE __)
 
This function is defined in sync0rw. ic. The function is also very simple, as follows:
 
If (rw_lock_s_lock_low (lock, pass, file_name, line )){
Return;/* Success */
} Else {
/* Did not succeed, try spin wait */
Rw_lock_s_lock_spin (lock, pass, file_name, line );
Return;
}
 
Here, we call rw_lock_s_lock_low to lock the database. If the lock fails, we call rw_lock_s_lock_spin to wait. The Code logic of rw_lock_s_lock_spin is similar to that of rw_lock_x_lock_func.
The rw_lock_s_lock_spin recursively calls the rw_lock_s_lock_low function;
 
It seems that the actual lock and unlock operations are controlled by the counter,
(1) In the rw_lock_s_lock_low Function
Rw_lock_lock_word_decr (lock, 1), subtract 1 from lock-> lock_word
True is returned for successful subtraction; otherwise, false is returned.
This part of the logic is still very simple.
 
(2) In the rw_lock_x_lock_low function, call:
Rw_lock_lock_word_decr (lock, X_LOCK_DECR), subtract X_LOCK_DECR from lock-> lock_word
After the subtrahend are successfully completed, run:
 
Rw_lock_set_writer_id_and_recursion_flag (lock, pass? FALSE: TRUE) to set:
Lock-> writer_thread = s_thread_get_curr_id ()
Lock-> recursive = TRUE
 
Then call the rw_lock_x_lock_wait function to wait for lock-> lock_word = 0, that is, wait for all the read locks to exit.
 
We can see an interesting phenomenon. In the. ic code, we can see that macros are used.
INNODB_RW_LOCKS_USE_ATOMICS, which is related to the gcc version and implements atomic operations by using built-in functions of gcc.
 
3. Unlock
Unlock operations include unlock (# define rw_lock_s_unlock (L) rw_lock_s_unlock_gen (L, 0) and unlock (# definerw_lock_x_unlock (L) Unlock (L, 0 ))
The actually called functions are rw_lock_s_unlock_func and rw_lock_x_unlock_func.
 
1) unlock the read lock (rw_lock_s_unlock_func)
Add count rw_lock_lock_word_incr (lock, 1)
 
2) unlock the write lock (rw_lock_x_unlock_func)
Perform the following operations:
(1) if it is the last thread that calls the lock recursively, Set lock-> recursive = FALSE. The comments in the Code are as follows:
 
/* Lock-> recursive flag also indicatesif lock-> writer_thread is
Valid or stale. If we are the last of the recursive callers
Then we must unset lock-> recursive flag to indicate that
Lock-> writer_thread is now stale.
Note that since we still hold the x-lock we can safely read
Lock_word .*/
 
(2) increase the Count rw_lock_lock_word_incr (lock, X_LOCK_DECR) = X_LOCK_DECR. At this time, you need to send a signal to the thread waiting for the lock:
 
If (lock-> waiters ){
Rw_lock_reset_waiter_flag (lock );
OS _event_set (lock-> event );
Sync_array_object_signalled (sync_primary_wait_array );
}
 
The OS _event_set function will send a pthread_cond_broadcast to the waiting thread.
 
4. Monitor read/write locks
To prevent long-waiting rw locks caused by mysqld being hang, the error monitoring thread monitors long-waiting threads. This thread loops every 1 second
(OS _event_wait_time_low (srv_error_event, 1000000, sig_count );)
Function entry: srv_error_monitor_thread
The sync_array_print_long_waits () function is used to process threads waiting for semaphores for a long time. The process is as follows:
1. view all the waiting threads in the sync_primary_wait_array array.
-> When the value is greater than 240 seconds, the system sends a warning to the error log and sets noticed = TRUE;
-> If the value is greater than 600 seconds, set fatal to TRUE;
2. When noticed is true, innodb monitoring information is printed, and then sleep30 seconds
3. Return the fatal value.
 
When the function sync_primary_wait_array returns true, there will be ten more chances for the same waiting thread, that is, 300 + 1*10 (Monitoring thread loop sleep 1 s each time) seconds; if it does not pass, the monitoring thread will execute an asserted failure:
 
If (fatal_cnt> 10 ){
Fprintf (stderr,
"InnoDB: Error: semaphore wait has lasted"
"> % Lu seconds \ n"
"InnoDB: We intentionally crash the server ,"
"Because it appears to be hung. \ n ",
(Ulong) srv_fatal_semaphore_wait_threshold );

Ut_error;
}
 
Ut_error is a macro:
 
# Define ut_error assert (0)
Mysqld crash caused by assertion failure
The srv_error_monitor_thread function finds an interesting parameter srv_kill_idle_transaction. The corresponding system variable is innodb_kill_idle_transaction, which is used to clear idle transactions within a period of time. This variable specifies the maximum idle transaction time. Specific implementation analysis, and listen to the next Decomposition

Author record the path to growth

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.