Reprinted, please sign: Yin Feng
-----------------------------------------------------------
Recently, I encountered a problem where an online machine waited for the semaphore for a long time. The mysql monitoring thread thought that mysqld had been hang, so it committed suicide and restarted. This involves an interesting question, that is, how mysql handles read/write locks.
It consists of three parts:
1. Lock Creation
2. Lock
3. Unlock
4. Monitoring lock
The following content is analyzed based on Percona5.5.18
1. Create a lock
The creation of the lock is actually to initialize a RW struct (rw_lock_t). The actual call function is as follows:
# Define rw_lock_create (K, L, level )\
Rw_lock_create_func (L), # L)
There are three parameters in rw_lock_create. In actual scenarios, only 2nd parameters are used.
K indicates mysql_pfs_key_t, and level indicates the current operation type (at least it looks like yes, in the file sync0sync. h). It seems that k is prepared for performance schema, and k represents the level of the current operation.
For example, create a read/write lock for the purge thread:
Rw_lock_create (trx_purge_latch_key,
& Purge_sys-> latch, SYNC_PURGE_LATCH );
Let's go to rw_lock_create_func to see how it was created.
We can see that the logic of this function is actually very simple:
Lock-> lock_word = X_LOCK_DECR; // key field
Used to limit the maximum number of concurrent read/write locks. The comments in the Code are as follows:
/* We decrement lock_word by this amountfor each x_lock. It is also
Start value for the lock_word, meaning thatit limits the maximum number
Of concurrent read locks before the rw_lockbreaks. The current value
0x00100000 allows 1,048,575 concurrentreaders and 2047 recursive writers .*/
When trying to lock, rw_lock_lock_word_decr will be called to reduce lock_word
After initializing a series of variables, execute:
Lock-> event = OS _event_create (NULL );
Lock-> wait_ex_event = OS _event_create (NULL );
OS _event_create is used to create a system signal. The mutex is actually created (OS _fast_mutex_init (& (event-> OS _mutex )); and condition variables (OS _cond_init (& (event-> cond_var ));)
Add the lock to the global linked list rw_lock_list.
2. Lock
The locking function is defined by a macro. The actual called function is:
1) Write lock
# Define rw_lock_x_lock (M )\
Rw_lock_x_lock_func (M), 0, _ FILE __, _ LINE __)
When applying for a write lock, perform the following steps:
(1). Call the rw_lock_x_lock_low function to obtain the lock. If the lock is obtained, rw_x_spin_round_count + = I will be returned directly. If the lock is not obtained, continue to execute
(2) In the loop process, only rw_x_spin_wait_count ++ is executed once.
(3). Wait multiple times in millisecond-level loop
While (I <SYNC_SPIN_ROUNDS
& Lock-> lock_word <= 0 ){
If (srv_spin_wait_delay ){
Ut_delay (ut_rnd_interval (0,
Srv_spin_wait_delay ));
}
I ++;
}
Two system variables are involved:
Innodb_sync_spin_loops (SYNC_SPIN_ROUNDS)
Innodb_spin_wait_delay (srv_spin_wait_delay)
Call the ut_delay function in the SYNC_SPIN_ROUNDS loop. This function is very simple, that is, the delay * 50 empty loops.
Ut_delay (uint delay ):
For (I = 0; I <delay * 50; I ++ ){
J + = I;
UT_RELAX_CPU ();
}
UT_RELAX_CPU () calls the Assembly command to exclusively occupy the CPU to prevent thread switching.
(4) If the number of loop times is equal to SYNC_SPIN_ROUNDS, call OS _thread_yield (the actual call of pthread_yield causes the calling thread to discard CPU usage) to suspend the thread; otherwise, the thread is suspended until 1 to continue the loop.
(5). Obtain a cell in sync_primary_wait_array ?). Call sync_array_reserve_cell. It seems that there are 1000 slots (sync_primary_wait_array-> n_cells)
(6). Call the rw_lock_x_lock_low function again to try to obtain the lock. If the lock is obtained successfully, return
(7). Call sync_array_wait_event to wait for the conditional variable, and then return 1 to continue loop
The specific locking function (rw_lock_x_lock_low) will be analyzed later
2) read lock
# Define rw_lock_s_lock (M )\
Rw_lock_s_lock_func (M), 0, _ FILE __, _ LINE __)
This function is defined in sync0rw. ic. The function is also very simple, as follows:
If (rw_lock_s_lock_low (lock, pass, file_name, line )){
Return;/* Success */
} Else {
/* Did not succeed, try spin wait */
Rw_lock_s_lock_spin (lock, pass, file_name, line );
Return;
}
Here, we call rw_lock_s_lock_low to lock the database. If the lock fails, we call rw_lock_s_lock_spin to wait. The Code logic of rw_lock_s_lock_spin is similar to that of rw_lock_x_lock_func.
The rw_lock_s_lock_spin recursively calls the rw_lock_s_lock_low function;
It seems that the actual lock and unlock operations are controlled by the counter,
(1) In the rw_lock_s_lock_low Function
Rw_lock_lock_word_decr (lock, 1), subtract 1 from lock-> lock_word
True is returned for successful subtraction; otherwise, false is returned.
This part of the logic is still very simple.
(2) In the rw_lock_x_lock_low function, call:
Rw_lock_lock_word_decr (lock, X_LOCK_DECR), subtract X_LOCK_DECR from lock-> lock_word
After the subtrahend are successfully completed, run:
Rw_lock_set_writer_id_and_recursion_flag (lock, pass? FALSE: TRUE) to set:
Lock-> writer_thread = s_thread_get_curr_id ()
Lock-> recursive = TRUE
Then call the rw_lock_x_lock_wait function to wait for lock-> lock_word = 0, that is, wait for all the read locks to exit.
We can see an interesting phenomenon. In the. ic code, we can see that macros are used.
INNODB_RW_LOCKS_USE_ATOMICS, which is related to the gcc version and implements atomic operations by using built-in functions of gcc.
3. Unlock
Unlock operations include unlock (# define rw_lock_s_unlock (L) rw_lock_s_unlock_gen (L, 0) and unlock (# definerw_lock_x_unlock (L) Unlock (L, 0 ))
The actually called functions are rw_lock_s_unlock_func and rw_lock_x_unlock_func.
1) unlock the read lock (rw_lock_s_unlock_func)
Add count rw_lock_lock_word_incr (lock, 1)
2) unlock the write lock (rw_lock_x_unlock_func)
Perform the following operations:
(1) if it is the last thread that calls the lock recursively, Set lock-> recursive = FALSE. The comments in the Code are as follows:
/* Lock-> recursive flag also indicatesif lock-> writer_thread is
Valid or stale. If we are the last of the recursive callers
Then we must unset lock-> recursive flag to indicate that
Lock-> writer_thread is now stale.
Note that since we still hold the x-lock we can safely read
Lock_word .*/
(2) increase the Count rw_lock_lock_word_incr (lock, X_LOCK_DECR) = X_LOCK_DECR. At this time, you need to send a signal to the thread waiting for the lock:
If (lock-> waiters ){
Rw_lock_reset_waiter_flag (lock );
OS _event_set (lock-> event );
Sync_array_object_signalled (sync_primary_wait_array );
}
The OS _event_set function will send a pthread_cond_broadcast to the waiting thread.
4. Monitor read/write locks
To prevent long-waiting rw locks caused by mysqld being hang, the error monitoring thread monitors long-waiting threads. This thread loops every 1 second
(OS _event_wait_time_low (srv_error_event, 1000000, sig_count );)
Function entry: srv_error_monitor_thread
The sync_array_print_long_waits () function is used to process threads waiting for semaphores for a long time. The process is as follows:
1. view all the waiting threads in the sync_primary_wait_array array.
-> When the value is greater than 240 seconds, the system sends a warning to the error log and sets noticed = TRUE;
-> If the value is greater than 600 seconds, set fatal to TRUE;
2. When noticed is true, innodb monitoring information is printed, and then sleep30 seconds
3. Return the fatal value.
When the function sync_primary_wait_array returns true, there will be ten more chances for the same waiting thread, that is, 300 + 1*10 (Monitoring thread loop sleep 1 s each time) seconds; if it does not pass, the monitoring thread will execute an asserted failure:
If (fatal_cnt> 10 ){
Fprintf (stderr,
"InnoDB: Error: semaphore wait has lasted"
"> % Lu seconds \ n"
"InnoDB: We intentionally crash the server ,"
"Because it appears to be hung. \ n ",
(Ulong) srv_fatal_semaphore_wait_threshold );
Ut_error;
}
Ut_error is a macro:
# Define ut_error assert (0)
Mysqld crash caused by assertion failure
The srv_error_monitor_thread function finds an interesting parameter srv_kill_idle_transaction. The corresponding system variable is innodb_kill_idle_transaction, which is used to clear idle transactions within a period of time. This variable specifies the maximum idle transaction time. Specific implementation analysis, and listen to the next Decomposition
Author record the path to growth