MySQL series: InnoDB engine analysis thread concurrency synchronization mechanism

Source: Internet
Author: User
Tags mutex posix read write lock semaphore volatile

InnoDB is a multi-threaded concurrent storage engine, the internal read and write are implemented with multithreading, so innodb internal implementation of a relatively efficient concurrency synchronization mechanism. InnoDB does not directly use the system-provided lock (latch) synchronization structure, but rather its own encapsulation and implementation optimization, but also compatible with the system's locks. Let's take a look at a InnoDB internal note (MySQL-3.23):

Semaphore operations in operating systems is Slow:solaris on a 1993 SPARC takes 3 microseconds (US) for a Lock-unlock PA IR and Windows NT on a 1995 Pentium takes microseconds for a lock-unlock pair. Therefore, we have toimplement our own efficient spin lock mutex. The future operating systems mayprovide efficient spin locks and we cannot count on that.

Probably mean that 1995 years, a Windows NT lock-unlock need to consume 20us, even under the Solaris also need 3us, this is why he wants to implement the purpose of custom latch, In InnoDB, the author implements the encapsulation of the system latch, the custom mutex, and the custom Rw_lock. Let's do the analysis.

1 Mutex and event for the systemin the InnoDB engine, the basic mutex (mutex) and event (semaphore) provided by the operating system are encapsulated, and the implementation under Windows is not recorded for the time being, mainly to support POSIX systems. The implementation of the POSIX system is os_fast_mutex_t and os_event_t. Os_fast_mutex_t is relatively simple, actually is Pthread_mutex。 Defined as follows:
typedef Pthread_mutex OS_FAST_MUTEX_T;
while os_event_t is relatively complex, it is implemented through os_fast_mutex_t and a pthread_cond_t, defined as follows:
typedef struct OS_EVENT_STRUCT    {        os_fast_mutex_t        Os_mutex;        Ibool                  Is_set;        pthread_cond_t         Cond_var;    } os_event_t;
The following is an example flow of two-thread signal control for os_event_t:

for the encapsulation of the system, the main thing is the encapsulation of the os_event_t interface, and in the os_event_t package, Os_event_set, Os_event_reset, os_event_wait these three method is the most critical one.
2 CPU Atomic operationIn the implementation of InnoDB mutexes (mutexes), atomic operations are used to encapsulate an efficient mutex implementation in addition to the os_mutex_t of the referencing system. In thewhen the system supports atomic operations, it uses its own encapsulated mutex to do mutexes, and if not, use os_mutex_t. Prior to GCC 4.1.2, the compiler wasdoes not provide the atomic operation of the API, so in the mysql-.3.23 InnoDB implementation of a similar implementation of the __sync_lock_test_and_set, the code is the use of The assembly implementation:
  ASM volatile ("MOVL",%%eax; Xchgl (%%ECX),%%eax ":               " =eax "(res)," =m "(*LW):               " ECX "(LW));
What does this code mean? In fact, the value of LW is set to 1, and returns the value before setting LW (RES), the process is that the CPU needs to write back the memory, that is, the CPU and memory is exactly the same. In addition to the above settings 1, there is a reset implementation, as follows:
ASM volatile ("MOVL $,%%eax; Xchgl (%%ECX),%%eax ":               " =m "(*LW):   " ECX "(LW):  
These two functions are used together, which is the basic implementation of the __sync_lock_test_and_set after gcc-4.1.2. In MySQL-5.6 's InnoDB engine, the above assembly code is replaced with __sync_lock_test_and_set, and we can implement a simple mutex using atomic manipulation.
#define LOCK () while (__sync_lock_test_and_set (&lock, 1)) {} #define UNLOCK () __sync_lock_release (&lock)
The above is a basic lock-free structure of the mutex, under the Linux test is indeed much higher than the Pthread_mutex efficiency.Of course, in the innodb of the mutex implementation is not just so simple, there are many factors to consider, such as: the same thread multiple lock, lock spin cycle, deadlock detection and so on.
3 implementation of the mutexIn InnoDB, the mutex custom mutex with atomic manipulation is the underlying concurrency and synchronization mechanism, which is designed to reduce the CPU context switching and provide high efficiency, the mutex is very efficient when the general mutex waits no more than 100 microseconds. If the waiting time is long, it is recommended to choose the Os_mutex method. Although a custom mutex goes into a signal-waiting state when the spin time exceeds the spin threshold, the entire process is less efficient than the Os_mutex, which is not the purpose of a custom mutex. The definition of a custom mutex is as follows:
struct mutex_struct{ulint Lock_word;                             /*mutex Atom Control variable */os_fast_mutex_t Os_fast_mutex;     /* Replace the mutex*/Ulint waiters with the system Os_mutex used by the compiler or the system to support atomic operations;                                  /* Whether the thread is waiting for lock */ut_list_node_t (mutex_t) LIST;     /*mutex list node*/os_thread_id_t thread_id;              /* thread that obtains the mutex id*/char* file_name;                            /*mutex lock operation of the file/Ulint line;                                       /*mutex the number of rows of the file that the lock operates */ulint level;                                     /* Lock layer id*/char* cfile_name;                          File created by/*mute */Ulint cline;                                    Number of file lines created by/*mutex */Ulint magic_n;                              /* Magic word */};
In the interface method of custom mute_t, the most core two methods are: Mutex_enter_func and Mutex_exit methods
Mutex_enter_func obtains a mutex lock, and if the mutex is occupied by another thread, it spins the sync_spin_rounds first and then Wait for the signal of the thread that occupies the lock again
Mutex_exit releases the mutex lock and sends a semaphore to the waiting thread that can preempt the mutex
3.1 Mutex_enter_func Flowchart:


The above process is mainly implemented in mutex_spin_wait this function, from its code can be seen, this function is to make the thread in the spin cycle to obtain the lock, because once into the cell_wait state, at least 1 ~ 2 system calls, in Cell_ Add is likely to trigger os_mutex_t lock waits and will event_wait wait. This is much less efficient than the system Os_mutex. In the debug state, when the lock is acquired, a lock is added to the Thread_levels, which is used for deadlock checking and debugging.
3.2 Mutex_exit Flowchart
3.4 mutex_t's memory structure diagram
3.4mutex get the lock and release lock
The implementation of the 4 Rw_lock InnoDB to improve read performance, custom read write lock, which is read-write lock. Its design principles are:
1. Allow multiple threads to read variables in memory at the same time
2. Only one thread is allowed to change variables in memory at the same time
3. At the same time, the thread thread does not allow any threads to write while reading the variable
4. At the same time, when the thread is changing the volume, it does not allow any threads to read, nor does it allow the thread to write outside itself (the thread can recursively occupy the lock).
5, when there is rw_lock in the thread read mode is wired write wait, this time if there are other threads read the request lock, this read request will be waiting for the previous write completion.
From the above 5 points, we can see that Rw_lock is occupied is read state and write state, we call S-latch (read sharing) and X-latch (write exclusive), "MySQL Technology Insider: InnoDB Engine" to S-latch and X_latch described as follows:
S-latch X-latch
S-latch Compatible Not compatible
X-latch Not compatible Not compatible
The Rw_lock in InnoDB is built on a custom mutex_t, and all controls are based on mutexes and Thread_cell. The following is the structure definition of rw_lock_t:
struct rw_lock_struct{ulint reader_count;                                     /* Get S-latch number of readers, once not 0, means s-latch lock */ulint writer;                                                                                           /* Get X-latch status, mainly Rw_lock_ex, RW_LOCK_WAIT_EX, Rw_lock_not_locked, in Rw_lock_ex said to be a x-latch lock, RW        The state of _LOCK_WAIT_EX is a s-latch lock */os_thread_id_t Writer_thread;                         /* Get the thread ID of X-latch or the first thread waiting to become x-latch id*/ulint Writer_count;                             /* X-latch lock times in the same thread */mutex_t mutex;                                      /* Protect data Mutex in rw_lock structure */Ulint pass; /* Default is 0, if 0, indicates that the thread can transfer latch control to another thread, there is a related call in insert buffer */UL                                 int waiters; /* have read or write in wait to get latch*/ibool writer_is_wait_ex; ut_list_node_t (rw_lock_t) LIST; ut_list_base_node_t (rw_lock_debug_t) debug_list;    Ulint level;                                 /*level indicator, used to detect the deadlock//////////char* cfile_name;                                     /*rw_lock created file */Ulint cline;                 /*rw_lock creates the file line location */char* last_s_file_name;                 /* Last obtained S-latch file */char* last_x_file_name;                            /* Last obtained X-latch file */Ulint last_s_line;                           /* The file line location at the end of the S-latch */Ulint last_x_line;                              /* The file line location at the end of the X-latch */Ulint magic_n; /* Magic word */};

The main interfaces for obtaining lock and release locks in rw_lock_t are: Rw_lock_s_lock_func, Rw_lock_x_lock_func, Rw_lock_s_unlock_func, Rw_lock_x_unlock_ Func four key functions. The spin function is defined in Rw_lock_s_lock_func and Rw_lock_x_lock_func, and the flow of the two spin functions is similar to the spin function implementation process in mutex_t, which is designed to achieve the acquisition of the lock during spin. Detailed details can be seen in the code implementation of the Rw_lock_s_lock_spin/rw_lock_x_lock_func in SYNC0RW.C. From the definition of the above structure and the implementation of the function you can know that there are four states of Rw_lock:
rw_lock_not_locked Idle State
Rw_lock_shared in multi-threaded concurrency all states
RW_LOCK_WAIT_EX waiting to become x-latch state from S-latch
RW_LOCK_EX in single-threaded write state
The following is theFourmigration of states:
Through the above migration we can clearly understand the operation mechanism of Rw_lock, in addition to state processing, Rw_lock also provides the interface for debug, we can through the memory diagram to understand their relationship:
5 Deadlock Detection and commissioningInnoDB Debug Deadlock detection for both types of latch, in addition to implementing custom mutex_t and rw_lock_t ,this greatly simplifies the latch debugging of InnoDB, latch status and information can be seen in real time, but this is only in InnoDB debugcan be seen in the version. The modules associated with deadlock detection are mainly mutex level, rw_lock level, and Sync_cell. Latch level-related definitions:
/*sync_thread_t*/    struct sync_thread_struct    {         os_thread_id_tid;            /* id*/sync_level_t*levels that occupies latch's thread         ;         /*latch information, sync_level_t structure content * *     };        /*sync_level_t*/    struct sync_level_struct    {         void*latch;                    /*latch handle, is the structure pointer of mute_t or rw_lock_t */         ulintlevel;                     /*latch level identification id*/    };

when latch is acquired, InnoDB invokes the Mutex_set_debug_info function to add a latch-acquired state information to the sync_thread_t, which in fact includes the thread ID for the latch, Get the latch file location and latch layer ID (details can be viewed Mutex_enter_func and mutex_spin_wait). Only occupy the latch will be reflected in the sync_thread_t, if just waiting to get latch will not be added to the sync_thread_t. InnoDB can use the Sync_thread_levels_empty_gen function to output all latch waiting on the cell_t sequence to trace where the thread waits. The memory structure relationship between 5.1sync_thread_t and sync_level_t:
The length of the sync_thread_level_arrays is Os_thread_max_n (default is 10000 under Linux), which is the same as the maximum number of threads.
The length of the levels is Sync_thread_n_levels (10000).
5.2 Deadlock and deadlock detection what is a deadlock, by the following example we can make a simple description:
Thread A Thread B
Mutex1 Enter Mutex2 Enter
Mutex2 Enter Mutex1 Enter
Perform tasksPerform tasks
MUTEX2 ReleaseMUTEX1 Release
MUTEX1 Release MUTEX2 Release
When the above two threads run at the same time, a deadlock may occur, that is, the a thread obtains the lock that MUTEX1 is waiting for mutex2, and thread 2 obtains the lock Mutex2 is waiting for mutex1. In this case, thread 1 waits for thread 2, thread 2 waits for the thread to cause a deadlock.

Once we understand the concept of deadlocks, we can begin to analyze the details of the process of deadlock detection in InnoDB, and the essence of InnoDB's check-in deadlock is to judgewhether the latch to be locked produces a closed loop for all threads, this is judged by the contents of the sync_array_cell_t. When you start waiting for the cell signal,will be judged to put their status information into sync_array_cell_t, before entering OS event wait will call Sync_array_detect_deadlock to judgeIf a deadlock occurs, an exception is triggered if the lock is deadlocked. The key to deadlock detection is with the Sync_array_detect_deadlock function. The following is a description of the process for detecting deadlocks:
1, will enter the waiting latch corresponding cell as a parameter into the Sync_array_detect_deadlock, where the start parameter and the dependent cell parameterThe number filled in is the cell itself.
2, enter Sync_array_detect_deadlock first to determine whether the cell is waiting for latch, if not, indicates no deadlock, directly return.If there is, first determine which thread is waiting for the lock to be occupied, and get the ID of the thread that occupies it, by seizing the thread's ID and the global sync_array_t waitcell array-likecall Sync_array_deadlock_step to determine the lock dependency of the waiting thread. 3, enter Sync_array_deadlock_step first to find the corresponding cell that occupies the thread, if the cell and the original need event wait for the cell is the samea cell,represents a closed loop, which creates a deadlock. If not, continue calling the query to the cell as a parameter recursivelySync_array_detect_deadlockperform step 2nd. This is a process of cross-recursive judgment of two functions. In the detection of deadlock process latch handle, thread ID, cell handle three interlocking and recursive, through the state of the latch itself to determine the closed-loop deadlock. The 2nd step above is based on the distinction between latch and rw_lock, which is due to the different operating mechanisms of mutexes and rw_lock. Because the latch of a relational database is very frequent and complex to use, checking for deadlocks is very effective for lock debugging, especially with Thread_levels state information output for debugging, and it is very meaningful to troubleshoot deadlocks.
Deadlock:6. SummaryFrom the above analysis you can know that InnoDB in addition to the implementation of the operating system provided by the latch architecture package unexpectedly, but also provides the atomic operation level of the custom latch, then why should it implement a custom latch it? My personal understanding is primarily to reduce the switching of the operating system context and increase the efficiency of concurrency. The custom latch implemented in InnoDB is only suitable for a short time lock wait (preferably not more than 50US), if it is a long time lock wait, it is best to use the operating system provided, although the custom lock waiting for a spin cycle will enter the operating system event_wait, But this is undoubtedly more resource-intensive than the system's mutex lock. Finally, we'll look at the author's summary in the code:We conclude the best choice are to set the spin time at us. Then the system should work well on a multiprocessor. On a uniprocessor we has to make sure this thread swithches due to mutex collisions is not frequent, i.e., they does not H Appen every us or so, because that wastes too much resources. If the thread switches is not frequent, the US wasted in spin loop was not too much.

MySQL series: InnoDB engine analysis thread concurrency synchronization mechanism

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.