MySQL series: Thread concurrency synchronization mechanism for innodb engine analysis _ MySQL

Source: Internet
Author: User
Tags read write lock
Innodb is a multi-thread concurrent storage engine. internal read/write operations are implemented using multiple threads. Therefore, innodb implements a highly efficient concurrency synchronization mechanism. Innodb does not directly use the latch synchronization structure provided by the system, but encapsulates it and implements innodb as a multi-thread concurrent storage engine, internal reads and writes are implemented using multiple threads. Therefore, innodb implements a highly efficient concurrent synchronization mechanism. Innodb does not directly use the latch synchronization structure provided by the system, but encapsulates and optimizes it, but is also compatible with the system lock. Let's take a look at a piece of innodb internal comment (MySQL-3.23 ):

Semaphore operations in operating systems are slow: Solaris on a 1993 iSCSI takes 3 microseconds (us) for a lock-unlock pair and Windows NT on a 1995 Pentium takes 20 microseconds for a lock-unlock pair. therefore, we have toimplement our own efficient spin lock mutex. future operating systems mayprovide efficient spin locks, but we cannot count on that.

It probably means that in 1995, a lock-unlock of Windows NT required 20 US, and 3us even under Solaris, this is why he wants to implement the custom latch. in innodb, the author has implemented the system latch encapsulation, custom mutex, and custom rw_lock. Next we will analyze them one by one.

1. the mutex and event in the innodb engine encapsulate the basic mutex and event provided by the operating system. the implementation in WINDOWS is not recorded for the time being, this article mainly introduces the support for POSIX systems. In the POSIX system, OS _fast_mutex_t and OS _event_t are implemented. OS _fast_mutex_t is relatively simple. it is actually pthread_mutex. Definition:
typedef pthread_mutex os_fast_mutex_t;
OS _event_t is relatively complex. it is implemented through OS _fast_mutex_t and pthread_cond_t. its definition is as follows:
typedef struct os_event_struct    {        os_fast_mutex_t        os_mutex;        ibool                  is_set;        pthread_cond_t         cond_var;    }os_event_t;
The following is an example of OS _event_t's two-thread signal control process:

For system encapsulation, the most important thing is OS _event_t interface encapsulation. in OS _event_t encapsulation, OS _event_set, OS _event_reset, and OS _event_wait are the most critical methods.
2. in the implementation of mutex (mutex) in innodb, in addition to referencing the OS _mutex_t of the system, the CPU atomic operation is also used to encapsulate an efficient mutex implementation. When the system supports atomic operations, it uses its own encapsulated mutex for mutual exclusion. if not, OS _mutex_t is used. Before gcc 4.1.2, the compiler does not provide atomic operation APIs, so in MySQL -. in innodb of 3.23, we implemented an implementation similar to _ sync_lock_test_and_set. the code is implemented using Assembly:
  asm volatile("movl $1, %%eax; xchgl (%%ecx), %%eax" :               "=eax" (res), "=m" (*lw) :               "ecx" (lw));
What does this code mean? In fact, it is to set the value of the lw to 1 and return the value (res) before setting the lw. in this process, the CPU needs to write back the memory, that is, the CPU and memory are exactly the same. In addition to setting 1 above, there is also a reset implementation, as shown below:
 asm volatile("movl $0, %%eax; xchgl (%%ecx), %%eax" :               "=m" (*lw) :   "ecx" (lw) :  "eax"); 
These two functions are used together, which is the basic implementation of _ sync_lock_test_and_set after the gcc-4.1.2. In the Innodb engine of MySQL-5.6, the above assembly code is replaced by _ sync_lock_test_and_set. we can use atomic operations to implement a simple mutex.
#define LOCK() while(__sync_lock_test_and_set(&lock, 1)){}#define UNLOCK() __sync_lock_release(&lock)
The above is a basic lock-free structure of mutex. in linux, testing is indeed much more efficient than pthread_mutex. Of course, the implementation of mutex in innodb is not just that simple. There are many factors to consider, such as multiple locks in the same thread, the cycle of lock spin, and deadlock detection.
3 mutex implementation in innodb, the mutex custom mutex with atomic operations is a basic mechanism of concurrency and synchronization, in order to reduce the context switching of CPU and provide high efficiency, generally, when the wait time of mutex is less than 100 microseconds, the mutex efficiency is very high. If the waiting time is long, we recommend that you select the OS _mutex mode. Although custom mutex enters the signal waiting state when the spin time exceeds the spin threshold, the efficiency of the entire process is too low compared with OS _mutex, which is not the purpose of custom mutex. The custom mutex is defined as follows:
Struct mutex_struct {ulint lock_word;/* mutex atomic control variable */OS _fast_mutex_t OS _fast_mutex;/* the system OS _mutex used to replace mutex */ulint waiters when the compiler or system supports atomic operations; /* whether a thread is waiting for the lock */UT_LIST_NODE_T (mutex_t) list;/* mutex list node */OS _thread_id_t thread_id;/* obtain the mutex thread ID */char * file_name; /* mutex lock operation file/ulint line;/* Number of lines of mutex lock operation files */ulint level;/* lock layer ID */char * cfile_name; /* file created by mute */ulint cline;/* Number of lines of files created by mutex */ulint magic_n;/* Magic word */};
In the custom mute_t interface method, the two core methods are: mutex_enter_func and mutex_exit.
Mutex_enter_func gets the mutex lock. if mutex is occupied by other threads, the system will first spin SYNC_SPIN_ROUNDS, and then wait for the signal of the thread occupying the lock.
Mutex_exit releases the mutex lock and sends a semaphore that can seize the mutex to the waiting thread.
3.1 mutex_enter_func flowchart:

The above process is implemented in the mutex_spin_wait function. from its code, we can see that this function tries its best to obtain the lock for the thread within the spin cycle, because once it enters the cell_wait state, at least 1 ~ Two system calls may trigger the OS _mutex_t Lock wait and event_wait wait at cell_add. This is much less efficient than OS _mutex. If the lock is obtained in the debugging status, a message indicating that the lock is being used will be added to thread_levels for deadlock check and debugging.
3.2 mutex_exit flowchart
3.4 mutex_t memory structure diagram
4 rw_lock: innodb customizes the read write lock to improve the read performance. The design principle is:
1. multiple threads are allowed to read the variables in the memory at the same time.
2. only one thread is allowed to change the variables in the memory at the same time.
3. at the same time, when a thread reads a variable, no thread write is allowed to exist.
4. at the same time, when a thread changes a variable, it cannot be read by any thread or write by itself (the thread can recursively occupy the lock ).
5. when rw_lock is in the thread read mode, there is a thread write wait. if there are other threads to read the request lock, the read request will be waiting for the previous write to complete.
From the above five points, we can see that rw_lock is in the read and write status when it is occupied. We call it S-latch (read sharing) and X-latch (write exclusive ), mySQL: innodb engine describes S-latch and X_latch as follows:
S-latch X-latch
S-latch Compatible Incompatible
X-latch Incompatible Incompatible
The rw_lock in innodb is built on the custom mutex_t, and all the control is based on mutex and thread_cell. The following is the structure definition of rw_lock_t:
Struct rw_lock_struct {ulint reader_count;/* get the number of readers of The S-LATCH, once not 0, indicates the S-LATCH lock */ulint writer;/* get the status of the X-LATCH, there are mainly RW_LOCK_EX, RW_LOCK_WAIT_EX, timeout, in RW_LOCK_EX indicates an x-latch lock, RW_LOCK_WAIT_EX status indicates a S-LATCH lock */OS _thread_id_t writer_thread; /* obtain the thread ID of the X-LATCH or the ID of the first thread waiting to become x-latch */ulint writer_count;/* Number of X-latch locks in the same thread */mutex_t mutex; /* Protect the mutex of data in the rw_lock structure */ulint pass;/* The default value is 0. if it is not 0, the thread can transfer control of latch to other threads, there are related calls in the insert buffer */ulint waiters;/* there are read or write requests waiting for latch */ibool tables; UT_LIST_NODE_T (rw_lock_t) list; UT_LIST_BASE_NODE_T (bytes) debug_list; ulint level;/* level ID, used to detect deadlocks * // * information used for debugging */char * cfile_name;/* rw_lock file */ulint cline; /* The Row location of the file created by rw_lock */char * last_s_file_name;/* The File */char * last_x_file_name when S-latch is finally obtained; /* finally obtain the file */ulint last_s_line for X-latch;/* finally obtain the file row location for S-latch */ulint last_x_line; /* obtain the row location of the file X-latch */ulint magic_n;/* Magic word */};
The main interfaces for getting and releasing locks in rw_lock_t are: rw_lock_s_lock_func, rw_lock_x_lock_func, rw_lock_s_unlock_func, and rw_lock_x_unlock_func. Here, the spin functions are defined in rw_lock_s_lock_func and rw_lock_x_lock_func. the flows of these two spin functions are similar to the implementation flows of the spin functions in mutex_t, with the aim of completing the lock acquisition during the spin. For details, see the code implementation of rw_lock_s_lock_spin/rw_lock_x_lock_func in sync0rw. c. From the definition of the above structure and the implementation of functions, we can know that rw_lock has four states:
RW_LOCK_SHARED is in multi-thread concurrency state
RW_LOCK_WAIT_EX: wait from S-latch to X-latch status
RW_LOCK_EX is in single-thread write status
There are four statuses of migration:
Through the migration above, we can clearly understand the operational mechanism of rw_lock. in addition to status processing, rw_lock also provides interfaces for debug. we can use the memory relationship diagram to understand their relationship:
5. Deadlock Detection and debugging innodb not only implements custom mutex_t and rw_lock_t, but also performs debug deadlock detection for these two types of latch, which greatly simplifies latch debugging of innodb, latch status and information can be viewed in real time, but this is only available in the debugging version of innodb. The mutex level, rw_lock level, and sync_cell modules related to deadlock detection are used. Latch level-related definitions:
/* Sync_thread_t */struct sync_thread_struct {bytes;/* occupy latch thread id */sync_level_t * levels;/* latch information, sync_level_t structure content */}; /* sync_level_t */struct sync_level_struct {void * latch;/* latch handle, which is the structure pointer of mute_t or rw_lock_t */ulintlevel;/* latch level ID */};
When obtained by latch, innodb will call the mutex_set_debug_info function to add a latch to sync_thread_t, in fact, it includes obtaining the latch thread id, obtaining the latch file location, and latch layer id (for details, see mutex_enter_func and mutex_spin_wait ). Only when latch is occupied will it be reflected in sync_thread_t. if it is only waiting for latch to be obtained, it will not be added to sync_thread_t. Innodb can use the sync_thread_levels_empty_gen function to output all latch waiting dependent cell_t sequences and track the waiting position of threads. 5.1sync _ thread_t and sync_level_t memory structure:
The length of sync_thread_level_arrays is OS _THREAD_MAX_N (the default value is 10000 in linux), which is the same as the maximum number of threads.
The levels length is SYNC_THREAD_N_LEVELS (10000 ).
5.2 What are deadlocks and Deadlock Detection? the following example shows a simple description:
Thread A thread B
Mutex1 enter mutex2 enter
Mutex2 enter mutex1 enter
Execute task execution task
Mutex2 release mutex1 release
Mutex1 release mutex2 release
When the above two threads run simultaneously, A deadlock may occur, that is, thread A obtains the mutex1 waiting for the mutex2 lock, and thread 2 obtains the mutex2 waiting for the mutex1 lock. In this case, thread 1 is waiting for thread 2 and thread 2 is waiting for the thread to cause a deadlock.

After learning about the concept of deadlock, we can start to analyze the process details about the deadlock detection in innodb, the essence of innodb's deadlock check is to determine whether the latch to be locked will generate a closed loop of all threads. this is determined by the content of sync_array_cell_t. When waiting for the cell signal, the system will judge whether to put its status information into sync_array_cell_t and call sync_array_detect_deadlock before entering the OS event wait to determine whether to stop the deadlock. if the deadlock occurs, an exception is triggered. The key to deadlock detection is the sync_array_detect_deadlock function. The following describes how to detect deadlocks:
1. pass the cell corresponding to the latch to the sync_array_detect_deadlock parameter as the parameter. The start parameter and the dependent cell parameter are filled in the cell itself.
2. enter sync_array_detect_deadlock. first, judge whether the dependent cell is waiting for latch. If no, no deadlock exists and a result is returned directly. if yes, first determine which thread occupies the waiting lock and obtain the id of the occupied thread, the system calls sync_array_deadlock_step by occupying the thread id and the global sync_array_t waiting for the cell array state information to determine the lock dependency of the waiting thread. 3. enter sync_array_deadlock_step and find the corresponding cell that occupies the thread. if the cell is the same as the cell that originally required event wait, it indicates a closed loop and a deadlock will occur. If no, call sync_array_detect_deadlock recursively using the queried cell as the parameter to execute step 1. This is a process of cross-recursive judgment of two functions. The loop and recursion between latch handles, thread IDs, and cell handles are detected. the closed loop deadlocks are determined by the latch status. In the preceding step, the mutex and rw_lock latch are used for distinguishing and determining the differences. this is caused by the different operating mechanisms of mutex and rw_lock. Because the latch usage of relational databases is very frequent and complex, checking deadlocks is very effective for lock debugging. especially with the thread_levels status information output for debugging, it makes sense to check deadlocks.
Deadlock: 6. through the above analysis, we can know that in addition to the latch structure encapsulation provided by the operating system, innodb also provides custom latch at the atomic operation level, why does it implement custom latch? My personal understanding is to reduce the context switching of the operating system and improve the concurrency efficiency. The custom latch implemented in innodb is only applicable to lock waits for a short period of time (preferably up to 50us). if it is a long lock wait, it is best to use the provided by the operating system, although the custom lock will enter the event_wait of the operating system after waiting for a spin cycle, it will undoubtedly consume more resources than the system's mutex lock. Finally, let's look at the author's summary in the code: We conclude that the best choice is to set the spin time at 20 us. then the system shocould work well on a multiprocessor. on a uniprocessor we have to make sure that thread swithches due to mutex collisions are not frequent, I. e ., they do not happen every 100 us or so, because that wastes too much resources. if the thread switches are not frequent, the 20 us wasted in spin loop is not too much.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.