InnoDB is currently the most popular storage engine in the MySQL database, and InnoDB is a big feature of other storage engines that support transactions and support row-granularity locks. Today, I would like to share with you the basic knowledge of the implementation of the InnoDB line lock. Due to the larger space, the article will be based on the following directory structure.
{
INNODB Lock Structure
Lock mechanism key process
InnoDB row lock Overhead
InnoDB lock synchronization mechanism
InnoDB Wait for event implementation
}
Let's start with a simple example, as in table 1 below.
Time axis |
A user (T1) |
b User (T2) |
T1 |
SELECT * FROM t where id=1 for update |
|
T2 |
|
SELECT * FROM t where id=1 for update |
T3 |
|
Pending status |
T4 |
Commit |
|
T5 |
|
Successful execution |
Table 1
T1 moment a user obtains an exclusive lock of the record in table T with the ID 1, then when the T2 moment B user requests the record's exclusive lock, it waits for the user to wait for the second time, and then the B user executes the transaction immediately after the "a" is committed by the T4. There are a few questions behind this simple example: first, how InnoDB suspends the execution thread of the B user, and second, how User B executes the successful return immediately after a user commits the transaction. The above example is essentially a innodb using a lock to achieve the purpose of an ordered operation ID of 1 for both a user and B user, the implementation process is described in detail later in this article, along with some basic information about locks.
1. INNODB Lock Structure
INNODB lock structure through Lock_sys management, all the row lock lock_t objects are inserted into the hash table, by maintaining the hash table to manage the row lock object, the hash table key value is calculated by the page number (SPACE_ID,PAGE_NO).
1) lock system structure diagram
2) Important Data structures
1 Lock_sys2 {3hash_table_t* Rec_hash;//row lock Hash table4srv_slot_t* waiting_threads;//array of Wait objects5 }6 7 lock_rec_t8 { 9Ulint space;//Table Space NumberTenUlint Page_no;//data page Number OneUlint n_bits;//records contained in a data page A bytebitmap[1+n_bits/8]//Bitmap Array -};
2. Key processes
1) Create lock "Lock_rec_create"
A) calculate the number of records in the page,
b) Calculate the required storage space by a bit per record
c) apply for lock_t storage space
d) initialize the bitmap and place the heap_no corresponding bit position 1 to indicate the lock
e) Insert the lock object pointer into the hash list
f) inserting the lock object into the chain table of the transaction
2) Check the lock of a record: (whether locked, lock type)
A) Get record information: (SPACE_ID,PAGE_NO), and Heap_no
b) Find the hash table according to (SPACE_ID,PAGE_NO), get lock object lock _t
c) Depending on the content of the lock object, whether it is a shared or exclusive lock
D) If present, traverse the lock object's bitmap to determine if the heap_no corresponds to a bit of 1.
e) is 1, which means it has been locked
3) Uplink lock
A) Find the hash table to determine if there is a lock on the page
b) If it does not exist, create a lock and insert the lock object into the hash list
c) If present, determine if the transaction has a stronger lock presence (LOCK_REC_HAS_EXPL)
d) If, jump 5, if not, jump 6 (Lock_rec_lock_slow)
e) Set the bit to end according to the heap_no of the page.
f) Determine if the request lock has a lock conflict
g) If, create the lock (mode lock_wait), set Wait_lock (lock_rec_enqueue_waiting)
h) If not, locked successfully, join lock queue (Lock_rec_add_to_queue)
i) Call lock wait logic (Lock_wait_suspend_thread) based on the return error code of the upper layer call
4) Lock Waiting "Lock_wait_suspend_thread"
A) obtain transaction information based on worker thread information;
b) Request slot node (Lock_wait_table_reserve_slot), initialize the wait event;
c) Set a wait event (implemented in Linux via a conditional variable) to suspend a thread
Call stack
#0 pthread_cond_wait#1 os_cond_wait (pthread_cond_t*, os_fast_mutex_t*) () # 2Long) () #3 lock_wait_suspend_thread (que_thr_t*) () #4 Row_ Mysql_handle_errors (dberr_t*, trx_t*, que_thr_t*, trx_savept_t*) ()
5) Release Lock
The InnoDB row lock is not released until the transaction commits or rolls back. When the lock is released, it checks to see if there is a lock object waiting for the lock, and if so, releases it and wakes the corresponding thread.
A) The extraction lock type is lock_wait lock, determine whether the need to continue to wait.
b) If you do not need to wait, authorize lock_grant
c) Find the corresponding transaction (LOCK_T->TRX) information according to the lock object,
d) Find the corresponding worker thread (TRX_LOCK_T->WAIT_THR) information through the transaction
e) Find the corresponding slot (waiting event) via THR information
f) Invoke Os_event_set trigger event
Call Stack #0 os_event_set (thr->slot->event); #1 lock_wait_ release_thread_if_suspended#2 lock_grant#3 lock_rec_dequeue_from_page# 4 lock_trx_release_locks
6) Management of slots
The lock waits through the Wait Events event event on the slot object (as explained below), each slot object contains a wait event, and the number of slots is related to the thread that is running. Because the blocked body is a thread, you only need to initialize the same slot node as the maximum number of threads. The slot information is stored in the waiting_threads of the Lock_sys. When a slot is needed, it is fetched from the array.
Slot Initialization
Lock_sys = static_cast<lock_sys_t*>= static_cast<lock_stack_t*>( mem_zalloc ( sizeof (*lock_stack) * lock_stack_size)); void* ptr = &lock_sys[1];lock_sys->waiting_threads = Static_cast<srv_slot _t*> (PTR);
3. InnoDB row lock Overhead
InnoDB row locks are stored in a bitmap, in theory a record requires only a bit bit. The basic unit of a lock is a row, but a lock is managed and organized through transactions and pages, and the instance that creates the lock is lock_t, and a lock_t instance corresponds to all records of an indexed page.
1) Line lock cost calculation
Memory overhead is mainly derived from pointers and bitmap that store lock information. A bit in bitmap corresponds to a record of page, a page of 200 records, a row lock object size of about 100bytes. If the page locks only one row, the cost is 100byte/, and if all records have a common lock, the cost is 100byte/200=4bit/rows. In fact, only if the same transaction locks all the records of the page, and the lock mode is the same, it is possible to guarantee that a page has only one lock.
memory space occupied by a lock_t object
1 /*Make lock bitmap bigger by a safety margin*/ 2n_bits = page_dir_get_n_heap (page) +Lock_page_bitmap_margin; 3N_bytes =1+ N_bits/8; 4 Lock= static_cast<lock_t*>( 5Mem_heap_alloc (trx->Lock. Lock_heap,sizeof(lock_t) + n_bytes));
2) Lock Reuse
the INNODB lock mechanism uses lock reuse to ensure that the memory cost of the lock is as small as possible. Specifically , the same transaction locks the record of the same page, and the lock mode is the same, and for the same transaction, the existing lock is stronger than the requested lock mode, neither of which requires a re-creation of the lock object.
4. InnoDB lock synchronization mechanism (spinlock+mutex+ condition variable)
InnoDB does not directly adopt the native synchronization mode such as SPINLOCK,MUTEX or conditional variable implementation, but will be fused in several ways to achieve the optimal goal. The main functions are implemented in MUTEX_ENTER_FUNC and mutex_exit two functions.
1) Data structure
ib_mutex_t{ os_event_t event; // wait for event volatile lock_word_t // lock variable / / do not support atomic lock system, use mutex Ulint // whether there is a waiting thread }
2) Get Mutex Flow "Mutex_enter_func (Ib-mutex)"
A) first spin, check the Mutex->lock_word, determine whether the lock can be obtained
b) for systems that do not support spinlock, use the Pthread_mutex_trylock method and use Os_fast_mutex to protect the Mutex->lock_word to determine if the lock can be obtained
c) If not available, assign a cell from the global variable Sync_wait_array and set the cell's Wait_object to Ib-mutex
D) Set the waiters of Ib-mutex to 1
e) Call Os_event_wait_low (ib-mutex->event) to suspend the thread
f) After acquiring the semaphore, the thread jumps to step A to start the execution again.
3) Release Mutex process "Mutex_exit_func (Ib-mutex)"
A) reset the Mutex->lock_word,
b) for spin lock, set via Os_atomic_test_and_set_byte
c) for systems that do not support spin locks, release Os_fast_mutex and set Lock_word to 0
D) Determine if the Ib-mutex object waiters is 1 (whether the thread hangs)
e) Call Mutex_signal_object (ib-mutex->event)
f) Call Pthread_cond_broadcast (Event->cond) to wake up all waiting threads
5. InnoDB Wait for event implementation
1) The structure of the event
os_event{ os_cond_t / / condition variable ibool // for ture, thread does not block on event os_fast_mutex_t Os_mutex; // protect the mutex of the condition variable }
2) Os_event_set Process
A) Obtain the mutex amount Os_mutex
b) If Is_set is true, do nothing, release Os_mutex
c) If Is_set is false, set Is_set to True
D) invoke the Pthread_cond_broadcast broadcast condition variable to wake all waiting threads
3) os_event_wait Process
A) Obtain the mutex amount Os_mutex
b) Judge Is_set to true, then do nothing, release Os_mutex
c) If Is_set is false, call pthread_cond_wait and suspend itself waiting
D) After being awakened, release the mutex Os_mutex
Back to the question mentioned at the beginning of the article, assuming that the record of table t,id=1 is on the page (1,20), 2, the lock node can be represented by a red box, and a node represents a lock object. In addition, transactions T2 and T3 already have 2 locks on the page (0,200), explaining why there are 2 locks on the same page. This is because the owner of the lock object is different. Different transactions, even for the same pattern of locks on the same record, need to create a lock object separately, so-called lock reuse is for the same transaction lock multiple records of the same page. If the T1 also needs to be locked to (0,200), if the locked record is in conflict with an existing lock, the lock is created and the wait is suspended; otherwise, the lock is created and returned successfully.
InnoDB Line Lock Source Learning (a)