Innodb row lock source code learning (1), innodb row lock source code

Last Update:2015-01-28 Source: Internet

Author: User

Tags lock queue

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Innodb row lock source code learning (1), innodb row lock source code

Innodb is the most popular storage engine in mysql databases. Compared with other storage engines, innodb supports transactions and row-level locks. Today, I would like to share with you the basic knowledge of innodb row lock implementation. Due to the large length, the article will be expanded according to the following directory structure.

{
Innodb Lock Structure
Key Process of Lock Mechanism
Innodb row lock overhead
Innodb Lock synchronization mechanism
Innodb waits for event implementation
}

Let's start with a simple example, as shown in table 1 below.

Timeline	User A (T1)	User B (T2)
T1	Select * from t where id = 1 for update
T2		Select * from t where id = 1 for update
T3		Pending status
T4	Commit
T5		Execution successful

Table 1

At Moment t1, user A acquires the exclusive lock for the record whose id is 1 in Table t. When user B requests the exclusive lock for the record at moment t2, the user needs to wait; after user A commits A transaction at t4 moment, user B also runs successfully immediately. There are several problems behind this simple example. First, how does innodb suspend the execution thread of user B? Second, how does user B commit transactions after user, returns the result of successful execution immediately. In the above example, innodb uses the lock to achieve the objective of the sequential operation id of user A and user B being 1. The implementation process is described in detail below, we will also introduce some basic knowledge about locks.

1. Innodb Lock Structure

The Innodb Lock structure is managed through lock_sys. All row lock lock_t objects are inserted into the hash table and the row lock objects are managed by maintaining the hash table. The key value of the hash table is managed by the page number (space_id, page_no.

1) Lock System Structure

2) Important Data Structure

1 lock_sys 2 {3 hash_table_t * rec_hash; // row lock hash Table 4 srv_slot_t * waiting_threads; // wait for the object array 5} 6 7 lock_rec_t 8 {9 ulint space; // tablespace No. 10 ulint page_no; // Data Page No. 11 ulint n_bits; // record 12 byte bitmap on the data page [1 + n_bits/8] // bitmap array 13 };

2. Key Processes

1) Create lock_rec_create]

A) calculate the number of records on the page,

B) one bit storage for each record and the storage space required for Calculation

C) apply for a lock_t Bucket

D) initialize bitmap and lock the bit location 1 corresponding to heap_no.

E) insert the Lock Object Pointer into the hash linked list

F) insert the lock object into the chain table of the transaction

2) query the locking status of a record: (whether to lock, lock type)

A) Obtain Record information: (space_id, page_no), and heap_no

B) Search for the hash table based on (space_id, page_no) and obtain the lock Object lock _ t.

C) Determine whether the lock is a shared lock or an exclusive Lock Based on the Lock Object content.

D) If yes, traverse the bitmap of the Lock Object and determine whether the bit corresponding to heap_no is 1.

E) 1 indicates that the lock has been applied.

3) upstream lock

A) Search for the hash table and check whether there is a lock on the page.

B) if the lock does not exist, create a lock and insert the lock object into the hash linked list.

C) If yes, determine whether the transaction has a stronger lock (lock_rec_has_expl)

D) If yes, jump to 5. If not, jump to 6 (lock_rec_lock_slow)

E) set the bit according to heap_no on the page.

F) Determine whether the request lock has a lock conflict

G) If yes, create the lock (Mode: LOCK_WAIT) and set wait_lock (lock_rec_enqueue_waiting)

H) If no, the lock is successful. Add the lock Queue (lock_rec_add_to_queue)

I) The upper-layer call calls the lock wait logic (lock_wait_suspend_thread) based on the returned error code)

4) Lock Wait [lock_wait_suspend_thread]

A) Obtain transaction information based on the worker thread information;

B) apply for the slot node (lock_wait_table_reserve_slot) and initialize the wait event;

C) set the wait event (implemented through the conditional variable in linux) to suspend the thread

Call Stack
#0 pthread_cond_wait #1 OS _cond_wait (pthread_cond_t *, OS _fast_mutex_t *) () #2 OS _event_wait_low (OS _event *, long) () #3 Queue (que_thr_t *)() #4 row_mysql_handle_errors (dberr_t *, trx_t *, que_thr_t *, trx_savept_t *)()

5) release the lock

The row lock of innodb is released only after the transaction is committed or rolled back. After the lock is released, check whether there are lock objects waiting for the lock. If yes, release the lock and wake up the corresponding thread.

A) The extracted lock type is LOCK_WAIT and determines whether to continue waiting.

B) If you do not need to wait, authorize lock_grant

C) locate the corresponding transaction (lock_t-> trx) Information Based on the lock object,

D) Find the corresponding working thread (trx_lock_t-> wait_thr) information through the transaction

E) Find the corresponding slot through the thr information (wait for the event)

F) Call OS _event_set to trigger the event

Call Stack #0 OS _event_set (thr-> slot-> event); #1 lock_wait_release_thread_if_suincluded #2 lock_grant #3 lock_rec_dequeue_from_page #4 lock_trx_release_locks

6) slot management

The lock wait is implemented through the wait event on the slot object (as described below). Each slot object contains a wait event. The number of slots is related to the running thread. Because the blocked body is a thread, you only need to initialize a slot node with the same number of threads as the maximum number of threads. The slot information is stored in the waiting_threads of lock_sys. When slot is required, it is obtained from the array.

Slot Initialization
Lock_sys = static_cast <lock_sys_t *> (mem_zarloc (lock_sys_sz); lock_stack = static_cast <lock_stack_t *> (mem_zarloc (sizeof (* lock_stack) * LOCK_STACK_SIZE )); void * ptr = & lock_sys [1]; lock_sys-> waiting_threads = static_cast <srv_slot_t *> (ptr );

3. innodb row lock overhead

Innodb row locks use bitmap storage. In theory, only one bit is required for a record. The basic unit of a lock is a row, but the lock is managed and organized through transactions and pages. The instance for creating a lock is lock_t, and a lock_t instance corresponds to all records on an index page.

1) Row lock Cost Calculation

Memory overhead mainly comes from pointer and bitmap that stores Lock information. In bitmap, a bit corresponds to a record of the page, and a Page with 200 records. The size of a row Lock Object is about 100 bytes. If the page only locks one row, the cost is 100 bytes/row, and if all records share one lock, the cost is 100 bytes/200 = 4 bit/row. In reality, only when the same transaction locks all records on the page and the lock mode is the same can only ensure that a page has only one lock.

Memory space occupied by a lock_t object
1/* Make lock bitmap bigger by a safety margin */2 n_bits = page_dir_get_n_heap (page) + LOCK_PAGE_BITMAP_MARGIN; 3 n_bytes = 1 + n_bits/8; 4 lock = static_cast <lock_t *> (5 mem_heap_alloc (trx-> lock. lock_heap, sizeof (lock_t) + n_bytes ));

2) Lock Reuse

The innodb Lock Mechanism uses the lock reuse method to ensure that the memory overhead of the lock is as small as possible. Specifically, the same transaction locks the records on the same page, and the lock mode is the same; for the same transaction, the existing locks for the same record are stronger than the requested lock mode, in both cases, you do not need to recreate the Lock Object.

4. Innodb Lock synchronization mechanism (spinlock + mutex + condition variable)

Innodb does not directly adopt native synchronization methods such as spinlock, mutex, or conditional variables. Instead, innodb integrates several methods to achieve the optimal performance. The main functions are mutex_enter_func and mutex_exit.

1) Data Structure

Ib_mutex_t {OS _event_t event; // wait for the event volatile lock_word_t lock_word; // The lock variable OS _fast_mutex_t OS _fast_mutex; // The Atomic lock system is not supported and the mutex ulint waiters is used; // whether there is a waiting thread}

2) Obtain mutex flow [mutex_enter_func (ib-mutex )]

A) First, check mutex-> lock_word to determine whether the lock can be obtained.

B) for systems that do not support spinlock, use the pthread_mutex_trylock method and use OS _fast_mutex to protect mutex-> lock_word and determine whether the lock can be obtained.

C) if not, allocate a cell from the global variable sync_wait_array and set the wait_object of the cell to ib-mutex.

D) set the waiters of ib-mutex to 1.

E) Call OS _event_wait_low (ib-mutex-> event) to suspend the thread

F) after the semaphore is obtained, the thread jumps to step a) to start execution again.

3) release mutex process [mutex_exit_func (ib-mutex )]

A) reset mutex-> lock_word,

B) For spin locks, set through OS _atomic_test_and_set_byte

C) for systems that do not support spin locks, release OS _fast_mutex and set lock_word to 0.

D) Determine whether the ib-mutex object waiters is 1 (whether a thread is suspended)

E) Call mutex_signal_object (ib-mutex-> event)

F) Call pthread_cond_broadcast (event-> cond) to wake up all the waiting threads.

5. innodb waits for event implementation

1) event Structure

OS _event {OS _cond_t cond_var; // The condition variable ibool is_set; // when it is true, the thread will not block OS _fast_mutex_t OS _mutex on the event; // The mutex of the Protection Condition variable}

2) OS _event_set Process

A) Get the mutex OS _mutex

B) If is_set is true, do nothing and release OS _mutex.

C) if is_set is false, set is_set to true.

D) Call the pthread_cond_broadcast broadcast condition variable to wake up all waiting threads.

3) OS _event_wait Process

A) Get the mutex OS _mutex

B) If is_set is determined to be true, nothing will be done and OS _mutex will be released.

C) if is_set is false, call pthread_cond_wait and wait for it.

D) after being awakened, release the mutex OS _mutex.

Return to the question mentioned at the beginning of the article. Assume that the record of table t and id = 1 is on the page () and 2, then the lock node can be represented in a red box, A node represents a lock object. In addition, transaction T2 and T3 have two locks on the page (0,200). Here we will explain why two locks are available on the same page. This is because the owner of the lock object is different. Different transactions must create a lock object for the same lock pattern on the same record. The so-called lock reuse is for the same transaction to lock Multiple records on the same page. If T1 also needs to lock (0,200), if the lock record conflicts with the existing lock, the lock is created and the wait is suspended; otherwise, the lock is created and a success is returned.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More