InnoDB source Reading notes-buffer pool

Source: Internet
Author: User
Tags mutex

When I first learned Oracle, there was a concept called the SGA and PGA, which is a very important concept, in fact, the buffer pool in memory. The InnoDB is designed to resemble Oracle and also opens up a buffer pool in memory. As we all know, the speed of CPU and the speed of the disk IO can be described by the gap, so the wise predecessors used the memory of this ROM to compensate for the gap, then the database designers also inherited this excellent design concept, in memory to open up a region, storage buffer data, improve database efficiency.

The buffer of the disk can be understood as a simple model-an area of data blocks (block/page) with a default size of 16KB. So now you can draw a good understanding of the model came out:

Each of these squares represents a page. In the code, this area has two key data structures: Buf_pool_struct and buf_block_struct. Where buf_pool_struct is the data structure of the buffer pool, buf_block_struct is the data structure of the data block.

For the management of the buffer pool, InnoDB maintains a free list of memory blocks that are not used in the list, and each request block is taken from the list. However, in general, the buffer pool of the database is smaller than the actual data, so the buffer pool is always useful for the day, that is, all the pages of the free list are allocated, this time another data structure began to play a role--lru linked list.

LRU is a classic algorithm, the full name is the least recently used (lastest Least used). The most frequently used page is always in front of the linked list, and the last page is the page to be freed. However, InnoDB did not use this kind of Da Lu Huo, but a new way to make an improved version of the LRU, some people call him midpoint LRU, is this:

The main improvement of InnoDB is that each time the data read on the disk is not placed directly on the head of the linked list, but instead placed at 3/8 of the list (the value is configurable), the page is moved to the list header only the next time it is accessed. The reason for this improvement is discussed in the book "MySQL Kernel--innodb storage Engine" (P250). This list is divided into two parts, called the old list before midpoint is called Young list,midpoint. The data block at the end of the list will be freed, and the Buf_lru_search_and_free_block function will do the following:

    block = Ut_list_get_last (buf_pool->LRU);      while (block = NULL) {        ut_a (block-andin_lru_list);        Mutex_enter (&block->mutex);         = Buf_lru_free_block (block);        Mutex_exit (&block->mutex);         if (Freed) {            break;        }

The above code snippet shows the release process described above.

All that was said before is based on the assumption that the pages in the--free linked list are allocated. So when the database just started, the free list has enough pages to allocate, how does the InnoDB work?

The comment in the Buf_lru_add_block function explicitly writes that the function is used to join the block to the LRU list. Therefore, any operation that adds a block to the LRU is done by the function, regardless of whether the free linked list has pages that can be assigned. When I look at this function, I notice a constant: Buf_lru_old_min_len. In 5.1.73 's code it is set to 80. This function will determine the young tag of block, and at system initialization, this function will set all blocks to young and put them on the list head until the length of the LRU list is greater than or equal to Buf_lru_old_min_len.

After the LRU length is greater than or equal to Buf_lru_old_min_len, InnoDB resets all the pages in the LRU to Old (Buf_lru_old_init), and then calls the Buf_lru_old_adjust_len function to adjust the position. Until the linked list renders the above state. Here's the code:

voidBuf_lru_old_adjust_len (void)/*========================*/{ulint Old_len;    Ulint New_len; Ut_a (Buf_pool-lru_old); Ut_ad (Mutex_own (& (buf_pool->mutex)); Ut_ad (3* (Buf_lru_old_min_len/8) > Buf_lru_old_tolerance +5);  for (;;) {Old_len= buf_pool->Lru_old_len; New_len=3* (Ut_list_get_len (BUF_POOL-&GT;LRU)/8); Ut_a (Buf_pool->lru_old->in_lru_list); /*Update The Lru_old pointer if necessary*/        if(Old_len < New_len-buf_lru_old_tolerance) {Buf_pool->lru_old =Ut_list_get_prev (LRU, Buf_pool-lru_old); (Buf_pool->lru_old)->old =TRUE; Buf_pool->lru_old_len++; } Else if(Old_len > New_len +buf_lru_old_tolerance) {(Buf_pool->lru_old)->old =FALSE; Buf_pool->lru_old =Ut_list_get_next (LRU, Buf_pool-lru_old); Buf_pool->lru_old_len--; } Else{ut_a (Buf_pool->lru_old);/*Check that we do not fall out of the LRU list*/            return; }    }}

As you can see, the function uses an unconditional loop to keep moving the buf_pool->lru_old position until the condition is met.

As for the insertion of the LRU list, it is very simple that each time the newly inserted page is placed in the next position of the buf_pool->lru_old, and then the data page is accessed again later, the call Buf_lru_make_block_ The young function moves it to the head of the linked list.

Ut_list_insert_after (LRU, BUF_POOL->LRU, buf_pool->lru_old,                     block);

Ut_list_insert_after's notes are clear: Inserts a NODE2 after NODE1 in a LIST. The node1 here means that buf_pool->Lru_old,node2 is the block. And the key step in the Buf_lru_make_block_young function:

Ut_list_add_first (LRU, BUF_POOL->LRU, block);

Ut_list_add_first's note reads: Adds the node as the first element in a two-way linked LIST.

At this point, we basically understand how a data page is read into memory. To summarize, start with the following process:

1 when the system is initialized, all pages in the free list can be assigned.

2 when there is a data request, the block that is read from the disk is placed in the LRU list, which simply places all blocks as young and inserts the list head until the LRU length reaches Buf_lru_old_min_len.

3 when the LRU length reaches Buf_lru_old_min_len, InnoDB will do the following:

3.1 Set all LRU blocks to old (Buf_lru_old_init)

3.2 Dispatch the Buf_lru_old_adjust_len function to adjust the buf_pool->lru_old to the appropriate position.

After 4, each time a new page is inserted into the LRU, dispatch the Buf_lru_add_block function and mark old as true to insert the page into thenext position of the buf_pool-> lru_old

5 If the data page in step fourth is accessed again, InnoDB dispatches the Buf_lru_make_block_young function to place the page in the LRU list header.

6 The free list is allocated, and it is necessary to find a block that can be freed from the LRU tail, which is executed by Buf_lru_search_and_free_block.

Tips

One thing to note here is that the block at the end of the LRU list can be released, but there are two prerequisites: The page is not dirty, and the page is not used by other threads. Because dirty pages are always flushed to disk, when dirty pages are to be replaced, you need to first swipe them into disk. The function used to release the trailing block has a constraint in Buf_lru_free_block:

if (! buf_flush_ready_for_replace (block)) {        return(FALSE);    }

If the page does not meet the criteria, it will return false, so at this point the Buf_lru_search_and_free_block function will continue to look for the last block of the trailing block:

block = Ut_list_get_prev (LRU, block)

Then continue to determine whether the block can be released. The complete code is as follows, and I added some comments myself:

Iboolbuf_lru_search_and_free_block (/*==========================*/                /*Out:true if Freed*/ulint n_iterations)/*In:how Many times this have been called repeatedly without result:a high value means t Hat we should search farther; If value is K < ten, then we only search K/10 * [number of pages in the buffer pool] from The end of the LRU list*/{buf_block_t*Block; Ulint Distance=0;    Ibool freed; Mutex_enter (& (buf_pool->mutex)); Freed=FALSE; Block= Ut_list_get_last (buf_pool->LRU);  while(Block! =NULL) {UT_A (block-in_lru_list); Mutex_enter (&block->mutex); Freed=Buf_lru_free_block (block);//This function will first determine if block can be released Mutex_exit (&block->mutex); if(Freed) {//If the above Judgment page cannot be released, the loop here cannot jump out Break; } Block=Ut_list_get_prev (LRU, block); The trailing page cannot be released, looking for the block in front of it, to continue the loop distance++; if(!freed && n_iterations <=Ten&& Distance > -+ (N_iterations * buf_pool->curr_size)/Ten) {Buf_pool->lru_flush_ended =0; Mutex_exit (& (buf_pool->mutex)); return(FALSE); }    }    if(Buf_pool->lru_flush_ended >0) {Buf_pool->lru_flush_ended--; }    if(!freed) {Buf_pool->lru_flush_ended =0; } mutex_exit (& (buf_pool->mutex)); return(freed);}

These two days are looking at InnoDB buffer pool source code, for the time being only this point harvest. The C language used here is more than my level of understanding (I can basically only read simple C code, there is a pointer to understand), but with comments and reference materials, or feel more than simply look at the document to be more enjoyable.

Inserts a NODE2 after NODE1 in a list.

InnoDB source Reading notes-buffer pool

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.