InnoDB source code reading notes -- buffer pool, innodb source code --

Last Update:2016-06-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When I first learned about Oracle, there was a concept called SGA and PGA, which is a very important concept. It is actually a buffer pool in the memory. The Design of InnoDB is similar to that of Oracle, and a buffer pool is opened in the memory. As we all know, the difference between the CPU speed and the disk I/O speed can be described by the gap. Therefore, smart predecessors use the memory ROM to make up for this gap, the database designers also inherit this excellent design concept and open up an area in the memory to store buffered data and improve database efficiency.

The disk buffer can be understood as a simple model-an area composed of data blocks. The default size of a data block (block/page) is 16 kb. Now we can draw a model that is easy to understand:

Each grid here represents a page. In the code, this area has two key data structures: buf_pool_struct and buf_block_struct. Buf_pool_struct is the data structure of the buffer pool, and buf_block_struct is the data structure of the data block.

For buffer pool management, InnoDB maintains a free linked list, which records unused memory blocks. Each request for data blocks must be retrieved from the free linked list. However, in general, the buffer pool of the database is smaller than the actual data volume, so the buffer pool is used up for one day, that is, all the pages of the free linked list are allocated, at this time, another data structure began to play a role-LRU linked list.

LRU is a classic algorithm that is Least recently Used (Lastest Least Used ). The most frequently used pages are always in front of the linked list, and the last page is the page to be released. However, InnoDB did not adopt this kind of product. Instead, it launched a simplified version of LRU. Some people call it midpoint LRU, which is like this:

The main improvement of InnoDB is that the data read from the disk is not directly placed in the head of the linked list, but at 3/8 of the linked list (this value can be configured ), this page will be moved to the head of the linked list only when you access this page next time. The reason for this improvement is described in MySQL kernel-InnoDB Storage engine (p250 ). This linked list is divided into two parts. Before the midpoint, it is called the young list and after the midpoint, it is called the old list. The data block at the end of the linked list will be released. The buf_LRU_search_and_free_block function will complete this operation:

    block = UT_LIST_GET_LAST(buf_pool->LRU);    while (block != NULL) {        ut_a(block->in_LRU_list);        mutex_enter(&block->mutex);        freed = buf_LRU_free_block(block);        mutex_exit(&block->mutex);        if (freed) {            break;        }

The above code snippet shows the release process described above.

All of the above is based on a hypothesis -- the page in the free linked list is allocated. So when the database was just started, the free linked list had enough pages to allocate. How does InnoDB work?

The comment of the buf_LRU_add_block function clearly states that this function is used to add a block to the LRU list. Therefore, this function is used to add a block to the LRU, regardless of whether the free linked list has pages that can be allocated. When viewing this function, I noticed a constant: BUF_LRU_OLD_MIN_LEN. In code 5.1.73, it is set to 80. This function determines the young mark of the block. during system initialization, this function sets all the blocks to young and places them in the head of the linked list until the LRU linked list length is greater than or equal to BUF_LRU_OLD_MIN_LEN.

After the LRU length is greater than or equal to BUF_LRU_OLD_MIN_LEN, InnoDB sets all pages in LRU to old (buf_LRU_old_init) and calls the buf_LRU_old_adjust_len function to adjust the position until the Linked List displays the above state. The following code is used:

voidbuf_LRU_old_adjust_len(void)/*========================*/{    ulint    old_len;    ulint    new_len;    ut_a(buf_pool->LRU_old);    ut_ad(mutex_own(&(buf_pool->mutex)));    ut_ad(3 * (BUF_LRU_OLD_MIN_LEN / 8) > BUF_LRU_OLD_TOLERANCE + 5);    for (;;) {        old_len = buf_pool->LRU_old_len;        new_len = 3 * (UT_LIST_GET_LEN(buf_pool->LRU) / 8);        ut_a(buf_pool->LRU_old->in_LRU_list);        /* Update the LRU_old pointer if necessary */        if (old_len < new_len - BUF_LRU_OLD_TOLERANCE) {            buf_pool->LRU_old = UT_LIST_GET_PREV(                LRU, buf_pool->LRU_old);            (buf_pool->LRU_old)->old = TRUE;            buf_pool->LRU_old_len++;        } else if (old_len > new_len + BUF_LRU_OLD_TOLERANCE) {            (buf_pool->LRU_old)->old = FALSE;            buf_pool->LRU_old = UT_LIST_GET_NEXT(                LRU, buf_pool->LRU_old);            buf_pool->LRU_old_len--;        } else {            ut_a(buf_pool->LRU_old); /* Check that we did not                         fall out of the LRU list */            return;        }    }}

As you can see, the function uses an unconditional loop to constantly move the buf_pool-> LRU_old until the condition is met.

As for the insert operation of the LRU linked list, it is actually very simple, that is, every time you place the newly inserted page to the next position of buf_pool-> LRU_old, when you visit this data page again later, call the buf_LRU_make_block_young function to move it to the head of the linked list.

UT_LIST_INSERT_AFTER(LRU, buf_pool->LRU, buf_pool->LRU_old,                     block);

Note in UT_LIST_INSERT_AFTER: Inserts a NODE2 after NODE1 in a list. node1 here refers to buf_pool-> LRU_old, and node2 refers to block. The key step in the buf_LRU_make_block_young function is as follows:

UT_LIST_ADD_FIRST(LRU, buf_pool->LRU, block);

UT_LIST_ADD_FIRST: Adds the node as the first element in a two-way linked list.

So far, I have learned how a data page is read to the memory. To sum up, the starting process is as follows:

1. during system initialization, all pages in the free linked list can be allocated.

2. When there is a data request, put the block read from the disk into the LRU linked list. This operation directly sets all the blocks to young and inserts them into the head of the linked list until the LRU length reaches BUF_LRU_OLD_MIN_LEN.

3 when the LRU length reaches BUF_LRU_OLD_MIN_LEN, InnoDB performs the following operations:

3.1 set all LRU blocks to old (buf_LRU_old_init)

3.2 schedule the buf_LRU_old_adjust_len function and adjust buf_pool-> LRU_old to the appropriate location.

4. When there is a new page to insert LRU, schedule the buf_LRU_add_block function, Mark old as true, and insert the page to the next position of buf_pool-> LRU_old

5. If the data page in Step 4 is accessed again, the InnoDB scheduling buf_LRU_make_block_young function places the page in the head of the LRU linked list.

6. After the free linked list is allocated, you need to find the available block from the end of LRU. This operation is executed by buf_LRU_search_and_free_block.

Tips:

Note that the block at the end of the LRU linked list can be released, but the two prerequisites must be met: the page is not dirty; the page is not used by other threads. Because dirty pages always need to be refreshed to the disk, when dirty pages need to be replaced, they need to be first flushed into the disk. The buf_LRU_free_block function used to release the tail block has a constraint:

if (!buf_flush_ready_for_replace(block)) {        return(FALSE);    }

If this page does not meet the conditions, false is returned. At this time, the buf_LRU_search_and_free_block function will continue to look for the previous block of the tail block:

block = UT_LIST_GET_PREV(LRU, block)

Then, determine whether the block can be released. The complete code is as follows. I added some comments myself:

Iboolbuf_LRU_search_and_free_block (/* =================================* // * out: TRUE if freed */ulint n_iterations)/* in: how many times this has been called repeatedly without result: a high value means that we shoshould search farther; if value is k <10, then we only search k/10 * [number of pages in the buffer pool] from the end of the LRU list */{buf_block_t * block; ulint distance = 0; ibool freed; Mutex_enter (& (buf_pool-> mutex); freed = FALSE; block = UT_LIST_GET_LAST (buf_pool-> LRU); while (block! = NULL) {ut_a (block-> in_LRU_list); mutex_enter (& block-> mutex); freed = buf_LRU_free_block (block ); // This function will first determine whether the block can be released mutex_exit (& block-> mutex); if (freed) {// if the above judgment page cannot be released, here, the cycle cannot jump out of break;} block = UT_LIST_GET_PREV (LRU, block); // The page at the end cannot be released. Find the block before it and continue to cycle distance ++; if (! Freed & n_iterations <= 10 & distance> 100 + (n_iterations * buf_pool-> curr_size)/10) {buf_pool-> LRU_flush_ended = 0; mutex_exit (& (buf_pool-> mutex); return (FALSE) ;}} if (buf_pool-> LRU_flush_ended> 0) {buf_pool-> LRU_flush_ended --;} if (! Freed) {buf_pool-> LRU_flush_ended = 0;} mutex_exit (& (buf_pool-> mutex); return (freed );}

The source code of the InnoDB buffer pool is being viewed over the past two days. This is the only difference for the time being. Although the C language used here exceeds the level I know (I can only understand simple C code, and can barely understand pointers), I add comments and references, I still feel much better than simply reading documents.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

InnoDB source code reading notes -- buffer pool, innodb source code --

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

InnoDB source code reading notes -- buffer pool, innodb source code --

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support