Overview of original mysql kernel source code deep Parsing Buffer Pool bufferpool
Mysql memory management is huge and advanced, which is described in the comments at the beginning of the mem0pool. c file. A rough description can be divided into four parts, including nine parts:
Buffer pool,
Parsed andoptimized SQL statements,
Data dictionarycache,
Log buffer,
Locks for eachtransaction,
Hash table forthe adaptive index,
State andbuffers for each SQL query currently being executed,
Session foreach user, and
Stack for eachOS thread.
9. Large blocks are managed through 4 parts
A solution tothe memory management:
1. the bufferpool size is set separately;
2. log buffersize is set separately;
3. the commonpool size for all the other entries, limit T 8, is set separately.
That is, the buffer pool, redo log buffer, common pool, and 8 (user session information can be seen as part)
The redo log buffer is managed separately by the redo part. The bufferpool, that is, the buffer pool, is a complex part with a lot of content. On the general pool, apart from 8, and 1, 2. the rest are all managed by it. The above structure is the complete picture of the mysql memory subsystem. :
This article starts with the buffer pool (buf pool) and explains the common pool and insert cache.
Before performing a complete, comprehensive, and detailed analysis of the buffer pool architecture, you must focus on all the components and highlights of the bufpool subsystem.
(1) Five modules of the Buf pool
First, the bufpool subsystem can be divided into the most basic five module components.
Buffer pool Routine Management (buffer pool routines), buf pool (multiple) instance management, and other four modules for operation, providing function interfaces for external subsystems such as storage, transactions, redo logs, and main system threads is the core of five modules.
LRU replacement algorithm, both need to be managed in the corresponding non-compressed and compressed LRU linked list. From the design perspective, we need to add the LRU linked list node and delete the LRU linked list node, in addition, you need to implement two advanced functions: allocate blocks and release blocks. 15 main functions that can be called by external modules are derived. In addition, although not called by external interfaces, however, there are still 14 important auxiliary functions associated with the primary function (all implementation and nesting details of the primary function). These two functions are more than 60% of all 43 LRU functions.
The refresh mechanism (flush algorithm), The bufpool module that interacts closely with the underlying IO, directly completes the refresh action, and provides various refresh interfaces called by all modules in the system.
The refresh method can be divided into: the method of refreshing through the LRU linked list (BUF_FLUSH_LRU) and the method of refreshing through the flush linked list (BUF_FLUSH_LIST ). The flush module also includes the flush linked list management function to insert and delete linked list nodes, in this way, three main functions are implemented: Single-block refresh, near-page refresh, and doublewrite component refresh, and external interfaces are implemented for the transaction part (and mini transaction part) insert dirty blocks in the flush linked list in the commit action.
Partner System (Binary buddy allocator for compressed pages). In the bufpool, the partner system is not a unit for unified memory allocation and management. Its role is limited to allocating buf control blocks (buf_page_struct) the page memory allocation action is required, but it still plays a huge role. The ultimate goal of the Bufpool is to implement various external interfaces for file storage systems, transaction management, Master process scheduling units, and so on, these parts basically require more or less read and write operations on the data records (buf_block_struct-> buf_page_struct.zip.data) on the pages managed in the buf control block. After the partner manages the allocated memory, the underlying layer also needs to compress pages to non-compress pages based on system running needs. This is an extremely important component.
The read cache (buffer read) implements simple and direct functions. Single-page asynchronous reads, random reads, and linear pre-reads also run these functions according to different situations, the LRU linked list (LRU and unzip_LRU) performs node insertion, deletion, and placement at the end of the linked list on the control block (buf_block_struct or buf_page_struct) of the page read, the refresh action of the flush linked list may also exist in places with LRU operations.
(2) Four important structures
Secondly, four important structures are involved in the process of running the five major components of the subsystem,
The buffer pool control instance (buf_pool_struct) implements the core data structure of bufpool instance management, including the six linked table headers (UT_LIST_BASE_NODE_T) described in detail below ), contains asynchronous IO refresh condition variables (OS _event_t), LRU and flush linked list mutex, compressed and non-compressed page hash tables. These are just the most important parts.
Struct buf_pool_struct {
......
Ulint LRU_old_ratio;
Mutex_t LRU_list_mutex;
Rw_lock_t page_hash_latch;
Mutex_t free_list_mutex;
......
Hash_table_t * page_hash;
......
UT_LIST_BASE_NODE_T (buf_page_t) flush_list;
......
UT_LIST_BASE_NODE_T (buf_page_t) free;
......
UT_LIST_BASE_NODE_T (buf_page_t) LRU;
......
}; // The content of the struct cannot be described at the moment. The following sections will be referenced and explained one by one when necessary in each module.
The underlying memory allocation unit (buf_chunk_struct) is the most easily overlooked part in countless mysql kernel analysis articles of Version 5.5 and later, but in a sense, it is extremely important, because it is the most close to the underlying-OS Memory Allocation of the bufpool, and all the buf control blocks (buf_block_struct ), all are mounted to the six chain tables and can be managed directly through (buf_pool_struct). However, the initial memory allocation actions of these struct are completed in the initialization phase of the (buf_chunk_struct) struct, (buf_chunk_struct) is the basic and underlying memory allocation unit of (buf_pool_struct. Although this part of the code is "insignificant" in terms of importance, it cannot be ignored.
Struct buf_chunk_struct {
Ulint mem_size ;/*! <Allocated size of the chunk */
Ulint size ;/*! <Size of frames [] and blocks [] */
Void * mem ;/*!
Wasallocated for the frames */
Buf_block_t * blocks ;/*!
}; // The chunk struct contains a small amount of content. All content has been listed here. The following sections describe the content based on the module reference.
The non-compressed page control block (buf_block_struct) is described in mysql kernel innodb Storage engine volume 1 by Netease lajiang. As the version changes and mysql functions evolve, the page control block has evolved. The management of the non-compressed page control block (buf_block_struct) is separated from the compressed page control block (buf_page_struct, the former includes the structure reference of the latter, the frame address of the physical page, the unzip_LRU linked list node (UT_LIST_NODE_T), The read/write lock, the mutex and other important objects, it is the most critical control unit for implementing various core interfaces of the bufpool.
Struct buf_block_struct {
Buf_page_t page ;/*!
Bethe first field, so that
Buf_pool-> page_hashcan point
Tobuf_page_t or buf_block_t */
Byte * frame ;/*! <Pointer to buffer frame which
Isof size UNIV_PAGE_SIZE, and
Alignedto an address divisible
UNIV_PAGE_SIZE */
......
}; // Other parts of this struct are not listed first, but remember the two above. The non-compressed page struct references the important handle of the compressed page struct, it is also the most critical part of an important function for force conversion (buf_block_t *) bpage! Frame is the core of the real service of the bufpool Department. This is a non-compressed page (data page, undo page, special page ......) Page frame address. Once a page record is read and managed by reading the linked list through the read module, all its modify operations are equivalent to making memory modifications to the page frame. As for writing back to the disk, It is asynchronous (synchronous) i/O considerations (I/O mechanisms will be fully described when I/O is explained in the file storage subsystem ).
Compressed page control block (buf_page_struct). Theoretically, all non-compressed pages are only a subset of the compressed pages (I need to further verify the actual situation), because during core operations, therefore, the compressed page control block (buf_page_struct) does not contain mutex. To ensure consistency with (buf_block_struct), the lock counter must be implemented. In addition, it also contains the tablespace id, offset, page status, and refresh type of the corresponding physical page (BUF_FLUSH_LRU and BUF_FLUSH_LIST are described above) compression page references, hash table, five linked lists (unzip_LRU in the non-compressed Page Structure object (buf_block_struct), so here is the remaining five linked lists) of the subnodes (UT_LIST_NODE_T) and whether it is in the OLD_LRU end (LRU linked list structure old part) Crash page first access time and other details, these are just the most important struct objects listed, not all.
Struct buf_page_struct {
Unsigned space: 32 ;/*!
Bybuf_pool-> mutex .*/
Unsigned offset: 32 ;/*!
Bybuf_pool-> mutex .*/
......
Unsigned flush_type: 2 ;/*! <If this block is currently being
Flushedto disk, this tells
Flush_type.
@ Seeenum buf_flush */
Unsigned io_fix: 2 ;/*!
Alsoprotected by buf_pool-> mutex
@ Seeenum buf_io_fix */
Unsigned buf_fix_count: 19 ;/*! <Count of howmanyfold this block
Iscurrently bufferfixed */
......
UT_LIST_NODE_T (buf_page_t) free;
UT_LIST_NODE_T (buf_page_t) flush_list;
UT_LIST_NODE_T (buf_page_t) zip_list;
......
};
(3) six important linked lists
Once again, the sub-system components and the basic control struct must implement various complex functions and cannot be separated from the six important linked lists,
The idle linked list (buf_pool-> free) is one of the three most important chain lists of the bufpool. Previously, we have mentioned multiple linked lists such as LRU and flush, but we have not mentioned idle linked lists, in fact, it is the only bufpool linked list that explicitly calls the linked list initialization function to perform the init operation after the bufpool is initialized in the system initialization phase. The first LRU linked list block in the system must be obtained from the free linked list. When the dirty page of the flush module is refreshed, the LRU linked list node will be cleared or moved to the end of the LRU linked list to be cleared. After the LRU is cleared, the node will still return to the free linked list.
LRU linked list (buf_pool-> LRU): the second of the three most important chain lists of bufpool. When the LRU linked list is empty, it is necessary to obtain idle nodes from the free linked list, asynchronous IO reads the page into the bufpool, and adds the LRU linked list. When the LRU linked list is too long, the tail is refreshed, if the refresh fails, dirty pages are refreshed directly through LRU (BUF_FLUSH_LRU). The flush linked list node is refreshed after the dirty pages are released, at the same time, the dirty blocks of the LRU linked list are also removed. It can be said that LRU and free, flush three linked lists, five modules are inextricably linked.
Non-compressed LRU linked list (buf_pool-> unzip_LRU), this linked list is actually a subset of the LRU linked list, in the compression page control block (buf_page_struct) when the compressed page in needs to be decompressed for various record-level read/write operations, the linked list will play a role. Therefore, it can be said that the pages inserted into the unzip_LRU linked list must be in the LRU linked list, and vice versa.
Dirty block linked list (buf_pool-> flush_list), the second of the three most important chain lists of bufpool, is actually a subset of the LRU linked list, therefore, the pages read into the bufpool are managed through compressed or non-compressed control blocks. Initially, it must be in the LRU linked list. When the transaction part (and the mini transaction part) completes the commit operation, in fact, it means that memory writing is successful. Dirty pages must be added to the flush linked list and wait for asynchronous IO threads (one of the main thread sub-systems of the system) refresh (the native asynchronous IO in linux and the simulated asynchronous IO implemented by the author Heikki Tuuri using conditional variables, buf_pool_struct mentioned, the storage part will be described in detail.
The compressed block linked list (buf_pool-> zip_clean) has not been modified. This linked list is only used for debugging in the source code.
The idle linked list of the partner System (buf_pool-> zip_free []) is the most special one among the bufpool6 linked lists. The root node of the linked list can be seen as a pointer array ", the essence of the partner system is to merge and split adjacent memory blocks in multiples of 2 to achieve efficient management and low Code complexity. The pointer array actually contains 4 layers, 4096, 8192, and based on the block size. Each layer of the base node only manages blocks of the same size.
The overall overview of the bufpool section is coming to an end, and new content will be added in the future for further sorting. Liu will stick to this analysis.
The next section details the LRU. If you find any errors in the article, please correct them.