I original article, reproduced please indicate the source.
MySQL's memory management is large and advanced, which is described in the comments at the beginning of the mem0pool.c file, roughly divided into four parts, containing 9 chunks:
Buffer pool,
Parsed andoptimized SQL statements,
Data Dictionarycache,
Log buffer,
Locks for Eachtransaction,
Hash table forthe Adaptive Index,
State andbuffers to each SQL query currently being executed,
Session foreach User, and
Stack for Eachos thread.
9 chunks are managed by 4 parts
A Solution tothe Memory Management:
1. The Bufferpool size is set separately;
2. Log buffersize is set separately;
3. The commonpool size for all of the other entries, except 8, is set separately.
That is, buffer pool, redo log buffer, normal pool and 8 (user session information, can be seen as part of)
The redo log buffer is managed separately by the redo part, Bufferpool is a complex part of the buffer pool, the content of a lot, ordinary pool above said, in addition to 8, and a. The rest belongs to it. The above structure is the complete picture of the MySQL memory subsystem. :
This article starts with the buffer pool (BUF) and then explains the common pool and inserts the cache.
Before doing a complete, comprehensive and detailed analysis of the buffer pool architecture, it is important that all of the Bufpool subsystem components and priorities involved be exhaustive.
(1) Buf Pool five major components
First, the Bufpool subsystem can be divided into the most basic five module components.
Buffer pool Routines general management (buffer pools routines), buf pool (Multi) instance management, coordination of another four modules to run, storage, transactions, redo logs, the main thread of the system such as external subsystems, such as providing a calling function interface, is the core of five modules.
LRU Chain list management (LRU replacement algorithm), BUF pool Basic Control storage units (detailed below) such as Buf_block_struct and Buf_page_struct, Need to be in the corresponding non-compression and compression of the LRU chain list management, from the design thinking needs to implement to join the LRU list node, delete the LRU list node Two basic points, and then need to implement the allocation block, release block two advanced functions, This results in 15 important main functions that can be called by external modules, and although not for external interfaces, there are still 14 important auxiliary functions associated with the main function (all implementations of the main function and nested details), which are equivalent to 60% of all 43 functions of the LRU. Above.
The refresh mechanism (flush algorithm),the Bufpool part of the module that interacts with the underlying IO, completes the refresh action directly, and the various refresh interfaces provided are called by all modules within the subsystem respectively.
The Refresh method can be divided into: the way to refresh through the LRU list (BUF_FLUSH_LRU) and the Flush list (buf_flush_list). from the function, the Flush module must also contain the management function of the flush list, to realize the insertion and deletion of the list nodes , and to realize the three main functions of single block refresh, near page refresh, Doublewrite component Refresh , and implement the external interface for the transaction part (and mini transaction part) commit action to implement the flush list of dirty block insert.
Partner System (Binary buddy allocator for compressed pages), in Bufpool, the partner system is not a unit that implements unified memory allocation management, its role is limited to allocating BUF control blocks (buf_ PAGE_STRUCT) required to compress the page memory allocation action, but still have a huge role to be negligible. The ultimate goal of Bufpool is to implement a variety of external interfaces for file storage systems, transaction management, master process scheduling units, and so on, which basically require more or less data records in the pages that are managed in the BUF control block (buf_block_struct->buf_ Page_struct.zip.data) to read and write operations, after the partner manages the allocation of memory, the underlying also to be based on the needs of the system operation, compression page to non-compressed page conversion, can be said to be an extremely important component.
Read cache (buffer read), the implementation of simple and direct, single-page asynchronous read, random read, linear pre-reading, while the operation of these functions are based on different situations, the LRU linked list (LRU and UNZIP_LRU) to read the control block of the page (buf_ Block_struct or Buf_page_struct) for node insertion, deletion, at the end of the list and other operations, there are LRU operations where there may be a flush list of flushing action.
(2) Four important structural bodies
Secondly, in the process of running the five components of subsystem, it involves four important structures,
The buffer pool Control instance (buf_pool_struct) , which implements the core data structure of Bufpool instance management, contains the 6 large list headers (ut_list_base_node_t)described in detail below, Contains asynchronous IO Refresh condition variables (os_event_t), LRU and flush linked list mutexes, compressed and uncompressed page hash tables, which are just the most important parts.
struct buf_pool_struct{
。。。。。。
Ulint Lru_old_ratio;
mutex_t Lru_list_mutex;
rw_lock_t Page_hash_latch;
mutex_t Free_list_mutex;
。。。。。。
hash_table_t* Page_hash;
。。。。。。
ut_list_base_node_t (buf_page_t) flush_list;
。。。。。。
ut_list_base_node_t (buf_page_t) free;
。。。。。。
ut_list_base_node_t (buf_page_t) LRU;
。。。。。。
The contents of the};//structure are difficult to describe at a moment's notice, and are then progressively referenced and interpreted one after the other in each module.
The underlying memory allocation unit (BUF_CHUNK_STRUCT), in countless 5.5 or more versions of MySQL kernel analysis article, this structure is the most easily overlooked part, but in a sense is extremely important, because it is closest to the bottom of Bufpool- OS memory allocations, all BUF control blocks (buf_block_struct), are mounted to 6 large lists and can be managed directly through (buf_pool_struct), but the initial memory allocation actions of these structures are in the (buf_chunk_ struct) in the initialization phase of the struct, (buf_chunk_struct) is the most basic and underlying memory allocation unit (BUF_POOL_STRUCT). Although this part is "insignificant" in the code magnitude, it must not be ignored in terms of importance.
struct buf_chunk_struct{
Ulint mem_size; /*!< allocated size of the chunk */
Ulint size; /*!< size of frames[] and blocks[] */
void* Mem; /*!<pointer to the memory area which
Wasallocated for the frames */
buf_block_t* blocks; /*!<array of buffer Control blocks */
The};//CHUNK structure contains less content, and all of the contents are listed here, which are described below according to the module reference.
Non-compressed page control block (buf_block_struct), in NetEase Lao Jiang's "MySQL Kernel InnoDB storage Engine Volume 1" In this structure has been described, as the version changes and the evolution of MySQL function, page control block has evolved, non-compressed page control block The management of (buf_block_struct) is separate from the Compressed page control block (buf_page_struct) , which contains the structure reference of the latter, the physical page frame address, and the UNZIP_LRU list node (ut_ list_node_t), read-write lock, mutex and other important objects, is to achieve bufpool various core interface functions of the most critical control unit.
struct buf_block_struct{
buf_page_t page; /*!<page information; This must
Bethe first field, so
Buf_pool->page_hashcan Point
tobuf_page_t or buf_block_t */
byte* frame; /*!< pointer to buffer frame which
Isof size univ_page_size, and
Alignedto an address divisible by
univ_page_size*/
。。。。。。
};//the other parts of this structure do not do any of the enumeration, but be sure to remember the above two, page non-compressed pages structure refers to the compressed page structure of the important handle, but also in an important function of the casting operation ((buf_block_t*) bpage) The most critical part! as Frame is the Bufpool The core of the department's Real service, which is a non-compressed page (data page, undo pages, special pages ... ) of the page frame address. once a page record is read into the linked list through the Read module for management, then all of its modify operations are equivalent to doing memory modifications for this page frame, As for the write back disk is the asynchronous (synchronous) IO need to consider things (in detail the file storage subsystem when the IO mechanism is fully explained).
Compressed page control block (buf_page_struct), in theory, all non-compressed pages are only a subset of compressed pages (the actual situation to be further verified), because in the core operations, are in the non-compressed pages, Therefore, the Compression page control block (buf_page_struct) does not contain mutexes, but to ensure consistency with (buf_block_struct), the implementation of the lock counter is required, in addition, it contains the tablespace ID of the corresponding physical page, the offset of the page within the Tablespace, the page state, Refresh type (described above Buf_flush_lru and buf_flush_list), compressed page corresponding reference, hash table,5 large list (UNZIP_LRU in non-compressed page structure object (buf_block_ struct) So this is the child node (ut_list_node_t) of the remaining 5 linked lists, and whether it is on the Old_lru side (the old part of the LRU linked list) crashes the page first access time and other details, these are just the most important structural objects listed , not all.
struct buf_page_struct{
unsigned space:32; /*!<tablespace ID; Also protected
Bybuf_pool->mutex. */
unsigned offset:32; /*!<page number; Also protected
Bybuf_pool->mutex. */
。。。。。。
unsigned flush_type:2; /*!< if this block is currently being
Flushedto disk, this tells the
Flush_type.
@seeenum Buf_flush * *
unsigned io_fix:2; /*!<type of pending I/O operation;
Alsoprotected by Buf_pool->mutex
@seeenum Buf_io_fix * *
Unsigned buf_fix_count:19;/*!< count of Howmanyfold this block
iscurrently bufferfixed */
。。。。。。
ut_list_node_t (buf_page_t) free;
ut_list_node_t (buf_page_t) flush_list;
ut_list_node_t (buf_page_t) zip_list;
。。。。。。
};
(3) Six important lists
Thirdly, the process of subsystem component and basic control structure to realize all kinds of complex functions must not be separated from 6 important linked lists .
The idle list (buf_pool->free), one of the most important of the three linked lists, has been mentioned many times in the list of LRU and flush, but not the list of idle links. In fact, it is the only bufpool linked list that explicitly calls the list initialization function for the INIT operation after initialization of the system during the initialization phase of Bufpool. The first LRU-linked table block in the system must be obtained from the free list, and when the Flush module Dirty page refreshes, the LRU list node is purged or moved to the end of the LRU list for cleanup, and the node after the LRU cleanup is still returned to the free list.
LRU linked list (BUF_POOL->LRU), bufpool the most important of the three list of two, when there is an LRU linked list is empty, it is necessary to get the free node from the list, and asynchronous IO read the page into the Bufpool, and add the LRU list, If the LRU chain list is too large, a tail flush will occur, and the flushing failure will be done more thoroughly by flushing the dirty pages directly through the LRU (BUF_FLUSH_LRU mode), theflush linked list node gets the dirty page to complete the refresh, and the dirty blocks of the LRU list are also removed. It can be said that LRU and free, flush three linked list, 5 modules are inextricably linked.
Non-compressed LRU list (BUF_POOL->UNZIP_LRU), This list is actually a subset of the LRU list, in the compressed page control block (buf_page_struct) compressed pages need to be decompressed for various record-level read and write operations, The linked list will work, so it can be said, inserted into the UNZIP_LRU linked list is a certain page in the LRU chain list, the reverse is not necessarily.
Dirty List (buf_pool->flush_list), bufpool the most important of the three linked list two, is actually a subset of the LRU list, so read into the bufpool of the pages are compressed or non-compressed control block management, Must initially be in the LRU list, when the transaction part (and the mini transaction part) completes the commit operation, actually means that the memory writes the success, the dirty page must join the Flush list, and waits for the asynchronous IO thread ( system main thread Subsystem One) for the refresh operation (Linux native asynchronous io and author Heikki Tuuri themselves with the conditional variable implementation of the simulated asynchronous io,buf_pool_struct mentioned, the storage section will be described in detail) action.
The compressed block list (Buf_pool->zip_clean) is not modified, and this list is only used for debugging functions in the current source code.
Partner System Idle list (buf_pool->zip_free[]),bufpool6 the most special one in a large list, the root node of a linked list can be seen as "an array of pointers", The essence of the partner system is to combine and split adjacent memory blocks in multiples of 2 to achieve efficient management and low code complexity. This pointer array actually contains 4 layers, 1024,2048,4096 and 8192, per block size, and each layer of the base node only manages blocks of the same size.
About the Bufpool part of the overall overview, temporarily over, will continue to add new content, and further collation. Lao Liu will insist on this analysis to continue to do. If you are interested in friends can also QQ plus I private chat: 275787374, always welcome.
The next section will explain the LRU. If you find any errors in the article welcome your friends to correct them.
Original MySQL kernel source code depth resolution buffer pool overall overview