In-depth understanding of the underlying implementation of MySQL
This article comes fromGitChatFor more information, see "understanding the underlying implementation of MySQL" and "read the original article.
「 High energy at the end of the article 」
Edit | Habi
MySQL common Engine 1. InnoDB
There are two InnoDB Storage files with the suffix. frm and. idb, where. frm is the table definition file and idb is the data file.
There are table locks and row locks in InnoDB, but the row locks work only when the index is hit.
InnoDB supports transactions and supports four isolation levels (uncommitted read, committed read, Repeatable read, and serialized). By default, InnoDB supports repeated read. in Oracle databases, only the serialized level and read committed level are supported. The default value is the Read committed level.
2. Myisam
There are three Myisam storage files with the extension names:. frm,. MYD, and MYI. The. frm file is the table definition file, the. MYD file is the data file, and the. MYI file is the index file.
Myisam only supports table locks and does not support transactions. Myisam has a separate index file, which has high performance in Data Reading.
3. Storage Structure
Both InnoDB and Myisam use B + Tree to store data.
MySQL Data and index Storage Structure 1. Data Storage principle (hard disk)
Information is stored in the hard disk. The hard disk is composed of many disks. data is stored by magnetic substances on the disk surface.
Place the disc under the microscope to zoom in. We can see that the surface of the disc is uneven, and the raised area is magnetized, indicating number 1. The concave area is not magnetized, indicating number 0, therefore, the hard disk can store information such as text and images in binary format.
There are many hard disks, but they are composed of disks, heads, disk spindle, control motor, head controller, data converter, interface, cache and other parts.
All the disks are fixed on a rotating axis, which is the disk spindle.
All disks are absolutely parallel. Each disk has a head on the disk. The distance between the head and the disk is smaller than the diameter of the first hair.
All the heads are connected to one head controller. The head Controller is responsible for the movement of each head. The head can be moved along the radius of the disk. In fact, it is a diagonal cutting, each head must be coaxial at the same time, that is, from the top down, all heads overlap at any time.
Due to the development of technology, there is already an independent multi-head technology. This situation is not considered here.
The disk runs at a high speed with thousands of turns per minute to tens of thousands, so that the head can read and write data at the specified position on the disk.
Because the hard disk is a high-precision device and the dust is its enemy, it must be completely sealed.
2. Principles of data read/write
Hard Disks are logically divided into tracks, cylinders, and sectors.
The head is close to the surface of the main shaft, that is, the place with the smallest wire speed. It is a special area that does not store any data. It is called the start and stop area or the landing area. The Start and Stop area is the data area.
In the outermost ring, the farthest place from the spindle is the "0" track, and the storage of Hard Disk Data starts from the outermost ring.
There is also a component named "0" track detector in the hard disk, which is used to complete the initial positioning of the hard disk.
Disk Surface
Hard Disk disks are generally made of aluminum alloy materials as substrates. Each disk on the hard disk has two upper and lower disk surfaces. Generally, each disk can be used to store data and become an effective disk surface, there are also a few hard disk faces.
Each valid disk has a disk number starting from 0 from top to bottom.
In the hard disk system, the disk number is also called the head number, because each valid disk has a corresponding read/write head, and the disk groups of the hard disk are 2-14, usually there are 2-3 disks.
Track
The disk is divided into many concentric circles during formatting. These concentric circle tracks are called tracks.
Tracks are sequentially numbered from 0 in the outward direction. Each disk of the hard disk has 300-1024 tracks. The number of tracks on each disk of the new large-capacity hard disk is more, information is recorded in these tracks in the form of a pulse string. These concentric circles are not recorded continuously, but are divided into arcs of the segment.
The angle velocity of these arcs is the same. Because the radial length is different, the line velocity is also different. The line velocity of the outer ring is higher than that of the inner ring. That is, the outer ring is in the same time period at the same speed, the length of the arc to be crossed is larger than that of the arc to be crossed by the inner ring.
Each section of an arc is called a slice. The Slice starts from 1 and the data in each slice is read or written as a unit at the same time.
The track is invisible. It is only some of the magnetized areas on the disk surface that have been magnetized in special form. It has been planned during disk formatting.
Cylindrical
The same track on all disks forms a cylindrical disk, usually called a cylindrical disk.
The head on each cylinder starts from 0 and starts from top to bottom. Data reading/writing is performed by the cylinder, that is to say, when the head reads/writes data, it first starts to operate from 0 in the same cylinder, and then operates down on different disks of the same cylinder, that is, the head.
Only after all the heads in the same cylinder have been read/written is the head transferred to the next cylinder (concentric circles and then to the inside cylinder), because the selected head only needs to be switched over electronically, the selected cylinder must be switched mechanically, and the electronic switching is quite fast, which is much faster than moving the mechanical head to the adjacent track.
Therefore, data reading/writing is performed by the cylindrical column instead of the disc plane. That is to say, after a track is full of data, it is written on the next disk of the same cylindrical column. After a cylindrical column is full, before moving to the next sector to start writing data, read data is also carried out in this way, which improves the read/write efficiency of the hard disk.
Slice
The operating system stores information on the hard disk in the form of sectors. Each sector contains 512 bytes of data and other information. One sector has two main parts: the identifier of the location where the data is stored and the data segment where the data is stored.
An identifier is the header of a sector. It consists of three numbers that constitute the three-dimensional address of a sector: the disk number, the cylinder number, and the fan area number (block number ).
Data segments can be divided into data and Error Correction Code (ECC) for data protection ). During the initial preparation, the computer fills in this section with 512 virtual information bytes (actual data storage location) and ECC numbers corresponding to these virtual information bytes.
3. Complete disk access request process
1) determine the disk address (cylindrical number, head number, sector number), memory address (Source/object ):
When data needs to be read from a disk, the system transfers the Logical Address of the data to a disk. the control circuit of the disk translates the logical address into a physical address according to the addressing logic, determine the track and sector of the data to be read.
2) to read data from this sector, you need to place the head above this sector to achieve this:
A. First, the cylinder must be located. That is, the head must be moved to the corresponding track. This process is called track seeking and the time consumed is called track seeking time.
B. Then, the target sector is rotated to the bottom of the head, that is, the disk rotation is used to rotate the target sector to the bottom of the head. the time consumed in this process is called the rotation time.
3) the process of completing a disk access request (read/write) consists of three actions:
A. Seek (time): Move the head to locate the specified track.
B. Rotation delay (time): Wait for the specified sector to rotate from the bottom of the head.
C. Data Transmission (time): the actual data transmission between the disk and the memory.
4. disk read/write principles
When the system stores files on a disk, it performs the following operations: cylindrical, Head, and sector, that is, all sectors under the first head of the 1st track, then the next head of the same cylinder ......
After a cylindrical storage is full, it is pushed to the next cylindrical until all the file content is written to the disk.
The system also reads data in the same order. when reading the data, it notifies the disk controller to read the cylindrical number, head number, and sector number (three parts of the physical address) of the slice.
5. Reduce the I/O pre-read Principle
Because of the characteristics of the storage medium, the access to the disk itself is much slower than the primary storage, coupled with the time required for mechanical movement, the access speed of the disk is usually several 1% of the primary storage.
Therefore, to improve efficiency, we need to minimize disk I/O.
The disk is usually not read strictly as needed, but preread every time. Even if only one byte is required, the disk reads data of a certain length from this position in sequence and puts it into the memory.
This theory is based on the well-known local principle in Computer Science:
When a piece of data is used, the data nearby it is usually used immediately.
The data required during the program running is usually concentrated.
Because sequential disk reading is highly efficient (requires little rotation time without seeking time), preread can improve I/O efficiency for local programs.
The preread length is generally an integer multiple of the Page. Pages are logical blocks for computer memory management. Hardware and operating systems often divide primary and disk storage into contiguous blocks of the same size.
Each block is called a page (in many operating systems, the page size is usually 4 k). The primary storage and disk exchange data in pages, when the data to be read by the program is not in the primary storage, a page missing exception is triggered.
At this time, the system sends a read information to the disk. the disk finds the starting position of the data and reads one or several pages of data into the memory consecutively. Then, an exception is returned and the program continues to run.
6. MySQL Index
An index is a data structure used to efficiently obtain data from MySQL.
We usually refer to creating an index on a field, which means that MySQL will store the field in the index data structure, and then there will be a corresponding search algorithm during the search.
The fundamental purpose of index creation is to optimize the search, especially when the data is very large, the general search algorithms include sequential search, half-lookup, and quick search.
However, each search algorithm can only be applied to a specific data structure. For example, sequential search depends on the ordered structure. Binary Search is implemented using the binary search tree or the red/black tree. Therefore, in addition to data, the database system also maintains data structures that meet specific search algorithms.
These data structures reference data in some way, so that you can implement advanced search algorithms on these data structures, which are indexes.
7. MySQL B + Tree
Currently, most database systems and file systems use B-Tree or its variant B + Tree as the index structure.
The B + tree index is an implementation of the B + tree in the database. It is the most common and frequently used index in the database. B In the B + tree represents a balance, not a binary cross.
Because the B + tree evolved from the first balanced binary tree. The B + Tree is gradually optimized by the Binary Search Tree, the balanced binary Tree (AVLTree), and the balanced multiple search Tree (B-Tree.
Binary Search Tree:The key value of the Left subtree is smaller than the root key value, and the key value of the right subtree is greater than the root key value.
AVL Tree:The AVL Tree meets the condition of the Binary Search Tree, and the maximum height difference between the two Subtrees of any node is 1.
Balanced Multi-path search Tree (B-Tree): A balanced search tree designed for storage devices such as disks.
When the system reads data from the disk to the memory, it uses the disk block as the basic unit. Data in the same disk block will be read at one time rather than on demand.
The InnoDB Storage engine uses pages as the unit for Data Reading. pages are the smallest unit for disk management. The default page size is 16 kb.
The storage space of a disk block in the system is usually not that large. Therefore, each time InnoDB applies for disk space, it will contain several sequential disk blocks to reach the page size of 16 kb.
When InnDB reads data from a disk to a disk, it takes the page as the basic unit. When querying data, if each piece of data in a page can help locate the location of the data record, this reduces the number of disk I/O operations and improves query efficiency.
B-Tree structure data allows the system to efficiently find the disk block where the data is located.
To describe B-Tree, first define a data record as a binary group [key, data]. The key is the key value of the record. Keys are different for different data records; data records data except the key.
B-Tree is a data structure that meets the following conditions:
D is a positive integer greater than 1, called the degree of B-Tree.
H is a positive integer called the height of B-Tree.
Each non-leaf node consists of N-1 keys and n pointers, where d <= n <= 2d.
Each leaf node contains at least one key and two pointers, and at most two D-1 keys and two 2d pointers. The pointers of leaf nodes are null.
All leaf nodes have the same depth, which is equal to the height h.
Keys and pointers are separated from each other. The two ends of a node are pointers.
Keys in a node are arranged from left to right in non-descending order.
All nodes form a tree structure.
Each pointer is either null or pointing to another node.
If a pointer is left at the far left of the node and is not null, all keys pointing to the node are smaller than v (key1), where v (key1) is the value of the first key of the node.
If a pointer is at the far right of a node and is not null, all keys pointing to the node are greater than v (keym), where v (keym) is the value of the last key of the node.
If the adjacent keys of a pointer between the left and right nodes are keyi and keyi + 1 and are not null, all keys pointing to the node are smaller than v (keyi + 1) and greater than v (keyi ).
Each node in B-Tree can contain a large number of keyword information and branches based on actual conditions. For example:
Each node occupies the disk space of a disk block. A node has two keywords in ascending order and three pointers pointing to the Child root node, the pointer stores the address of the disk block where the sub-node is located.
The three range fields divided by two keywords correspond to the range fields of the data that the three pointers point.
Take the root node as an example. The keywords are 17 and 35. The data range of the Child tree pointed by the P1 pointer is smaller than 17, and that of the child tree pointed by the P2 pointer is 17 ~ 35. The data range of the Child tree pointed by the P3 pointer is greater than 35.
Simulate the search for keyword 29:
Locate disk Block 1 based on the root node and read it into the memory. Disk I/O operation 1st times]
Compare the keyword 29 in the range (), find the pointer P2 of disk Block 1.
Locate disk block 3 based on the P2 pointer and read it into the memory. Disk I/O operation 2nd times]
When the comparison keyword 29 is in the range (), find the pointer P2 of disk Block 3.
Find disk block 8 Based on the P2 pointer and read it into the memory. Disk I/O operation 3rd Times]
In the disk block 8 keyword list, find the keyword 29.
MySQL's InnoDB Storage engine is designed to keep the root node in the memory, so it strives to reach the depth of the tree up to 3, that is, I/O does not need to exceed 3 times.
After analyzing the above process, it is found that three disk I/O operations and three Memory search operations are required. Because the keywords in the memory are an ordered table structure, you can use the binary lookup method to improve efficiency.
Three disk I/O operations affect the efficiency of the entire B-Tree search.
Compared with AVLTree, B-Tree reduces the number of nodes, which makes the data retrieved from the memory by disk I/O more efficient.
B + Tree is an optimization based on B-Tree, making it more suitable for implementing the external storage index structure. The InnoDB Storage engine uses B + Tree to implement its index structure.
In B-Tree, each node has a key and data, and the storage space of each page is limited. If the data size is large, each node (that is, a page) will be created) the number of keys that can be stored is small.
When a large amount of data is stored, the depth of B-Tree is also large, which increases the number of disk I/O queries, thus affecting query efficiency.
In B + Tree, all data record nodes are stored on the leaf nodes of the same layer in the order of key values, instead of storing only the key value information on the leaf nodes, this greatly increases the number of key values stored on each node and lowers the height of B + Tree.
B + Tree has two changes on the basis of B-Tree:
Data exists in leaf nodes;
There are pointers between data nodes.
Because the non-leaf nodes of B + Tree only store key-value information, if each disk block can store four key-value and pointer information, the structure of B + Tree is shown in:
Generally, there are two head pointers on B + Tree, one pointing to the root node and the other pointing to the leaf node with the smallest keyword. In addition, all leaf nodes (that is, data nodes) are chained loops.
Therefore, you can perform two search operations on B + Tree: one is the range query of the primary key and the other is the paging query, and the other is the random search starting from the root node.
8. B + Tree in Myisam
The Myisam engine also uses the B + Tree structure as the index structure.
Because indexes and data in Myisam are stored in different files, the data stored in the leaf node in the index tree is the address of the data record corresponding to the index. Because the data is different from the index, therefore, Myisam is a non-clustered index.
9. B + Tree in InnoDB
InnoDB stores data indexed by ID.
There are two data storage files using the InnoDB Engine: one definition file and the other data file.
InnoDB indexes the ID using the B + Tree structure, and then stores records in the leaf node.
If the index field is not the primary key ID, index the field, store the primary key of the record in the leaf node, and then find the corresponding record through the primary key index.
MySQL-Related Optimization 1. MySQL performance optimization: composition and Table Design
Enable query cache. Avoid using some SQL functions directly in SQL statements, resulting in invalid Mysql cache.
Avoid superfluous image. For example, a logic only needs to judge whether a female exists. If one is found, do not check it all. At this time, use limit.
Create an appropriate index. Therefore, it is necessary to create a suitable place and object. Index should be created for fields that are frequently operated, compared, or judged.
The field size is suitable. The field values are limited and fixed. In this case, you can use enum and the IP field can be stored with unsigned int.
Table design. Vertical table segmentation reduces the complexity and number of fields of a fixed table and a variable-length table.
2. SQL statement optimization: avoid full table Scan
Index creation: Generally, you can create an index on the columns involved in where and order by. Try not to create an index on fields that can be repeated.
Try to avoid using it in where! (<>) Or, do not judge the null value.
Do not perform function operations or expression operations on fields in the where clause.
Try to avoid using like-%. In this case, full-text search can be performed.
Recent hot news
In this way, your interview success rate will reach 90%
How to Use TensorFlow to make everything look more beautiful?
Web security: XSS in-depth analysis on front-end attacks
3 million fans, the largest online lottery platform in China, in-depth analysis
High Availability, high performance? 16 principles for Interface Design
「 Read original 」Here is all you want to know about the exchange record.