1. The choice of
MySQL database engine
1.1 Overview
One of the most important concepts in the MySQL database is the
database engine. The working principles of different database engines are very different, which ultimately leads to differences in the performance of MySQL database services. For example, if the database engine needs to support transactions, it must meet the basic characteristics of transactions-AICD characteristics (AICD: atomicity, isolation, consistency, and permanence. It belongs to basic knowledge, so I won't repeat it here), then a certain processing mechanism is naturally required To achieve these features. The practical effect of this is to cause the database engine that supports transactions to consume more time than the database engine that does not support transactions when the same amount of data is written. Here we first list the database engines (partial) supported in MySQL Database Community Edition for readers:
1.2 Engine
1. MEMORY:
The MEMORY storage engine stores all table data in memory. The other engine similar to this database engine in the historical version of the MySQL database is HEAP, which used to be the fastest database engine in the MySQL database. However, since these two database engines work completely in memory, if MySQL or the server restarts, the data saved in the database engine will be lost.
2. BLACKHOLE:
The Chinese name "black hole", the data table using the BLACKHOLE database engine does not store any data, and only records the binary log according to the database operation process. Its main function is to act as a repeater for MySQL master-slave replication, and can add a business filtering mechanism on it.
3. MyISAM:
MyISAM database engine is the default database engine for MySQL database. MyISAM uses a table locking mechanism to optimize multiple concurrent read and write operations (in fact, it uses a mechanism to avoid dirty reading of data). However, this mechanism has a certain waste of storage space. MyISAM also has some useful extensions, such as the MYISAMCHK tool for repairing database files and the MYISAMPACK tool for restoring wasted space. The MySQL database-related technologies introduced in this article will not involve this database engine.
4. InnoDB:
The InnoDB database engine is the most widely used database engine in various versions of MySQL databases. If there is no special description in the subsequent introduction of this article, it will default to the InnoDB database engine. The InnoDB database engine uses the log mechanism to provide transaction support.
2. Basic I/O performance
2.1 Overview
To understand the performance problems in the MySQL database, we must first figure out what the latter does when the client submits a transaction operation to the MySQL database, and how to do it. The work process described in this section mainly revolves around the InnoDB database engine:
The author only drew the important work areas involved in the InnoDB database engine in the process of insert/update a transaction. The actual work details of InnoDB are much more complicated than the steps shown in the figure above. As mentioned above, the InnoDB database engine is a database engine that supports transactions, so how to solve the problem of data consistency under abnormal crashes is one of the most important tasks in its design. The InnoDB database engine uses logs to solve this problem. Please note that the InnoDB database engine log mentioned here is not the global binary log of the MySQL database. The InnoDB database engine log has another name: redo log. This is because the main function of this part of the log is to recover data in the InnoDB engine after the database crashes abnormally and restarts.
In order to improve the performance of the MySQL database, the data operation process of the InnoDB database engine is basically completed in memory, and then through a certain strategy (described in detail later) the log data in the InnoDB Log Buffer memory area is synchronized to the InnoDB on the disk File Log Group area. The InnoDB File Log Group area is mainly used to store the log files of the InnoDB database engine. It consists of multiple log files of the same size and these files are read and written sequentially. The innodb_log_file_size parameter will determine the size of each file, and the innodb_log_files_in_group parameter will determine how many log files there are in the entire log group.
When the MySQL database completes the initialization process, these log files will occupy a continuous disk space on the disk according to the parameter setting value. The phenomenon of this is that although there is no data in the database, the total size of the log file is already the value obtained by innodb_log_file_size * innodb_log_files_in_group:
InnoDB database engine log file example
....
total 1.0G
-rw-rw---- 1 mysql mysql 500M May 4 06:09 ib_logfile0
-rw-rw---- 1 mysql mysql 500M May 4 06:09 ib_logfile1
....
The purpose of this is to ensure that subsequent operations of synchronizing log data are written sequentially, rather than randomly. When the log data is written to the end of the last file, the next log data will be written again from the beginning of the first log file.
2.2 The occurrence of I/O performance problems
The four identifier pointers in the InnoDB Log Buffer memory space are the most important elements in the log processing part of the InnoDB database engine. They are: Log sequence, Log flushed, Pages flushed and Last checkpoint. These four identifiers involve InnoDB when it crashes and restarts. Data recovery strategy, and the key principles in I/O performance optimization. These four identifiers are actually four values that share a value pool (named LSN, log sequence number, and its total length is a 64-bit unsigned integer), representing the current state of InnoDB's processing of transaction operations. And their values have the following characteristics:
Log sequence >= Log flushed >= Pages flushed >= Last checkpoint
1. I/O performance issues (1)
Whenever InnoDB receives a complete database insert/update request transaction, it will create a new LSN. New LSN = old
LSN + log size written this time. This latest log will be marked with Log sequence, and if there are multiple
In the case of a transaction request, InnoDB will also sort these logs in a predetermined order, and then generate new LSNs in turn. This step
The steps are performed entirely in memory, so there is no I/O performance problem.
2. I/O performance issues (2)
——1. Next, Mysql will start to perform various detailed operations in this transaction. InnoDB database engine has an InnoDB Buffer
Pool memory space is used for data modification or data addition. Its size is controlled by the innodb_buffer_pool_size parameter, and its data source
In the innoDB data file and exists in the InnoDB Buffer Pool in the form of Page. Generated when there is an insert operation in the log
New Page; when there is an update operation in the log, InnoDB will check whether the data already exists in the Page Cache, if it exists
(Hit) directly update the content in this Page Cache, if it does not exist (miss), it will continue from the InnoDB data file
Read the original data into the InnoDB Buffer Pool and then update it. There are several issues to note here:
——2. Remember the "read-ahead" technology we mentioned when discussing disk devices? The idea of this technology is that if the data in a certain area is read and
Use the area adjacent to it will also be read and used in the near future. So in order to improve reading efficiency, the disk control chip will
The target block and several adjacent blocks on the disk are read together. The InnoDB database engine also uses this idea, that is, read a certain
Page will read the adjacent Page at the same time, but whether it can serve the purpose of mentioning I/O performance or not depends on different operating environments (explained later).
——3. When InnoDB completes the data operation in the InnoDB Buffer Pool, the pages involved in the changed data will be stored on the disk at this time
The data is different, such a Page is called a dirty page. How to control dirty pages will be the key to maintaining data consistency. InnoDB database engine does
The method is to first write the log information of this transaction to the InnoDB File Log Group log file. There are three types of write strategies, through
The innodb_flush_log_at_trx_commit parameter can be controlled:
When innodb_flush_log_at_trx_commit = 0, InnoDB will write all completed transaction log information at this stage to disk in units of 1 second. The writing success here does not mean that writing to the Page Cache of the Linux operating system is considered successful, but to wait for the notification that the operating system has actually written to the physical disk (for details, please refer to the previous article explaining the file system). This means that even if the data operation in the InnoDB Buffer Pool is successful, once the database system crashes abnormally, the business system will lose the data written in the previous 1 second: because there is no log on the disk medium, it cannot be restarted after an abnormal restart. Restore data information.
When innodb_flush_log_at_trx_commit = 1, InnoDB writes transaction log information to disk after completing a log operation (writes a transaction log after executing a transaction). Similarly, the success of writing here is to wait for the operating system to return a notification that the physical disk is actually written.
When innodb_flush_log_at_trx_commit = 2, InnoDB works by writing log information to the disk after completing a log operation. However, InnoDB will continue to work in this working mode without waiting for the operating system to return a notification that the write on the physical disk is successful. In fact, at this time, the data generally still exists in the cache memory block of the Linux operating system, so it is better to use a file system with logging function in this mode, and make sure that the logging function of the file system is turned on.
The last action of the InnoDB database engine in this step is to change the Log flushed identifier pointer to the value of the transaction log LSN value of the current last completed flush action. In fact, after performing this step, a transaction processing operation is truly successful.