No. 01 MySQL Architecture and History

Last Update:2018-06-08 Source: Internet

Author: User

Tags lock queue create database disk usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The most important feature of MySQL is its storage engine architecture, which separates query processing from other system tasks and data storage, extraction phases.

1 MySQL Logic architecture

The top is the client: mainly responsible for link processing, authorization authentication, security, etc.

The middle layer is the core of MySQL, the server: mainly responsible for query parsing, analysis optimization, caching and all the built-in functions, all cross-storage engine functions are implemented at this level, such as: stored procedures, triggers, views, etc.

The next layer is the storage engine: the storage engine is responsible for the storage and extraction of data in MySQL, can use different storage engines, different storage engines have their own advantages and disadvantages, the different storage engines can not communicate, only the corresponding upper server requests. The server communicates with the storage engine through the API, These interfaces mask the differences between different storage engines, which are transparent to the query process on the upper level.

1.1 Client

Each client link will have a thread in the server process, and the client's query will be executed only on that thread. The server caches threads, so there is no need to create or destroy threads for each client's query.

When a client is linked to a MySQL server, the server authenticates it. Authentication is based on the username, the original host information and password, if the use of Secure Sockets SSL, the way the link, will also carry out the certificate authentication. When the client link succeeds, The server continues to verify that the client has permission to execute a particular query.

1.2 Optimization and execution

MySQL parses and creates internal data structures-parsing numbers, and then aligning the various optimizations, including rewriting the query, determining the reading order of the underlying, and selecting the appropriate index. The user can influence his optimization results by means of a special keyword optimizer. You can also request that the optimizer interpret each factor of the optimization process, This allows the user to know how the server makes the decision.

The optimizer does not care what storage engine is used, but the storage engine is impressed by the optimization query, which requests the storage engine to provide the capacity or the cost information for a specific operation, as well as statistics on the table data.

When executing SQL, the cache is checked first, and if the corresponding query is found, the server does not have to perform the entire process of query parsing, optimization, and execution, but instead returns the result set in the query cache directly.

2 concurrency Control 2.1 read-write lock

Two types of locks are used when processing concurrent reads or writes: Shared and exclusive locks (read and write locks).

2.2 Lock particle size

The lock policy is to find the balance between the cost of the lock and the security of the data. The general database is added to the lock after a row lock.

The MySQL storage engine can implement its own lock policy and lock granularity.

Table Lock: Table lock is the most basic strategy in MySQL, is the least cost policy, will lock the entire table. The write lock has a higher chicken size than a read lock, so a write lock request may be inserted in front of the read lock queue.

Row-level Locks: row-level locks can be the largest in Chengdu's late concurrent processing, but also brought greater lock overhead. Row-level locks are implemented only in the storage engine, and the MySQL server layer does not implement row-level locks.

3 Business

A transaction is a set of atomic SQL queries, or a separate unit of work. If the database engine succeeds in applying all the statements of the group query to the database, then the set of queries is executed. If any of these statements cannot be executed because of a crash or other reason, none of the statements will be executed.

In MySQL, transactions are implemented in the real storage engine.

3.1 ACID

Acid indicates:

Atomic atomicity: A thing must be considered an indivisible minimum unit, and all operations in the whole transaction either commit successfully or all fail back, innocent a transaction, it is impossible to perform only a subset of these operations
Consistency consistency: The database is always transitioning from one consistent state to another consistent state
Isolation isolation: Typically, changes made to a transaction are not visible to other foods until they are finally committed. But not always, and different visibility is called isolation level
Persistence Durability: Once a transaction commits, the modifications are persisted to the database. Even if the system crashes, the modified data is not lost, and persistence is a somewhat vague concept because persistence is actually different levels.

The acid nature of the transaction guarantees that the data will not go wrong, but it is very difficult to achieve this day. An acid-compliant database requires a lot of complex but not-so-user-aware work to ensure acid implementation

A database that implements acid requires more CPU processing power, greater memory, and more disk space.

3.2 Isolation Level

The SQL standard redefined four types of isolation boundaries:

UNCOMMITTED READ UNCOMMITTED: Changes in transactions, even if they are not committed, are visible to other transactions, and transactions can read uncommitted data, called dirty Read dirty read, which causes many problems and does not perform much better than other levels. So seldom used
Submit Read Committed, non-repeatable READ: Most databases default to the isolation interface, when a transaction starts, you can only see the changes that have been made to the task that was committed. The problem of dirty reads is resolved. But the same two queries in a transaction can have different results, Because between the first and second queries, other transactions are modified and the data is modified to complete.
REPEATABLE READ REPEATABLE READ: Repeatable reads are resolved. However, Phantom reads cannot be avoided, that is, a transaction modifies all records in a range, and another transaction inserts new records within that range. Then the previous transaction after modifying this range of data, then read, will appear magic line, the popular saying, is obviously all modified, but read again will find a few not modified
Serializable serializable: is the highest isolation level, forcing transactions to execute serially. Avoids the above-mentioned phantom reads rarely used in this isolation level

MySQL Pass: SET SESSION TRANSACTION ISOLATION LEVEL read committed; Modify Isolation Level

3.3 Dead Lock

A deadlock is worth two or more transactions competing against one another on a unified resource, and requesting to lock in resources that the other party occupies, leading to a situation where both parties are not getting the resources they need.

To solve this problem, the database system implements a variety of deadlock detection and deadlock timeout base addresses, the more complex the storage engine, the more able to detect the deadlock of the cyclic dependency, and immediately return an error. InnoDB currently handles deadlocks by rolling back transactions that hold the least row-level exclusive locks.

3.4 Transaction Log

Transaction logs can help improve transaction efficiency, using transaction logs, when the storage engine modifies the data, it only needs to modify its memory copy, and then the modification behavior is recorded in the transaction log on the hard disk, without the data itself being persisted to the disk every time the Soymilk is modified.

The transaction log is appended, so the write log operation is the sequential IO of a small area of memory on disk. After the transaction log is persisted, the in-memory modified data can be slowly brushed back to disk in the background. This approach is called pre-write logging, meaning that modifying the data requires writing two disks.

3.5 MySQL Transaction

There are two storage engines in MySQL that provide transactions: InnoDB and NDB Cluster.

The features of the MySQL transaction are:

3.5.1 Automatic Submission

Autocommit: The default autocommit mode in MySQL, which means that each query is committed as a transaction. You can enable or disable autocommit mode by setting autocommit: query: SHOW VARIABLES LIKE ‘AUTOCOMMIT‘; , set: SET AUTOCOMMIT=1; (1 on, 0 off), For non-transactional storage engine tables, no impact.

When it is closed, all queries are in one transaction until the execution commit is displayed COMMIT or ROLLLBACK rolled back.

3.5.2 mixing the storage engine with transactions

The MySQL transaction is implemented by the underlying storage engine. Therefore, it is unreliable to use multiple storage engines in the same transaction.

If transactional and non-transactional tables are mixed in a transaction, there is no problem with normal commits, and data on non-transactional tables cannot be rolled back when rollback is required.

3.5.3 implicit and explicit lock-in

InnoDB uses a two-phase locking protocol, which can be performed at any time during the execution of a transaction, and the lock COMMIT is released only at execution or ROLLBACK time and is released at the same time. This lock is implicit and automatic.

InnoDB can also be displayed lock, add LOCK IN SHARE MOD or FOR UPDATE read lock or write lock after the statement

MySQL implements the and statements at the server level LOCK TABLES UNLOCK TABLES .

4 multi-version concurrency control

Multiple versions of concurrency control MVCC can be thought of as a variant of row-level locking, which in many cases avoids lock-up operations and therefore costs less.

MVCC is achieved by saving a snapshot of the data at a point. That is, regardless of how long it takes to execute, the data that each transaction sees is consistent. Depending on when the transaction started, each transaction might see different data for the same table at the same time.

The MVCC implementations of the different storage engines are different, typically by optimistic concurrency control and pessimistic concurrency control.

InnoDB's MVCC is implemented by saving two hidden columns after each row of records. These two columns, one saving the row's creation time, and one saving the line's expiration time. This time refers to the system version number, each starting a new transaction, The system version number is automatically incremented. The system version number of the transaction start time returns the version number of the seat transaction, which is used to compare the version number of each row of records queried to.

For example, the Select operation: InnoDB only finds data rows that are earlier than the current version of the transaction, which ensures that the transaction reads the rows, either before the transaction starts, or the transaction itself is inserted or modified. The delete time of the row is undefined, is either greater than the version number of the current transaction. This ensures that the transaction reads the ethical line and is not deleted before the transaction. Records that meet the above criteria are returned as results

INSERT: Saves the current system version number as the line creation time for each row

Delete: Saves the current system version number as the deletion time for each row

UPDATE: Saves the current system version number as the time the row was created, and the colleague saves the current system version number as the deletion time of the previous data

Save these two additional system version numbers so that most reads do not have to be locked. (But the same line of records, which is made up of multiple records, is a snapshot of a different time), makes the read data operation Simple, performs well, and can only read to rows that conform to the standard. The disadvantage is that extra rough space is needed.

The MVCC value works under the two isolation levels of repeatable read and Read Committed.

5 MySQL Storage Engine 5.1 information

Use SHOW TABLE STATUS LIKE ‘表名‘ \G; ( \G display information without using a table Form)

The information for each of these fields is as follows:

Name: Table name

Engine: Storage Engine Type

Version: Versions of the storage engine

Row_format: Row format, MySQL optional dynamic identity line length is variable, usually contains variable-length fields, fixed line length is constant, compressed only exists in the compressed table

Rows: The number of rows in the table, for InnoDB the value is an estimate

Avg_row_length: Average number of bytes per row

Data_length: The data size of the table, in bytes

Max_data_length: The maximum capacity of the table, which is related to the storage engine

Index_length: Size of index, byte bit unit

Data_free: A space that has been allocated but not used for the MyISAM identity

Auto_increment: Next auto_increment value

Create_time: When the table was created

Update_time: Last Modified time of table

Check_time: Use the Check Table command or the Myisamchk tool to check the table last time

Collation: Character set and character collation for tables

Checksum: Real-time checksum of the table

Create_options: Additional options specified when creating a table

Comment: Contains some additional information, different storage engines, InnoDB represents the remaining space information for the table

5.2 InnoDB Storage Engine

InnoDB is the default transactional engine for MySQL and the most widely used storage engine.

is designed to handle a large number of short-term transactions, and most of the short-term transactions are normally submitted.

InnoDB data is stored in a tablespace, and the tablespace is a black box managed by InnoDB, with some columns of data files. Each Biao data and index is placed in a separate file.

The InnoDB uses MVCC to support high concurrency and achieves four standard isolation boundaries. The default level is repeatable read REPEATABLE read. And the gap lock prevents the appearance of Phantom Reading. The gap lock allows the InnoDB to lock not only the rows of the query design, but also the gaps in multiple indexes. Prevents insertion of phantom rows. (The meaning of the gap can be understood as the next pointer in Btree)

InnoDB tables are based on clustered indexes. Clustered indexes have high performance on primary key queries, and a level two index (that is, non-primary key index) must contain primary keys, so if the primary key column is large, the other indexes will be large. Therefore, if there are more indexes on the table, the primary key should be as small as possible.

5.3 MyISAM Storage Engine

MyISAM provides full-text indexing, compression, spatial function GIS, etc., but does not support transactional and row-level locks. and cannot be recovered safely after a crash.

MyISAM will store the table in two files: Data files and index files, respectively, .MYD and .MYI . MyISAM can contain dynamic or jintai lines.

Features of the MyISAM:

Locking and Concurrency: MyISAM lock the entire table

FIX: Repairing a table can cause data to be repaired, and the repair operation is very slow.

Index: For long fields such as blogs and text, you can index them based on their first 500 characters

Defer Update index key: Specifies DELAY_KEY_WRITE后, that the modified index data will not be written to the disk immediately after each modification execution, but rather to a buffer in memory

When the table is not modified, this table is suitable for myisam compression table, compressed table needs to use Myisampack to compress the MyISAM table, and cannot be modified, but can be decompressed. Compression tables greatly reduce disk usage and improve query performance.

6 engine Selection

Unless the required features are InnoDB, use InnoDB

Choosing a different storage engine may need to be considered:

Transaction: InnoDB Support Transaction
Backup: InnoDB supports online hot backup,
Crash recovery
Other features

Under certain circumstances:

Logging applications: There are requirements for insertion speed, so myisam,archive is more suitable because they cost the first, and the chaussure speed is fast. When this data needs to be processed, the data can be copied and then manipulated on the backup to avoid the impact on the main library.
Read-only or mostly read-only: Suggested InnoDB
Order Processing: InnoDB
Electronic bulletin Boards and forums: Read and write pressure, and need data consistency

7 Engine of the conversion table

Musql when you create a table, if you do not specify it engine , the InnoDB storage engine is used by default.

There are three ways to modify the storage engine for a table.

7.1 ALTER TABLE

The simplest way to modify a table from one storage engine to another is to use a ALTER TABLE statement:

ALTER TABLE account ENGINE=MyISAM;

MySQL will copy the data from the table to a new table, perform a lot of IO operations, and lock the original table by rows.

And after different storage engines are converted, the foreign keys may disappear.

7.2 Import and Export

To better control the conversion process, you can use the Mysqldump tool to export the data to a file, then modify the storage engine options in the file, and CREATE TABLE Modify the table name (the table cannot have the same names in the same database). In addition, the Mysqldump tool will precede the statement with the CREATE TABLE DROP TABLE, delete the original table.

For example:

Create statement bits for table and database

DROP DATABASE IF EXISTS employ;CREATE DATABASE employ;USE employ;CREATE TABLE account(    id int PRIMARY KEY AUTO_INCREMENT,    name varchar(10)) engine=myisam;INSERT employ.account (name)VALUES('xiaoming'),('xiaohong'),('xiaozhang');

Then use the Mysqldump tool:

MySQLdump -uroot -p employ

It is output to the terminal by default, so it is output to a file in the form of redirection.

The disadvantage of this approach is that only the entire database can be exported and a table cannot be exported

7.3 Creating and querying

Sum up the first and the second way.

Use CREATE TABLE tb_name LIKE old_tb the form to create an old table with the same structure as the original table, and then use the ALTER TABLE statement to modify the storage engine and then INSERT tb_name SELECT * FROM old_tb insert all the data.

No. 01 MySQL Architecture and History

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More