In-depth analysis based on mysql Architecture

Source: Internet
Author: User

By: Connection Pool components, management services and tool components, SQL interface components, query analyzer components, optimizer components,
Buffer components, plug-in storage engines, and physical files.
Mysql is a unique plug-in architecture. Each storage engine has its own characteristics.





Mysql storage engine Overview:
Innodb Storage engine:[/Color] [/B] For oltp (online transaction processing), row lock, support for foreign keys, non-locked read, repeaable level by default (repeaable) use the next-keylocking policy to avoid phantom read, insert buffer, secondary write, adaptive hash index, and pre-read.
Myisam storage engine:Transactions, table locks, full-text indexes, and olap (online analysis and processing) are not supported. Here, myd: Put data files, and myi: Put index files.
Ndb storage engine:Cluster Storage engine and share nothing can improve availability
Memory storage engine:Data is stored in the memory, table locks, and poor concurrency performance. Hash indexes are used by default.
Archive storage engine:Only supports insert and select zlib compression:, suitable for storing archived data such as logs and row locks
Maria storage engine:Objective To replace myisam, cache data and indexes, row locks, and mvcc





Innodb features:
Subject structure:Default 7 Background threads, 4 io threads (insert buffer, log, read, write), 1 master thread (highest priority), 1 lock monitoring thread, one error monitoring thread. You can view it through show engine innodb status. The new version has increased the default read thread and write thread to four, respectively, and can be viewed through show variables like 'innodb _ io_thread %.
Storage engine composition:Buffer pool, redo log buffer, and additional memory pool ). the specific configuration can be set by show variables like 'innodb _ buffer_pool_size 'and show variables like
'Innodb _ log_buffer_size 'and show variables like 'innodb _ additional_mem_pool_size.
Buffer Pool:The largest block memory used to store various data caches, including index pages, data pages, undo pages, insert buffering, adaptive hash indexes, innodb Storage lock information, and data dictionary information. The working method always reads database files to the buffer pool by PAGE (16 k per page), and then retains the cached data in the buffer pool based on the least recently used (lru) algorithm. If the database file needs to be modified, the pages in the cache pool are always modified first (dirty pages are changed after modification), and then the dirty pages in the buffer pool are refreshed to the file at a certain frequency. Run show engine innodb status.
Log Buffer:Put the redo log information into this buffer zone first, and then refresh it to the redo log file at a certain frequency.




Master thread:
The operation of the main loop per second:
The log buffer is refreshed to the disk even if the transaction has not been committed. (Always execute, so a large transaction commit
)
Merge insert buffer (innodb executes when the number of io operations in one second is less than 5)
Refresh dirty pages in the buffer pool of up to 100 innodb to the disk (the buffer pool proportion exceeds the configured dirty pages
Line. It is determined by innodb_max_dirty_pages_pac in the configuration file. The default value is 90, and the new version is 75,
Google recommends 80)
If no user activity is currently used, switch to backgroud loop

The operation of the main loop every 10 seconds:
Refresh 100 dirty pages to the disk (if I/O operations are less than 200 in the past 10 seconds)
Merge up to 5 insert buffers (always)
Buffer logs to disks (always)
Delete useless Undo pages (always)
Refresh 100 or 10 dirty pages to the disk (there are more than 70% dirty pages, refresh 100 dirty pages; otherwise, refresh 10 dirty pages)
Generate a checkpoint

Backgroud loop. If there is no user activity (when the database is idle) or the database is closed, the loop is switched:
Delete useless Undo pages (always)
Merge 20 insert buffers (always)
Jump back to the main loop (always)
Refresh 100 pages continuously until the conditions are met (may be completed in the flush loop)

If nothing can be done in the flush loop, the InnoDB Storage engine switches to suspend_loop and suspends the master thread to wait for the event to occur. If the InnoDB Storage engine is enabled, but no InnoDB Storage engine tables are used, the master thread is always suspended.

Insert Buffer:Insert Buffer is not a part of the Buffer pool. It is an integral part of a physical page and improves InnoDB performance. According to the features of the B + algorithm (as described below), primary key indexes are ordered during data insertion, and database reads are not performed randomly, for non-clustered indexes (secondary indexes), the insertion of leaf nodes is no longer sequential. In this case, you need to access non-clustered indexes discretely, and the insertion performance is reduced here. InnoDB introduces the insert buffer to determine whether the non-clustered index page is in the buffer pool. If it is in the buffer pool, it will be inserted directly; if not, it will be placed in the insert buffer zone first. Then, according to the master thread described above, insert buffering will be merged at a certain frequency. In addition, secondary indexes cannot be unique because index pages are not searched when inserted into the insert buffer. Otherwise, random reading is still caused and the significance of inserting the buffer is lost. The insert buffer may occupy the memory in the buffer pool. By default, it can also occupy 1/2. Therefore, you can reduce the value to 1/3. It is set through IBUF_POOL_SIZE_PER_MAX_SIZE. 2 indicates 1/2, and 3 indicates 1/3.

Two writes:It provides the reliability of InnoDB data. If the write failure occurs, you can restore the log by redoing the log. However, the redoing log records the physical operations on the page. If the page itself is damaged, redo it. Therefore, a copy of the page is required before the application redo log. When the write failure occurs, the page is restored through the copy of the page and then redone. This is doublewire.
Data Recovery = page copy + redo log





Adaptive hash index:The InnoDB Storage engine proposes an adaptive hash index. The storage engine monitors the query of indexes on the table. If it is observed that the creation of a hash index will increase the speed, a hash index will be created, so it is called adaptive. Adaptive hash indexes can only be used to search for equivalent queries, for example, select * from table where index_col = '***'. In addition, adaptive hash is controlled by the InnoDB Storage engine, we can only disable or enable innodb_adaptive_hash_index, Which is enabled by default.

Mysql File
Parameter file:It tells the Mysql instance where the database file can be found at startup and specifies some initialization parameters, which define the size of a memory structure and other settings. You can use file storage and edit it. If it cannot be loaded at startup, it cannot be started successfully (different from other databases ). Parameters can be either dynamic or static. Static is equivalent to read-only, and dynamic is set. For example, the key and value values found through show variable like '*** can be directly modified through set key = value. In the same way, there are also scopes during the modification, that is, this seesion is valid and globally valid. You can add session or global before the corresponding key, for example, select @ seesion. read_buffer_size, set @ global. read_buffer_size.
Log File:Used to record the files written when the Mysql instance responds to certain conditions. Such as error log files, binary log files, slow query log files, and query log files.
Error Log:Use show variables like 'Log _ error' to view the error log storage address
Slow query log:Use show variables like '% long %' to view the threshold of slow query log records. The new version is set to 0.05. Use show variables like 'Log _ slow_queries 'to check whether it is enabled, the default value is disabled. You can use show variabes like 'Log _ queries_not_using_indexes 'to view slow logs that do not use indexes. Mysql can directly use the mysqldumpslow command to view slow logs.
Binary file:Queries are not recorded, but all modifications to the database are recorded. The objective is to restore (point-in-time repair) and replicate. Use show variables like 'datadir' to view the storage path. Binary logs support three formats: STATEMENT, ROW, and MIX. They are set by the binlog_format parameter and are usually set to ROW, which can improve the reliability of database recovery and replication, but it will increase the size of the binary file, and the network overhead will be increased during replication. Mysql uses mysqlbinlog to view binary log files.
Socket file:Files required for connection using Unix domain sockets.
Pid file:The process ID file of the Mysql instance.
Mysql table structure file:Used to store the Mysql table structure definition file. Because of the architecture of the Mysql plug-in storage engine, each table has a corresponding file ending with the frm suffix.
Storage engine files:Stores your own files to store various types of data, and truly stores data such as data and indexes. The following describes the tablespace files and redo log files under the storage engine of InnoDB.
Tablespace file:The default tablespace file of InnoDB is ibdata1. You can use show variables like 'innodb _ file_per_table 'to check whether each table generates a separate. idb tablespace file. However, a single tablespace file only stores data, indexes, insert buffering, and other information of the table. The remaining information is stored in the default tablespace.





Redo log files:If the instance and media fail, redo log files can be used. For example, if the database power is down, the InnoDB Storage engine uses the redo log to restore the data to the time before power loss to ensure data integrity. The innodb_log_file_size parameter specifies the size of the redo log file. innodb_log_file_in_group specifies the number of redo log files in the log file group. The default value is 2. innodb_mirrored_log_groups specifies the number of log image file groups, the default value is 1, which indicates that there is only one log file group and there is no image. innodb_log_group_home_dir specifies the path of the log file group, which is under the database path by default.

Differences between binary logs and redo logs: first, binary logs record all log records related to Mysql, including logs from other storage engines such as InnoDB, MyISAM, and Heap. While the InnoDB Storage engine redo log only stores transaction logs about it. Second, no matter whether the format of the binary log file record is set to STATEMENT, ROW, or MIXED, it records the specific operations of a transaction. While the InnoDB Storage engine retries the log file to record the physical situation of changes to each page. In addition, the binary log file is recorded before the transaction is committed, and the redo log entries are constantly written into the redo log file during the transaction.

Mysql innodb table
Tablespace: The tablespace can be seen as the highest level of the logic structure of the InnoDB Storage engine.
Segments: tablespaces are composed of segments. Common segments include data segments, index segments, and rollback segments.
Area: consists of 64 consecutive pages. The size of each page is 16 KB, that is, the size of each area is 1 MB.
Page: 16 KB per page, which cannot be changed. Common page types include: data Page, Undo page, system page, transaction data page, insert buffer bitmap page, insert buffer idle list page, uncompressed Binary Large Object page, compressed Binary Large Object page.
Row: the InnoDB Storage engine is row-oriented (row-oriented). Each page can store up to 7992 rows of data.
Row record format: two common row record formats: Compact and Redundant. After mysql5.1, the main format is Compact. For Compact, both the char and varchar types do not occupy storage space. For mongodant, the null type of varchar does not occupy space, and the null type of char occupies storage space.

The length of the varchar type is limited to 65535, but it cannot be reached. There will be other overhead, usually around 65530, which is also related to the selected character set. In addition, the length limit is a whole row, for example: create table test (a varchar (22000), B varchar (22000), cvarchar (22000 )) charset = latin1 engine = innodb also reports an error.

FOR blob data, only the first 65535 bytes of prefix data of varchar (768) is saved on the data page, and the offset is followed to point to the row overflow page, that is, Uncompressed BLOB Page. The new InnoDB Plugin introduces the new file format called Barracuda, which has two new row record formats: Compressed and Dynamic. The two completely overflow methods are used for storing Blog fields, store 20-byte pointers on the database Page. The actual data is stored in BLOB Page.





Data Page Structure:The data page structure consists of the following seven parts:
File Header: records some Header information of a page, such as the page offset, Previous Page, next page, and page type. The fixed length is 38 bytes.

Page Header: records the status information of the Page, the number of records in the heap, the pointer to the idle list, the number of bytes deleted, and the last position inserted, a fixed length of 56 bytes.

Infimun + Supremum Records: In the InnoDB Storage engine, each data page has two virtual Row Records, which are used to limit the record boundary.

An Infimun record is a value smaller than any primary key on the page, and a Supermum value greater than any possible value. These two values are created at page creation and will not be deleted under any circumstances. The number of bytes occupied by Compact and Redundant are different.



User Records (User record, that is, row record): implements the record content. Once again, InnoDB Storage engine tables are always organized by B + village indexes.
Free Space: refers to the Free Space, which is also a linked table data structure. When a record is deleted, the space is added to the idle link table.

Page Directory: The Page Directory stores the relative location of the record, not the offset. In some cases, these records are called Slots. InnoDB does not have one slot for each record, A slot is a sparse directory, that is, a slot may belong to multiple records, at least four records, and at most eight records. Note that the B + tree index itself cannot find a specific record, and the B + tree index can find only the page where the record is located. The database loads the Page into the memory, and then performs a binary search through Page Directory. However, the time used for Binary Search is low and the memory is fast. Therefore, the time used for this search is ignored.

File Trailer (the end of the File): to ensure that the page is fully written to the disk (for example, the disk during the write process is damaged or the machine is down), the length is 8 bytes.

View:Views in Mysql are always virtual tables, and materialized views are not supported. However, some other techniques (such as triggers) can also implement some simple materialized view functions.

Partition:Mysql databases support RANGE, LIST, HASH, KEY, and COLUMNS partitions, and can use HASH or KEY for subpartitions.

Mysql innodb common indexes and algorithms:
B + tree index:The data structure of the B + tree is relatively complex. B represents that balance evolved from a balanced binary tree at the earliest, but the B + tree is not a binary tree, for its more detailed introduction can see this article: http://blog.csdn.net/v_JULY_v/article/details/6530142 due to the high fan out of B + tree index, so in the database, B + tree height is generally 2 ~ At Layer 3, a single-key value row record can be searched for up to 2 to 3 IO operations. Currently, a general disk can perform at least 100 IO operations per second, 2 ~ Three IO operations mean that the query time is only 0.02 ~ 0.03 seconds.

Database B + indexes can be divided into clustered indexes and secondary clustered indexes, but they are both B + trees, that is, they are highly balanced, leaf nodes store data.

Clustered index:Because clustered indexes are organized by primary keys, each table can have only one clustered index. Each data page is connected through a two-way linked list, and the leaf node stores a whole row of information, therefore, the query optimizer prefers clustered indexes. In addition, the storage of clustered indexes is logically continuous. Therefore, the clustered index performs fast sorting and range searching for primary keys.

Secondary index:Also called non-clustered indexes, leaf nodes do not store all data. The primary key value and a boomark (actually the key for clustered indexes) Tell InnoDB where row data corresponding to the index can be found, for example, a secondary index with a height of 3 and a clustered index with a height of 3, if the row record is queried Based on the secondary index, six IO operations are required in total. In addition, secondary indexes can have multiple secondary indexes.


Indexing principles:A small amount of data (also called a unique index) in a table is selected and retrieved at a high level ). Generally, if the retrieved data volume exceeds 20% of the data in the table, the optimizer does not use an index and performs a full table scan. For example, it is meaningless for fields such as gender.

Joint index:Composite index is an index created on multiple columns (> = 2. The composite index in Innodb is also a B + tree structure. Indexed data contains multiple columns (col1, col2, col3 ...), The indexes are sorted by col1, col2, and col3. For example, (1, 2), (1, 3), (2, 0 )... The leftmost prefix principle should be fully utilized when using composite indexes. As the name suggests, the leftmost prefix is the highest priority. If you create an index ind_col1_col2 (col1, col2), you can use the ind_col1_col2 index to query where col1 = xxx and col2 = xx or where col1 = xxx, however, where col2 = ***** cannot obtain the index. When creating multi-column indexes, You must place the most frequently used and well-filtered column in The where clause on the leftmost according to your business needs.

Hash index:Hash algorithms are also common algorithms. mysql innoDB uses the common link address method for deduplication. As mentioned above, hash in innoDB is adaptive and is determined by the system when hash is used. manual settings cannot be performed.
Binary Search:This algorithm is common and is not mentioned here. In InnoDB, the slots in Page Directory on each Page are stored in the order of primary keys. the query of a specific record is obtained by performing a binary search on Page Directory.

Mysql innodb locks
The implementation of the InnoDB Storage engine lock is very similar to that of Oracle. It provides consistent non-locking read, row-Level Lock support, and row-Level Lock without related overhead, and can get concurrency and consistency at the same time.
The InnoDB Storage engine implements the following two standard row-level locks:
Share Lock (S Lock ):Allows a transaction to read a row of data;
Exclusive Lock (X Lock ):Allows a transaction to delete or update a row of data.
When a transaction has obtained the row r shared lock, another transaction can immediately obtain the row r shared lock because reading data that does not change the row r is called lock compatibility. However, if a transaction wants to obtain the exclusive lock of row r, it must wait for the transaction to release the shared lock on Row r-this situation is called the lock incompatibility.



Before InnoDB Plugin, you can only use commands such as show full processlist and show engine inoodb status to view the current database request, and then judge the locks in the current transaction. In the new InnoDB Plugin, INNODB_TRX, INNODB_LOCKS, and InnoDB_LOCK_WAITS are added under the INFORMATION_SCHEMA architecture. With these three tables, you can easily monitor the current transaction and analyze possible lock problems.
INNODB_TRX consists of eight fields:
Trx_id: unique transaction ID in the InnoDB Storage Engine
Trx_state: the status of the current transaction.
Trx_started: the start time of the transaction.
Trx_requested_lock_id: ID of the lock waiting for the transaction. For example, if the status of trx_state is lock wait, this value indicates the ID of the LOCK resource occupied by the current transaction before waiting.
If the trx_state is not lock wait, the value is NULL.
Trx_wait_started: the start time of the transaction.
Trx_weight: the weight of a transaction, reflecting the number of rows modified and locked by a transaction. In the InnoDB Storage engine, when a deadlock occurs and a rollback is required, InnoDB Storage selects
Roll back with the smallest value.
Trx_mysql_thread_id: the thread ID in Mysql and the result displayed by show processlist.
Trx_query: SQL statement run by a transaction.
Select * from infomation_schema.INNODB_TRX;

INNODB_LOCKS table, which consists of the following fields:
Lock_id: the ID of the lock.
Lock_trx_id: the transaction ID.
Lock_mode: Lock mode.
Lock_type: the lock type, table lock or row lock.
Lock_table: The table to be locked.
Lock_index: The index of the lock.
Lock_space: ID of the tablespace of the InnoDB Storage engine.
Lock_page: Number of locked pages. If it is a table lock, the value is NULL.
Lock_rec: Number of locked rows. If it is a table lock, the value is NULL.
Lock_data: The primary key value of the locked row. When it is a table lock, the value is NULL.
Select * from information_schema.INNODB_LOCK;

INNODB_LOCK_WAIT consists of four fields:
Requesting_trx_id: the transaction ID of the resource to be locked.
Requesting_lock_id: ID of the applied lock.
Blocking_trx_id: the ID of the blocked lock.
Select * from information_schema.INNODB_LOCK_WAITS.

Consistent non-locked read:The InnoDB Storage engine uses multi-Version Control to read the data of the row in the database at the current execution time. If the read row is performing the Delete or update operation, the read operation will not wait for the row to be locked to be released. On the contrary, the InnoDB Storage engine will read a snapshot of the row. Snapshot data refers to the data of previous versions of the row, which is implemented through the Undo segment. Undo is used to roll back data in transactions, so there is no additional overhead in the snapshot itself. In addition, snapshot data does not need to be locked because there is no need to modify historical data. A row may have more than one snapshot data, so this technology is called a multi-version technology. Thus, concurrency Control is called Multi VersionConcurrency Control (MVCC ).

Transaction isolation level: Read uncommitted, Read committed, Repeatable read, and serializable. Under Read Committed and Repeatable Read, the InnoDB Storage engine uses non-locking consistent Read. However, the snapshot definition is different. At the Read Committed transaction isolation level, for snapshot data, non-consistent Read always reads the latest snapshot data of the locked row. At the Repeatable transaction isolation level, for snapshot data, non-consistent read always reads the row data version at the beginning of the transaction.


Lock algorithm:
Record Lock: the Lock on a single row Record
Gap Lock: A Gap Lock that locks a range, but does not contain the record itself
Next-Key Lock: Gap Lock + Record Lock: Lock a range and Lock the Record itself. More detailed introduction can see this blog, http://www.db110.com /? P = 1848

Lock problems:
Loss update: a classic database problem occurs. When two or more transactions select the same row and update the row based on the originally selected value, a loss update problem occurs. Every transaction does not know the existence of other transactions. The last update will overwrite the updates made by other firms, which will lead to data loss.
Example:
Transaction A and transaction B Modify the value of A row at the same time,
1. Transaction A changes the value to 1 and submits
2. Change the value of transaction B to 2 and submit the transaction.
When the value of data is 2, the update made by transaction A will be lost.
Solution: Change the transaction to a serial operation in parallel and apply an exclusive lock to the update operation.

Dirty read: one transaction reads updates not committed by another transaction, that is, reads dirty data.
Example:
1. Mary's original salary is 1000, and the finance staff changed Mary's salary to 8000 (but not submitted)
2. Mary reads her salary and finds that her salary has changed to 8000!
3. When the financial department finds that the operation is incorrect and the transaction is rolled back, Mary's salary is changed to 1000 again. In this way, Mary's 8000 salary is a dirty data.
Solution: dirty reads only occur when the transaction isolation level is Read Uncommitted. innoDB uses Repeatable Read by default, so no dirty reads will occur in the production environment.

Repeatable read: In the same transaction, the returned results are different when the same data is read multiple times. In other words, you can read the updated data submitted by another transaction later. On the contrary, "Repeatable read" ensures that the read data is the same when the same transaction reads data multiple times, that is, subsequent reads cannot read the updated data committed by another transaction. The main difference between dirty read and non-repeated read is that dirty read reads read uncommitted data and non-repeated reads Read committed data.
Example:
1. In transaction 1, Mary reads his salary of 1000, and the operation is not completed.
2. In transaction 2, the financial staff changed Mary's salary to 2000 and committed the transaction.
3. In transaction 1, when Mary reads her salary again, the salary changes to 2000.
Solution: Read Committed data. Generally, the database is acceptable. Therefore, the transaction isolation level is generally set to Read Committed. Mysql InnoDB uses the Next-Key Lock algorithm to avoid repeated reads. The default isolation level is Repeatable Read.

Transactions in mysql innodb
Four features of transactions: atomicity, consistency, isolation, and persistence
Isolation is achieved through locks. atomicity, consistency, and durability are achieved through database redo and undo.
The redo log records the transaction behavior and ensures the integrity of the transaction through the redo implementation. However, the transaction sometimes needs to be undone, and then the undo is generated. The opposite is true for undo and redo. When a database is modified, the database will not only generate redo, but also generate a certain amount of undo, even if the transaction or statement executed fails for some reason, or if you use a rollback statement to request rollback, you can use the undo information to roll back the data to the original state. Unlike redo, redo is stored in the redo log file, and undo is stored in a special segment in the database, which is called the undo segment ), the undo segment is located in the shared tablespace. Another important thing is that the undo record is a logical operation opposite to the transaction operation. For example, the insert undo record is a delete operation. Therefore, the undo record only logically restores the database as before the start of the transaction. For example, if you insert 0.1 million rows of data, the tablespace may increase. After rollback, The tablespace will not be reduced.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.