Deep analysis _mysql based on Mysql architecture

Source: Internet
Author: User
Tags compact connection pooling filegroup flush hash rollback switches
By: Connection pooling components, management services and tool components, SQL interface components, Query Analyzer components, optimizer components,
Buffer component, plug-in storage engine, physical file composition.
MySQL is a unique plug-in architecture, and each storage engine has its own characteristics.





MySQL individual storage engine Overview:
InnoDB Storage Engine:[/color] [/b] for OLTP (online transaction processing), row locks, support for foreign keys, unlocked reads, default repeaable level (repeatable reading) to avoid phantom reading, insert buffering, two writes, Adaptive hash Index, pre-read
MyISAM Storage Engine:Does not support transactions, table locks, full-text indexing, for OLAP (online analysis processing), where MyD: Put data files, myi: Put index files
NDB Storage Engine:Cluster storage engine, share nothing, improves availability
Memory storage Engine:Data is stored in memory, table locks, concurrency performance is poor, hash index is used by default
Archive Storage Engine:Only inserts and select Zlib algorithm compression 1:10, suitable for storing archived data such as log, row lock
Maria Storage Engine:Objective to replace MyISAM, cached data and indexes, row locks, MVCC





InnoDB Features:
Master Architecture:Default 7 Background threads, 4 IO thread (insert buffer, log, read, write), 1 master thread (highest priority), 1 lock (lock) monitor thread, 1 error monitoring thread. You can view it by show engine InnoDB status. The new version has been increased to 4 for the default read thread and write thread, which can be viewed by show variables like ' innodb_io_thread% '.
Storage Engine Composition:Buffer pool, redo log buffer pool (redo log buffer), and additional memory pools (additional memory pool). Specific configuration can be made by show variables like ' Innodb_buffer_ Pool_size ', show variables like
' Innodb_log_buffer_size ', show variables like ' innodb_additional_mem_pool_size ' to see.
Buffer pool:The largest chunk of memory, the cache used to hold a variety of data includes index pages, data pages, undo pages, insert buffers, adaptive hash Indices, lock information for INNODB storage, data dictionary information, and so on. The working method always reads the database file to the buffer pool by page (16k per page) and then retains the cached data in the buffer pool according to the least recently used (LRU) algorithm. If the database file needs to be modified, always first modify the page in the Cache pool (Dirty page after modification), and then refresh the buffer pool's dirty pages to the file at a certain frequency. By command show engine InnoDB status;
Log buffering:The Redo log information is first placed in this buffer and then flushed to the Redo log file at a certain frequency.




Master Thread:
The loop main loop operates once per second:
Log buffering is flushed to disk, even if the transaction has not been committed yet. (Always executed, so again a large transaction commits
The time is also very fast)
Merge Insert buffer (InnoDB is less than 5 io times of the current one second)
Up to 100 InnoDB of dirty pages to disk in a buffer pool (the proportion of the buffer pool that exceeds the configured dirty page holds
Row, in the configuration file is determined by INNODB_MAX_DIRTY_PAGES_PAC, the default is 90, the new version is 75,
Google advice is 80)
If no user activity is currently in use, switch to Backgroud loop

Loop main loop operation every 10 seconds:
Refresh 100 dirty pages to disk (performed in the past 10 seconds IO operation is less than 200 times)
Merge up to 5 insert buffers (always)
Buffer log to disk (always)
Delete unwanted undo pages (always)
Refresh 100 or 10 dirty pages to disk (with more than 70% dirty pages, refresh 100 dirty pages; otherwise refresh 10 dirty pages)
Produce a checkpoint

Backgroud loop switches to this loop if there is no current user activity (when the database is idle) or when the database is shut down:
Delete unwanted undo pages (always)
Merge 20 Insert buffers (always)
Jump back to main loop (always)
Continuously refresh 100 pages until eligible (may be done in flush loop)

If there is nothing to do in the flush loop, the InnoDB storage engine switches to Suspend_loop, hangs the master thread, waits for the event to occur. If the InnoDB storage engine is enabled but no table with any InnoDB storage engine is used, master thread is always in a pending state

Insert Buffer:is not part of the buffer pool, Insert buffer is an integral part of the physical page, and it brings innodb performance improvements. Based on the characteristics of the B + algorithm (mentioned below), when inserting data, the primary key index is sequential and does not result in random reads of the database, but for nonclustered indexes (that is, secondary indexes), the insertion of leaf nodes is no longer sequential, and there is a need for discrete access to nonclustered indexes, where the insertion performance is reduced. InnoDB introduces the insert buffer to determine if the nonclustered index page is in the buffer pool, if it is inserted directly, or in the insert buffer. Then, as described in the above master thread, there is a certain frequency that inserts the buffer merge. In addition, a secondary index cannot be unique because it does not look for index pages when inserted into the buffer, or it can still cause random reads and loses the meaning of the insert buffer. The insert buffer may account for memory in the buffer pool, which can also account for 1/2 by default, so you can reduce the value to 1/3. Set by Ibuf_pool_size_per_max_size, 2 means 1/2,3 represents 1/3.

two times write:It brings the reliability of INNODB data. If the write fails, you can redo the log to restore, but the redo log is recorded in the physical operation of the page, if the page itself is damaged, and then redo it is meaningless. Therefore, before the application redo log, need a copy of the page, when write failure occurs, first through the page to restore the page, and then redo, this is doublewire.
recovery data = copy of page + redo log





Adaptive Hash Index:The InnoDB storage Engine presents an adaptive hash index, which monitors the lookup of indexes on a table and, if observed, creates a speed boost when establishing a hash index, so it is called adaptive. Adaptive hash indices can only be used to search for equivalent queries, such as SELECT * from table where index_col= ' * * *, and adaptive hashes are controlled by the InnoDB storage engine, and we can only pass Innodb_adaptive_hash_ Index to disable or enable, and open by default.

MySQL file
parameter file:Tells MySQL where the instance can locate the database file when it starts, and specifies some initialization parameters that define the size of some memory structure. Files are stored, editable, and cannot be started successfully (unlike other databases) if it is not loaded at startup. Parameters are dynamic and static, static is equivalent to read-only, dynamic can be set. such as we through show variable like ' * * * found the key, value value, can be directly modified through set Key=value. Also, there is a scope for the modification, that is, the seesion is valid and globally valid, plus session or global before the corresponding key, such as SELECT @ @seesion. read_buffer_size, SET @ @global. read _buffer_size.
log file:Used to record files written by a MySQL instance when responding to a condition. such as error log files, binary log files, slow query log files, query log files, and so on.
error log:View the error log store address by show variables like ' Log_error '
Slow query log:The new version is set to 0.05 by show variables like '%long% ' to view the threshold of the slow query log, which is turned off by show variables like ' log_slow_queries '; Variabes like ' log_queries_not_using_indexes ' view is that queries that do not use indexes are logged in a slow log. In MySQL, you can view the slow log directly through the Mysqldumpslow command.
Binary files:Does not record the query, only records all modifications to the database. The purpose is to restore (point-in-time repair) and replicate. View the storage path by show variables like ' DataDir '. Binary log Support statement, ROW, mix three formats, through the Binlog_format parameter settings, usually set to ROW, you can restore the database and replication to bring better reliability, but will bring the increase in the size of the binary file, replication will increase network overhead. In MySQL, view the contents of the binary log file through Mysqlbinlog.
Socket File:The files that are required when you connect using UNIX domain sockets.
pid File:The process ID file for the MySQL instance.
MySQL table structure file:Used to store MySQL table structure definition files. Because of the architecture of the MySQL plug-in storage engine, each table has a corresponding file that ends with a frm suffix.
Storage Engine Files:Store your own files to keep a variety of data and really store data and indexes. The following describes the table space files and redo log files under InnoDB's storage engine.
table space file:InnoDB The default table space file is ibdata1, you can see whether each table produces a separate. IDB table space file by show variables like ' innodb_file_per_table '. However, a separate tablespace file stores only information such as the table's data, indexes, and insert buffers, and the rest of the information is stored in the default table space.





redo log files:instance and media fail, and redo log files can come in handy, such as when the database loses power, the InnoDB storage engine uses redo logs to restore the time before the power off to ensure data integrity. The parameter innodb_log_file_size specifies the size of the redo log file, Innodb_log_file_in_group specifies the number of redo log files in the log file group, and the default is 2,innodb_mirrored_log_ Groups specifies the number of log mirror filegroups, defaults to 1, represents only one log filegroup, no mirrors, Innodb_log_group_home_dir specifies the path of the log filegroup, and defaults to the database path.

The difference between binary logs and redo logs: First, the binary log records all of the MySQL-related logs, including InnoDB, MyISAM, heap, and other storage engines. The InnoDB storage Engine redo log stores only the transaction log for itself, followed by different content, regardless of whether the binary log file record format is set to statement or row, or mixed, it records the specific operational content of a transaction. The InnoDB storage engine's redo log file records the physical condition of changes to each page. In addition, the binary log files are recorded before the transaction is committed, and in the course of the transaction, the redo log entries are written to the redo log file for different write times.

MySQL InnoDB table
Tablespace: Tablespace can be viewed as the highest level of the logical structure of the INNODB storage engine.
Segment: The table space consists of various segments, the common segments are data segments, index segments, rollback segments, and so on.
Area: Consists of 64 consecutive pages, each with a size of 16kb, that is, the size of each zone is 1MB.
Page: 16KB per page, and cannot be changed. Common page types are: Data pages, undo pages, System pages, transaction data pages, insert buffer bitmap pages, insert buffered free list pages, uncompressed binary large object pages, compressed binary large object pages.
Row: The InnoDB storage engine is line-oriented (row-oriented), allowing up to 7992 rows of data per page.
Row record format: Common two types of row record format compact and redundant,mysql5.1 version, mainly the compact line record format. For the compact, the null type is not occupied with storage space, whether char or varchar, and the null type of char is used to occupy storage space for redudant,varchar null space.

varchar type of length limit is 65535, in fact, there will be other costs, generally 65530 or so, which is also related to the selected character set. In addition, this length limit is an entire line, for example: Create TABLE Test (a varchar (22000), B varchar (22000), Cvarchar (22000)) Charset=latin1 engine= InnoDB will also complain.

For BLOB-type data, only the first 768 byte prefix data for varchar (65535) is saved in the data page, followed by an offset, pointing to the row overflow page, which is the uncompressed BLOB page. The new InnoDB plugin introduces a new file format called Barracuda, which has two new row record formats compressed and dynamic, both of which are completely overflow in the blog field, with 20-byte pointers in the database page, The actual data is stored in the Blob page.





data page structure:The data page structure consists of the following 7 parts:
File header (header): Records the header information of a page, such as page offset, previous page, next page, page type, fixed length of 38 bytes.

Page header (header): Records the status information of the page, the number of records in the heap, the pointer to the free list, the number of bytes of the deleted record, the last inserted position, and so on, with a fixed length of 56 bytes.

Infimun+supremum Records: In the InnoDB storage engine, there are two virtual row records in each data page that limit the boundaries of records.

The Infimun record is a value smaller than any of the primary keys in the page, and Supermum refers to a value larger than any potentially large value. These two values are established when the page is created and will not be deleted in any case. In the compact row format and the redundant row format, the two bytes are occupied by the same number.



User Records (subscriber record, then record): Implements the contents of the record. Again, the InnoDB storage engine table is always organized by the B + Village index.
Free spaces (idle space): refers to the free space, is also a linked list data structure. When a record is deleted, the space is added to the free list.

Page Directory: the page directory holds the relative position of the record, not the offset, sometimes these records are called slots (slots), InnoDB not each record a slot, slot is a sparse directory, that is, a slot may belong to more than one record, at least 4 records, Up to 8 records. It should be kept in mind that the B + Tree index itself does not find a specific record, the B + Tree index can find just the page where the record resides. The database loads the page into memory and then makes a binary lookup through page directory. Only two-fork lookups have a low time complexity, while in-memory lookups are fast, so you can ignore the time it takes to find this part.

File Trailer (end of document): a fixed length of 8 bytes to ensure that the page is fully written to disk (for example, disk corruption in the write process, machine downtime, and so on).

View:A view in MySQL is always a virtual table and does not natively support materialized views. But with some other tricks, such as triggers, you can also implement some simple materialized view functions.

partitions:The MySQL database supports range, LIST, hash, KEY, columns partitions, and can use HASH or KEY for sub partitions.

MySQL InnoDB common index and algorithm:
B + Tree index:The B + Tree's data structure is relatively complex, and B. It represents that balance was first evolved from a balanced binary tree, but that a B + tree is not a binary tree, a more detailed description of which can be found in this article: http://blog.csdn.net/v_JULY_v/article/ details/6530142 because of the high fan out of the B + Tree index, so in the database, B + Tree height is generally in the 2~3 layer, but also for the search for a key value of the row records, up to 2 to 3 times IO, now the average disk can do at least 100 times per second io,2~ 3 times of Io means that the query time is only 0.02~0.03 seconds.

The B + index in the database can be divided into clustered index (clustered index) and secondary clustered index (secondary index), but its interior is a B + tree, that is, highly balanced, leaf node storage data.

Clustered index:Because the clustered index is organized according to the primary key, each table can have only one clustered index, each data page is connected through a two-way linked list, and the leaf node holds an entire line of information, so the query optimizer prefers to take a clustered index. In addition, storage for clustered indexes is logically contiguous. Therefore, clustered indexes are very fast for sorting and scoping lookups on primary keys.

Secondary Index:Also called nonclustered index, the leaf node does not keep all the data, the primary deposit key value and a boomark (in fact, the key to the clustered index) tell InnoDB where to find the row data corresponding to the index, such as a secondary index with a height of 3 and a clustered index with a height of 3, to query the row records based on the secondary index. Altogether requires 6 IO. Additional secondary indexes can have multiple.


guidelines for the use of indexes:High-Select, remove a small portion of data from a table (also known as a unique index). Generally, the amount of data fetched exceeds 20% of the data in the table, and the optimizer does not use the index, but does a full table scan. such as the gender and other fields are meaningless.

Federated Index:Also called a composite index, is an index established on multiple columns (>=2). The composite index in InnoDB is also a B + tree structure. The indexed data contains multiple columns (col1, col2, col3 ...), sorted by col1, col2, and col3 in the index. such as (1, 2), (1, 3), (2,0) ... Use composite index to make full use of the leftmost prefix principle, as the name suggests, is the leftmost priority. If you create an index ind_col1_col2 (col1, col2), you can go to the col2 index in the query where col1 = xxx and col1 = XX or where ind_col1_col2 = xxx, but where col2=**** is to go Less than the index. When you create a multiple-column index, depending on your business needs, the most frequently used and filtered columns in the WHERE clause are on the far left.

Hash Index:Hashing algorithm is also a more common algorithm, MySQL InnoDB used a more common chain address method to go heavy. In addition, it has been mentioned above, the hash in the InnoDB is adaptive, when the use of hash is determined by the system, can not be manually set.
Two-point search method:This algorithm is more common, here is not much mentioned. In InnoDB, slots in page directory are stored in the order of the primary key, and queries for a specific record are obtained by a binary lookup of page directory.

locks in the MySQL InnoDB
The implementation of the INNODB storage engine lock is very similar to Oracle, providing consistent, unlocked read, row-level lock support, and row-level locks with no associated overhead that can be concurrent and consistent.
The InnoDB storage engine implements the following two standard row-level locks:
shared locks (S Lock):Allows a transaction to read a row of data;
Exclusive Lock (X Lock):Allow a transaction to delete or update a row of data.
When a transaction has acquired a shared lock of row R, then another transaction can immediately obtain the shared lock of row R, because reading data that does not alter row R, we call this a lock-compatible case. However, if a transaction wants to acquire an exclusive lock on row R, it must wait for the transaction to release the shared lock on row R ———— this is known as lock incompatibility.



Before InnoDB plugin, the current database request can only be viewed through commands such as show full processlist,show ENGINE inoodb status, and then the lock in the current transaction is judged. In the new version of INNODB plugin, Innodb_trx, innodb_locks, Innodb_lock_waits are added under the INFORMATION_SCHEMA architecture. With these three tables, you can more easily monitor the current transaction and analyze the problems of possible locks.
The Innodb_trx consists of 8 fields:
Trx_id:innodb the unique transaction ID within the storage engine
Trx_state: The status of the current transaction.
Trx_started: The start time of the transaction.
TRX_REQUESTED_LOCK_ID: Waits for the lock ID of the transaction. If the state of the trx_state is lock wait, the value represents the ID of the current pending transaction seizing the lock resource.
If Trx_state is not a lock wait, the value is null.
Trx_wait_started: The time the transaction waits to start.
Trx_weight: The weight of the transaction, reflecting the number of rows modified and locked by a transaction. In the InnoDB storage engine, when a deadlock needs to be rolled back, the InnoDB store chooses
Select the minimum value for the rollback.
The results of the threads id,show Processlist displayed in the Trx_mysql_thread_id:mysql.
Trx_query: The SQL statement that the transaction runs.
Through the SELECT * from Infomation_schema. Innodb_trx; can view

innodb_locks table, which consists of the following fields:
LOCK_ID: ID of the lock.
LOCK_TRX_ID: Transaction ID.
Lock_mode: Mode of the lock.
Lock_type: Type of lock, table lock or row lock.
Lock_table: The table to be locked.
Lock_index: Index of the lock.
Lock_space:innodb the ID number of the Storage engine table space.
Lock_page: Number of pages locked. If table locks, the value is null.
Lock_rec: The number of rows that are locked. If table locks, the value is null.
Lock_data: The primary key value of the locked row. This value is null when it is a table lock.
Through the SELECT * from INFORMATION_SCHEMA. Innodb_lock; can view

The innodb_lock_wait consists of 4 fields:
REQUESTING_TRX_ID: The transaction ID of the request lock resource.
REQUESTING_LOCK_ID: ID of the lock requested.
BLOCKING_TRX_ID: ID of the blocked lock.
Through the SELECT * from INFORMATION_SCHEMA. Innodb_lock_waits can be viewed.

non-locked reads for consistency:The InnoDB storage engine reads data from rows in the current execution time database through multiple versioning. If the read row is performing a delete, update operation, then the read operation will not wait for the row to be locked for release, instead, the InnoDB storage engine will read one snapshot of the row. Snapshot data refers to the previous version of the row of data, which is implemented through the undo segment. Undo is used to roll back data in a transaction, so the snapshot itself has no extra overhead. In addition, snapshot data is not required to be locked because there is no need to modify historical data. A row may have more than one snapshot data, so this technique is called a row-multiple version technique. This brings concurrent control, called Multiple versioning concurrency control (Multi versionconcurrency controls, MVCC).

Isolation level of transaction: READ UNCOMMITTED, Read committed, Repeatable read, serializable. Under Read committed and Repeatable read, the INNODB storage engine reads using a lock-consistent read. However, the definition of snapshots is different. Under the Read COMMITTED transaction isolation level, for snapshot data, non-conformance reading always reads the most recent snapshot data for the locked row. Under the REPEATABLE transaction isolation level, for snapshot data, non-conformance reading always reads the version of the row data at the start of the transaction.


the algorithm of the lock:
Record Lock: Locks on Single-line Records
Gap Lock: Clearance locks, locking a range, but does not contain the record itself
Next-key lock:gap lock + record lock, locks a range, and locks the records themselves. A more detailed introduction can be found in this article blog,http://www.db110.com/?p=1848

the problem of the lock:
Missing updates: Classic database problem, when two or more transactions select the same row, and then update the row based on the initially selected value, a loss update problem occurs. Every transaction has no knowledge of the existence of other transactions. The final update overrides updates made by other transactions, which results in data loss.
Cases:
Transaction A and transaction B modify the value of a row at the same time.
1. Transaction a changes the value to 1 and submits
2. Transaction B changes the value to 2 and submits it.
When the value of the data is 2, the update made by transaction a will be lost.
Solution: Transaction parallel variable serial operation, add exclusive lock to update operation.

Dirty reads: A transaction reads data that is not committed by another transaction and reads dirty data.
Cases:
The original salary for 1.Mary was 1000, and the treasurer changed Mary's salary to 8000 (but did not commit the transaction).
2.Mary read their wages, found their wages into 8000, rapturous!
3. While the financial discovery was wrong and the transaction rolled back, Mary's salary turned to 1000, and, like this, Mary's 8000 salary was a dirty data.
Workaround: Dirty reads appear only if the transaction isolation level is READ UNCOMMITTED, InnoDB the default isolation level is repeatable read, so dirty reads do not occur in the production environment.

Non-repeatable reads: The same data is read more than once in the same transaction, and the results returned are different. In other words, subsequent reads can read the updated data that another transaction has committed. On the contrary, repeatable read reads data more than once on the same transaction, guaranteeing the same read data, which means that subsequent reads cannot read the updated data that has been committed by another transaction. The main difference between dirty and non repeatable reads is that dirty reads are read to uncommitted data and not read to committed data.
Cases:
1. In transaction 1, Mary read her own salary of 1000, and the operation was not completed
2. In transaction 2, the Treasurer modified Mary's salary at 2000 and submitted a transaction.
3. In transaction 1, when Mary read her salary again, her salary changed to 2000.
Workaround: Read the submitted data, the general database is acceptable, so the transaction isolation level is generally set to read committed. Mysql InnoDB the Next-key lock algorithm to avoid non repeatable reads, and the default isolation level is repeatable read.

transactions in the MySQL InnoDB
Four characteristics of a transaction: atomicity, consistency, isolation, persistence
Isolation is achieved by locking, atomicity, consistency, and persistence through the redo and undo of the database.
The redo log records the behavior of the transaction through redo implementation, guaranteeing the integrity of the transaction, but sometimes the transaction needs to be undone, and then the undo is needed. Undo and redo on the contrary, for database modifications, the database not only produces redo, but also produces a certain undo, even if the execution of a transaction or statement for some reason failed, or if you use a ROLLBACK statement request rollback, You can use these undo information to roll back the data to the way it was before it was modified. Unlike redo, redo is stored in the Redo log file, and undo is stored in a special segment (segment) inside the database, called the Undo section (undo segment), and the Undo field is in the shared table space. It is also important that undo records a logical operation contrary to a transaction operation, such as an insert undo record a Delete, so undo simply restores the database to the way it was before the transaction started. For example, insert 100,000 rows of data, may result in table space increase, rollback, tablespace will not be reduced back.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.