Go MySQL Technology insider InnoDB Storage Engine-table & index Algorithms and Locks

Source: Internet
Author: User
Tags bulk insert compact dba mutex mysql view


Original: http://yingminxing.com/mysql%E6%8A%80%E6%9C%AF%E5%86%85%E5%B9%95innodb%E5%AD%98%E5%82%A8%E5%BC%95%E6%93%8E% ef%bc%8d%e8%a1%a8%e7%b4%a2%e5%bc%95%e7%ae%97%e6%b3%95%e5%92%8c%e9%94%81/

4.1. InnoDB Storage Engine Table type

InnoDB tables are similar to Oracle's IoT tables (indexed clustered Tables-indexorganized table), each table in the InnoDB table has a primary key, and if a defined primary key is not displayed when the table is created InnoDB the primary key is selected or created as follows.
The first table has a unique non-null index (unique NOT NULL), and if so, the column is the primary key.
If this is not the case, the InnoDB storage engine automatically creates a 6-byte pointer, rowID ().

4.2. InnoDB Logical Storage structure

The logical storage units of the InnoDB are made up of tablespace,segment,extent,page (blocks), respectively, from large to small.

4.2.1, table space (tablespace)

All data is stored in the table space, the parameter innodb_file_per_table is enabled, the data in each table can be placed in a single table space, each table space is only the data, index and insert buffer, other classes of data, such as undo information, system transaction information, Two write buffers and so on are still stored in the original your shared table space.

4.2.2, Duan (segment)

Common segment are data segments, index segments, and rollback segments. InnoDB is the index aggregation table, so the data is the index, the index is the data, then the data segment is a B + Tree page node (leaf node segment), the index segment is a B + tree non-indexed nodes (non-leaf node segment). And the management of the segment is done by the engine itself.

4.2.3, District (extend)

The zone is made up of 64 consecutive pages, each with a size of 16K, i.e. the size of each zone (64*16k) =1MB, and for large data segments, MySQL can apply up to 4 extents at a time to guarantee the sequential performance of the data.

4.2.4, pages (page)

page is the smallest unit of INNODB Disk Management, InnoDB each page is 16K in size and cannot be changed. Common types are: Data page b-tree node;undo page, Undo Log page, System page, the Transaction data page Transaction system page, insert buffer bitmap page, insert buffer Bitmap; The Idle list page inserts Buffer Freebitmap, the uncompressed Binary large Object page uncompressed the Blob page, and the compressed binary large object page compressed blob pages.

4.2.5, line

The InnoDB storage engine is line-oriented (row-oriented), which means that the data is stored on a row. Each page can hold a maximum of 16k/2~200 rows, which is 7,992 rows.

4.3. InnoDB Physical storage structure

The InnoDB engine consists of a shared tablespace, a log file (redo log), and a table structure definition file.

4.4, InnoDB line record format

Starting with MySQL 5.1, INNODB provides the compact and redundant (for compatibility with previous versions) to store row-and-record data in two formats.

4.4.1, compact line record format

The compact line records are designed to efficiently store data. Whether char or varchar, NULL refers to the non-consumption of storage space. The row record also includes two hidden column transaction id columns (6 bytes) and rollback pointer columns (7 bytes) If no PrimaryKey is defined, a 6-byte rowid column is added. InnoDB within a page is a list of rows in which each row is recorded.

4.4.2, redundant row record format

Redundant the row record format in order to be compatible with previous versions. Each row has a maximum of 1023 columns, because the number of columns occupies 10 bits. For a null value of varchar, it does not occupy any storage space, while a null value of type char takes up space.

4.4.3, row overflow data

The InnoDB storage engine can store some of the data in a record outside the true data page as row overflow data. The N in Varchar (n) refers to the length of the character, and the 65535 length defined in the official manual refers to the sum of the lengths of all varchar columns.

The data is typically stored in B-tree node's page type, but when a row overflow occurs, the type of page that holds the row overflow is the Uncompress BLOB page. If there is at least two rows of data in a page, varchar will not be stored in a BLOB page with a threshold length of 8098. For the text or BLOB data type, we always think that they are placed in the uncompressed BLOB page, in fact, this is not accurate, in the data page or BLOB page is the same as the previous discussion of varchar.

4.4.4, compressed, and dynamic record formats

InnoDB plugin introduced the new file format into the Barracuda file format, which has two new row record formats compressed and dynamic two, and it uses a secure row overflow approach for storing BLOB data.

Row structure storage for 4.4.5, Char

From mysql4.1 to Chr (n), medium n Specifies the length of the character, not the length of the previous version of the byte. In other words, the internal storage of char is not a fixed-length data under different character sets. The number of characters and bytes can be viewed by select A,char_length (a), length (a) from T; So in a multi-character set, char and varchar occupy a space the same way.

4.5. INNODB data page structure

The InnoDB data page consists of seven parts:
File Header: Header (bytes)
Page Header: Header (bytes)
Infimum + supremum Records: Upper/Lower bound record
Users Records: User records, records
Free Space: Empty spaces
Page Directory: Leaf directory
File Trailer: End of document information

4.6. Named file formats

The InnoDB storage engine addresses the structure compatibility issues of different versions of the page by named the file formats mechanism. The previous version was defined as Antelope (including the compact and redudant file formats) and was recently defined as Barracuda (including compressed and dynamic file formats). Use the parameter Innodb_file_format to specify the file format.

4.7. Constraint 4.7.1, data integrity

InnoDB provides the following four kinds of constraints: Primary key,unique key,foreign key,default,not NULL.

4.7.2, constraint creation and lookup

Defined at the time of creation, or by using ALTER TABLE.

Differences in 4.7.3, constraints, and indexes

Primary key and unique key are both constraints and primary keys. Constraints are a logical concept that guarantees data integrity, whereas an index is a data structure that has a logical concept that is more of a physical storage in a database.

4.7.4, constraints On Error data

It is possible to modify the Sql_mode to ensure the constraint is mandatory.

4.7.5, enum, and set constraints

Since MySQL does not support check constraints, it is possible to implement partial requirements through enum and set, and to implement check constraints through triggers, noting the need to modify sql_mode= ' strict_trans_tables '; can only be limited to discrete numeric constraints, for enum if inserting an illegal value inserts an empty string as a special error value.

4.7.6, triggers and constraints

The purpose of a trigger is to automatically invoke SQL commands or stored procedures before or after the Insert,delete and update commands. So a table can build up to 6 triggers.

4.7.7, foreign key 4.8, view 4.8.1, function of view 4.8.2, materialized view

The Oracle database supports materialized views-the view is not based on the virtual table of the base table, but rather the materialized view can be used to pre-calculate and save more time-consuming results such as table links or aggregates, based on actual tables that the base table actually exists. In MS, this view is an indexed view. When a DML operation occurs on the base table, the materialized view is synchronized with on demand and on commit mode refreshes. The MySQL view does not support materialized views and is virtual.

4.9, the partition table 4.9.1, the partition table summary

Partitioned tables are not available in the storage engine, so more than InnoDB supports partitioned table functionality. MYISMA,NDB, etc. are supported. MySQL partition table is a horizontal partition, not a vertical partition, MySQL partition table is a local partition index, a partition stored both data and index. The current MySQL database supports several types of partitions:
Range partition, the row data is placed into a partition based on the column values belonging to a given contiguous interval, which can only be integers. Value less than the partition that specifies the MaxValue value, primarily for the partition of the date column. For a range partition query, the optimizer can only optimize the selection for the year () To_days () To_seconds () and Unix_timestamp () functions.
The list partition is similar to range, except that the list partition is a discrete value, which can only be an integer. (VALUE in for undefined insertions, MySQL throws an exception.) When there are undefined values in the simultaneous insertion of multiple records, the MyISAM partition allows the previous row data to be inserted, while the row data after the rejection is inserted, but InnoDB treats it as a transaction rollback the entire insert.
Hash partitioning, the return value of a user-defined expression is not negative (PARTITION by HASH (expr) distributes the data evenly and can be divided by the linear hash partition, which differs in terms of the algorithm). The purpose of the hash partition is to distribute the data evenly across the predefined partitions, ensuring that the data volume of each partition is roughly the same.
Key partition, which is partitioned according to the functions provided by the MySQL database. The key partition is similar to the hash partition, except that the hash partition is a user-defined function partition, and the key partition is partitioned using the functions provided by the MySQL database.
Columns partition, mysql-5.5 begins to support columns partition, can be considered as the evolution of range and list partition, columns partition can be partitioned directly using non-shaping data. The RANGE columns partition allows you to partition the values of multiple columns.
Regardless of the type of partition, if there is a primary key and a unique index in the table, the partition column must be an integral part of the primary key or unique index. Otherwise the return is wrong.

4.9.2, sub-partitions

MySQL allows hash or key sub-partitions on the range and list partitions. The number of sub-partitions on each partition must be the same. Within each partition, the name of the child partition is unique and the partition can be placed on a different disk.

4.9.3, null values in the partition

Range,hash,key partition If you insert a null value, MySQL will put it in the leftmost partition, and if you delete the leftmost partition, the null value will not be deleted and he will record to the new leftmost partition. List partition If you do not specify a location for null values, you will get an error.

4.9.4, partitioning performance

OLTP (online transaction processing such as blogs, e-commerce, online games) systems are not suitable for partitioning tables, and partitioning tables are not recommended if disk space and disk IO do not appear to be bottlenecks. OLAP (online analytical processing, such as Data Warehouse, Data mart) is more suitable for partitioning operations.

Indexes and algorithms

Indexes and overhead are the need to find a balance, too much or too little will affect performance, resulting in excessive load, wasting hardware resources. And the index should be added at the outset, and it takes a lot of time for the DBA to monitor a large number of SQL statements after it is added.

5.1. InnoDB Storage Engine Overview

InnoDB supports two common indexes, B + Tree index and hash index. The hash index is self-adaptive and cannot be considered an intervention. B + trees are evolved from balanced binary trees, but B + trees are not a binary tree.
B + Tree does not directly find the exact row, B + Tree index can only find the page where the data row, then the database by the page read into memory, and then in memory to find.

5.2, two-part search method

The specific line in the page is found by the dichotomy method. The binary search method, invented in 1946, did not appear until 1962 for a complete and correct binary search method.

5.3. Balanced binary Tree

Balanced binary tree (left node key value < root node key value < Right node key value) First of all, according to the binary tree definition, the second must meet any node of the left and the two sub-tree height difference 1. The efficiency of balanced binary tree is higher, but it needs more resources to maintain balance two times. Used in memory-structured objects, the maintenance overhead is relatively small.

5.4. B + Tree

B + trees are evolved from the tree and indexed sequential access methods. In the B + tree, all the record nodes are the size of the key values placed in the leaf node of the same layer, each page node pointer is linked. At the same time their parent node is used only as an index node.

5.4.1, B + Tree insert operation

The B + Tree will always be balanced, but for the newly inserted value it may require a large number of splits, which consumes a lot of disk resources, so the B + tree has a rotation (rotation) function, the rotation occurs when the Leat page is full, but the left and right nodes are not full, then the B + The tree is not anxious to split the page, and the record is transferred to the sibling node of the page, usually the left sibling is checked first. The specific operation of reading.

5.4.2, B + Tree delete operation

The B + Tree uses a fill factor (fill factor) to control the deletion of the tree, and 50% is the minimum value that the fill factor can set. The delete operation of the B + tree must also ensure that the records in the deleted page node are still sorted. The specific operation of reading.

5.5. B + Tree Index

The B + Tree index has a feature in the database that is high fanout (fan out), and the B + tree is typically 2-3 levels high. The B + Tree index can be divided into a clustered index (clustered index) and a secondary clustered index (secondary index) with a B + tree within it, and the leaf node holds all the data. They are different: whether the leaf node is storing a whole line of information.
Clustered index: The data in the table is stored in the primary key order, and the clustered index constructs a B + tree according to the primary key of each table, and the leaf node holds the row record data of the whole table. The storage of a clustered index is not a physical continuation, but a logical continuation. Another benefit of this is that the sort lookup and range lookup for the primary key is very fast.

Secondary index: Also known as a nonclustered index, the leaf level does not contain all the data for the row, and the leaf node contains a bookmark in addition to the package row key value, which is the clustered index key corresponding to the row data in each leaf level.

Management of 5.5.1 and B + Tree indexes

Indexes can index the entire column of data, or you can index only the first part of a column. InnoDB Plugin supports a method called Fast index creation, which is limited to secondary indexes, and creating an index adds an S lock to the table, simply marking the space of the secondary index as available and deleting the index definition of the table on the inner view.

5.6. B + Tree Index use 5.6.1, when to use B + Tree index

A B + Tree index is most appropriate when a field has a wide range of values, with little repetition, or high selectivity. According to my experience, the optimizer does not use an index, but a full table scan, when the data is generally taken up to 20% of the total.

5.6.2, sequential read, random read and pre-read

Sequential read refers to the sequential reading of the required row data based on the leaf node data of the index, but it is possible to read logically sequentially on the physical disk or randomly.
Random reading refers to the general need to find the actual row data according to the primary key in the leaf node of the secondary index, while the secondary index and the primary key are in different data segments, so the access mode is random.
To improve read performance, the InnoDB uses a pre-read method to read the required data into memory, including random read ahead and linear pre-read linear read ahead. However, since InnoDB Plugin1.0.4, the random access pre-read is canceled, the linear pre-read is retained, and the Innodb_read_ahead_threshold parameter is added. It controls how many pages in a zone are sequentially accessed, InnoDB enables pre-read, and pre-reads all pages in the next page.

5.7. Hash index

The adaptive hash index in the InnoDB storage engine uses the data structure of the hash table. But hash lists exist not only in adaptive hashing, but in every database, to speed up the lookup of data in memory.

5.7.1 Hassi (hash table)

Hash table is also known as a hash list, improved by the direct addressing table. The hash function solves the problem of direct addressing, while using the link to solve the collision problem.

5.7.2 Adaptive Hasi Index

It is created and used by the database system itself, and the DBA itself cannot intervene. It should be noted that the hash index can only be used to search for the value of the query, for other lookups can not use the hash index. We can disable or start this attribute only with the parameter innodb_adaptive_hash_index.


Locks are a key feature of distinguishing between file systems and database systems.

6.1. What is a lock?

Locks are used to manage concurrent access to shared files. InnoDB locks the database at the row level. However, the InnoDB storage engine uses locks in many other places within the database, allowing concurrent access to different resources. For example, manipulate the LRU list in the buffer pool, delete, add, move elements in the LRU list, and in order to ensure consistency, there must be a lock involved.

6.2. InnoDB lock 6.2.1, type of lock in the storage engine

The S lock shared lock allows a transaction to read a row of data.
The X lock exclusive lock allows a transaction to delete or update a single piece of data.
is lock intent shared lock transaction wants to get a shared lock for a few rows in a table.
IX lock intention to take his affairs. Want to get an exclusive lock on a few rows in a table.
Because the InnoDB storage engine supports row-level locks, intent locks do not actually block any requests other than full-table scans.

6.2.2, consistent, non-locking read operations

Consistent non-lock read (consistent nonlocking read) refers to the innodb of data in a row of the current execution time database by means of multiple versioning (multi versioning). The mechanism of non-locking reads greatly improves the concurrency of data reading, and is the default read method in the InnoDB engine, that is, the read will not occupy and wait for the lock on the representative.
Multi-versioning is implemented through snapshots, which are actually historical versions of the current data, and may have multiple versions. This technique, known as the line-to-multi-version technology, brings concurrency control called most of this concurrency control (multi version concurrency CONTROL,MVCC).
Under Read committed and Repeatable read (INNODB default transaction isolation level), the InnoDB storage engine uses a non-locking, consistent read. However, the definition of snapshot data is different. At the read commited level, for snapshot data, non-conforming reads always read the latest snapshot of the locked line. Under the repeatable level, for snapshot data, non-conforming reads always read the row data version at the beginning of the transaction.

6.2.3, SELECT ... For UPDATE &select ... LOCK in SHARE MODE

SELECT ... The for UPDATE can obtain an X lock.
SELECT ... Lock in SHARE MODE can obtain a s lock.
Note that this must be done using the display submission method, which adds Begin,start transaction or set autocommit = 0.

6.2.4, self-growth and lock

When inserting a table with a child growth counter, the Select MAX (auto_inc_col) from T for UPDATE is executed. The insert operation will be more of this self-growing counter value plus 1 for the self-growing column. This implementation is called Auto-inc Locking. This is a special kind of lock that, in order to increase concurrency, does not release after the transaction executes, but is released immediately after the statement executes.
Starting with the mysql-5.1.22 version, the InnoDB engine provides a lightweight mutex self-growth implementation mechanism that greatly improves the performance of sub-growth value insertions. And mysql-5.1.22 starts, the InnoDB engine provides a parameter Innodb_autoinc_lock_mode, the default value is 1. Before we discuss the new growth pattern, we need to classify the self-growth implementation pattern:
1.insert-like: Refers to all INSERT statements, such as INSERT, REPLACE, insert ... SELECT, REPLACE ... Select,load data and so on.
2.Simple Insert: A statement that determines the number of inserted rows before inserting, including insert, REPLACE, not insert ... On DUPLICATE KEY Update this type of statement.
3.Bulk inserts: A statement that does not determine the insertion line before insertion. such as insert ... Select,replace ... Select,load DATA.
4.mixed-mode inserts: Refers to a portion of which is a sub-growth, and part of it is determined.
There are now simple insert, BULK inserts, Mixed-mode inserts three types of INSERT statements, with Auto-inc locking (the earliest) and lightweight mutex self-growth two kinds of auto-increment locks.
1.innodb_autoinc_lock_mode=0 5.1.22 before the way, that is, all types of inserts are used auto-inc locking.
2.innodb_autoinc_lock_mode=1 This parameter is also the default after 5.1.22, for simple insert, using a lightweight mutex lock, for bulk INSERT, use Auto-inc locking.
3.innodb_autoinc_lock_mode=2 refers to the most efficient use of lightweight, mutually exclusive locks regardless of the circumstances, but only row-basereplication is used for replication, because statement-base Replication can be problematic.
Another is InnoDB and myisam a difference, innodb, self-growth must be an index, and must be the first column of the index, or it will be an error, MyISAM will not appear this problem.

6.2.5, foreign keys and locks

Foreign keys are primarily used for referential integrity constraint checking. InnoDB, for a foreign key column, if no index is displayed for the column, InnoDB automatically adds an index to it.

6.3, the lock algorithm

1.Record Lock, locks on single-line records, locks Index Records.
2.GapLock, Gap Lock can lock a range, but does not include the record itself, such as < 6 o'clock, can still insert 6.
3.next-keylock:gap lock + record lock, locks a range and locks the record itself, such as < 6, which is blocked when inserted 6 o'clock.
In repeatable read mode, the Next-keylock algorithm is the default row record locking algorithm.

6.4, lock problem

The lock problem will result in the loss of updates, phantom reading, dirty reading, non-repeatable reading, but InnoDB author only wrote three kinds of questions, may be phantom read through InnoDB Next-key lock solved, the author did not mention. These lock issues correspond to the 4 security levels for transaction isolation:
READ UNCOMMITTED (the lowest level of transaction isolation, where transaction isolation resolves the loss of updates, but there is a problem with dirty reads).
Read commited (the default isolation level for Oracle and SQL Server, which resolves dirty reads, but a transaction that reads more than once and has a non-repeatable read problem).
Read repeatable (Repeatable read, InnoDB the default transaction isolation level of the engine, resolves non-repeatable reads, but generates Phantom reads, INNODB through Next-key Lock to resolve Phantom reads).
SERIALIZABLE (string jargon, which can reduce performance by forcing a transaction sort to solve a phantom reading problem) Overall it seems InnoDB the default READ repeatable is great.

6.5, blocking

Locks that require other transactions in the InnoDB release the resources that it locks up, and this time a lock wait occurs, which is blocking. InnoDB engine has two parameters: Innodb_lock_wait_timeout used to set the waiting time, the default is 50 seconds, which is a dynamic parameter, can be adjusted at any time; innodb_rollback_on_ Timeout is used to set whether to roll back the in-progress transaction while waiting for a timeout, which is off by default, which means no rollback, which is a static parameter.

6.6. Deadlock

Deadlocks can cause blocking, so the time-out blocking is rolled back by 6.5 of the parameters. There is the time of development, each transaction on the table, the fields, the operation of the row, are sequential, so that the deadlock can be avoided to a large extent.

Go MySQL Technology insider InnoDB Storage Engine-table & index Algorithms and Locks

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.