Tutorial _ MySQL

Source: Internet
Author: User
Tags table definition
This article describes how to create an index using Inplace and Online in MySQL. if InnoDB is a storage engine, you can refer to MySQL versions, the processing methods for add Index are different. There are three main methods:

(1) Copy Table
This is the earliest index creation method supported by InnoDB. As the name suggests, creating an index is achieved through temporary table copy.

Create a temporary table with a new index, copy all the data in the original table to the temporary table, and Rename the table to create the index.

In this way, you can create an index. during the creation, the original table is readable. However, it consumes twice the storage space.

(2) Inplace method
This is the way to create indexes in native MySQL 5.5 and innodb_plugin. The so-called Inplace, that is, the index is created directly on the original table without copying the temporary table. Compared with the Copy Table method, this is an improvement.

Index creation in Inplace mode. during the creation process, the original table is also readable, but cannot be written.

(3) Online mode
This is the index creation method provided in MySQL 5.6.7. Whether in Copy Table or Inplace mode, the original Table can only be read and cannot be written during index creation. Therefore, InnoDB supports creating indexes Online in the latest MySQL version.

The Online Add Index of InnoDB is created using Inplace instead of temporary tables. During the process of traversing the clustered index, collecting records and inserting them into the new index, the original table records can be modified. The modified records are stored in the Row Log. After clustering index traversal is complete and all indexes are inserted to the new index, replays the record modification in the Row Log to make the new index consistent with the clustered index record.

Compared with the Copy Table method, Online Add Index adopts the Inplace method, which eliminates the need to Copy the Table and reduces the space overhead. at the same time, online Add Index only locks the table when the last Block of the Row Log is replayed, reducing the lock table time.

Compared with the Inplace method, Online Add Index absorbs the advantage of the Inplace method, but reduces the table lock time.


1. Inplace add Index


Test table

  create table t1 (a int primary key, b int)engine=innodb;  insert into t1 values (1,1),(2,2),(3,3),(4,4);

Inplace Add Index processing process
SQL

  alter table t1 add index idx_t1_b(b);

Process

SQL _table.cc: mysql_alter_table (); // determines whether the current operation can be implemented in Inplace. Inplace Alter cannot be performed, including: // 1. modify the Auto Increment field; // 2. rename a column; // 3. modify the row storage format. for example, mysql_compare_tables ()-> ha_innobase: check_if_incompatible_data (); // The first stage of Inplace index creation (main stage) handler0alter. cc: add_index ();... // Create the index data dictionary row0merge. c: row_merge_create_index (); index = dict_mem_index_create (); // There is a trx_id in each index data dictionary, which records the transaction for creating this index. // What is the function of this trx_id, next, let's look at index-> trx_id = trx_id; // read the clustered index, construct the items of the new index, sort and insert the new index row0merge. c: row_merge_build_indexes (); // read the clustered index. note: Read only non-deleted items. // skip all deleted items. why? Look down at row_merge_read_clustered_index (); // sort row_merge_sort () in the file; // read the index items in the sort file sequentially, insert row_merge_insert_index_tuples () in the new index one by one (); // wait for all the read-only transactions in the current table to be opened and submit SQL _base.cc: wait_while_table_is_used (); // The End of index creation is handler0alter. cc: final_add_index (); // Inplace add Index is complete

Inplace Add Index for analysis
After the index is created, MySQL Server can use the new index for query immediately. However, based on the above process, I personally have three questions:

Why do we need to maintain a trx_id in the index data dictionary?
What is the role of trx_id?

Why can I skip deleting items when reading all records through the clustered index?
If you only read non-deleted items, the new index does not have version information and cannot process the snapshot reads of the original transaction;

At the MySQL Server layer, why wait for the read-only transaction commit to open the table?
Wait for the read-only transactions on the current table to ensure that these transactions do not use the new index.

According to the analysis, it is easy to understand that the read-only transaction waiting to open the table ends. Because there is no version information on the new index, if these transactions use the new index, the correct version record will not be read.

So how does InnoDB handle other old transactions that have been started but not committed before the index is created? These transactions are not waiting to end because the current table is not read in the early stage. At the RR isolation level, these transactions cannot read the correct version records because the indexes used do not have version information.

Of course, InnoDB has also considered this issue and adopted a relatively brief solution. Maintain a trx_id on the index to identify the transaction ID for creating the index. If a transaction older than this transaction intends to use the newly created index for snapshot reading, an error is reported directly.

Consider the following concurrent processing process (the transaction isolation level is RR ):

Session 1: session 2: // create Global ReadViewselect * from t2; delete from t1 where B = 1; // idx_t1_ B index, alter table t1 add index idx_t1_ B (B ); // since ReadView is obtained before delete // therefore, B = 1 should be read to select * from t1 where B = 1;

When session 1 executes the last select statement, MySQL Optimizer selects the idx_t1_ B index for query, but the index does not contain items B = 1. using this index will cause query errors. So how does InnoDB handle this situation?

Process:

... Ha_innobase: index_init (); change_active_index (); // judge whether the ReadView of session 1 transaction can see the transaction of session 2 index creation. // Here, session 2 transactions are of course invisible, so prebuilt-> index_usable = false prebuilt-> index_usable = row_merge_is_index_usable (readview, index-> trx_id );... Ha_innobase: index_read (); // determines the index_usable attribute. if this parameter is set to false, the system returns the upper-level table definition modification. if (! Prebuilt-> index_usable) return HA_ERR_TABLE_DEF_CHANGED;

After the MySQL Server receives the error returned by InnoDB, it reports the error to the user. the user will receive the following error:

mysql> select * from t1 where b = 1;

ERROR 1412 (HY000): Table definition has changed, please retry transaction

2. Online add Index

Test table

  create table t1 (a int primary key, b int)engine=innodb;  insert into t1 values (1,1),(2,2),(3,3),(4,4);

Online Add Index processing process
SQL

  alter table t1 add index idx_t1_b(b);

Process

SQL _table.cc: mysql_alter_table (); // 1. determine whether the current DDL operation can be performed in Inplace check_if_supported_inplace_alter ();... // 2. start preparations for Online creation. prepare_inplace_alter_table ();... // Modify the table's data dictionary information prepare_inplace_alter_table_dict ();... // Wait for all background threads of InnoDB to stop the operation on this table dict_stats_wait_bg_to_stop_using_tables ();... // The key difference between Online Add Index and Inplace Add Index // during Online operations, the original table can be read and written at the same time, therefore, you need to // record the modification operation in this process to row0log in row log. cc: row_log_allocate (); row_log_t * log = (row_log_t *) & buf [2 * srv_sort_buf_size]; // identifies that the current index is created Online, then the // DML operation on this index will be written into the Row Log, instead of updating dict_index_set_online_status (index, ONLINE_INDEX_CREATION) on the index );... // 3. start the real Online Add Index operation (the most important process) inplace_alter_table (); // This function operation, the first part is basically the same as the Inplace Add Index. // read the clustered Index, sort it, and insert it into the new Index. // The biggest difference is that after the insert is complete, online Add Index // You also need to update the record changes in the row log to row0merge in the new Index. cc: row_merge_build_index ();... // After reading, sorting, and inserting a new index in a cluster, // The real difference between Online and Inplace is displayed, it is also the essence of Online operations // -- reuse the Row Log generated in this process row0log. cc: row_log_apply (); // temporarily lock the entire index tree of the new index. // Note: it is only temporary, it is not always locked in the whole process of reusing the Row Log. (to prevent the optimization of lock taking too long, how can we optimize it ?) Rw_lock_x_lock (dict_index_get_lock (new_index ));... // The most important processing flow for InnoDB Online operations // replays the Row Log recorded in the Online Copy Table to the new index. // The algorithm for replaying the Row Log is as follows: // 1. row Log records DML operations on the original table during Online index creation. // These operations include: ROW_OP_INSERT; ROW_OP_DELETE_MARK ;... // 2. Row Logs are stored in Blocks. if there are many DML instances, Row Logs may occupy multiple Blocks. The row_log_t structure contains two pointers: the head and tail // head pointers are used to read the Row Log, and the tail pointer is used to append the new Row Log; // 3. when reusing Row Log, the algorithm follows the principle of minimizing the time required to lock the index tree (adding an X lock to the index tree also means that new DML operations are disabled on the table) // the scenario in which the index tree needs to be locked: // (1) temporarily lock when the Row Log is reused to span the new Block; // (2) if the Row Log Block of the application is the last Block, the lock will always be applied. // The last Block of the application is disabled because the new DML operation is disabled, the new index record is consistent with the clustered index, and the reuse stage ends. // (3) no locks are required when the Row Log on the row log Block in the middle of the application, the new // DML operation can still be performed, and the row log generated is recorded to the last // Row Log Block; // 4. if you create a Unique index, you may encounter a // violation of the uniqueness constraint when applying the Row Log, which will be recorded in the row_log_apply_ops (trx, index, & dup); row_log_apply_op (); row_log_apply_op_low ();... // Set the Online row log of the New Index to NULL, // The data that identifies the New Index is already exactly the same as that of the clustered Index // after this, the New DML operation, no need to record Row Log commit (); index-> online_status = ONLINE_INDEX_COMPLETE; index-> online_log = NULL; rw_lock_x_unlock (dict_index_get_block (new_index); row_log_free ();... // 4. perform the final step of Online Add Index and make some final work commit_inplace_alter_table ();...

Online Add Index for analysis
After reading the basic processing procedures in InnoDB 5.6.7-RC analyzed earlier, I still have several issues:

Does Online Add Index support Unique indexes?

The exact answer is: Yes (but there are bugs, which will be analyzed later ). InnoDB supports creating Unique indexes Online.

With support, you will be faced with the Check Duplicate Key issue. What if the same key value exists in the Row Log as that in the index? How can I check whether the same key value exists?

The solutions for InnoDB to solve this problem are also easy to understand. It maintains a row_merge_dup_t data structure and stores the Row logs that violate the uniqueness conflict during Row log replay. After the Row Log is applied, the system checks whether a Unique conflict exists (the number of Unique conflicts is recorded) and the system fails to create a Unique index Online.

What structure is Row Log and how is it organized?

During the Online Add Index process, the changes generated by concurrent DML are recorded in the Row Log. First, Row Log is not the Redo Log of InnoDB, but the exclusive structure of each index being created Online.

Creating an index Online follows the method of creating an index data dictionary first and then filling in data. Therefore, after the index data dictionary is created successfully, the new DML operation can read the index and try to update it. However, because the status of the index structure is ONLINE_INDEX_CREATION, these updates cannot be directly applied to the new index, but are placed in the Row Log, waiting to be replayed to the index.

Row Log manages the storage of DML operations in Block mode. The size of a Block is controlled by the innodb_sort_buffer_size parameter. the default size is 1 MB (1048576 ). In the initialization phase, Row Log applies for two such blocks.

How long does it take to lock the table during Row Log replay?

The lock table problem is also mentioned in the previous process analysis (the internal implementation of creating an index tree for the lock ).

When replaying the Row log, you need to lock the table in two cases:

Scenario 1: When a Block is used and redirected to the next Block, you need to temporarily lock the table and determine whether the next Block is the last Block of the Row Log. If it is not the last one, the lock will be released after the jump is completed. if the row log in the Block is used, the user's DML operation can still be performed.

Case 2: the lock is always held when the last Block is used. New DML operations are not allowed. After the last Block is replayed, the new index is consistent with the clustered index record.

To sum up the analysis of the two lock tables, situation 2 will continue to lock the table, but because it is only the last Block, so the lock table time is also short, only a short impact on user operations, in the off-peak period, this effect is acceptable.

3. does Online Add Index have the same restrictions as Inplace?

Because Online Add Index is also an Inplace method, there is also a problem with the Inplace method: the new Index lacks version information, so it cannot provide snapshot reading for old transactions.

In addition, compared with the Inplace method, the Online method has more constraints. not only can all transactions earlier than the Index be created use new indexes, but also all transactions started during the new Index creation process, you cannot use a new index.

This enhancement limit is available in rowmerge. cc: row_merge_read_clustered_index () function. after clustering index traversal is complete, assign the trx_id of the new index to the largest transaction ID in Online Row Log. After the index is created, new indexes cannot be used for all transactions that are less than the transaction ID.

When reading data through the clustered index, the latest record version is read. Will this record also exist in the Row Log? How does InnoDB handle this situation?

First, the answer is yes. When the clustered index is used to read the latest version of the record, these records may be modified/inserted by new transactions. These records have been applied to the new index during the traversal phase. at the same time, the operations of these records are also recorded in the Row Log, and a record exists on the new index, it also exists in Row Log.

Of course, InnoDB has considered this issue. During the Row Log replay process, each record in the Row Log will first determine whether it already exists in the new index (row0log. c: row_log_apply_op_low (). If yes, the current Row Log can be skipped (or the operation type can be converted ).

For example, Row Log records an INSERT operation. if the INSERT record already exists in the new index, it can be discarded directly (if there are items exactly the same as the INSERT items), or the INSERT operation is converted to UPDATE (the Row Log record is different from the record in the new index, and some index columns are different );

Is there a Bug in Online Add Index?

The answer is yes, and there is a Bug.

There is a Bug. the reproduction scheme is as follows:

Create table t1 (a int primary key, B int, c char (250) engine = innodb; insert into t1 (B, c) values (1, 'aaaaaaa '); // ensure that the data volume is sufficient. insert into t1 (B, c) select B, c from t1; insert into t1 (B, c) select B, c from t1; insert into t1 (B, c) select B, c from t1 ;... // Max (a) = 196591 select max (a) from t1; // no identical update t1 set B = a in B; session 1 session 2 alter table t1 add unique index idx_t1_ B (B); insert into t1 (B, c) values (196592, 'B'); // this update, duplicate items of B = 196589 will be generated. update t1 set B = 196589 where a = 196582; delete from t1 where a = 262127;

In the above tests, we first prepare enough data for the table to read the clustered Index during Online Add Index reading in session 1. new records in session 2 can also be read.

After the Online Add Index of session 1 is complete (successful), run the following two commands:

mysql> show create table t1;

+——-+————————————————–| Table | Create Table+——-+————————————————–| t1 | CREATE TABLE `t1` (`a` int(11) NOT NULL AUTO_INCREMENT,`b` int(11) DEFAULT NULL,`c` char(250) DEFAULT NULL,PRIMARY KEY (`a`),UNIQUE KEY `idx_t1_b` (`b`)) ENGINE=InnoDB AUTO_INCREMENT=262129 DEFAULT CHARSET=gbk |+——-+————————————————–mysql> select * from t1 where a in (196582,196589);+——–+——–+———+| a | b | c |+——–+——–+———+| 196582 | 196589| aaaaaaa || 196589 | 196589| aaaaaaa |+——–+——–+———+2 rows in set (0.04 sec)

As you can see, B already has a Unique index, but there are two identical values in the table with 196589 values.

This Bug is caused by the Row Log replay process, which is not fully considered. Therefore, use it with caution before MySQL 5.6 is stable!

Online Add Index
Two file operation functions are learned in MySQL 5.6.7: posix_fadvise () function, which specifies the POSIX_FADV_DONTNEED parameter, so that read and write data is not cached: improving Linux performance by preserving Buffer Cache State unbuffered I/O in Linux; the second is the fallocate () function, which specifies the FALLOC_FL_PUNCH_HOLE parameter, which can be cleared during read: linux Programmer's Manual FALLOCATE (2) can be used if you have similar requirements.

Posix_fadvise function + POSIX_FADV_DONTNEED parameter, the main function is to discard the clean blocks file in the Cache. Therefore, if you do not want a file to occupy too much file system Cache, you can call fdatasync () on a regular basis, and then run posix_fadvise (POSIX_FADV_DONTNEED) to clear the clean blocks of the file in the Cache, good features!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.