A tutorial on using inplace and online to create an index under Mysql _mysql

Source: Internet
Author: User
Tags create index readable table definition

MySQL versions, for the Add index processing is different, there are three main types:

(1) Copy table method
This is the earliest supported way to create an index INNODB. As the name suggests, creating indexes is done through temporary table copies.

Create a temporary table with a new index, copy all of the original table data to a temporary table, and then rename to complete the index creation operation.

This way the index is created, and the original table is readable during the creation process. But it consumes up to a single storage space.

(2) InPlace mode
This is native MySQL 5.5, and the way the index is created in Innodb_plugin. The so-called inplace, that is, index creation on the original table directly, does not copy temporary tables. This is an improvement relative to the Copy table method.

The InPlace method creates an index, in which the original table is equally readable, but not writable.

(3) Online mode
This is the way to create an index provided in the MySQL 5.6.7. Whether it's a copy table method or a InPlace method, the original table can only be read and not writable during the index creation process. There is a large limit to the application, so in the latest version of MySQL, InnoDB supports the so-called online way to create an index.

InnoDB's online ADD index is the first way to create an index inplace, without using a temporary table. In the process of traversing a clustered index, collecting records and inserting them into a new index, the original table record can be modified. The modified record is saved in row log. When the clustered index is traversed and all inserted into the new index, the record changes in row log are replayed so that the new index and the clustered index record are in a consistent state.

Compared to the Copy table, the online add index uses the InPlace method, eliminating the need for copy table to reduce space overhead; Meanwhile, the online add index only locks the table when the last block of row log is replayed. Reduced the time to lock the table.

Compared with the inplace approach, Online ADD Index absorbs the advantages of the inplace approach, but reduces the time to lock the table.

1.Inplace Add Index

Test table

  CREATE TABLE t1 (a int primary key, b int) Engine=innodb;

  INSERT into T1 values (1,1), (2,2), (3,3), (4,4);

InPlace ADD Index Processing process

  ALTER TABLE T1 ADD index idx_t1_b (b);

Processing process

  Sql_table.cc::mysql_alter_table (); To determine whether the current operation can be implemented InPlace, inplace alter is not allowed to include://1. Auto increment field modification;//2. Rename named;//3.

    Row storage format modification; Mysql_compare_tables ()-> ha_innobase::check_if_incompatible_data ();

      InPlace CREATE index first stage (Main Stage) Handler0alter.cc::add_index ();

        ..//CREATE Index data dictionary row0merge.c::row_merge_create_index ();

        index = Dict_mem_index_create ();

         On each index data dictionary, there is a trx_id that records what the transaction//This trx_id does to create this index, and then looks down index->trx_id = trx_id;

            Read the clustered index, construct the item for the new index, sort and insert the new index row0merge.c::row_merge_build_indexes (); Read the clustered index, note: Read only the non-delete items//Skip all Deleted Items, why do you do this?

            Look down Row_merge_read_clustered_index ();

            File sort Row_merge_sort ();

    Sequentially reads the index entries in the sort file, inserts one by one into the new index row_merge_insert_index_tuples ();

    Waits for all read-only transactions to open the current table to be submitted sql_base.cc::wait_while_table_is_used (); Create index end, do final cleanup work Handler0alter.cc::final_add_index ();

 InPlace Add index Complete

InPlace ADD Index Implementation Analysis
After the index creation is complete, MySQL server can immediately use the new index to make the query. However, according to the above process, for me personally, there are three points of doubt:

Why do I need to maintain a trx_id in the index data dictionary?
What's the role of trx_id?

When traversing a clustered index to read all records, why can I skip deleting items?
Read-only non-deletion, there is no version information on the new index, unable to process the snapshot read of the original transaction;

MySQL server layer, why do I need to wait for a read-only transaction to open the table?
Waits for read-only transactions on the current table to ensure that these transactions are not used to the new index

Based on the analysis, it is better to wait for read-only transactions to open the table. Because there is no version information on the new index, the correct version record will not be read if these transactions use the new index.

So how does InnoDB deal with other old transactions that started before the index was created, but have not yet been submitted? These transactions will not be waiting to end because the current table was not read earlier. These transactions are not able to read the correct version record under the RR isolation level because there is no version information on the index used.

Of course, InnoDB also takes this into account and uses a more concise approach. Maintains a trx_id on the index identifying the transaction ID that created the index. If you have a transaction older than this, you intend to use the newly created index to read the snapshot, then you can make a direct error.

Consider the following concurrency process (transaction isolation level is RR):

Session 1: Session                               2:

//The global Readview

select * from T2 is created at this time;

                                       Delete from T1 where B = 1;

                                       On the Idx_t1_b index, there is no item

                                       ALTER TABLE T1 add index Idx_t1_b (b) for b = 1;

Because Readview gets

//So B = 1 before delete this item should be read to

select * from t1 where B = 1;

When session 1 executes the last select, the MySQL optimizer selects the Idx_t1_b index for the query, but the index does not have an entry for B = 1, which results in a query error. So, how does InnoDB deal with this situation?

Processing process:

... Ha_innobase::index_init ();

  Change_active_index ();

    To determine if the Readview of Session 1 transactions can see the transaction that the session 2 creates an index

    //Here, Session 2 transactions are certainly not visible, then Prebuilt->index_usable = False

    prebuilt->index_usable = row_merge_is_index_usable (Readview, index->trx_id);

... Ha_innobase::index_read ();

  The Index_usable property is judged, false, and the upper table definition modification is returned, and the query fails

  if (!prebuilt->index_usable) return

    ha_err_table_def_ CHANGED;

After the MySQL server receives the error returned by InnoDB, the error is reported to the user and the user receives the following error:

Mysql> SELECT * from t1 where B = 1;
ERROR 1412 (HY000): Table definition has changed, please retry transaction

2.Online Add Index

Test table

  CREATE TABLE t1 (a int primary key, b int) Engine=innodb;

  INSERT into T1 values (1,1), (2,2), (3,3), (4,4);

Online ADD Index Processing process

  ALTER TABLE T1 ADD index idx_t1_b (b);

Processing process

  Sql_table.cc::mysql_alter_table (); 1.

      Determines whether the current DDL operation can be inplace for Check_if_supported_inplace_alter (); ...//2.

      Start the preparatory work for online creation prepare_inplace_alter_table ();

        ...//modify the table data dictionary information prepare_inplace_alter_table_dict ();

        .../wait for InnoDB all background threads to stop the operation of this table dict_stats_wait_bg_to_stop_using_tables (); ...//online ADD index difference with inplace add index key//When the online operation, the original table can read and write, so you need///The modification operation in this procedure is recorded to row lo

          G among row0log.cc::row_log_allocate ();

          row_log_t* log = (row_log_t*) &buf[2 * Srv_sort_buf_size]; Identifies the current index status to online creation, then the//DML operation on this index is written to row Log instead of being updated on the index dict_index_set_online_status (index, Onli

      Ne_index_creation); ...//3.

      Start the operation of the real online ADD index (the most important process) inplace_alter_table (); The operation of this function, the first part and InPlace ADD index are basically consistent//read clustered indexes, sorted, and inserted into the new index//The biggest difference is that when the insert completes, Online ADD index//also needs

 Changes the record in row log to a new index     Row0merge.cc::row_merge_build_index (); ...///after the cluster index reads, sorts, inserts the new index the operation completes/enters the online and inplace real difference, also is the online operation//The essence part--will this process produces the row log heavy

          With Row0log.cc::row_log_apply (); Temporarily lock new index entire index tree//NOTE: Only temporary lock, not in the whole process of reusing row Log//Lock (prevent the lock time too long optimization, how to optimize?)

            ) Rw_lock_x_lock (Dict_index_get_lock (New_index)); 

          ...///InnoDB online operation The most important process//the algorithm for replaying row log in the online Copy table, recorded row log replay to new index//is as follows: 1. Record in Row log is online during the index creation, the DML operations on the original table//These operations include: Row_op_insert;row_op_delete_mark; ...//2. Row log is stored as block, and if there is more DML, then row logs may//may occupy multiple blocks. The row_log_t structure contains two pointers: the head and tail//head pointers are used to read the row log,tail pointer for appending a new row log; When reusing row log, the algorithm follows a primitive Then: Minimize the time that the index tree locks//(the index tree plus the x lock, also means that the new DML operation is prohibited on the table)//The scene in which the index tree needs to be locked://(a) reusing row log across the new Blo A short lock is required for CK;With the row Log block being the last blocks, the last blocks are always locked, and because the new DML operation is blocked, the new index record is consistent with the clustered index. State,//reuse phase end;//(c) When the row log on the middle row log block is applied, the new//DML operation can still be performed without locking, and the resulting R ow log is recorded on the last//ROW log block;//4. If you are creating a unique index, you may see//Violate uniqueness constraints when applying row log, which is recorded in//row_merge_dup_t structure Row_

            Log_apply_ops (TRX, Index, &dup);

              Row_log_apply_op ();

                Row_log_apply_op_low (); ...///Set the online row log for new index to NULL,//identifies the new index with the data already in full agreement with the clustered index//After this, the next DML operation without logging R

            ow Log dict_index_set_online_status ();

          Index->online_status = Online_index_complete;

          Index->online_log = NULL;

          Rw_lock_x_unlock (Dict_index_get_block (New_index));

      Row_log_free (); ...//4. The final step of Online Add index, do some follow-up finishing work Commit_inplace_alter_table ();


Online ADD Index Implementation Analysis
After reviewing the basic processing process implemented in the previous analysis of the InnoDB 5.6.7-RC version, the individual remains a few problems, the main problems are:

Does Online ADD index support a unique index?

The exact answer is: support (though there are bugs, back analysis). InnoDB supports online to create a unique index.

Now that you are supported, you will face the problem of check Duplicate key. What if the same key value in the index is handled in Row log? How do I detect the existence of the same key value?

InnoDB solution to this problem is also relatively easy to understand. It maintains a row_merge_dup_t data structure that stores a row log that violates the uniqueness violation encountered during row log replay. After the row log is applied, the external judge whether there is a unique conflict (how many unique conflicts are recorded) and the online creation of the unique index failed.

What is the structure of Row log, and how is it organized?

In the online ADD index process, the changes that are generated by DML are recorded in row log. First, Row log is not the InnoDB redo log, but the exclusive structure of each index that is being created online.

Online creates an index that follows the way in which you first create an indexed data dictionary and then populate the data. Therefore, when the index data dictionary is successfully created, the new DML operation can read the index and attempt an update. However, because the status state on the index structure is online_index_creation, these updates cannot be applied directly to the new index, but instead are placed in row log, waiting to be replayed on top of the index.

In Row log, manages the storage of the DML operation content in block mode. The size of a block is controlled by the parameter innodb_sort_buffer_size and the default size is 1M (1048576). In the initialization phase, ROW log requests two such blocks.

How long does it take to lock the table during row log replay?

In the previous process analysis, the problem of locking the table was also mentioned (the operation of the new index tree for the lock was implemented internally).

When you replay row log, there are two situations where you need to lock the table:

Situation one: After using a block, jump to the next block, you need to lock the table briefly to determine whether the next block is row log the last block. If it is not the last one, the jump completes, release the lock, use the row log in the block without locking, user DML operation can still proceed.

Case two: When using the last block, locks are held all the time. The new DML operation is not allowed at this time. After the last block replay is completed, the new index and the clustered index record are in a consistent state.

Comprehensive analysis of two lock table situation, the situation will continue to lock the table, but because it is only the last block, so the lock table time is also shorter, only briefly affect the user operation, in the low peak period, this effect is acceptable.

3. Does the Online ADD index have the same limitations as the InPlace method?

Because the online ADD index is also a inplace way, there are also problems with the online approach: the lack of version information on the new index makes it impossible to provide snapshot reading for old transactions.

More than that, the online approach is more restrictive than the inplace approach, not only for all transactions that are less than the creation of this index, but for all transactions that start during the creation of the new index, and the new index is not available.

This enhanced restriction, adjusted in the Rowmerge.cc::row_merge_read_clustered_index () function, assigns the trx_id of the new index to the largest transaction ID in the online row log after the clustered index traversal completes. All transactions that are less than this transaction ID are not allowed to use the new index after the completion of the index creation.

When reading data by traversing a clustered index, the current version of the record is read, so does this record exist in row log? InnoDB How to deal with this situation?

First of all, the answer is yes. When you traverse a clustered index to read the latest version of a record, these records may be new transactions modified/inserted. These records are in the traversal phase and have been applied to the new index, at the same time, the operations of these records are also recorded in row log, and a record exists in the new index, as well as in row log.

Of course, InnoDB has taken the issue into account. In the process of replaying row log, for each record in row log, it is first judged whether it already exists in the new Index (Row0log.c::row_log_apply_op_low ()), and if so, the current row log can be skipped (or the action type is converted )。

For example, the record in row log is an insert operation, and if the insert record already exists in the new index, the record in row log can be dropped directly (if the entry is exactly the same as the Insert entry), or the insert is converted to an update action (Row Log records in the new index, and some of the index columns are different);

Is there a bug in Online ADD index?

The answer is also yes, there are bugs.

There is a bug in which the recurrence scenario is as follows:

CREATE TABLE t1 (a int primary key, b int, C char ()) Engine=innodb;

INSERT into T1 (b,c) VALUES (1, ' aaaaaaa ');

Ensure enough data to

insert into T1 (b,c) select b,c from T1;

INSERT into T1 (b,c) select b,c from T1;

INSERT into T1 (b,c) select b,c from T1;

... Max (a) = 196591

Select Max (a) from T1;

There is also no same item in B

Update T1 set B = A;

Session 1 session                                   2

ALTER TABLE T1 add unique index idx_t1_b (b);

                                           INSERT into T1 (b,c) VALUES (196592, ' B ');

                                           This update produces b=196589 duplicates

                                           update T1 set b=196589 where a=196582;

                                           Delete from t1 where a = 262127;

In the above test, first prepare enough data for the table, the purpose is session 1 to do online ADD index reading cluster indexing stage, the sessions 2 new records can be read.

After session 1 's online ADD index completes (successfully), the following two commands are executed, with the following results:

Mysql> Show CREATE table T1;

+ ——-+ ———————————————— –

| Table | Create Table

+ ——-+ ————————————————-

| t1 | CREATE TABLE ' t1 ' (

' a ' int (one) not null auto_increment,

' b ' int (one) default null,

' C ' char () default NULL,

PRIMARY KEY (' a '),

UNIQUE KEY ' Idx_t1_b ' (' B ')

) engine=innodb auto_increment=262129 DEFAULT CHARSET=GBK |

+ ——-+ ———————————————— –

mysql> select * T1 where a In (196582,196589);

+--–+--–+ ——— +

| a | b | c |

+--–+--–+ ——— +

| 196582 | 196589 |


| 196589 | aaaaaaa | +--–+--–+ ——— +

2 rows in Set (0.04 sec)

As you can see, there is already a unique index on B, but there are two values in the table that have the same value of 196589.

This bug, which handles the replay of row log, is not considered in detail. Therefore, before the MySQL 5.6 version of stability, use caution!

Online ADD Index can be used for reference
In the MySQL 5.6.7 to learn two file operation functions: First, the posix_fadvise () function, specify Posix_fadv_dontneed parameters, can be read and write does not cache:improving Linux performance by Preserving Buffer Cache State unbuffered I/O in Linux, and the Fallocate () function, which specifies the Falloc_fl_punch_hole parameter, can be emptied when read: Linux Programmer ' s Manual fallocate (2) has similar needs for friends that can be tried.

posix_fadvise function +posix_fadv_dontneed parameters, the main function is to discard files in the cache of clean blocks. Therefore, if the user does not want a file to occupy too much of the file system cache, you can periodically call Fdatasync (), and then Posix_fadvise (posix_fadv_dontneed), empty the file in the cache of clean blocks, Good function!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.