Mysql Custom Hash index delivers huge performance gains

Source: Internet
Author: User
Tags crc32 joins sha1 time 0

There is a business scenario that needs to be compared in 2 tables that exist in table A, and do not exist in table B data. The table structure is as follows:

T_settings_backup | CREATE TABLE ' t_settings_backup ' (  ' FID ' bigint () not NULL auto_increment,  ' Fuserid ' bigint (a) NOT null Commen T ' User ID ',  ' fdevice ' varchar (+) ' NOT null default ' COMMENT ' user device number (SN) ',  ' fappid ' varchar (UP) ' NOT NULL ' COMMENT ' app id ',  ' fkeyid ' varchar (+) NOT null default ' COMMENT ' Set item ID ',  ' fcontent ' varchar (+) NOT NULL Defaul  T ' COMMENT ' Set item contents ',  ' Fupdatetime ' datetime not NULL DEFAULT ' 1970-01-01 00:00:00 ' COMMENT ' Modified time ',  ' fcreatetime ' DateTime not NULL DEFAULT ' 1970-01-01 00:00:00 ' COMMENT ' creation time ',  PRIMARY key (' FID '),  UNIQUE KEY ' Udx_userid_devic E_appid_keyid ' (' Fuserid ', ' fdevice ', ' fappid ', ' Fkeyid ')) Engine=innodb auto_increment=21934 DEFAULT charset=utf8mb4

The table above is defined as a table, record number: 21933

Table B is structured as follows, number of records: 4794959

CREATE TABLE ' meizu_device_tmp_1 ' (  ' id ' int (one) unsigned NOT null default ' 0 ',  ' IMEI ' bigint (a) ' NOT NULL default ' 0 ' COMMENT ' imei ',  ' sn ' varchar (CHARACTER) SET UTF8 not NULL DEFAULT ' COMMENT ' sn ',  UNIQUE KEY ' imei ' (' IME I '),  UNIQUE KEY ' sn ' (' sn ')) Engine=innodb DEFAULT CHARSET=UTF8MB4

The SN of Fdevice and B of a is the associated field, and now requires the number of records that Fdevice A is not in B. Naturally think of the following left JOIN

Mysql> explain select A.fdevice from T_settings_backup A left joins Meizu_device_tmp_1 B on a.fdevice=b.sn where b.sn i s null;+----+-------------+-------+-------+---------------+-------------------------------+---------+------+--- ------+--------------------------------------+| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |+----+-------------+-------+-------+---------------+-------------------------------+  ---------+------+---------+--------------------------------------+| 1 | Simple | A | Index | NULL | Udx_userid_device_appid_keyid | 654 |   NULL | 22232 |  Using Index | | 1 | Simple | B | Index | NULL | sn | 62 | NULL | 4772238 | Using where; Using index; NOT exists |+----+-------------+-------+-------+---------------+-------------------------------+---------+------ +---------+--------------------------------------+rows in Set (0.00 sec) 

  

The execution time is more than 1 hours, wait for the result to kill directly.

Analysis of the above execution plan, two tables are used to cover the index, each table has no filter conditions, so need to scan all rows, 2W times 470W is a huge number, the actuator is constantly doing the internal loop judgment, until the completion of 22232*4772238 times. In addition to this large number of cycles, there are 2 other areas to consider in this implementation plan.

1) Type=index 2) Key_len

The type=index performs only more efficiently than full-table scans and, in some cases, worse than all scans. Key_len is larger, indicating that the index is too long. The index of a table is a combined index of 4 fields, the useful comparison field is only Fdevice, in order to overwrite the index optimizer to add all fields to judge.

For Key_len There are two questions 1) Why is the key_len=654 of a watch? 2) Why is the ken_len=62 of table B?

Optimization Key_len, considering the business characteristics, Fuserid must be greater than 0, change SQL, the execution plan looks better, type=range,key_len=8, in fact, a table only uses the Fuserid field index, the leftmost prefix, Fdevice is judged by where.

Mysql> desc Select A.fdevice from T_settings_backup A left joins Meizu_device_tmp_1 B on A.FDEVICE=B.SN where A.fuserid >0 and B.SN is null;+----+-------------+-------+-------+-------------------------------+----------------------- --------+---------+------+---------+--------------------------------------+| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |+----+-------------+-------+-------+-------------------------------+----------------  ---------------+---------+------+---------+--------------------------------------+| 1 | Simple | A | Range | Udx_userid_device_appid_keyid | Udx_userid_device_appid_keyid | 8 |   NULL | 11116 | Using where;  Using Index | | 1 | Simple | B | Index | NULL | sn | 62 | NULL | 4911049 | Using where; Using index; NOT exists |+----+-------------+-------+-------+-------------------------------+-------------------------------+---------+------+---------+-------- ------------------------------+rows in Set (0.01 sec)

The execution time is still very long, can't wait to kill directly.

Optimization to this step, what other tricks can improve execution performance? Seems to have come to an end.

Recalling the next two table related fields, A.FDEVICE=B.SN, two fields are strings, in the data type of consideration, naturally think, is it possible to compare the record comparison field from the string to the comparison of the number? This is a direction of optimization. The bottom of the computer in the data are 01010 this way, only need to convert the number to 0101 can do the equivalent comparison, but into the character, you need to go to the character encoding table to find the corresponding number of characters, in the number converted to 0101, here to find a step more. On the other hand, characters occupy much more space than numbers, and a page can have fewer item entries than numbers, which results in more data page reads.

According to this direction, try to use a custom hash index, the common hash function has MD5,PASSWORD,CRC32,SHA1, etc., only the CRC32 hash after the value of the digital type.

Mysql> Select MD5 (' Sdsafa '), password (' Sdsafa '), CRC32 (' Sdsafa '), SHA1 (' Sdsafa '); +------------------------------  ----+-------------------------------------------+-----------------+------------------------------------------+| MD5 (' Sdsafa ') | Password (' Sdsafa ') | CRC32 (' Sdsafa ') | SHA1 (' Sdsafa ') |+----------------------------------+-------------------------------------------+--------------- --+------------------------------------------+| c5067032ca64a35620fc5c75aa42265c | *45abb21dbd1e6a5659e05f1ebaf589a3b39eb835 | 1766538443 | B9349F6A0B8138E3E6461745FD257678EEFEB9A2 |+----------------------------------+---------------------------------- ---------+-----------------+------------------------------------------+row in Set (0.00 sec)

Add a field in the table to record the value after the hash and index the field.

mysql> CREATE TABLE ' Meizu_device_tmp_3 ' (-id ' int ') unsigned not NULL DEFAULT ' 0 ', ' IMEI ' big Int (a) NOT null default ' 0 ' COMMENT ' imei ', ' sn ' varchar ' CHARACTER SET UTF8 not null default ' COMMENT ' s n ', ' Hash_sn ' bigint () DEFAULT NULL, UNIQUE key ' Imei ' (' IMEI '), key ' Hash_sn ' (' hash_s n ') Engine=innodb DEFAULT charset=utf8mb4; Query OK, 0 rows affected (0.20 sec) mysql> insert into Meizu_device_tmp_3 select ID,IMEI,SN,CRC32 (SN) from Meizu_device _tmp_1; Query OK, 4794959 rows affected (1 min 50.31 sec) records:4794959 duplicates:0 warnings:0mysql> select * from Meizu _device_tmp_3 limit 1;+----------+---------+--------------------+------------+| ID | IMEI | sn | HASH_SN |+----------+---------+--------------------+------------+| 23930528 | 1311265 | mx21ca2alhr2460302 | 2330453935 |+----------+---------+--------------------+------------+row in Set (0.00 sec)

When querying, the associated fields are CRC32 calculated and then compared, so it becomes a comparison of numbers and numbers, and the driver table comparison fields are indexed.

But the CRC32 algorithm may have a hash collision, that is, different values hash out the result is the same, this "hit" on the. In order to avoid collisions caused by the results of inaccurate comparisons, after the hash comparison, then do a comparison of the original value.

The query statement after optimization is like this

Mysql> desc Select A.fdevice from T_settings_backup A left joins Meizu_device_tmp_3 B on CRC32 (a.fdevice) =B.HASH_SN and A.FDEVICE=B.SN where b.sn is null;+----+-------------+-------+-------+---------------+---------------------------- ---+---------+------+-------+-------------------------+| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |+----+-------------+-------+-------+---------------+-------------------------------+---------+---  ---+-------+-------------------------+| 1 | Simple | A | Index | NULL | Udx_userid_device_appid_keyid | 654 | NULL | 22232 |  Using Index | | 1 | Simple | B | Ref | HASH_SN | HASH_SN | 9 |     Func | 1 | Using where; NOT exists |+----+-------------+-------+-------+---------------+-------------------------------+---------+------ +-------+-------------------------+rows in Set (0.00 sec)

Great changes were driven by the rows=1 of the table. SQL execution time 0.38 seconds.

Hash index has such a great advantage, but there are many shortcomings

1) hash can not deal with the range comparison, can only deal with equivalent comparison.

2) hash can not be ordered, hash out the results are randomly distributed.

3) hash does not support partial indexes, such as index A (10) is not supported.

4) hash cannot overwrite index

5) Hash has a collision, collision is more severe, the cost of handling collisions is relatively high.

Mysql Custom Hash index delivers huge performance gains

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.