Optimization of innodb table count () in mysql

Source: Internet
Author: User
Translator: ye Jinrong (imysql # imysql.com), source: imysql.com, please reprint it. Cause: It is too slow to make count (*) Statistics on the innodb table, so try to see if you can hurry up. Symptom: Let's take a look at several test cases. 1. Test showcreatetablesbtestG on the sbtest table ***********

Ye Jinrong (imysql # imysql.com), source: http://imysql.com, welcome to reprint. Cause: It is too slow to make count (*) Statistics on the innodb table, so try to see if you can hurry up. Symptom: Let's take a look at several test cases. 1. Test the show create table sbtest \ G ************ on the sbtest table ***********

Ye Jinrong (imysql # imysql.com>), source: http://imysql.com, welcome to reprint.

Cause: It is too slow to make count (*) Statistics on the innodb table, so try to see if you can hurry up.
Symptom: Let's take a look at several test cases as follows:
I. Test the sbtest table

show create table sbtest\G*************************** 1. row ***************************Table: sbtestCreate Table: CREATE TABLE `sbtest` (`aid` bigint(20) unsigned NOT NULL auto_increment,`id` int(10) unsigned NOT NULL default '0',`k` int(10) unsigned NOT NULL default '0',`c` char(120) NOT NULL default '',`pad` char(60) NOT NULL default '',PRIMARY KEY  (`aid`),KEY `k` (`k`),KEY `id` (`id`)) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=latin1show index from sbtest;+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| Table  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| sbtest |          0 | PRIMARY  |            1 | aid         | A         |     1000099 |     NULL | NULL   |      | BTREE      |         || sbtest |          1 | k        |            1 | k           | A         |          18 |     NULL | NULL   |      | BTREE      |         || sbtest |          1 | id       |            1 | id          | A         |     1000099 |     NULL | NULL   |      | BTREE      |         |+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

1 million records are filled.
1. Direct count (*)

explain SELECT COUNT(*) FROM sbtest;+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------------+| id | select_type | table  | type  | possible_keys | key     | key_len | ref  | rows    | Extra       |+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------------+|  1 | SIMPLE      | sbtest | index | NULL          | PRIMARY | 8       | NULL | 1000099 | Using index |+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------------+SELECT COUNT(*) FROM sbtest;+----------+| COUNT(*) |+----------+|  1000000 |+----------+1 row in set (1.42 sec)

We can see that if no conditions are added, the optimizer uses the primary key for scanning first.
2. count (*) uses the primary key field as a condition

explain SELECT COUNT(*) FROM sbtest WHERE aid>=0;+----+-------------+--------+-------+---------------+---------+---------+------+--------+--------------------------+| id | select_type | table  | type  | possible_keys | key     | key_len | ref  | rows   | Extra                    |+----+-------------+--------+-------+---------------+---------+---------+------+--------+--------------------------+|  1 | SIMPLE      | sbtest | range | PRIMARY       | PRIMARY | 8       | NULL | 485600 | Using where; Using index |+----+-------------+--------+-------+---------------+---------+---------+------+--------+--------------------------+SELECT COUNT(*) FROM sbtest WHERE aid>=0;+----------+| COUNT(*) |+----------+|  1000000 |+----------+1 row in set (1.39 sec)

As you can see, although the optimizer thinks that it only needs to scan 485600 records (actually an index), it is much less than just now, but it still needs to perform full table (INDEX) scanning. Therefore, the time consumption is equivalent to the first one.

3. count (*) uses the secondary index field as a condition.

explain SELECT COUNT(*) FROM sbtest WHERE id>=0;+----+-------------+--------+-------+---------------+------+---------+------+--------+--------------------------+| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows   | Extra                    |+----+-------------+--------+-------+---------------+------+---------+------+--------+--------------------------+|  1 | SIMPLE      | sbtest | range | id            | id   | 4       | NULL | 500049 | Using where; Using index |+----+-------------+--------+-------+---------------+------+---------+------+--------+--------------------------+SELECT COUNT(*) FROM sbtest WHERE id>=0;+----------+| COUNT(*) |+----------+|  1000000 |+----------+1 row in set (0.43 sec)

We can see that this query method is very fast.
Someone may ask, is it because the length of the id field is smaller than that of the aid field, which leads to faster scanning? Let's take a look at the following test example.
2. Test on sbtest1 table

show create table sbtest1\G*************************** 1. row ***************************Table: sbtest1Create Table: CREATE TABLE `sbtest1` (`aid` int(10) unsigned NOT NULL AUTO_INCREMENT,`id` bigint(20) unsigned NOT NULL DEFAULT '0',`k` int(10) unsigned NOT NULL DEFAULT '0',`c` char(120) NOT NULL DEFAULT '',`pad` char(60) NOT NULL DEFAULT '',PRIMARY KEY (`aid`),KEY `k` (`k`),KEY `id` (`id`)) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=latin1show index from sbtest1;+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| Table   | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| sbtest1 |          0 | PRIMARY  |            1 | aid         | A         |     1000099 |     NULL | NULL   |      | BTREE      |         || sbtest1 |          1 | k        |            1 | k           | A         |          18 |     NULL | NULL   |      | BTREE      |         || sbtest1 |          1 | id       |            1 | id          | A         |     1000099 |     NULL | NULL   |      | BTREE      |         |+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

In this table, the length of the aid and id fields is changed and 10 million records are filled.
1. Direct count (*)

explain SELECT COUNT(*) FROM sbtest1;+----+-------------+---------+-------+---------------+---------+---------+------+---------+-------------+| id | select_type | table   | type  | possible_keys | key     | key_len | ref  | rows    | Extra       |+----+-------------+---------+-------+---------------+---------+---------+------+---------+-------------+|  1 | SIMPLE      | sbtest1 | index | NULL          | PRIMARY | 4       | NULL | 1000099 | Using index |+----+-------------+---------+-------+---------------+---------+---------+------+---------+-------------+SELECT COUNT(*) FROM sbtest1;+----------+| COUNT(*) |+----------+|  1000000 |+----------+1 row in set (1.42 sec)

We can see that if no conditions are added, the optimizer uses the primary key for scanning first.
2. count (*) uses the primary key field as a condition

explain SELECT COUNT(*) FROM sbtest1 WHERE aid>=0;+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+| id | select_type | table   | type  | possible_keys | key     | key_len | ref  | rows   | Extra                    |+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+|  1 | SIMPLE      | sbtest1 | range | PRIMARY       | PRIMARY | 4       | NULL | 316200 | Using where; Using index |+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------------+1 row in set (0.00 sec)SELECT COUNT(*) FROM sbtest1 WHERE aid>=0;+----------+| COUNT(*) |+----------+|  1000000 |+----------+1 row in set (1.42 sec)

As you can see, although the optimizer thinks that it only needs to scan 485600 records (actually an index), it is much less than just now, but it still needs to perform full table (INDEX) scanning. Therefore, the time consumption is equivalent to the first one.

3. count (*) uses the secondary index field as a condition.

explain SELECT COUNT(*) FROM sbtest1 WHERE id>=0;+----+-------------+---------+-------+---------------+------+---------+------+--------+--------------------------+| id | select_type | table   | type  | possible_keys | key  | key_len | ref  | rows   | Extra                    |+----+-------------+---------+-------+---------------+------+---------+------+--------+--------------------------+|  1 | SIMPLE      | sbtest1 | range | id            | id   | 8       | NULL | 500049 | Using where; Using index |+----+-------------+---------+-------+---------------+------+---------+------+--------+--------------------------+1 row in set (0.00 sec)SELECT COUNT(*) FROM sbtest1 WHERE id>=0;+----------+| COUNT(*) |+----------+|  1000000 |+----------+1 row in set (0.45 sec)

We can see that this query method is very fast.
All tests above are passed in mysql 5.1.24 environment, and mysqld is restarted before each query.
As you can see, after changing the length of aid and id, the use of secondary index query is still much faster than the use of primary key query. It seems that the index scanning speed is not caused by the field length, but the difference caused by the use of primary key and secondary index. So why is scanning with secondary index faster than scanning with primary key? What about innodb?Clustered index? AndSecondary index? .
The clustered index of innodb stores the primary key and row data together, while the secondary index stores the data separately, and then a pointer points to the primary key. Therefore, it is faster to use secondary index to scan the count (*) Statistical table records. Primary key is mainly used to scan indexes and return results records. For example:

SELECT * FROM sbtest WHERE aid = xxx;

Since secondary index is faster than primary key, why does the optimizer prioritize primary key scanning,Heikki Tuuri? The answer is:

in the example table, the secondary index is inserted into in a perfect order! That isvery unusual. Normally the secondary index would be fragmented, causing random disk I/O,and the scan would be slower than in the primary index.I am changing this to a feature request: keep 'clustering ratio' statistics on a secondaryindex and do the scan there if the order is almost the same as in the primary index. Idoubt this feature will ever be implemented, though.

For details, see this bug and this article: InnoDB Row Counting using Indexes.
Finally, I would like to thank Lao Yang for his help.

Technical problems :?

MySQL Optimization

InnoDB

Related Articles

  • About the ID in MySql explain
  • General mysql optimization steps (Tutorial)
  • [Basic knowledge of storage engine] six major differences between InnoDB and MyISAM
  • MySQL technical Insider: InnoDB Storage-3.6 InnoDB Storage engine files
  • Handler_read _ * of MySQL _*

Original article address: optimization of the innodb table count () in mysql. Thank you for sharing it with the original author.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.