Count () optimization in innodb tables in MySQL

Last Update:2017-01-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Cause: It's too slow to make count (*) statistics on the InnoDB table, so find a way to see if it's going to be faster.

Phenomenon: First look at a few test cases, as follows

First, the test on the Sbtest table

Show CREATE TABLE Sbtest\g
1. Row ***************************
Table:sbtest
Create table:create Table ' sbtest ' (
' Aid ' bigint unsigned not NULL auto_increment,
' ID ' int (a) unsigned not NULL default ' 0 ',
' k ' int (a) unsigned not NULL default ' 0 ',
' C ' char (not NULL default '),
' Pad ' char ' not NULL default ',
PRIMARY KEY (' aid '),
KEY ' K ' (' K '),
KEY ' id ' (' ID ')
) Engine=innodb auto_increment=1000001 DEFAULT charset=latin1

1, direct count (*)

2, COUNT (*) Use the primary key field to make the condition

3, COUNT (*) Use the secondary index field to make the condition

Explain SELECT COUNT (*) from Sbtest WHERE id>=0;
+----+-------------+--------+-------+---------------+------+---------+------+--------+------------------------- -+
| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |
+----+-------------+--------+-------+---------------+------+---------+------+--------+------------------------- -+
| 1 | Simple | Sbtest | Range | ID | ID | 4 | NULL | 500049 | The Using where; Using Index |
+----+-------------+--------+-------+---------------+------+---------+------+--------+------------------------- -+
SELECT COUNT (*) from Sbtest WHERE id>=0;
+----------+
| COUNT (*) |
+----------+
| 1000000 |
+----------+
1 row in Set (0.43 sec)
As you can see, querying in this way can be very fast. One might ask, would it be because the length of the ID field is smaller than the length of the aid field, causing it to scan more quickly? Let's take a look at the following test examples before jumping to conclusions.

Second, the test on the Sbtest1 table

Show CREATE TABLE Sbtest1\g
1. Row ***************************
Table:sbtest1
Create table:create Table ' Sbtest1 ' (
' Aid ' int (a) unsigned not NULL auto_increment,
' ID ' bigint unsigned not NULL DEFAULT ' 0 ',
' k ' int (a) unsigned not NULL DEFAULT ' 0 ',
' C ' char (not NULL DEFAULT '),
' Pad ' char ' not NULL DEFAULT ',
PRIMARY KEY (' aid '),
KEY ' K ' (' K '),
KEY ' id ' (' ID ')
) Engine=innodb auto_increment=1000001 DEFAULT charset=latin1
Show index from Sbtest1;
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+- -----+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+- -----+------------+---------+
|          Sbtest1 | 0 |            PRIMARY | 1 | Aid |     A |     1000099 | NULL |      NULL | |         Btree | |
|          Sbtest1 | 1 |            K | 1 | K |          A |     18 | NULL |      NULL | |         Btree | |
|          Sbtest1 | 1 |            ID | 1 | ID |     A |     1000099 | NULL |      NULL | |         Btree | |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+- -----+------------+---------+

This table, the aid and ID of the field length of the exchange, but also filled 10 million records.

1, direct count (*)

As you can see, if you do not add any conditions, then the optimizer takes the primary key first to scan.

2, COUNT (*) Use the primary key field to make the condition

As you can see, even though the optimizer thinks it only needs to scan 485,600 records (in fact, it's an index), it's a lot less than just now, but it still has to do a full table (index) scan. Therefore time consuming and the first kind is quite.

3, COUNT (*) Use the secondary index field to make the condition

Explain SELECT COUNT (*) from Sbtest1 WHERE id>=0;
+----+-------------+---------+-------+---------------+------+---------+------+--------+------------------------ --+
| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |
+----+-------------+---------+-------+---------------+------+---------+------+--------+------------------------ --+
| 1 | Simple | Sbtest1 | Range | ID | ID | 8 | NULL | 500049 | The Using where; Using Index |
+----+-------------+---------+-------+---------------+------+---------+------+--------+------------------------ --+
1 row in Set (0.00 sec)
SELECT COUNT (*) from Sbtest1 WHERE id>=0;
+----------+
| COUNT (*) |
+----------+
| 1000000 |
+----------+
1 row in Set (0.45 sec)

As you can see, querying in this way can be very fast.

All of the tests above are passed in the MySQL 5.1.24 environment, and the mysqld is restarted before each query.

Can be seen, the length of the aid and ID exchange, the use of secondary index query is still more than using primary key query to come a lot faster. It seems that the main is not the length of the index scan caused by speed, but the use of primary key and secondary index caused by the difference. So why is it faster to use the secondary index scan than the primary key scan? We need to understand the difference between InnoDB's clustered index and secondary index.

InnoDB's clustered index is the primary key and row data are stored together, and secondary index is stored separately, and then there is a pointer to the primary key. As a result, the number of count (*) tab records is required to be scanned with secondary index, apparently faster. While primary key is mainly in the scan index, and to return the results of the record when the role of a larger, for example:

SELECT * from sbtest WHERE aid = XXX;

Since the use of secondary index will be faster than primary key, why the optimizer is preferred to select primary key to scan it, Heikki Tuuri answer is:

In the example table, the secondary index was inserted into a perfect order! The is
very unusual. Normally the secondary index would be fragmented, causing random disk I/O,
and the scan would to slower in the PR Imary index.
I am Changing this to a feature request:keep ' clustering ratio ' statistics on a secondary
index and do the scan There if is almost the same as in the primary index. I
Doubt this feature would ever be implemented, though.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Count () optimization in innodb tables in MySQL

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Count () optimization in innodb tables in MySQL

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support