Count () optimization in innodb tables in MySQL

Source: Internet
Author: User


Cause: It's too slow to make count (*) statistics on the InnoDB table, so find a way to see if it's going to be faster.

Phenomenon: First look at a few test cases, as follows

First, the test on the Sbtest table

Show CREATE TABLE Sbtest\g
1. Row ***************************
Table:sbtest
Create table:create Table ' sbtest ' (
' Aid ' bigint unsigned not NULL auto_increment,
' ID ' int (a) unsigned not NULL default ' 0 ',
' k ' int (a) unsigned not NULL default ' 0 ',
' C ' char (not NULL default '),
' Pad ' char ' not NULL default ',
PRIMARY KEY (' aid '),
KEY ' K ' (' K '),
KEY ' id ' (' ID ')
) Engine=innodb auto_increment=1000001 DEFAULT charset=latin1

Show index from Sbtest;
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+-- ----+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+-- ----+------------+---------+
|          Sbtest | 0 |            PRIMARY | 1 | Aid |     A |     1000099 | NULL |      NULL | |         Btree | |
|          Sbtest | 1 |            K | 1 | K |          A |     18 | NULL |      NULL | |         Btree | |
|          Sbtest | 1 |            ID | 1 | ID |     A |     1000099 | NULL |      NULL | |         Btree | |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+-- ----+------------+---------+
1 million records were filled in.

1, direct count (*)

Explain SELECT COUNT (*) from sbtest;
+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------------+
| ID | Select_type | table  | type  | Possible_keys | key     | Key_len | ref  | rows    | extra       |
+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------------+
|  1 | simple      | Sbtest | Index | null          | PRIMARY | 8       | NULL | 1000099 | Using Index |
+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------------+
SELECT COUNT (*) from sbtest;
+----------+
| COUNT (*) |
+----------+
|  1000000 |
+----------+
1 row in Set (1.42 sec)
You can see that if you do not add any conditions, then the optimizer takes primary K first EY to perform a scan.

2, COUNT (*) Use the primary key field to make the condition

Explain SELECT COUNT (*) from Sbtest WHERE aid>=0;
+----+-------------+--------+-------+---------------+---------+---------+------+--------+---------------------- ----+
| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |
+----+-------------+--------+-------+---------------+---------+---------+------+--------+---------------------- ----+
| 1 | Simple | Sbtest | Range | PRIMARY | PRIMARY | 8 | NULL | 485600 | The Using where; Using Index |
+----+-------------+--------+-------+---------------+---------+---------+------+--------+---------------------- ----+
SELECT COUNT (*) from Sbtest WHERE aid>=0;
+----------+
| COUNT (*) |
+----------+
| 1000000 |
+----------+
1 row in Set (1.39 sec)
As you can see, even though the optimizer thinks it only needs to scan 485,600 records (in fact, it's an index), it's a lot less than just now, but it still has to do a full table (index) scan. Therefore time consuming and the first kind is quite.

3, COUNT (*) Use the secondary index field to make the condition

Explain SELECT COUNT (*) from Sbtest WHERE id>=0;
+----+-------------+--------+-------+---------------+------+---------+------+--------+------------------------- -+
| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |
+----+-------------+--------+-------+---------------+------+---------+------+--------+------------------------- -+
| 1 | Simple | Sbtest | Range | ID | ID | 4 | NULL | 500049 | The Using where; Using Index |
+----+-------------+--------+-------+---------------+------+---------+------+--------+------------------------- -+
SELECT COUNT (*) from Sbtest WHERE id>=0;
+----------+
| COUNT (*) |
+----------+
| 1000000 |
+----------+
1 row in Set (0.43 sec)
As you can see, querying in this way can be very fast. One might ask, would it be because the length of the ID field is smaller than the length of the aid field, causing it to scan more quickly? Let's take a look at the following test examples before jumping to conclusions.

Second, the test on the Sbtest1 table

Show CREATE TABLE Sbtest1\g
1. Row ***************************
Table:sbtest1
Create table:create Table ' Sbtest1 ' (
' Aid ' int (a) unsigned not NULL auto_increment,
' ID ' bigint unsigned not NULL DEFAULT ' 0 ',
' k ' int (a) unsigned not NULL DEFAULT ' 0 ',
' C ' char (not NULL DEFAULT '),
' Pad ' char ' not NULL DEFAULT ',
PRIMARY KEY (' aid '),
KEY ' K ' (' K '),
KEY ' id ' (' ID ')
) Engine=innodb auto_increment=1000001 DEFAULT charset=latin1
Show index from Sbtest1;
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+- -----+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+- -----+------------+---------+
|          Sbtest1 | 0 |            PRIMARY | 1 | Aid |     A |     1000099 | NULL |      NULL | |         Btree | |
|          Sbtest1 | 1 |            K | 1 | K |          A |     18 | NULL |      NULL | |         Btree | |
|          Sbtest1 | 1 |            ID | 1 | ID |     A |     1000099 | NULL |      NULL | |         Btree | |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+- -----+------------+---------+

This table, the aid and ID of the field length of the exchange, but also filled 10 million records.

1, direct count (*)

Explain SELECT COUNT (*) from Sbtest1;
+----+-------------+---------+-------+---------------+---------+---------+------+---------+-------------+
| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+------+---------+-------------+
| 1 | Simple | Sbtest1 | Index | NULL | PRIMARY | 4 | NULL | 1000099 | Using Index |
+----+-------------+---------+-------+---------------+---------+---------+------+---------+-------------+
SELECT COUNT (*) from Sbtest1;
+----------+
| COUNT (*) |
+----------+
| 1000000 |
+----------+
1 row in Set (1.42 sec)

As you can see, if you do not add any conditions, then the optimizer takes the primary key first to scan.

2, COUNT (*) Use the primary key field to make the condition

Explain SELECT COUNT (*) from Sbtest1 WHERE aid>=0;
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------- -----+
| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------- -----+
| 1 | Simple | Sbtest1 | Range | PRIMARY | PRIMARY | 4 | NULL | 316200 | The Using where; Using Index |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+--------------------- -----+
1 row in Set (0.00 sec)
SELECT COUNT (*) from Sbtest1 WHERE aid>=0;
+----------+
| COUNT (*) |
+----------+
| 1000000 |
+----------+
1 row in Set (1.42 sec)

As you can see, even though the optimizer thinks it only needs to scan 485,600 records (in fact, it's an index), it's a lot less than just now, but it still has to do a full table (index) scan. Therefore time consuming and the first kind is quite.

3, COUNT (*) Use the secondary index field to make the condition

Explain SELECT COUNT (*) from Sbtest1 WHERE id>=0;
+----+-------------+---------+-------+---------------+------+---------+------+--------+------------------------ --+
| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |
+----+-------------+---------+-------+---------------+------+---------+------+--------+------------------------ --+
| 1 | Simple | Sbtest1 | Range | ID | ID | 8 | NULL | 500049 | The Using where; Using Index |
+----+-------------+---------+-------+---------------+------+---------+------+--------+------------------------ --+
1 row in Set (0.00 sec)
SELECT COUNT (*) from Sbtest1 WHERE id>=0;
+----------+
| COUNT (*) |
+----------+
| 1000000 |
+----------+
1 row in Set (0.45 sec)

As you can see, querying in this way can be very fast.

All of the tests above are passed in the MySQL 5.1.24 environment, and the mysqld is restarted before each query.

Can be seen, the length of the aid and ID exchange, the use of secondary index query is still more than using primary key query to come a lot faster. It seems that the main is not the length of the index scan caused by speed, but the use of primary key and secondary index caused by the difference. So why is it faster to use the secondary index scan than the primary key scan? We need to understand the difference between InnoDB's clustered index and secondary index.

InnoDB's clustered index is the primary key and row data are stored together, and secondary index is stored separately, and then there is a pointer to the primary key. As a result, the number of count (*) tab records is required to be scanned with secondary index, apparently faster. While primary key is mainly in the scan index, and to return the results of the record when the role of a larger, for example:

SELECT * from sbtest WHERE aid = XXX;

Since the use of secondary index will be faster than primary key, why the optimizer is preferred to select primary key to scan it, Heikki Tuuri answer is:

In the example table, the secondary index was inserted into a perfect order! The is
very unusual. Normally the secondary index would be fragmented, causing random disk I/O,
and the scan would to slower in the PR Imary index.
I am Changing this to a feature request:keep ' clustering ratio ' statistics on a secondary
index and do the scan There if is almost the same as in the primary index. I
Doubt this feature would ever be implemented, though.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.