Optimization of the count (*) function in the InnoDB Storage engine of MySQL

Source: Internet
Author: User
In MySQL, two storage engines, MyISAM and InnoDB, are commonly used in daily development. One difference between the two is that the count (*) function is used to calculate the specific number of rows in a table.

In MySQL, two storage engines, MyISAM and InnoDB, are commonly used in daily development. One difference between the two is that the count (*) function is used to calculate the specific number of rows in a table.

I have read a lot of database Optimization content before writing this article. Most of the content is indexed, transactions used, select, and so on. However, there is little practice at the reading level, because there is no real project, and everything is on paper. Practice is the only criterion for testing truth, so I want to test some performance optimization solutions on the database, such as indexes, but I don't want to use false data, so I thought about whether I could capture some data on the Internet for analysis. Later I crawled some data through PHP (this blog post will be completed soon) and captured about of user data, of course, you need to calculate the specific quantity, so I used the following SQL statements (my storage engine is InnoDB ):

Select count (*) FROM tbl_name;

However, we can see the result only after 14-20 s.

The user experience of such time overhead in the real environment is very poor. Imagine that opening a page will have to wait for 20 s to see the data, let alone 20 s, even 3 S is very poor, so I want to optimize it.

Storage Engine

In MySQL, two storage engines, MyISAM and InnoDB, are commonly used in daily development. One difference between the two is that the count (*) function is used to calculate the specific number of rows in a table.

Because MyISAM stores the specific number of rows in the table, this code is executed in the MyISAM storage engine. MyISAM simply needs to read the number of rows saved. Therefore, if transactions and other operations are not used in the table, this is the best optimization solution. However, the InnoDB Storage engine does not store the specific number of rows in the Table. Therefore, when executing this code in the InnoDB Storage engine, InnoDB needs to scan the entire table to calculate the number of rows.

Query Optimization command -- Explain

To understand the query performance, you must first know the bottleneck that causes slow queries. The explain command shows that the rows are the core performance indicators. A large number of rows indicates that mysql requires a large number of lines to be scanned, and most of the large rows statements must be executed quickly. Therefore, optimization statements are basically optimizing rows.

First, let's look at the preceding statement:

As you can see, mysql scans the entire table to execute this query.

Strange

In the design of a data table, I added a unique index, but then I added a statement to count the number of fields. At that time, I added a common index, when I run the preceding SQL statement again, I find that the number of rows in the table can be counted in 0.2-seconds.

I was shocked and found the optimization method by mistake: In InnoDB, apart from the unique index, a common index (called secondary index) is added to other fields) to improve the performance of the count (*) function. But why? Explain:

The same number of rows are scanned. Why does adding a common index improve the performance? So I began to find the materials and read the documents to understand the problem.

Count (*) function execution Principle

As in different storage engines, the execution of the count (*) function is different. In the MyISAM storage engine, the count (*) function directly reads the number of row records stored in the data table and returns the results. In the InnoDB Storage engine, count (*) the function first reads data from the table in the memory to the memory buffer, and then scans the entire table to obtain the number of Row Records. When the where condition is added to the count function, the results of the two storage engines are the same. The full table is scanned to calculate the number of times a field has a value.

Indexing principles

Because the performance is improved only after the index is added, we want to explore it from the index perspective.

According to the definition in the official document, indexes are data structures that help MySQL efficiently obtain data. We can know that the essence of an index is the data structure. The purpose of adding an index is to improve the query efficiency.

The index query can be analogous to the dictionary. If you want to query the word "mysql", we will first locate the m letter, then find the y letter in the word below the m letter, and so on, until the mysql word is found, you can see its page number, and then go to the page to get more information about the word. Imagine that if there is no index, you have to read one page in the dictionary, which is very inefficient. The index is used to continuously narrow the query range to filter the final results.

The same is true for databases, but it is much more complicated to use indexes in databases.

Disk access and pre-read

In general, the index itself is also very large and cannot be fully stored in the memory. Therefore, the index is often stored on the disk as an index file. Then the database needs to read data from the disk when building the index, which will produce disk I/O consumption. Each data read process involves three parts: track time, rotation delay, and transmission time. Track seeking time refers to the time required for the magnetic arm to move to the specified track, generally within 5 ms; rotation delay is the disk speed; transmission time refers to the time when data is read from the disk and written to the memory. This time is short and negligible. Compared with memory access, I/O access consumes several orders of magnitude. Therefore, evaluating a data structure as the index's most important indicator is the progressive complexity of the number of disk I/O operations during the search process. In other words, the structure of the index should minimize the number of disk I/O accesses during the search process.

From the above description, we can see that disk I/O is a very high operation, according to the local principle of the operating system:

When a data is used, the data nearby it is usually used immediately.

The computer operating system has made some optimizations in this regard. When I/O is performed, not only does the data of the current disk address be read to the memory buffer zone, and read the adjacent data to the memory buffer. In this way, I/O generated when reading data is much less. In the database, each time I/O reads data, we call it a page, which is generally 4 k or 8 k. That is to say, when we read data on a page, i/O occurs only once.

Based on the above description, we can draw a preliminary conclusion that the performance gap before and after the index is increased is reflected in the disk read process. However, before adding a new index, I added a unique index and found it in mysql. The unique index I added was called a clustered index, the indexes added later are called secondary indexes. Therefore, let's look at the differences between clustered indexes and secondary indexes.

Clustering index and secondary index)

Each table under the InnoDB Storage engine has a special index used to store the data of each row, which is called a clustered index. Generally, clustering indexes are synonyms of primary keys. In InnoDB, mysql chooses clustering index as follows:

If the table defines the primary key, InnoDB uses it as the clustered index;

Otherwise, if the primary key is NOT defined, InnoDB selects the first unique index with the not null constraint as the primary key, and InnoDB uses it as the clustered index;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.