Summary of MySQL big table query optimization techniques and case study

Source: Internet
Author: User


SQL statements Use Basic principles: 1. To optimize the query, avoid full table scan as far as possible, and first consider indexing on the columns involved in where and order by. 2. You should try to avoid null values in the WHERE clause, otherwise it will cause the engine to discard full table scans using the index, such as: Select ID ...

SQL statements use Basic principles:

1. To optimize the query, avoid full-table scanning as far as possible, and first consider establishing an index on the columns involved in the Where and order by.

2. Avoid null-valued fields in the WHERE clause, which will cause the engine to discard full-table scans using the index, such as:

Select ID from t where num is null

You can set the default value of 0 on NUM, make sure that the NUM column in the table does not have a null value, and then query:

Select ID from t where num=0

3. Try to avoid using the! = or <> operator in the WHERE clause, or discard the engine for a full table scan using the index.

4. You should try to avoid using or in the WHERE clause to join the condition, otherwise it will cause the engine to abandon using the index for a full table scan, such as:

Select ID from t where num=10 or num=20

You can query this:

Select ID from t where num=10


Select ID from t where num=20 and not in should also be used with caution, otherwise it will result in full table scans, such as:

Select ID from t where num in

For consecutive values, you can use between instead of in:

Select ID from t where num between 1 and 3

6. The following query will also cause a full table scan:

Select ID from t where name like '%abc% '

To be more efficient, consider full-text indexing.

7. If you use a parameter in the WHERE clause, it also causes a full table scan. Because SQL resolves local variables only at run time, the optimizer cannot defer the selection of access plans to run time; it must be selected at compile time. However, if an access plan is established at compile time, the value of the variable is still unknown and therefore cannot be selected as an input for the index. The following statement will perform a full table scan:

Select ID from t where [email protected]

You can force the query to use the index instead:

Select ID from T with (index name) where [email protected]

8. You should try to avoid expression operations on the fields in the WHERE clause, which will cause the engine to discard the full table scan using the index. Such as:

Select ID from t where num/2=100

should read:

Select ID from t where num=100*2

9. You should try to avoid function operations on the fields in the WHERE clause, which will cause the engine to discard the full table scan using the index. Such as:

Select ID from t where substring (name,1,3) = ' abc '--name ID starting with ABC

Select ID from t where DATEDIFF (day,createdate, ' 2005-11-30 ') =0--' 2005-11-30 ' generated ID

should read:

Select ID from t where name like ' abc% '

Select ID from t where createdate>= ' 2005-11-30 ' and createdate< ' 2005-12-1 '

10. Do not perform functions, arithmetic operations, or other expression operations on the left side of "=" in the WHERE clause, or the index may not be used correctly by the system.

11. When using an indexed field as a condition, if the index is a composite index, you must use the first field in the index as a condition to guarantee that the system uses the index, otherwise the index will not be used, and the field order should be consistent with the index order as much as possible.

12. Do not write meaningless queries, such as the need to generate an empty table structure:

Select Col1,col2 into #t from T where 1=0

This type of code does not return any result sets, but consumes system resources and should be changed to this:


13. It is a good choice to replace in with exists in many cases:

Select num from a where num in (select num from B)

Replace with the following statement:

Select num from a where exists (select 1 from b where num=a.num)

14. Not all indexes are valid for queries, SQL is query-optimized based on data in the table, and when there is a large amount of data duplication in the index columns, SQL queries may not take advantage of the index, as there are fields in the table Sex,male, female almost half, So even if you build an index on sex, it doesn't work for query efficiency.

15. The index is not the more the better, although the index can improve the efficiency of the corresponding select, but also reduce the efficiency of insert and UPDATE, because the INSERT or update when the index may be rebuilt, so how to build the index needs careful consideration, depending on the situation. The number of indexes on a table should not be more than 6, if too many you should consider whether some of the indexes that are not commonly used are necessary.

16. You should avoid updating clustered index data columns as much as possible, because the order of the clustered index data columns is the physical storage order of the table records, which can consume considerable resources once the column values change to the order in which the entire table is recorded. If your application needs to update clustered index data columns frequently, you need to consider whether the index should be built as a clustered index.

17. Use numeric fields as much as possible, if the field containing only numeric information should not be designed as a character type, which will reduce the performance of queries and connections and increase storage overhead. This is because the engine compares each character in a string one at a time while processing queries and joins, and it is sufficient for a numeric type to be compared only once.

18. Use Varchar/nvarchar instead of Char/nchar as much as possible, because the first variable length field storage space is small, can save storage space, second, for the query, in a relatively small field in the search efficiency is obviously higher.

19. Do not use SELECT * from t anywhere, replace "*" with a specific field list, and do not return any fields that are not available.

20. Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, be aware that the index is very limited (only the primary key index).

21. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.

22. Temporary tables are not unusable, and they can be used appropriately to make certain routines more efficient, for example, when you need to repeatedly reference a dataset in a large table or a common table. However, for one-time events, it is best to use an export table.

23. When creating a temporary table, if you insert a large amount of data at one time, you can use SELECT INTO instead of CREATE table to avoid causing a large number of logs to increase speed, and if the amount of data is small, create table to mitigate the resources of the system tables. Then insert.

24. If a temporary table is used, be sure to explicitly delete all temporary tables at the end of the stored procedure, TRUNCATE table first, and then drop table, which avoids longer locking of the system tables.

25. Avoid using cursors as much as possible, because cursors are inefficient and should be considered for overwriting if the cursor is manipulating more than 10,000 rows of data.

26. Before using a cursor-based method or temporal table method, you should first look for a set-based solution to solve the problem, and the set-based approach is generally more efficient.

27. As with temporary tables, cursors are not unusable. Using Fast_forward cursors on small datasets is often preferable to other progressive processing methods, especially if you must reference several tables to obtain the required data. Routines that include "totals" in the result set are typically faster than using cursors. If development time permits, a cursor-based approach and a set-based approach can all be tried to see which method works better.

28. Set NOCOUNT on at the beginning of all stored procedures and triggers, set NOCOUNT OFF at the end. You do not need to send a DONE_IN_PROC message to the client after each statement that executes the stored procedure and trigger.

29. Try to avoid large transaction operation and improve the system concurrency ability.

30. Try to avoid the return of large data to the client, if the amount of data is too large, should consider whether the corresponding demand is reasonable.

Comprehensive optimization Case Reference 1:

Question: How do I design or optimize a big table for tens? In addition, there is no other information, the personal feel that this topic is a little fan, have to simply say how to do, for a storage design, must consider the business characteristics, the information collected are as follows:

1. Data capacity: 1-3 years will be about how many data, each piece of data about how many bytes;

2. Data item: Whether there is a large number of paragraphs, the values of those fields are often updated;

3. Data query SQL Condition: The column names of which data items often appear in the Where, GROUP by, and ORDER by clauses are medium;

4. Data Update class SQL Condition: How many columns often appear in the WHERE clause of update or delete;

5.SQL amount of statistical ratio, such as: select:update+delete:insert= how much?

6. What is the magnitude of the total daily execution of large tables and associated SQL?

7. Data in the table: Updated business or query-based business

8. What database physical server are you planning to use, and the database server architecture?

9. What about concurrency?

10. Does the storage engine choose InnoDB or MyISAM?

Generally understand the above 10 questions, as to how to design such a large table, should be clear of everything!

As for optimization refers to the creation of a good table, can not change the table structure, it is recommended InnoDB engine, use more memory, reduce the disk IO load, because IO is often the bottleneck of the database server in addition to optimize the index structure to solve performance problems, it is recommended to prioritize the modification of class SQL statements to make them faster, The only way to rely on the index organization structure, of course, is that the index has been created very good, if read-oriented, you can consider opening query_cache, and adjust some parameter values: Sort_buffer_size,read_buffer_size,read_rnd _buffer_size,join_buffer_size

Comprehensive optimization Case Reference 2:

My current company has three tables, is 500 million of the data, the daily increment is 100w per day, each table is about 10 columns, the following is my test and comparison:

1. First look at the engine, in the case of large data volume, in the case of no zoning, mysiam than InnoDB in the case of read-only, the efficiency is about 13% higher.

2. After doing the partition, you can read the official MySQL document, in fact, for partition, specifically for the MyISAM to do optimization, for InnoDB, all the data is there ibdata inside, so even if you can see the schema changed, In fact, there is no essential change. In the case of partitioning out of the same physical disk, the elevation is probably only 1%. In the partition under the different physical disk, I divide into three different disks under, Ascend probably in 3%, actually so-called throughput, is determined by many factors, such as your explain parition time can see, record in that one partition, if each partition has , in fact, does not solve the problem of reading, it will only improve the efficiency of writing. Another problem is that partitioning, how you divide, if a table, there are three column is often used to make query conditions, is actually a very sad thing, because you have no way to do the targeted partitioning of all SQL, if you just like the official MySQL file, only to do a partition of time, And you only use time to inquire, congratulations.

3. The table is mainly used for reading or writing, in fact, this problem is not sufficient, it should be asked, you write at the same time, concurrent query how much? My problem is relatively simple, because MongoDB shredding support can not, after crush, or back to MySQL, so under normal circumstances, 9am-9pm, write a lot of situations, this time I will do a view, View is based on the most recently inserted or frequently queried, by doing a view to separate the read, that is, written on the table, read in the logical judgment before the operation of the view.

4. Do some archive table, such as the first to do a lot of statistical analysis of these large tables, and then through the existing analysis + increment to solve.

5. If you use Mysiam, there is one more question you should pay attention to, if your. Configure, when you add a max index length parameter, when your record number is greater than the length of the set, the index will be disable.

Summary of MySQL big table query optimization techniques and case study

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.