How MySQL writes efficient SQL

Last Update:2016-04-14 Source: Internet

Author: User

Tags joins

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

recently, as requested by the team, the study collated the following MySQL-related optimizations, some of which are based on actual Java projects encountered in the context of experience. Welcome to the discussion ~

The SQL language is a powerful and flexible language that allows users to write a number of different SQL statements to obtain the same results when using the SQL language to execute a relational query. In other words, SQL statements with different syntax may be semantically identical. But while these SQL statements can return the same query results at the end of the day, the time it takes to execute them can vary considerably.

A Optimize your Goals

Reduce the number of IO

Io is always the most likely bottleneck in the database, which is determined by the responsibility of the database, most of the database operations more than 90% of the time is occupied by IO operations, reduce the number of IO is the first priority in SQL optimization, of course, is the most obvious optimization means.

Reduce CPU COMPUTE

In addition to the IO bottleneck, the optimization of the CPU is what needs to be considered in SQL optimization. Order BY, group By,distinct, and so on, are very CPU intensive (these operations are basically CPU-processed in-memory data comparison operations). When our IO optimization is done at a certain stage, reducing CPU computing becomes an important goal of our SQL optimization.

Two MySQL Statement execution order

The MySQL statement is divided into 11 steps, as noted, the first execution is always the from operation, and finally the limit operation is performed. Each of these operations produces a virtual table that acts as an input to the process.

The following is a detailed analysis of each stage of query processing.

form: The Cartesian product is computed on the left table of the From and the table on the right. Generating a virtual table VT1
on: The virtual table VT1 is filtered, only those rows that conform to <join-condition> are recorded in the virtual table VT2.
Join: If a outer join is specified (such as a LEFT JOIN or right join), rows that are not matched in the reserved table are added to the virtual table VT2 as an outer row, resulting in a virtual table VT3, if it contains more than two tables. Then the result of the previous join connection VT3 and the next table repeat steps three steps, until all the tables have been processed.
where: the Where condition is filtered on the virtual table VT3. Only records that match <where-condition> will be inserted into the virtual table VT4.
groupby: Groups the records in VT4 according to the columns in the GROUP BY clause, producing VT5.
CUBE | ROLLUP: A cube or ROLLUP operation is performed on the table VT5, resulting in a table VT6.
Having: Apply a having filter to the virtual table VT6, only records that match
Select: Performs a select operation, selects the specified column, and inserts into the virtual table VT8.
DISTINCT: Go to the record in the VT8. Generates virtual table VT9.
orderby: Sorts the records in the virtual table VT9, generating the virtual table VT10.
LIMIT: Takes a record of the specified row, generates a virtual table VT11, and returns the result.
1. Fair use Index

Some practices for writing efficient SQL

Advantages of using indexes:

① can greatly speed up the retrieval of data, which is the main reason for creating indexes.

② can speed up the connection between tables and tables.

③ can also significantly reduce the time to group and sort in queries when using grouping and sorting clauses for data retrieval.

Disadvantages of using indexes:

① It takes time to create indexes and maintain indexes, and this time increases as the amount of data increases.

② indexes need to occupy physical space, and each index takes up a certain amount of physical space in addition to the data tables that occupy the data space.

③ when the data in the table is added, deleted, and modified, the index is also maintained dynamically, thus reducing the maintenance speed of the data.

Guidelines for creating indexes:

① can speed up searches on columns that often need to be searched.

② are often used on connected columns, which are mainly foreign keys that can speed up the connection.

③ creates an index on a column that often needs to be searched by scope, because the index is already sorted and its specified range is contiguous.

④ creates an index on a column that is often ordered, because the index is sorted so that the query can take advantage of the sorting of the index to speed up the sort query time.

⑤ is often used to create indexes on the columns in the WHERE clause to speed up the judgment of the condition.

These columns that should not be indexed have the following characteristics:

① should not create indexes for those that are seldom used in queries. This is because, since these columns are seldom used, they are indexed or non-indexed and do not improve query speed. Conversely, by increasing the index, it reduces the system maintenance speed and increases the space requirement.

② should not increase the index for columns that have only a few data values. This is because the values of these columns are very small, and in the results of the query, the data rows of the result set occupy a large proportion of the data rows in the table, that is, the data rows that need to be searched in the table are large. Increasing the index does not significantly speed up the retrieval.

③ you should not create an index when the performance of the modification is far greater than the retrieval performance. This is because modifying performance and retrieving performance are conflicting. When you increase the index, the retrieval performance is improved, but the performance of the modification is reduced. When you reduce the index, you increase the performance of the modification and reduce the retrieval performance. Therefore, you should not create an index when the performance of the modification is far greater than the retrieval performance.

The following second and third sentence SQL, using the index and do not use the index, the query speed varies greatly. (indexes can be used after select, or in conditions after On,where)

SELECT * from Test1; --1.863 2.008 2.062 (2.6 million data)- - There is a select* error, actually not indexed select* than Select col speed comes fast

SELECT NAME from Test1; --4.989 5.001 4.855 (2.6 million data, name not indexed)

SELECT NAME from Test1; --1.227 1.476 1.403 (2.6 million data, name plus index)

Most relational databases are stored as rows (row), and data access operations are based on a fixed-size IO unit (called block or page), typically 4kb,8kb ... Most of the time, multiple rows are stored in each IO unit, and each row is all the fields that store the row (except for special types of fields such as lobs).

So, whether we are taking a field or multiple fields, the amount of data that the database needs to access in the table is actually the same.

Of course, there are exceptions, that is, our query can be done in the index, that is, when only a A, a, a, two fields, do not need to return to the table, and C This field is not used in the index, you need to return to the table to get its data. In such cases, the IO volume of the two will be significantly different.

Most of the time it does not affect the IO volume, but when an order by operation is still present, the number of fields in the SELECT clause affects the sorting efficiency to a large extent.

Filter as early as possible

In the Where condition, the more precise the condition (the condition that can eliminate most of the result) is put forward.

As another example, when paging data query in multiple tables, we'd better be able to filter the good data on a table, and then join with another table with the result set of the good page, so as to reduce unnecessary IO operations as much as possible, and greatly save the time of IO operation.

As few conditions as possible.

As in the following SQL, if col1=1,col2=2 in the Where condition is the simplest condition in which the results can be found accurately, then additional conditions for redundancy are not required.

SELECT * FROM table where col1=1 and col2=2 and col3=3

Transfer the pressure from the database to the background

The resources of the database are valuable and we can transfer unnecessary operations to Java-side processing. such as empty, field value stitching, string trim, date formatting and so on.

The following SQL can be processed by Java.

SELECT ifnull (SUM (col), 0) from table;

SELECT CONCAT (col, ' Hello World ') from table;

SELECT TRIM (col) from table;

SELECT Date_format (col, '%d%b%Y%t:%f ') from table;

Try to avoid returning large amounts of data to the client.

If the amount of data returned from the database is too large, it should be considered whether the demand is reasonable and whether it can be processed by paging and other methods.

Use UNION ALL instead of or (less, usually with or)

SELECT * from test1 WHERE name = ' test1_1 ' OR name = ' Test1_2 '; --3.179 2.902 2.806 (1 million data test)

SELECT * from test1 WHERE NAME = ' test1_1 '

UNION ALL

SELECT * from test1 WHERE NAME = ' test1_2 '; --2.335 2.485 2.432 (1 million data test)

You can use union all to not use union.

UNION all does not execute the SELECT DISTINCT function, which reduces a lot of unnecessary resources.

Try to avoid searching in indexed character data using non-heading letters

SELECT ID from table WHERE NAME is like '%l% ';

SELECT ID from table WHERE NAME is like ' l% ';

Even if the Name field is indexed, the first query cannot speed up the operation with the index, and the engine has to complete the task with one-by-one operation for all the data in the table. The second query can use an index to speed up operations.

Do not perform functions, arithmetic operations, or other expression operations on the left-hand side of the "=" in the WHERE clause

Any action on a column causes a table scan, which includes database functions, calculation expressions, and so on, to move the operation to the right of the equals sign whenever possible.

SELECT * from T1 WHERE f1/2=100

should read:

SELECT * from T1 WHERE f1=100*2

SELECT * from RECORD WHERE SUBSTRING (card_no,1,4) = ' 5378 '

should read:

SELECT * from RECORD WHERE card_no like ' 5,378% '

SELECT Member_number, first_name, last_name from members

WHERE DATEDIFF (Yy,datofbirth,now ()) > 21

should read:

SELECT Member_number, first_name, last_name from members

WHERE dateOfBirth < DATEADD (Yy,-21,now ())

Try to use join instead of subquery

While Join performance is poor, there is a significant performance advantage over MySQL subqueries. MySQL's sub-query execution plan has been a big problem, although this problem has existed for many years, but has been released by all the stable version of the widespread, has not been much improved. While the authorities have recognized this issue early and pledged to resolve it as soon as possible, at least we have not yet seen which version of the issue has been better solved.

Many times using exists is a good choice (in with exists efficiency in different scenes the efficiency is high and low, but not exists is more efficient than not in)

Select num from a where num in (select num from B);

Replace with the following statement:

Select num from a where exists (select 1 from b where num=a.num)

Sometimes it is necessary to do proper field redundancy in table design

For example, a table with a User ID field, if the user frequently needs to obtain a user name at the time of query, the user name can be stored in the table as a redundant field, so that the user name can be obtained without connection.

Requirement: The value of this redundant field is generally constant or rarely varies.

You can use GROUP by without distinct.

Using GROUP by to de-weight is more efficient than distinct.

SELECT DISTINCT OrderID from Details WHERE UnitPrice > 10

Replace with the following statement:

SELECT OrderID from Details WHERE UnitPrice > Ten GROUP by OrderID

When using an indexed field as a condition, if the index is a composite index, you must use the first field in the index as a condition to guarantee that the system uses the index, otherwise the index will not be used, and the field order should be consistent with the index order as much as possible.
Arrange the order of the columns in the Group by clause appropriately

Typically, a GROUP BY clause in an SQL statement causes the database to have to group data by using a sort operation, which is considered a CPU-and memory-intensive operation. In some cases, the sort operation can be avoided if the notation is correct. Specifically, when you write a GROUP by clause, you should take into account the existing indexes in the database. If all the columns in the GROUP BY clause are in the range of the key column of an index and are in the starting position, then when you write the GROUP BY clause, you should write the GROUP BY clause in the order of the keys on that index.

For example, the following SQL statement:

SELECT C2, C3, C1, AVG (C4)

From T1

GROUP by C2, C3, C1

In general,GROUP by C2, C3, C1 , and so on, will result in a sort operation of the database. But assuming an index IX1 (C1, C2, C3, C4) already exists on table T1, noting that the column referenced in the GROUP BY clause (C2,C3,C1) is exactly the first three keys in the index IX1, you can avoid this by changing the order of the columns in the GROUP BY clause. A sort operation.

The SQL statement can be rewritten as follows:

SELECT C1, C2, C3, AVG (C4)

From T1

GROUP by C1, C2, C3

By changing the order of the columns in the GROUP by clause so that it is consistent with the key order in the index IX1, the database can use IX1 to access its sorted key values and return directly to the next step, thereby avoiding additional sorting operations, which results in improved query performance.

Varchar/nvarchar and Char/nchar

Char/nchar is a fixed length, using Char/nchar efficiency is more efficient than Varchar/nvarchar if the character length in a field is known to be fixed.

If a field in a table stores a number, the field is designed as a numeric field rather than a character type, which reduces the performance of queries and connections and increases storage overhead. This is because the engine compares each character in a string one at a time while processing queries and joins, and it is sufficient for a numeric type to be compared only once.
COUNT (*) count (1) The efficiency of Count (ID)

Four Some misunderstanding

SELECT COUNT (*) from test1; --0.749 0.752 0.750

SELECT COUNT (1) from Test1; --1.578 0.900 1.455

SELECT COUNT (ID) from test1; --0.740 0.763 0.751

Many people use COUNT (1) and COUNT (*) instead of COUNT (*) in order to count the number of records, and they think this is a better performance, in fact this is a primary_key. For some scenarios, this might be worse because the database has some special optimizations for the COUNT (*) Count operation.

Count (column) and COUNT (*), COUNT (1) is the same

Count (column) and COUNT (*), COUNT (1) is a completely different operation, and the meaning of the representation is completely different.

Count (column) is a record that indicates how many column fields in the result set are not empty.

Count (*), COUNT (1) is the number of records that represent the entire result set.

You should try to avoid null values for a field in the WHERE clause

Myth: You should try to avoid the null value of the field in the WHERE clause, otherwise it will cause the database to abandon the use of the index for full table scan, reduce the query speed. You can set a default value on Col to make sure that the Col column in the table does not have a null value and then query.

The following results show that: in the case of non-index, the empty operation is much faster than the default value, and the index is similar in two cases.

SELECT ID from table where col is null–-1.549 1.884 1.818 (table Col joins index, 2.6 million data)

SELECT ID from table where col is null–-2.003 1.963 1.975 (table col not indexed, 2.6 million data)

SELECT ID from table where col = ' 0 '-–1.626 1.854 1.532 (table Col joins index, 2.6 million data)

SELECT ID from table where col = ' 0 '-–4.407 5.256 5.434 (table col not indexed, 2.6 million data)

Order by must have a sort operation

The index data is actually orderly, if we need the data and an index of the same order, and our query through the index to execute, then the database will generally omit the sort operation, and directly return the data, because the database knows that the data has satisfied our sorting requirements.

In fact, using the index to optimize SQL with ordered requirements is a very important optimization method.

These are the results I've done in Word, and I actually did it in MySQL. Here are the additions and summaries:

a Mysql database names and table names are case-insensitive in Windows, and are case-sensitive in most types of unix/linux systems.

MySQL case sensitivity can be controlled by the Lower_case_table_names parameter of the configuration file.

Windows:

Edit the My.ini file under the MySQL installation directory, add lower_case_table_names=0 under the [Mysqld] section (note: Case sensitive for 0 o'clock, not sensitive for 1 o'clock case, default is 1), You can implement MySQL to define table names according to the case state of the SQL statement that was built.

Linux:

Edit the/etc/my.cnf file, add the Lower_case_table_names parameter under the [Mysqld] section, and set the appropriate value (note: 0 o'clock case sensitive, 1 o'clock case insensitive, default = 0)

Storage of two databases----table structure, table data, index

Three indexes facilitate query speed, but are not conducive to deletion and modification, so those frequently repaired and updated fields should not be indexed

Four-percent like query, not in,not exist cannot apply index

Five uses the "aggregate function" query, also cannot apply the index; Compound index, usually look at the first condition index

The six queries try to use a specific field instead of a direct select *; aggregate functions do not use a large number of data query filtering;

COUNT (*)---will be null and empty columns are counted.

Count (column)---only count non-empty fields

PS: Found in mysql5.6 count (*) faster than COUNT (1) and Count (column)?

Seven query conditions as few as possible, the more accurate conditions, should be placed in front, first filter out big data, you can use multiple SQL statements to achieve a query results, improve query performance.

Eight MySQL rarely uses a "stored procedure" similar to that used in SQL Server to implement business logic (stored procedures are not the advantage of MySQL); it is more of a simple query, and the complex query business, directly in the program code implementation, pressure escape ( There are a lot of good ORM frameworks in Java. MySQL is not as powerful as SQL Server and Oracle, but it is open source free, can deploy multiple SQL Servers, realize the sub-Library, cluster, so as to achieve the volume of performance.

Nine InnoDB, one of MySQL's database engines, publishes one of the binary standards for MySQL AB. InnoDB was developed by Innobase Oy Company and was acquired by Oracle Corporation in May 2006. Compared with the traditional ISAM and MyISAM, InnoDB's greatest feature is the support for acid-compatible transaction (Transaction) functions, similar to PostgreSQL.

How MySQL writes efficient SQL

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More