Database optimization and SQL optimization summary, SQL optimization Summary

Last Update:2017-08-19 Source: Internet

Author: User

Tags createindex

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Database optimization and SQL optimization summary, SQL optimization Summary
Database Optimization Methods1. select the most suitable field attribute

MySQL can support access to large data volumes, but generally, the smaller the table in the database, the faster the query will be executed on it. Therefore, when creating a table, we can set the field width in the table as small as possible to achieve better performance.

For example, if you set it to CHAR (255) when defining the zip code field, it is obvious that unnecessary space is added to the database, and even the VARCHAR type is redundant, because CHAR (6) can well complete the task. Similarly, if possible, we should use MEDIUMINT instead of BIGIN to define integer fields.

Another way to improve efficiency is to set the field to NOTNULL whenever possible, so that the database does not need to compare NULL values during future queries. Some text fields, such as "Province" or "gender", can be defined as ENUM. In MySQL, The ENUM type is processed as the numeric data, and the numeric data is processed much faster than the text type. In this way, we can improve the database performance.

2. Use JOIN instead of Sub-Queries)

MySQL supports SQL subqueries from 4.1. This technique can use the SELECT statement to create a single column query result, and then use this result as a filter condition in another query. For example, if you want to delete a customer who has no orders in the basic customer information table, you can use the subquery to retrieve the customer IDs of all orders from the sales information table, then pass the result to the primary query, as shown below:

DELETEFROMcustomerinfo

WHERECustomerIDNOTin (SELECTCustomerIDFROMsalesinfo)

Subqueries can be used to complete SQL operations that require multiple logical steps at a time. At the same time, transactions or tables can be prevented from being locked and can be easily written. However, in some cases, subqueries can be replaced by more efficient JOIN. For example, if we want to retrieve all users without order records, we can use the following query:

SELECT * FROMcustomerinfo

WHERECustomerIDNOTin (SELECTCustomerIDFROMsalesinfo)

If you use JOIN... to complete this query, the speed will be much faster. Especially when the salesinfo table has an index on CustomerID, the performance will be better. The query is as follows:

SELECT * FROMcustomerinfo

LEFTJOINsalesinfoONcustomerinfo. CustomerID = salesinfo. CustomerID

WHEREsalesinfo. CustomerIDISNULL

JOIN... it is more efficient because MySQL does not need to create a temporary table in the memory to perform the query in two steps.

3. Use UNION instead of creating a temporary table manually

MySQL 4.0 and later versions support union queries. It can merge two or more select queries in a temporary table. When the query Session on the client ends, the temporary table is automatically deleted to ensure the database is neat and efficient. When using union to create a query, we only need to use UNION as the keyword to connect multiple select statements. Note that the number of fields in all select statements must be the same. The following example demonstrates a query using UNION.

SELECTName, PhoneFROMclientUNION

SELECTName, BirthDateFROMauthorUNION

SELECTName, SupplierFROMproduct

4. Transactions

Although we can use Sub-Queries, JOIN, and UNION to create various Queries, however, not all database operations can be completed with only one or a few SQL statements. In more cases, a series of statements are needed to complete some work. However, in this case, when a statement in the statement block fails to run, the operation of the entire statement block becomes uncertain. Imagine that you want to insert data into two associated tables at the same time. This may happen: after the first table is successfully updated, the database suddenly experiences an unexpected situation, as a result, the operations in the second table are incomplete, which may damage the data in the database. To avoid this situation, you should use a transaction. Its function is to either operate successfully or fail each statement in the statement block. In other words, data consistency and integrity can be maintained in the database. The transaction starts with the BEGIN keyword and ends with the COMMIT keyword. If an SQL operation fails, the ROLLBACK command can restore the database to the State before the start of in.

BEGIN; INSERTINTOsalesinfoSETCustomerID = 14; UPDATEinventorySETQuantity = 11 WHEREitem = 'book'; COMMIT;

Another important role of transactions is that when multiple users use the same data source at the same time, they can use the database locking method to provide users with a safe access method, this ensures that user operations are not affected by other users.

5. Lock the table

Although transactions are a very good way to maintain database integrity, they sometimes affect database performance, especially in large application systems. Because the database will be locked during transaction execution, other user requests can only wait until the transaction ends. If a database system is used by only a few users, the impact of transactions will not become a big problem. However, assuming that thousands of users access a database system at the same time, for example, accessing an e-commerce website may cause serious response latency.

In fact, in some cases, we can lock the table to achieve better performance. The following example uses the table locking method to complete the transaction function in the previous example.

LOCKTABLEinventoryWRITESELECTQuantityFROMinventoryWHEREItem = 'book ';

...

UPDATEinventorySETQuantity = 11 WHEREItem = 'book'; UNLOCKTABLES

Here, we use a select statement to retrieve the initial data. Through some calculations, we use the update statement to update the new value to the table. The LOCKTABLE statement containing the WRITE keyword ensures that no other access is allowed to insert, update, or delete the inventory before the UNLOCKTABLES command is executed.

6. Use foreign keys

Locking a table can maintain data integrity, but it cannot guarantee data relevance. In this case, we can use the foreign key.

For example, a foreign key can ensure that each sales record points to an existing customer. Here, the foreign key can map the CustomerID in the customerinfo table to the CustomerID in the salesinfo table. No record without a valid CustomerID will be updated or inserted into the salesinfo table.

CREATETABLEcustomerinfo (CustomerIDINTNOTNULL, PRIMARYKEY (CustomerID) TYPE = INNODB;

CREATETABLEsalesinfo (SalesIDINTNOTNULL, CustomerIDINTNOTNULL,

PRIMARYKEY (CustomerID, SalesID ),

FOREIGNKEY (CustomerID) REFERENCEScustomerinfo (CustomerID) ONDELETECASCADE) TYPE = INNODB;

Note the parameter "ONDELETECASCADE" in the example ". This parameter ensures that when a customer record in the customerinfo table is deleted, all records related to this customer in the salesinfo table will also be deleted automatically. If you want to use foreign keys in MySQL, remember to define the table type as the InnoDB type of the Transaction Security table when creating the table. This type is not the default MySQL table type. The defined method is to add TYPE = innodb to the createtable statement. As shown in the example.

7. Use Indexes

Index is a common method to improve database performance. It allows database servers to retrieve specific rows at a much faster speed than no index, especially when a query statement contains MAX (), when using the MIN () and ORDERBY commands, the performance improvement is more obvious.

Which fields should be indexed?

In general, the index should be built on the fields that will be used for JOIN, WHERE judgment and ORDERBY sorting. Try not to index a field in the database that contains a large number of repeated values. For an ENUM field, it is very likely that a large number of repeated Values exist.

For example, the "province"... field in customerinfo does not help to create an index on this field. On the contrary, it may also reduce the performance of the database. When creating a table, we can create an appropriate index at the same time, or use ALTERTABLE or CREATEINDEX to create an index later. MySQL supports full-text indexing and search from version 3.23.23. Full-text index is a FULLTEXT index in MySQL, but it can only be used for tables of the MyISAM type. For a large database, it is very fast to load data to a table without FULLTEXT indexes and then create indexes using ALTERTABLE or CREATEINDEX. However, if you load data to a table with a FULLTEXT index, the execution will be very slow.

8. Optimized query statements

In most cases, using indexes can increase the query speed. However, if an SQL statement is improperly used, indexes cannot play its due role.

The following are some notes.

First, it is best to compare fields of the same type..

Before MySQL 3.23, this is even a required condition. For example, you cannot compare an indexed INT field with a BIGINT field. However, in special cases, when a CHAR field is of the same size as a VARCHAR field, you can compare them..

Second, do not use functions to operate indexed fields.

For example, when the YEAE () function is used on a DATE field, the index cannot play its due role. Therefore, although the two queries below return the same results, the latter is much faster than the former.

Third, when searching for a struct field, we sometimes use the LIKE keyword and wildcard. Although this method is simple, it is at the cost of system performance.

For example, the following query will compare each record in the table.

SELECT * FROMbooks

WHEREnamelike "MySQL %"

However, if the following query is used, the returned results are the same, but the speed is much faster:

SELECT * FROMbooks

WHEREname> = "MySQL" andname <"MySQM"

Finally, you should note that you should avoid making MySQL automatically convert the data type during the query, because the conversion process also makes the index ineffective.

How to optimize SQL statements First, note the following for the where statement:

1. Try to avoid null value determination on the field in the where clause. Otherwise, the engine will discard the index and perform full table scanning, for example:
Select id from t where num is null
You can set the default value 0 on num to make sure that the num column in the table does not have a null value, and then query it like this:
Select id from t where num = 0

2. Try to avoid using it in the where clause! = Or <> operator. Otherwise, the engine will discard the index for full table scanning.

3. Try to avoid using or in the where clause to connect to the condition. Otherwise, the engine will discard the index and perform full table scanning, for example:
Select id from t where num = 10 or num = 20
You can query it as follows:
Select id from t where num = 10
Union all
Select id from t where num = 20

4. The following query will also cause a full table scan:
Select id from t where name like '% abc %'
To improve efficiency, you can consider full-text search.

5. Use in and not in with caution. Otherwise, a full table scan may occur, for example:
Select id from t where num in (1, 2, 3)
For continuous values, you can use between instead of in:
Select id from t where num between 1 and 3

6. Avoid performing expression operations on fields in the where clause as much as possible, which will cause the engine to discard the use of indexes for full table scanning. For example:
Select id from t where num/2 = 100
Should be changed:
Select id from t where num = 100*2

7. If a parameter is used in the where clause, a full table scan is performed. Because SQL parses local variables only at runtime, the optimizer cannot postpone the selection of the access plan to runtime; it must be selected at compilation. However, if an access plan is created during compilation, the value of the variable is still unknown and thus cannot be used as an input for index selection. The following statement performs a full table scan:
Select id from t where num = @ num
You can change it to force query to use the index:
Select id from t with (index name) where num = @ num

8. Avoid performing function operations on fields in the where clause as much as possible. This will cause the engine to stop using the index for full table scanning. For example:
Select id from t where substring (name, 1, 3) = 'abc' -- id whose name starts with abc
Select id from t where datediff (day, createdate, '2017-11-30 ') = 0 -- '2017-11-30' generated id
Should be changed:
Select id from t where name like 'abc %'
Select id from t where createdate> = '2014-11-30 'and createdate <'2014-12-1'

9. Do not perform functions, arithmetic operations, or other expression operations on the left side of "=" in the where clause. Otherwise, the system may not be able to correctly use the index.

Other optimization considerations:

1. To optimize the query, try to avoid full table scanning. First, consider creating an index on the columns involved in where and order.

2. when using an index field as a condition, if the index is a composite index, you must use the first field in the index as the condition to ensure that the system uses the index, otherwise, the index will not be used, and the field order should be consistent with the index order as much as possible.

3. Do not write meaningless query statements. For example, you need to generate an empty table structure:
Select col1, col2 into # t from t where 1 = 0
This type of code will not return any result set, but will consume system resources. You can change it:
Create table # t (...)

4. For tables with multiple large data volumes to JOIN, you need to perform the JOIN operation by page. Otherwise, the logical read operation will be high and the performance will be poor.

5. Do not use select * from t anywhere, replace "*" with a specific field list, and do not return any fields that are not used.

6. use numeric fields whenever possible. If fields containing only numerical information are not designed as numeric fields, this will reduce query and connection performance and increase storage overhead. This is because the engine compares each character in the string one by one during query and connection processing, and only one comparison is required for the number type.

7. The more indexes, the better. Although the index can improve the efficiency of the select statement, it also reduces the efficiency of insert and update. Because the insert or update statement may re-create the index, You need to carefully consider how to create the index, depends on the actual situation. It is recommended that the number of indexes in a table be no more than six. If there are too many indexes, consider whether the indexes on some columns that are not frequently used are necessary.

8. In the Update statement, if only one or two fields are modified, do not Update all fields. Otherwise, frequent calls may cause significant performance consumption and a large number of logs.

9. try to use varchar/nvarchar instead of char/nchar, because the first step is to reduce the storage space of the variable-length field, which can save storage space. Secondly, for queries, searching in a relatively small field is obviously more efficient.

10. use numeric fields whenever possible. If fields containing only numerical information are not designed as numeric fields, this will reduce query and connection performance and increase storage overhead. This is because the engine compares each character in the string one by one during query and connection processing, and only one comparison is required for the number type.

11. update the clustered index data column should be avoided as much as possible, because the order of the clustered index data column is the physical storage order of the table records. Once the column value changes, the order of the entire table record will be adjusted, it will consume a considerable amount of resources. If the application system needs to frequently update the clustered index data column, consider whether to create the index as a clustered index.

12. when creating a temporary table, if a large amount of data is inserted at one time, you can use select into instead of create table to avoid creating a large number of logs to increase the speed. If the data volume is small, to ease system table resources, create table first and then insert.

13. Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, note that the index is very limited (only the primary key index ).

14. when creating a temporary table, if a large amount of data is inserted at one time, you can use select into instead of create table to avoid creating a large number of logs to increase the speed. If the data volume is small, to ease system table resources, create table first and then insert.

15. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources. Temporary tables are not unavailable. Using them appropriately can make some routines more effective. For example, when you need to repeatedly reference a large table or a data set in a common table. However, it is best to use the export table for one-time events.

16. if a temporary table is used, you must explicitly delete all temporary tables at the end of the stored procedure. First truncate the table and then drop the table, so that the system table can be locked for a long time.

17. set nocount on at the beginning of all stored procedures and triggers, and set nocount off at the end. You do not need to send the DONE_IN_PROC message to the client after executing each statement of the stored procedure and trigger.

18. before using the cursor-based or temporary table method, you should first find a set-based solution to solve the problem. The set-based method is generally more effective.

19. Like a temporary table, the cursor is not unavailable. Using a FAST_FORWARD cursor for a small dataset is usually better than other row-by-row processing methods, especially when several tables must be referenced to obtain the required data. A routine that includes "sum" in the result set is usually faster than a cursor. If this is allowed during development, you can try both the cursor-based method and the set-based method to see which method works better.

20. Avoid returning a large amount of data to the client as much as possible. If the data volume is too large, consider whether the corresponding requirements are reasonable.

21. Avoid large transaction operations as much as possible and improve the system concurrency capability.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More