SQL optimization tips and guidelines
To optimize queries and avoid full table scanning, you should first consider creating an index on the columns involved in where and order.
Try to avoid null value determination on the field in the where clause. Otherwise, the engine will discard the index and perform full table scan.
select id from t where num is null
It is best NOT to leave NULL for the database and try to use not null to fill the database.
Remarks, descriptions, comments, and so on can be set to NULL. Otherwise, it is best not to use NULL.
Do not think that NULL does not require space, for example, char (100) type. When a field is created, the space is fixed, regardless of whether or not the inserted value (NULL is also included ), all occupy 100 characters of space. If it is a variable-length field such as varchar, null does not occupy space.
You can set the default value 0 on num to make sure that the num column in the table does not have a null value, and then query it like this:
select id from t where num = 0
Avoid using the where clause whenever possible! = Or <> operator. Otherwise, the engine will discard the index for full table scanning.
Try to avoid using or in the where clause to connect conditions. If a field has an index and a field does not have an index, the engine will discard the index and perform a full table scan.
Select id from t where num = 10 or Name = 'admin'
You can query it as follows:
select id from t where num = 10union allselect id from t where Name = 'admin'
Use in and not in with caution. Otherwise, full table scan may occur.
select id from t where num in(1,2,3)
For continuous values, you can use between instead of in:
select id from t where num between 1 and 3
In many cases, replacing in with exists is a good choice:
select num from a where num in(select num from b)
Replace the following statement:
select num from a where exists(select 1 from b where num=a.num)
The following query will also cause a full table scan.
select id from t where name like ‘%abc%'
To improve efficiency, you can consider full-text search.
If a parameter is used in the where clause, a full table scan is performed. Because SQL parses local variables only at runtime, the optimizer cannot postpone the selection of the access plan to runtime; it must be selected at compilation. However, if an access plan is created during compilation, the value of the variable is still unknown and thus cannot be used as an input for index selection. The following statement performs a full table scan:
select id from t where num = @num
You can change it to force query to use the index:
Select id from t with (index name) where num = @ num
Avoid performing expression operations on fields in the where clause whenever possible, which will cause the engine to discard the use of indexes for full table scanning. For example:
select id from t where num/2 = 100
Should be changed:
select id from t where num = 100*2
Avoid performing function operations on fields in the where clause whenever possible, which will cause the engine to discard the use of indexes for full table scanning. For example:
Select id from t where substring (name, 1, 3) = 'abc' -- id whose name starts with abc
Select id from t where datediff (day, createdate, '2017-11-30 ') = 0 -- '2017-11-30' -- generated id
Should be changed:
select id from t where name like 'abc%'select id from t where createdate >= '2005-11-30' and createdate < '2005-12-1'
Do not perform functions, arithmetic operations, or other expression operations on the left side of "=" in the where clause. Otherwise, the system may not be able to correctly use the index.
When using an index field as a condition, if the index is a composite index, you must use the first field in the index as the condition to ensure that the system uses the index, otherwise, the index will not be used, and the field order should be consistent with the index order as much as possible.
Do not write meaningless queries. If you need to generate an empty table structure:
Select col1, col2 into # t from t where 1 = 0
This type of code will not return any result set, but will consume system resources, should be changed to this:
Create table # t (...)
Update statement. If only one or two fields are modified, do not Update all fields. Otherwise, frequent calls may cause significant performance consumption and a large number of logs.
For JOIN operations on tables with large data volumes (hundreds of rows are larger), you must perform the JOIN operation by page. Otherwise, the logical read operation will be high and the performance will be poor.
Select count (*) from table; this way, the count without any conditions will cause a full table scan without any business significance, so it must be eliminated.
The more the index is, the better it is. Although the index can improve the efficiency of the select statement, it also reduces the efficiency of insert and update, because the insert or update statements may recreate the index, therefore, you need to carefully consider how to create an index, depending on the actual situation. It is recommended that the number of indexes in a table be no more than 6. If there are too many indexes, consider whether the indexes on some columns that are not frequently used are necessary.
Update the clustered index data column should be avoided as much as possible, because the order of the clustered index data column is the physical storage order of the table records. Once the column value changes, the order of the entire table record will be adjusted, it will consume a considerable amount of resources. If the application system needs to frequently update the clustered index data column, consider whether to create the index as a clustered index.
Use numeric fields whenever possible. If fields containing only numerical information are not designed as numeric fields, this will reduce query and connection performance and increase storage overhead. This is because the engine compares each character in the string one by one during query and connection processing, and only one comparison is required for the number type.
Try to use varchar/nvarchar instead of char/nchar, because the first step is to reduce the storage space of the variable-length field, which can save storage space. Secondly, for queries, searching in a relatively small field is obviously more efficient.
Do not use select * from t anywhere, replace "*" with a specific field list, and do not return any fields that are not used.
Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, note that the index is very limited (only the primary key index ).
Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources. Temporary tables are not unavailable. Using them appropriately can make some routines more effective. For example, when you need to repeatedly reference a large table or a data set in a common table. However, it is best to use the export table for one-time events.
When creating a temporary table, if a large amount of data is inserted at one time, you can use select into instead of create table to avoid creating a large number of logs to increase the speed. If the data volume is small, to ease system table resources, create table first and then insert.
If a temporary table is used, you must explicitly delete all temporary tables at the end of the stored procedure. First truncate the table and then drop the table, so that the system table can be locked for a long time.
Avoid using a cursor as much as possible because the efficiency of the cursor is poor. If the cursor operation has more than 10 thousand rows of data, you should consider rewriting.
Before using the cursor-based or temporary table method, you should first find a set-based solution to solve the problem. The set-based method is generally more effective.
Like a temporary table, a cursor is not unavailable. Using a FAST_FORWARD cursor for a small dataset is usually better than other row-by-row processing methods, especially when several tables must be referenced to obtain the required data. A routine that includes "sum" in the result set is usually faster than a cursor. If this is allowed during development, you can try both the cursor-based method and the set-based method to see which method works better.
Set nocount on at the beginning of all stored procedures and triggers, and set nocount off at the end. You do not need to send the DONE_IN_PROC message to the client after executing each statement of the stored procedure and trigger.
Avoid large transaction operations and improve system concurrency.
Avoid returning large data volumes to the client whenever possible. If the data volume is too large, consider whether the appropriate requirements are reasonable.
Case study: Split large DELETE or INSERT statements and submit SQL statements in batches
If you need to execute a large DELETE or INSERT query on an online website, you need to be very careful to avoid your operations to stop the entire website. Because these two operations lock the table, once the table is locked, other operations cannot be performed.
Apache has many sub-processes or threads. Therefore, it works very efficiently, and our server does not want to have too many sub-processes, threads, and database connections, which greatly occupy server resources, especially memory.
If you lock your table for a period of time, such as 30 seconds, for a site with high access traffic, the access process/thread and database link accumulated over the past 30 seconds, the number of opened files may not only cause your WEB service to crash, but also cause your entire server to crash immediately.
Therefore, if you have a large processing, you must split it. Using the LIMIT oracle (rownum) and SQL Server (top) conditions is a good method. The following is an example of mysql:
While (1) {// only 1000 pieces of mysql_query ("delete from logs where log_date <= '2017-11-01 'limit 2012"); if (mysql_affected_rows () = 0) {// deletion completed. Exit! Break;} // pause each time for a period of time. Release the table to allow access by other processes/threads. Usleep (50000 )}
How can I optimize SQL statements?
You can write executable SQL statements in the database application system in multiple ways, but it is difficult to determine which one is the best solution. To solve this problem, it is necessary to optimize SQL. Simply put, the optimization of SQL statements is to convert low-performance SQL statements into SQL statements with better performance for the same purpose.
Reasons for optimizing SQL statements
The lifecycle of a database system can be divided into three stages: design, development, and product. Optimization at the design stage has the lowest cost and the greatest benefit. Optimization in the finished stage has the highest cost and the lowest benefit. If a database system is compared to a building, correction after the building is built is often costly and has little effect (or even cannot be corrected ), in the building design and production stages, controlling the quality of each brick and tile can achieve the goal of low cost and high effectiveness.
To maximize the benefits, we often need to optimize the database. Database optimization can usually be performed by optimizing the network, hardware, operating system, database parameters, and applications. According to statistics, the performance improvement obtained by optimizing network, hardware, operating system, and database parameters only accounts for about 40% of the database application system performance improvement, the other 60% of system performance improvements come from application optimization. Many optimization experts even believe that application optimization can improve system performance by 80%. Therefore, it is certain that optimization of the database system by optimizing the application can achieve greater benefits.
Application optimization can be divided into two aspects: source code optimization and SQL statement optimization. Due to changes in program logic, source code optimization is costly in terms of time cost and risk (especially for systems in use ). On the other hand, source code optimization has limited effect on improving the database system performance, because the database operations performed by applications are ultimately performed by SQL statements on the database.
There are some direct reasons for optimizing SQL statements:
1. SQL statements are the only way to operate databases (data). The execution of applications is ultimately attributed to the execution of SQL statements, the efficiency of SQL statements plays a decisive role in the performance of the database system.
2. SQL statements consume 70% ~ 90% of database resources.
3. SQL statements are independent of program design logic. Optimization of SQL statements does not affect program logic. Compared with optimization of program source code, the cost of optimizing SQL statements is low in both time and risk.
4. SQL statements can be written in different ways. The performance of different statements may vary greatly.
5. SQL statements are easy to learn and difficult to master. The performance of SQL statements is often related to the database structure and number of records of the actual running system. There is no general rule to improve the performance.
Traditional Optimization Methods
Traditionally, SQL programmers use manual rewriting to optimize SQL statements. This mainly relies on DBA or senior programmers to analyze the SQL statement execution plan, rely on experience, try to rewrite the SQL statement, and then compare the results and performance to try to find the SQL statement with better performance. This practice has the following shortcomings:
1. All possible SQL statements cannot be written. It may take a lot of time to find SQL statements with better performance. Even if an SQL statement with better performance is found, you cannot know whether there is a better performance statement.
2. It is very dependent on human experience. The amount of experience often determines the performance of the optimized SQL statement.
3. very time-consuming. Rewrite --> verify correctness --> compare performance. This cycle takes a lot of time.
According to the functions of traditional SQL optimization tools, optimization tools are generally divided into the following three generations:
The first generation of SQL optimization tools is execution plan analysis tools. These tools extract execution plans from the database for input SQL statements and explain the meaning of keywords in the execution plan.
The second generation of SQL optimization tool can only provide recommendations for adding indexes. It analyzes the execution plan of input SQL statements to generate recommendations for increasing indexes. This type of tool has a fatal disadvantage: Only one SQL statement is analyzed and the conclusion of adding an index is obtained, which is ignored (in fact, it cannot be evaluated) the impact of the added index on the overall database system performance.
... The remaining full text>
How to optimize SQL?
(1) select the most efficient table name sequence (only valid in the rule-based Optimizer): The ORACLE parser processes the table names in the FROM clause in the order FROM right to left, the base table driving table written in the FROM clause will be processed first. When the FROM clause contains multiple tables, you must select a table with the least number of records as the base table. If more than three tables are connected for query, You need to select an intersection table as the base table, which is the table referenced by other tables. (2) join order in the WHERE clause.: ORACLE uses the bottom-up sequence to parse the WHERE clause. According to this principle, the join between tables must be written before other WHERE conditions, the conditions that can filter out the maximum number of records must be written at the end of the WHERE clause. (3) Avoid using '*' in the SELECT clause: during ORACLE parsing, '*' is converted to all column names in sequence, this task is completed by querying the data dictionary, which means it will take more time (4) Reduce the number of times the database is accessed: ORACLE has executed a lot of work internally: parsing SQL statements, estimate the index utilization, bind variables, and read data blocks. (5) reset the ARRAYSIZE parameter in SQL * Plus, SQL * Forms, and Pro * C, you can increase the retrieved data volume for each database access. The recommended value is 200 (6). Use the DECODE function to reduce the processing time: Use the DECODE function to avoid repeated scanning of the same record or reconnecting to the same table. (7) simple integration and unrelated database access: If you have several simple database query statements, you can integrate them into a single query (even if there is no relationship between them) (8) DELETE duplicate records: the most efficient method for deleting duplicate records (because ROWID is used) Example: delete from emp e where e. ROWID> (select min (X. ROWID) from emp x where x. EMP_NO = E. EMP_NO); (9) replace DELETE with TRUNCATE: When deleting records in a table, rollback segments are usually used to store information that can be recovered. if you do not have a COMMIT transaction, ORACLE will recover the data to the State before the deletion (which is precisely the State before the deletion command is executed). When TRUNCATE is used, rollback segments no longer store any recoverable information. after the command is run, the data cannot be restored. therefore, few resources are called and the execution time is short. (The translator Press: TRUNCATE is only applicable to deleting the entire table, and TRUNCATE is DDL rather than DML) (10) Try to use COMMIT as much as possible: Use COMMIT as much as possible in the program, in this way, the performance of the program is improved, and the demand will be reduced because of the resources released by COMMIT:. information used to restore data on the rollback segment. b. the lock obtained by the Program Statement c. space d in redo log buffer. ORACLE uses the Where clause to replace the HAVING clause for managing internal spending (11) of the preceding three types of resources: avoid HAVING clause, HAVING filters the result set only after all records are retrieved. this process requires sorting, total, and other operations. if the WHERE clause can be used to limit the number of records, this overhead can be reduced. (in non-oracle) where on, where, and having can be added, on is the first statement to execute, where is the second clause, and having is the last clause, because on filters out records that do not meet the conditions before making statistics, it can reduce the data to be processed by intermediate operations. It is reasonable to say that the speed is the fastest, where should be better than ha ...... remaining full text>