Database optimization Millions database SQL optimization scheme

Last Update:2016-04-10 Source: Internet

Author: User

Tags joins one table

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. To optimize the query, to avoid full table scanning, first consider the where and order by the columns involved in the index.

2. Avoid null-valued fields in the WHERE clause, which will cause the engine to discard full-table scans using the index, such as:

Selectidfromtwherenumisnull

It is best not to leave the database null, and to populate the database with not NULL as much as possible.

Comments, descriptions, comments, and so on can be set to NULL, others, preferably not using NULL.

Do not assume that NULL does not require space, such as: char (100) type, when the field is established, the space is fixed, regardless of whether the insertion value (NULL is also included), is occupied 100 characters of space, if it is varchar such a variable length field, NULL does not occupy space.

You can set the default value of 0 on NUM, make sure that the NUM column in the table does not have a null value, and then query:

Selectidfromtwherenum=0

3. Try to avoid using the! = or <> operator in the WHERE clause, or discard the engine for a full table scan using the index.

4. Try to avoid using or in the WHERE clause to join the condition, if a field has an index and a field is not indexed, it will cause the engine to discard using the index for a full table scan, such as:

Selectidfromtwherenum=10orname= ' admin '

You can query this:

Selectidfromtwherenum=10unionallselectidfromtwherename= ' admin '

5.in and not in should also be used with caution, otherwise it will result in full table scans, such as:

Selectidfromtwherenumin (a)

For consecutive values, you can use between instead of in:

Selectidfromtwherenumbetween1and3

A lot of times it's a good choice to replace in with exists:

Selectnumfromawherenumin (Selectnumfromb)

Replace with the following statement:

Selectnumfromawhereexists (Select1frombwherenum=a.num)

6. The following query will also cause a full table scan:

Selectidfromtwherenamelike '%abc% '

To be more efficient, consider full-text indexing.

7. If you use a parameter in the WHERE clause, it also causes a full table scan. Because SQL resolves local variables only at run time, the optimizer cannot defer the selection of access plans to run time; it must be selected at compile time. However, if an access plan is established at compile time, the value of the variable is still unknown and therefore cannot be selected as an input for the index. The following statement will perform a full table scan:

[Email protected]

You can force the query to use the index instead:

Selectidfromtwith (index name)) [email protected]

You should try to avoid expression operations on the fields in the WHERE clause, which causes the engine to discard full table scans using the index. Such as:

selectidfromtwherenum/2=100

should read:

Selectidfromtwherenum=100*2

9. You should try to avoid function operations on the fields in the WHERE clause, which will cause the engine to discard the full table scan using the index. Such as:

Selectidfromtwheresubstring (name,1,3) = ' abc '-–name Idselectidfromtwheredatediff beginning with ABC (Day,createdate, ' 2005-11-30′) =0-' 2005-11-30 '--generated ID

should read:

Selectidfromtwherenamelike ' abc% ' selectidfromtwherecreatedate>= ' 2005-11-30 ' andcreatedate< ' 2005-12-1 '

10. Do not perform functions, arithmetic operations, or other expression operations on the left side of "=" in the WHERE clause, or the index may not be used correctly by the system.

11. When using an indexed field as a condition, if the index is a composite index, you must use the first field in the index as a condition to guarantee that the system uses the index, otherwise the index will not be used, and the field order should be consistent with the index order as much as possible.

12. Do not write meaningless queries, such as the need to generate an empty table structure:

Selectcol1,col2into#tfromtwhere1=0

This type of code does not return any result sets, but consumes system resources and should be changed to this:

CREATE TABLE #t (...)

13.Update statement, if you only change 1, 2 fields, do not Update all fields, otherwise frequent calls will cause significant performance consumption, while bringing a large number of logs.

14. For multiple large data volume (here Hundreds of is even larger) table join, to first paged and then join, otherwise the logical reading will be very high, poor performance.

15.select Count (*) from table, so that count without any conditions causes a full table scan, and without any business meaning, it must be eliminated.

16. The index is not the more the better, although the index can improve the efficiency of the corresponding select, but also reduce the efficiency of insert and UPDATE, because the INSERT or update when the index may be rebuilt, so how to build the index needs careful consideration, depending on the situation. The number of indexes on a table should not be more than 6, if too many you should consider whether some of the indexes that are not commonly used are necessary.

17. You should avoid updating clustered index data columns as much as possible, because the order of the clustered index data columns is the physical storage order of the table records, which can consume considerable resources once the column values change to the order in which the entire table is recorded. If your application needs to update clustered index data columns frequently, you need to consider whether the index should be built as a clustered index.

18. Use numeric fields as much as possible, if the field containing only numeric information should not be designed as a character type, which will reduce the performance of queries and connections and increase storage overhead. This is because the engine compares each character in a string one at a time while processing queries and joins, and it is sufficient for a numeric type to be compared only once.

19. Use Varchar/nvarchar instead of Char/nchar as much as possible, because the first variable length field storage space is small, can save storage space, second, for the query, in a relatively small field in the search efficiency is obviously higher.

20. Do not use SELECT * from t anywhere, replace "*" with a specific field list, and do not return any fields that are not available.

21. Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, be aware that the index is very limited (only the primary key index).

22. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources. Temporary tables are not unusable, and they can be used appropriately to make certain routines more efficient, such as when you need to repeatedly reference a dataset in a large table or a common table. However, for one-time events, it is best to use an export table.

23. When creating a temporary table, if you insert a large amount of data at one time, you can use SELECT INTO instead of CREATE table to avoid causing a large number of logs to increase speed, and if the amount of data is small, create table to mitigate the resources of the system tables. Then insert.

24. If a temporary table is used, be sure to explicitly delete all temporary tables at the end of the stored procedure, TRUNCATE table first, and then drop table, which avoids longer locking of the system tables.

25. Avoid using cursors as much as possible, because cursors are inefficient and should be considered for overwriting if the cursor is manipulating more than 10,000 rows of data.

26. Before using a cursor-based method or temporal table method, you should first look for a set-based solution to solve the problem, and the set-based approach is generally more efficient.

27. As with temporary tables, cursors are not unusable. Using Fast_forward cursors on small datasets is often preferable to other progressive processing methods, especially if you must reference several tables to obtain the required data. Routines that include "totals" in the result set are typically faster than using cursors. If development time permits, a cursor-based approach and a set-based approach can all be tried to see which method works better.

28. Set NOCOUNT on at the beginning of all stored procedures and triggers, set NOCOUNT OFF at the end. You do not need to send a DONE_IN_PROC message to the client after each statement that executes the stored procedure and trigger.

29. Try to avoid large transaction operation and improve the system concurrency ability.

30. Try to avoid the return of large data to the client, if the amount of data is too large, should consider whether the corresponding demand is reasonable.

Real case Analysis: splitting large DELETE or INSERT statements and committing SQL statements in batches

If you need to perform a large DELETE or INSERT query on an online website, you need to be very careful to avoid your actions to keep your entire site from stopping accordingly. Because these two operations will lock the table, the table is locked, the other operations are not in.

Apache will have a lot of child processes or threads. So, it works quite efficiently, and our servers don't want to have too many child processes, threads and database links, which is a huge amount of server resources, especially memory.

If you lock your watch for a period of time, say 30 seconds, then for a site with a high volume of traffic, the 30-second cumulative number of access processes/threads, database links, and open files may not only crash your Web service, but may also cause your entire server to hang up immediately.

So, if you have a big deal, you must split it, using the LIMIT Oracle (rownum), SQL Server (top) condition is a good method. Here is an example of MySQL:

while (1) {//do only 1000 mysql_query at a time ("delete from logs where log_date <= ' 2012-11-01 ' limit"); if (mysql_affected_ Rows () = = 0) {//delete complete, exit! Break
}//each pause for a period of time, freeing the table for other processes/threads to access. Usleep (50000)

}

----------------------------------------------I'm a partition---------------------------------------------------------

1. Efficient SQL statement design:

Typically, you can use the following methods to optimize the performance of SQL for data operations:
(1) Reduce the number of queries to the database, that is, reduce requests for system resources, using distributed database objects such as snapshots and graphs to reduce the number of queries to the database.
(2) Try to use the same or very similar SQL statements to query, so that not only take full advantage of the parsed syntax tree in the SQL shared pool, the likelihood of the data being queried to hit in the SGA will be greatly increased.
(3) Avoid the execution of SQL statements without any conditions. SQL statements that do not have any conditions are usually executed with FTS, the database locates a block of data first, and then sequentially finds other data in order, which is a lengthy process for large tables.
(4) If you have constraints on the data in some tables, it is best to implement the SQL statement in the table with the description integrity rather than the SQL program.

One, operator optimization:

1, in operator

The advantages of SQL in write are easier to write and easy to understand, which is more suitable for modern software development style. But SQL performance with in is always lower, and the steps taken from Oracle to parse SQL with in is the following differences from SQL without in:

Oracle attempts to convert it into a connection to multiple tables, and if the conversion is unsuccessful, it executes the subquery in the inside, then queries the outer table record, and if the conversion succeeds, it directly uses the connection method of multiple tables. This shows that using in SQL at least one more conversion process. General SQL can be converted successfully, but for the inclusion of grouping statistics and other aspects of SQL cannot be converted. Try not to use the in operator in a business-dense SQL.

When optimizing SQL, you often encounter the use of in statements, be sure to replace it with exists, because Oracle in the process of processing in is done by or, even if the use of the index is very slow.

2, not in operator

A strongly-listed recommendation is not used because it cannot apply an index to a table. Replace with not EXISTS or (outer join + empty) scheme

3, is null or NOT NULL operation

Determining whether a field is empty generally does not apply an index, because the B-tree index is not indexed by a null value.

Instead of using other operations of the same function, A is not null changed to A>0 or a> "and so on.

The field is not allowed to be empty, and a default value is used instead of a null value, such as an application where the Status field is not allowed to be empty, and the default is the request.

Avoid using is null on an indexed column and is not NULL to avoid using any nullable columns in the index, and Oracle will not be able to use that index. For single-column indexes, this record will not exist in the index if the column contains a null value. For composite indexes, this record does not exist in the index if each column is empty. If at least one column is not empty, the record exists in the index. For example, if a uniqueness index is established on column A and column B of a table, and the table has a A and a record of a A and a (123,null), ORACLE will not accept the next record (insert) with the same A, B value (123,null). However, if all the index columns are empty, Oracle will assume that the entire key value is empty and null is not equal to NULL. So you can insert 1000 records with the same key value, of course they are empty! Because null values do not exist in the index column, a null comparison of indexed columns in the WHERE clause causes Oracle to deactivate the index.

Inefficient: (Index invalidation)

SELECT ... From DEPARTMENT WHERE Dept_code isnotnull;

Efficient: (Index valid)

SELECT ... From DEPARTMENT WHERE dept_code >=0;

4, > and < operator (greater than or less than operator)

The greater than or less than the operator generally does not need to adjust, because it has an index will be indexed to find, but in some cases it can be optimized, such as a table has 1 million records, a numeric field A, 300,000 records of a=0,30 Records of the A=1,39 million records of a=2,1 Records of the a=3. There is a big difference between performing a>2 and a>=3, because Oracle finds the index of records for 2 and then compares them, while A>=3 Oracle locates the records index of =3 directly.
replacing > with >=
Efficient:

SELECT ... From DEPARTMENT WHERE dept_code >=0;

Low efficiency:

Select*from empwhere DEPTNO >3

The difference between the two is that the former DBMS will jump directly to the first record that dept equals 4, and the latter will first locate the Dept no=3 record and scan forward to the first dept greater than 3.

5. Like operator:

The LIKE operator can apply a wildcard query, where the wildcard combination may reach almost arbitrary queries, but if used poorly it can produce performance problems, such as the "%5400%" query does not reference the index, and the "x5400%" reference to the scope index. A practical example: Use the user identification number behind the business number in the YW_YHJBQK table to inquire the business number yy_bh like '%5400% ' this condition will produce a full table scan if changed to yy_bh like ' x5400% ' OR yy_bh like ' b5400% '
The YY_BH index is used to make two-range queries, and the performance is definitely improved.

6. Replace distinct with exists:
Avoid using DISTINCT in the SELECT clause when submitting a query that contains one-to-many table information, such as a departmental table and an employee table. It is generally possible to consider replacing with exist, exists makes the query faster because the RDBMS core module will return the results immediately after the conditions of the subquery have been met.
Example:
(inefficient):

SelectDistinct dept_no,dept_namefrom DEPT D, EMP ewhere d.dept_no = e.dept_no

(efficient):

SELECT dept_no,dept_namefrom DEPT D whereexists
(SELECT ' X ' from EMP ewhere e.dept_no = d.dept_no);

Such as:
Replace in with exists instead of not exists instead of in:
In many base-table-based queries, it is often necessary to join another table in order to satisfy one condition. In this case, using EXISTS (or not EXISTS) will usually improve the efficiency of the query. In a subquery, the NOT IN clause performs an internal sort and merge. In either case, the not is the least effective (because it performs a full table traversal of the table in the subquery). To avoid using not, we can change it to an outer join (Outer Joins) or not EXISTS.
Example:
(efficient):

Select*from EMP (base table) WHERE EMPNO >0andexists
(SELECT ' X ' from Deptwhere DEPT. Deptno= EMP. DEPTNO and loc= ' Melb ')

(inefficient):

Select*from EMP (base table) WHERE EMPNO >0and Deptnoin
(SELECT DEP tnofrom DEPT WHERE LOC = ' Melb ')

7. Replace or with union (for indexed columns)

In general, replacing or in the WHERE clause with union will have a good effect. Using or on an indexed column causes a full table scan. Note that the above rules are valid only for multiple indexed columns. If a column is not indexed, the query efficiency may be reduced because you did not select or. In the following example, indexes are built on both loc_id and region.
(efficient):

SELECT loc_id,loc_desc,regionfrom location WHERE loc_id =Ten
Unionselect loc_id, Loc_desc, regionfrom location WHERE region = ' MELBOURNE '

(inefficient):

SELECT loc_id,loc_desc,regionfrom location WHERE loc_id= TenOR region = ' MELBOURNE '

If you persist in using or, you need to return the least logged index column to the front.

8. Replace or with in

This is a simple and easy-to-remember rule, but the actual execution effect has to be tested, and under Oracle8i, the execution path seems to be the same.
Low efficiency:

Select....from location WHERE loc_id =tenor loc_id=or loc_id=

Efficient:

SELECT ... From location WHERE loc_in in (ten,+);

II. structure optimization of SQL statements

1. Avoid using ' * ' in the SELECT clause:

2. Replace Delete with truncate:

Delete the full table record with truncate instead of Delete: (a table with a large amount of data is used in the secondary method)
When you delete a record in a table, in general, the rollback segment (rollback segments) is used to hold information that can be recovered. If you do not have a COMMIT transaction, Oracle restores the data to the state before it was deleted (to be exact, before the delete command was executed) and when the truncate is applied, the rollback segment no longer holds any recoverable information.

3. Replace the HAVING clause with a WHERE clause:

Avoid having a HAVING clause that filters the result set only after all records have been retrieved. This process requires sorting, totals, and so on. If you can limit the number of records through a WHERE clause, you can reduce this overhead. (Non-Oracle) on, where, have the three clauses that can be added conditionally, on is the first execution, where the second, having the last, because on is the non-qualifying records filtered before the statistics, it can reduce the intermediate operation to process the data, It's supposed to be the fastest speed, and where should be faster than having a.

4. SQL statements in uppercase

Because Oracle always parses SQL statements first, the lowercase letters are converted to uppercase and then executed.

5. Use the connector "+" connection string as little as possible in Java code!

6. Avoid changing the type of indexed columns:

Oracle automatically makes simple type conversions to columns when comparing data of different data types. Suppose Empno is an indexed column of a numeric type.

SELECT ... From EMP WHERE EMPNO = ' 123 ' Actually, after the Oracle type conversion, the statement translates to:

SELECT ... From EMP WHERE EMPNO = to_number ('123')

Fortunately, the type conversion did not occur on the index column, and the purpose of the index was not changed. Now, suppose Emp_type is an indexed column of a character type.

SELECT ... From EMP WHERE emp_type =123

This statement is translated by Oracle to:

SELECT ... From EMP whereto_number (emp_type) =123

This index will not be used because of the type conversions that occur internally! To avoid the implicit type conversion of your SQL by Oracle, it is a good idea to explicitly show the type conversions. Note When comparing characters to numbers, Oracle takes precedence in converting numeric types to character types

7. Optimize GROUP BY:

The efficiency of the group BY statement can be improved by filtering out unwanted records before group by. The following two
The query returns the same result but the second one is significantly faster.
Low efficiency:

1SELECT Job,avg (SAL) from EMP GROUPby jobhaving job= ' president ' OR JOB = ' MANAGER '

Efficient:

1SELECT Job,avg (SAL) from EMP WHERE JOB = ' President ' OR job= ' MANAGER ' GROUPby job

Database optimization Scheme

1. Using Table partitioning

Partitioning separates data physically, and data from different partitions can be stored in data files on different disks. In this way, when querying this table, only need to scan the table partition, instead of full table scan, significantly shorten the query time, the other partition on different disks will be the data transfer to the table of different disk I/O, a well-provisioned partition can transfer data to disk i/ o The competition is evenly dispersed. You can take this approach when you have a large amount of data. Table partitions can be automatically built by month.

2. Use of aliases

Alias is a large database application technique, that is, the table name, column name in the query with a letter alias, query speed is 1.5 times times faster than the construction of the connection table.

3. Optimized design of indexed index

Indexes can greatly speed up the query speed of a database, which maps the logical values in a table to a secure rowid, so that the index can quickly locate the physical address of the data. When querying a large table with indexes, the index data may run out of all the block cache space, and Oracle has to read and write the disk frequently to get the data, so after partitioning a large table, you can create a partitioned index based on the corresponding partition. But personally, not all tables need to be indexed and indexed only for large data volumes.

Cons: First, it takes time to create indexes and maintain indexes, and this time increases as the amount of data increases. Second, the index needs to occupy the physical space, in addition to the data table to occupy the data space, each index also occupies a certain amount of physical space, if you want to establish a clustered index, then the space will be larger. Thirdly, when the data in the table is added, deleted and modified, the index should be maintained dynamically, thus reducing the maintenance speed of the data.

Indexes need to be maintained: in order to maintain the performance of the system, indexes must be maintained after the index has been created and the index pages have been broken due to frequent operations such as adding, deleting, modifying the data.

4. Adjust the hard disk I/O

This step was done before the development of the information system. The database administrator can place the data files that comprise the same tablespace on different hard disks to achieve I/O load balancing between the drives. The following principles should also be followed in cases where the disk is relatively wealthy:

separating tables and indexes;

Create user tablespace, separate disk from System tablespace;

Specify a different table space when creating tables and indexes;

Create a dedicated table space for rollback segments to prevent space competition from affecting the completion of the transaction;

Create temporary tablespace for sorting operations, as much as possible to prevent database fragmentation from being present in more than one table space.

In the process of using materialized views, we can basically "treat it as an actual data table" without worrying about the efficiency, optimization, etc. of the underlying table of the view itself.

Materialized view

1. For complex and high-consumption queries, if used frequently, the materialized view should be built

2. Materialized view is a typical performance optimization method with space-time change

3. Use materialized views for frequently updated tables

4. Select the appropriate refresh mode

The general view is virtual, and materialized views are real data regions that occupy storage space.

Of course, materialized views differ in the creation and management of views and in the general view. In contrast, materialized views occupy a certain amount of storage space, while the system refreshes materialized views also need to spend a certain amount of resources, but it is in exchange for efficiency and flexibility.

Reduce IO and network transmission times

1. Try to use fewer database requests, obtain the required data, can be taken out of the same time without multiple extraction

2. For bulk operations of frequently operating databases, use stored procedures to reduce unnecessary network transfers

Deadlock and blockage

1. For data that needs to be updated frequently, avoid putting it in long transactions, so as not to cause a chain reaction

2. It is not a necessity, it is best not to add your own lock outside of the Oracle lock mechanism

3. Reduce transaction size and commit transactions in a timely manner

4. Avoid cross-database distributed transactions as much as possible because of the complexity of the environment, which can easily lead to blocking

5. Use bitmap indexing with caution, which is prone to deadlock when updating

Automatically increase table partitioning:

The program can be performed as an Oracle job execution before 28th of each month (considering the reason for 28 days in February), automatically adding partitions to the partition table under that user.

create or replace procedure guan_add_partition
/*
/* Automatically add partitions for all partition tables under one user. The partition is listed as a date type, and the partition name is similar to the following: p200706.
/*create by David
*/
as
V_table_name varchar2 ();
V_partition_name VARCHAR2 (50);
V_month char (6);
V_add_month_1 char (6);
v_sql_string VARCHAR2 (2000);
V_add_month varchar2 (20);
Cursor Cur_part is select DISTINCT U.table_name,max (p.partition_name) max_part_name from User_tables u,user_tab_ Partitions p
where u.table_name=p.table_name and u.partitioned = ' YES '
Group by U.table_name;
Begin
Select To_char (sysdate, ' yyyymm ') to v_month from dual;
Select To_char (Add_months (sysdate,1), ' yyyymm ') to v_add_month_1 from dual;
Select To_char (add_months (trunc (sysdate, ' mm '), 2), ' Yyyy-mm-dd ') into the v_add_month from dual;
Open Cur_part;
Loop
Fetch cur_part into v_table_name,v_partition_name;
Exit when Cur_part%notfound;
If To_number (substr (v_partition_name,2)) <=to_number (substr (v_month,1)) then

Database optimization Millions database SQL optimization scheme

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More