Millions database optimization scheme of database SQL optimization Summary

Last Update:2014-07-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

There are a lot of tutorials on SQL optimization online, but it's messy. Recently there is a time to tidy up a bit, write to share with you, where there are errors and deficiencies, but also please correct the supplement. This article I spent a lot of time looking for information, modification, typesetting, I hope you read, feel good words recommend to more people, let more people see, Correct and supplement.

1. To optimize the query, to avoid full table scanning, first consider the where and order by the columns involved in the index.

2. Avoid null-valued fields in the WHERE clause, which will cause the engine to discard full-table scans using the index, such as:

Select ID from t where num is null

It is best not to leave the database null, and to populate the database with not NULL as much as possible.

Comments, descriptions, comments, and so on can be set to NULL, others, preferably not using NULL.

Do not assume that NULL does not require space, such as: char (100) type, when the field is established, the space is fixed, regardless of whether the insertion value (NULL is also included), is occupied 100 characters of space, if it is varchar such a variable length field, NULL does not occupy space.

You can set the default value of 0 on NUM, make sure that the NUM column in the table does not have a null value, and then query:

Select ID from t where num = 0

3. Try to avoid using the! = or <> operator in the WHERE clause, or discard the engine for a full table scan using the index.

4. Try to avoid using or in the WHERE clause to join the condition, if a field has an index and a field is not indexed, it will cause the engine to discard using the index for a full table scan, such as:

Select ID from t where num=10 or Name = ' admin '

You can query this:

Select ID from t where num = 10union allselect ID from t where Name = ' admin '

5.in and not in should also be used with caution, otherwise it will result in full table scans, such as:

Select ID from t where num in

For consecutive values, you can use between instead of in:

Select ID from t where num between 1 and 3

A lot of times it's a good choice to replace in with exists:

Select num from a where num in (select num from B)

Replace with the following statement:

Select num from a where exists (select 1 from b where num=a.num)

6. The following query will also cause a full table scan:

Select ID from t where name like '%abc% '

To be more efficient, consider full-text indexing.

7. If you use a parameter in the WHERE clause, it also causes a full table scan. Because SQL resolves local variables only at run time, the optimizer cannot defer the selection of access plans to run time; it must be selected at compile time. However, if an access plan is established at compile time, the value of the variable is still unknown and therefore cannot be selected as an input for the index. The following statement will perform a full table scan:

Select ID from t where num = @num

You can force the query to use the index instead:

Select ID from the T with (index name) where num = @num

You should try to avoid expression operations on fields in the WHERE clause, which will cause the engine to discard full table scans using the index. Such as:

Select ID from t where NUM/2 = 100

should read:

Select ID from t where num = 100*2

9. You should try to avoid function operations on the fields in the WHERE clause, which will cause the engine to discard the full table scan using the index. Such as:

Select ID from t where substring (name,1,3) = ' abc '       -–name idselect ID starting with ABC from t where DateDiff (Day,createdate, ' 2005 -11-30′) = 0-    ' 2005-11-30 '    --generated ID

should read:

Select ID from t where name like ' abc% ' select ID from t where createdate >= ' 2005-11-30 ' and CreateDate < ' 2005-12-1 ‘

10. Do not perform functions, arithmetic operations, or other expression operations on the left side of "=" in the WHERE clause, or the index may not be used correctly by the system.

11. When using an indexed field as a condition, if the index is a composite index, you must use the first field in the index as a condition to guarantee that the system uses the index, otherwise the index will not be used, and the field order should be consistent with the index order as much as possible.

12. Do not write meaningless queries, such as the need to generate an empty table structure:

Select Col1,col2 into #t from T where 1=0

This type of code does not return any result sets, but consumes system resources and should be changed to this:
CREATE TABLE #t (...)

13.Update statement, if you only change 1, 2 fields, do not Update all fields, otherwise frequent calls will cause significant performance consumption, while bringing a large number of logs.

14. For multiple large data volume (here Hundreds of is even larger) table join, to first paged and then join, otherwise the logical reading will be very high, poor performance.

15.select Count (*) from table, so that count without any conditions causes a full table scan, and without any business meaning, it must be eliminated.

16. The index is not the more the better, although the index can improve the efficiency of the corresponding select, but also reduce the efficiency of insert and UPDATE, because the INSERT or update when the index may be rebuilt, so how to build the index needs careful consideration, depending on the situation. The number of indexes on a table should not be more than 6, if too many you should consider whether some of the indexes that are not commonly used are necessary.

17. You should avoid updating clustered index data columns as much as possible, because the order of the clustered index data columns is the physical storage order of the table records, which can consume considerable resources once the column values change to the order in which the entire table is recorded. If your application needs to update clustered index data columns frequently, you need to consider whether the index should be built as a clustered index.

18. Use numeric fields as much as possible, if the field containing only numeric information should not be designed as a character type, which will reduce the performance of queries and connections and increase storage overhead. This is because the engine compares each character in a string one at a time while processing queries and joins, and it is sufficient for a numeric type to be compared only once.

19. Use Varchar/nvarchar instead of Char/nchar as much as possible, because the first variable length field storage space is small, can save storage space, second, for the query, in a relatively small field in the search efficiency is obviously higher.

20. Do not use SELECT * from t anywhere, replace "*" with a specific field list, and do not return any fields that are not available.

21. Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, be aware that the index is very limited (only the primary key index).

22. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources. Temporary tables are not unusable, and they can be used appropriately to make certain routines more efficient, such as when you need to repeatedly reference a dataset in a large table or a common table. However, for one-time events, it is best to use an export table.

23. When creating a temporary table, if you insert a large amount of data at one time, you can use SELECT INTO instead of CREATE table to avoid causing a large number of logs to increase speed, and if the amount of data is small, create table to mitigate the resources of the system tables. Then insert.

24. If a temporary table is used, be sure to explicitly delete all temporary tables at the end of the stored procedure, TRUNCATE table first, and then drop table, which avoids longer locking of the system tables.

25. Avoid using cursors as much as possible, because cursors are inefficient and should be considered for overwriting if the cursor is manipulating more than 10,000 rows of data.

26. Before using a cursor-based method or temporal table method, you should first look for a set-based solution to solve the problem, and the set-based approach is generally more efficient.

27. As with temporary tables, cursors are not unusable. Using Fast_forward cursors on small datasets is often preferable to other progressive processing methods, especially if you must reference several tables to obtain the required data. Routines that include "totals" in the result set are typically faster than using cursors. If development time permits, a cursor-based approach and a set-based approach can all be tried to see which method works better.

28. Set NOCOUNT on at the beginning of all stored procedures and triggers, set NOCOUNT OFF at the end. You do not need to send a DONE_IN_PROC message to the client after each statement that executes the stored procedure and trigger.

29. Try to avoid large transaction operation and improve the system concurrency ability.

30. Try to avoid the return of large data to the client, if the amount of data is too large, should consider whether the corresponding demand is reasonable.

Real Case Analysis : splitting large DELETE or INSERT statements and committing SQL statements in batches
If you need to perform a large DELETE or INSERT query on an online website, you need to be very careful to avoid your actions to keep your entire site from stopping accordingly. Because these two operations will lock the table, the table is locked, the other operations are not in.
Apache will have a lot of child processes or threads. So, it works quite efficiently, and our servers don't want to have too many child processes, threads and database links, which is a huge amount of server resources, especially memory.
If you lock your watch for a period of time, say 30 seconds, then for a site with a high volume of traffic, the 30-second cumulative number of access processes/threads, database links, and open files may not only crash your Web service, but may also cause your entire server to hang up immediately.
So, if you have a big deal, you must split it, using the LIMIT Oracle (rownum), SQL Server (top) condition is a good method. Here is an example of MySQL:

while (1) {//do only 1000 mysql_query at a time ("delete from logs where log_date <= ' 2012-11-01 ' limit 1000"); if (mysql_affected_rows () = = 0) {
Delete Complete, exit! break;}//each pause for a period of time, freeing the table for other processes/threads to access. Usleep (50000)}

All right, we're done here. I know there are a lot of things that have not been written, and I would also ask you to add. There will be some SQL optimizer tools available to you later. Let's study together and make progress together!

SQL Server performance optimization nolock, greatly improve database query performance

The company database grows with time, the data is more and more, the query speed is also more and more slow. Into the database to see a bit, hundreds of thousands of of the data, the query is really time-consuming.

To improve the query performance of SQL, we generally consider the index as the first consideration. In fact, in addition to the establishment of the index, when we are under the SQL command, in the syntax of adding a section with (NOLOCK) can improve the environment of large online query data set by lock, thereby improving the performance of the query.

However, it is important to note that the SQL select with (NOLOCK) may cause dirty read to read invalid data.

The following is a brief description of the principle and use of SQL Server lock contention and Nolock,rowlock:

Description of Lock contention

Databases that do not just use row-level locks use a technique called the Escalation Lock (lock) to obtain higher performance. Unless it is clear that it is for the entire data table, the practice of these databases is to start using row-level locks and then start using a wide range of locking mechanisms as the modified data increases.

Unfortunately, this method of blending locks produces and amplifies new problems: deadlocks. If two users modify records in a different table in reverse order, and the two records are logically unrelated, but physically adjacent, the operation will first raise the row lock and then upgrade to a page lock. In this way, two users are required to lock something, resulting in a deadlock.

For example:

User A modifies some records of table A, and the resulting page lock locks not only the record being modified, but also many other records.

User B modifies some records of table B, and the page lock that is raised locks user A and other data that is being modified.

User A wants to modify the data that User B locks (and is not necessarily modifying) in table B.

User B wants to modify or simply want to access data that user a locks in table A (and is not necessarily being modified).

In order to solve this problem, the database will often detect the existence of a deadlock, if any, one of the transactions to be revoked, so that the other transaction can be completed successfully. In general, it is a matter of undoing the small amount of modified data so that the rollback is less expensive. A database with row-level locks rarely has this problem because two users are less likely to modify the same record at the same time, and have fewer locks due to the very accidental order of data modification.

Also, the database uses lock timeouts to avoid allowing users to wait too long. The introduction of query timeouts is also for the same purpose. We can re-submit those queries that have timed out, but this will only cause a blockage in the database. If a time-out occurs frequently, there is a problem with how the user uses SQL Server. The normal situation is that there are few timeouts.

In a running environment with high server load, the SQL Server lock mechanism using hybrid locks does not perform well. The reason for this is lock contention (lock contention). Lock contention causes deadlock and lock wait issues. In a multi-user system, many users will be in the database at the same time, there are more users at the same time access to the database, at any time to generate locks, users are scrambling to acquire locks to ensure that their operation is correct, deadlock occurs frequently, in this case, the user's mood is conceivable.

Indeed, if there are only a handful of users, SQL Server will not experience much trouble. When it comes to internal testing and publishing, it's hard to spot concurrency problems due to fewer users. But when you fire hundreds of concurrent, continue to insert,update, and some delete operations, how to see if there is trouble, then you will be scrambling to unlock.

Workaround for lock contention

SQL Server starts with row-level locks, but often expands to page and table locks, resulting in deadlocks.

SQL Server encounters a lock when a select is not modified, even if the user does not modify the data. Fortunately, we can do this manually with two keywords from SQL Server: Nolock and Rowlock.

They are used in the following ways:

SELECT COUNT (UserID) from the Users with (NOLOCK) WHERE Username like ' Football '

And

UPDATE Users with (rowlock) SET Username = ' admin ' WHERE Username = ' Football '

Use of Nolock

Nolock can ignore locks and read data directly from the database. This means that locks can be avoided, which improves performance and scalability. But it also means that there is a possibility of code errors. You may read uncommitted data that is being processed by the running transaction without validation. This risk can be quantified.

Use of Rowlock

Rowlock tells SQL Server to use only row-level locks. The Rowlock syntax can be used in select,update and DELETE statements, but I am accustomed to using only the update and DELETE statements. A row-level lock is always raised if there is a specified primary key in the UPDATE statement. However, when SQL Server batches several of these updates, some data is exactly on the same page (page), which is likely to happen in the current situation, as in a directory where it takes a long time to create files, while you are updating those files. When a page lock is thrown, things start to get worse. If you do not specify a primary key when you update or delete, the database will of course think that a lot of data is affected, and then the page lock is thrown directly, and things get worse.

Here is an example to illustrate the role of NOLOCK, here using a database of more than 10,000 to test, first without nolock to see:

DECLARE @start datetime;declare @end DATETIME; SET @start = getdate (); select * from captions_t18; SET @end = getdate (); Select DateDiff (MS, @start, @end);

In order to be more effective here, a SELECT * is used to look at the results of the execution, such as:

The use time shown here is 34720ms, below use Nolock to look at:

DECLARE @start datetime;declare @end DATETIME; SET @start = getdate (); select * from captions_t18 with (NOLOCK); SET @end = getdate (); Select DateDiff (MS, @start, @end);

Running results such as:

This time the use of 2563ms, the gap is reflected in it. Personal feeling time should not be poor so much, in short, performance is improved a lot. Let's have a lot of tests.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Millions database optimization scheme of database SQL optimization Summary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Millions database optimization scheme of database SQL optimization Summary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support