Optimization and Analysis of SQL queries with large data volumes

Source: Internet
Author: User
1. Analysis Objectives

(1) test data table: largedata;

(2) data volume level: 2 million;

(3) Insert database code:

-- Select * into largedata from gsm_stop_recorddeclare @ I intset @ I = 1 while @ I <60000 begininsert into largedata values ('dlbsclb', 'dbb2483 ', 'zhuanghe Wangjia Zhen ', 'zhuanghe City ', '2017-09-01 09:55:00', 'trx local', '2017-09-01 10:04:00 ', 'chen Xu', 'econnoisseurs ', 'Master device hardware (CDU \ bbu \ rru \ Tru \ dxu \ Tianji fault, etc.) ', '2017-09-01 10:07:00', 12, 'chen xu', 2012, 'test') set @ I = @ I + 1end

 

2. Technical Problems 3. Problem Analysis 3.1 Problem status quo

The following describes the data in the GSM stop.

(1) Client: GSM stop Query

Now the client is querying SQL statements

Select * from GSM_Stop_Record  join GSM_bts on GSM_Stop_Record.z_cell_id=GSM_bts.bts_id where z_start_time between '2011-04-10 06:10:00' and '2013-01-01 09:55:00' order by z_start_time

According to the actual required SQL statement, the query time is 00: 03: 02, the data volume is 3269455, And the size is about 570 MB (exported in CSV format ).

During this query, the server performs the update operation.

 

(2) server: GSM stop update

Update SQL statements on the server

Update GSM_Stop_Record set z_bsc='AAAAAAA' where z_start_time='2012-08-31 11:05:00'

 

3.2 Problem Solving

But why does the query select block update? The reason is as follows:

Because SQL Server
The [transaction isolation level] of is read committed by default (which cannot be read by others during the transaction), and the locking of SQL Server causes blocking, by default, other processes must wait indefinitely (lock_timeout =-1 ).

To solve this problem, you can set the transaction isolation level to dirty read )].

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTEDSelect * from GSM_Stop_Record  join GSM_bts on GSM_Stop_Record.z_cell_id=GSM_bts.bts_id where z_start_time between '2011-04-10 06:10:00' and '2013-01-01 09:55:00' order by z_start_time

This will not affect data updates. The update operation is performed at the same time. The time is 00:00:06 and the data volume is 148 records, which fully meets the requirements.

In addition, the with (nolock) keyword is added to the table name to indicate that SQL server is required. You do not have to consider the locking status of the table, so you can also reduce the deadlock (dead lock)] The probability of occurrence. However
With (nolock) is not applicable to insert, update, and delete. (That is to say, this type is mainly used to solve the problem of query blocking .)

3.3 nature of Blocking

We continue with the previous blocking. Under the theoretical and normal conditions (no human locks), select will not block update, but does the above experiment result violate the theory? Now there are two transactions to be executed:

Transaction

Select * from GSM_Stop_Record  join GSM_bts on GSM_Stop_Record.z_cell_id=GSM_bts.bts_id where z_start_time between '2011-04-10 06:10:00' and '2013-01-01 09:55:00' order by z_start_time

 

Transaction B

Select * from GSM_Stop_Record where z_start_time='2012-09-01 09:55:00'

Perform Two operations at the same time.

The results are all displayed as being queried, and the query results are finally displayed. The conclusion is that the above blocking is not caused by select, but is actually caused by update. During the update process, the system scans the entire table and finally finds the time when the SELECT statement is being queried:
09:55:00 (that is, the row to be updated), but at this time, select has obtained the query permission and does not want to read the modified data, so the update is blocked.

The conclusion is that it seems or is actually caused by select, but the root cause is that update cannot obtain operation table permissions.

3.4 factors affecting query efficiency

First, let's look at what caused the query efficiency to be so low. Let's take a look at the following example:

Query and update.

Things

Select * From gsm_stop_record
Where z_start_time

'2017-04-10 06:10:00'
And '2017-01-01 09:55:00'

 

Things B

Update gsm_stop_record
Set z_bsc = 'aaaaaaa' where z_start_time =

'2017-08-31 11:05:00'

The data in the experiment is that the time used for transaction a is 00:01:35, and the query data volume is 3269455 records. At the same time, the update is performed at 00:00:05, and the data volume is 148 records, the result is basically the same as the data above.

However, the SQL statement in MySQL 3.1 takes three minutes.

So what makes the client query efficiency so low? To query the cause, perform the following analysis:

3.4.1 effect of foreign key Association

You can view the preceding SQL statement and find that you need to query the Chinese name of gsm_bts on the GSM stop station. You also need to query the gsm_bts cell table (gsm_stop_record.z_cell_id = gsm_bts.bts_id ), in fact, the foreign key association does not exist in the Database. Check the database and find that bts_id is of the nvarchar (255) type, and z_cell_id is of the nvarchar (50) type ), to verify the effect of this condition on the entire statement, set them to nvarchar (50) and associate them with foreign keys. The process is as follows:

(1) deleting a foreign key that cannot be associated

Delete from gsm_stop_recordwhere z_cell_id
Notin

(Select bts_idfrom gsm_bts)

Result: (Row 3 is affected)

(2) Add a foreign key Association

Alter table gsm_stop_record

Add foreign key (z_cell_id) References gsm_bts (bts_id)

The SQL statement in section 3.2 is queried at 00:03:08, which is basically the same as the preceding one. Therefore, this is not the root cause of the impact on efficiency.

It can be seen that foreign key association is not the main reason here.

3.4.2 index impact

As shown in the preceding SQL statement, z_start_time is used to query and sort data. Therefore, an index is created on z_start_time. Run the SQL statement (1) in 3.1 again. The result is as follows:

The time is 00:02:45 and the number is 3269455.

It can be seen that the efficiency is indeed improved.

3.4.3 impact of index types

As shown in figure 3.3.2, the index does not significantly increase the query efficiency.

Is it related to the type? Here we will continue the transformation and convert z_start_time to the datetime type. Run the SQL statement (1) in 3.1 again. The result is as follows:

 

3.4.4 partitioned table Application

 

4. Conclusion

(1) You can addSettransaction
Isolationlevelreaduncommitted
To solve the problem, you can addWith (nolock)Solution.

(2) Increasing indexes can increase query efficiency;

(3) Partitioned Tables can increase query efficiency and facilitate programming and development;

5. Reference

(1) http://www.cnblogs.com/lmule/archive/2010/08/17/1801085.html

(2) http://www.cnblogs.com/WizardWu/archive/2008/10/27/1320055.html

(3) http://blog.csdn.net/smallfools/article/details/4930810

(4) National Computer Rank Examination Level 4 tutorial

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.