Toward dba[mssql] High efficiency stored procedures for large tables "principle" with worst performance SQL statements evolutionary process Guest

Last Update:2015-03-04 Source: Internet

Author: User

Tags mssql

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the original: Toward Dba[mssql Article] high-efficiency stored procedures for large tables "principle" with the worst performance SQL statement evolutionary process Guest

The results of the test are here to explain the principles of this article

Design background

Due to historical reasons, the online library environment data volume and its huge, many tens above even billions of tables. The goal is to have N-sheets of interconnected tables follow a source table as the base table, and the data is moved to archive here we example N for 50 per table data 5000W

Worst Performance SQL Evolutionary cameo

2 table keyname field meaning name, etc. same as the data from the BUG01 table, the first 500 not in the BUG02 table

Worst Performance:

Select TOP a.keyname from Bug01 a left joins bug02 b on a.keyname = B.keyname WHERE (a.keyname not in (select DISTINCT B.keyname from Bug02), ORDER by a.keyname ASC

The evolutionary body was unveiled at the end.

Detailed design

Problem point: Performance security fault tolerance

Why the process is so designed the following article explains

STEP.1 source table Data filtering

This part has nothing to say. Set different filtering rules according to your own business scenario

STEP.2 source table Data copy

The entry point of the program must be the source table, and the contents of the extension table are expanded with the source table as key. So how does this unfolding process work.

Let's start by identifying some concepts, which are the hierarchical relationships in the 50 tables. There are only 10 tables that may be associated directly with the source table key key.

For example, I count all the library details in the city, then we use the library as the source table. Library related bookshelf, address, member information. So this 3 information we are divided into a level table.

Bookshelf related Books category, address associated Street information, member associated user borrowing information, then the back of 3 we continue to be divided into two levels table, ... Continue to expand according to the scene.

Scenario 1: Using the cursor loop source table to process key-related data based on the source table key value Suppose we don't batch 500-hop source table data

That is, according to the library ID, traverse all nodes. Let's say we do not divide the level two three level table, is the first table our insert operation number is 500*50. Select operation same amount of data

This is certainly not a pleasure to anyone, and it's harder to imagine if you go through Level 2 Table 3.

Scenario 2: Set the source table key data, save the variable, and then use the in expression. Seems feasible. Reduce the number of operations directly to 1/500. But here's one of the scariest questions.

Variables have length, such as varchar maximum length cannot exceed 65535.

Scenario 3: The source table key is made into a query filter pool (the SQL Where condition statement, relative to the bottom of the first-level table, will be described in more detail below) we seem to have increased the number of operations compared to the second scenario.

Insert Operation 50, regardless of hierarchy. Select operation 50*2 can be accepted.

Scenario 3 Extension: 50 times for a large table is not an optimistic number, and this 50 is likely to become 500,5000,50000.

One more problem is that when you operate these 500, there may be data disturbances, and the 500 items you get 1 seconds ago are not necessarily 1 seconds behind.

So take a temp table strategy.

 CREATE TABLE #p (OrderID varchar, primary key (OrderID)); Set @temp_text = ' INSERT into #p ' [Email protected]--print @temp_textEXEC (@temp_text) Set @KeyText = ' SELECT OrderID From #p '--if the first level table is associated with more operations then you can visit the source table operation to replace the physical table set @SubKeyText = ' Select first table _a_ associated key from table _a with (nolock) where Table _a_ Association source table Key in (' + @KeyText + ') ' CREATE TABLE #q (OrderID varchar (), primary key (OrderID)); Set @temp_text = ' INSERT into #q ' [Email protected]exec (@temp_text) Set @SubKeyText = ' SELECT OrderID from #q '--If a level table is off Not many operations can be directly generated data filter pool Set @SubKeyTextforA = ' Select Level table _b_ is two-level association key from table _b with (nolock) where Table _b_ Association source table Key in (' + @KeyText + ') ' SET @SubKeyTextforB = ' Select Level table _c_ is two-level association key from table _c with (NOLOCK) where Level Table _C_ Association source table Key in (' + @KeyText + ') '--if there are more layer operations here you can follow Continuation of the associated Resources Filter pool demo only do three layer set @THKeyTextforA = ' Select two Level table _a_ is three level association key from two level table _a with (NOLOCK) where two level table _a_ association level table key in (' + @SubKeyTextf ora+ ') '

--STEP.3 Sub-table archiving operations

The problem with this link is how security transactions control how large a transaction is, how fault tolerance is measured, and how the program can be scaled to be maintainable.

You distinguish your own batch range according to the business scenario take the bug this demo to say 50 Tens large table if it is a batch of more than 5,000 transactions to be placed in the inner layer processing if it is 5,000 below can be placed in the outermost

The size of a transaction directly affects performance fluctuations

Fault Tolerant solutions You can also design bugs by themselves programmers using the Third Class table exception table to reset failed to insert the next batch directly on the filter

--Put the wrong batch order number into the exception table insert into exception table (@ExTable) SELECT OrderID from #p [email protected] to store exception data if there is an error in the current batch The batch order information will be put into the next batch of @extable to filter the data and then execute set @KeyText = ' SELECT TOP ' +cast (@SynSize as VARCHAR) + ' [email protected]_key+ ' From +

How to make the program beautiful and maintainable

We can also use the idea of interviewing objects in stored procedures except that the stored procedure has no such concept for us so we might as well design our own

With what or a temp table?

--first level directly associated with the primary key of the source table or the two-level associated main Table insert into #k values (' Level table _a ', @Base_Key, @KeyText, ')--first level table _ainsert into #k VALUES (' Level table _b ', @Base _key, @KeyText, ")--first level table _binsert into #k VALUES (' Level table _c ', @Base_Key, @KeyText, ')--level table _c--level two rule Indirect Association [email protected] Keytext related INSERT into #k values (' Level two table _a ', ' Level two table _a_ Association primary key ', @SubKeyText, ')--two levels table _ainsert into #k VALUES (' Two level table _b ', ' two level table _b_ ') Association primary Key ', @SubKeyText, ')--level two table _binsert into #k VALUES (' Two level table _c ', ' Level two table _c_ Association primary key ', @SubKeyText, ')--two level table _c--special handling-- Custom action INSERT into #k values (' Special table ', ' Special Table Association key ', ' custom data filtering method ', ')--other self-increment processing--Modify order, and its cancel modify Order Status history table insert into #k values (' self-added table ', @ Base_key, @KeyText, ' Custom Fields ')

--STEP.4 Processing Details

Cursor Loop temporary table operation once for each table

DECLARE cur_orderheder insensitive CURSOR for SELECT tablename,keyname,temptext,colname from #k OPEN cur_orderhederfetch Cur_orderheder into @Cur_Table, @Cur_Key, @Cur_W, @Cur_KWHILE @ @FETCH_STATUS = 0BEGIN EXECUTE p_task_sub_synchronization @OutParam  = @OutParam out, @OutMessage = @OutMessage out, @KeyText =  @Cur_W, @Table = @Cur_Table, @[email protected ],@[email protected],@[email protected]_key,@[email protected]_k--set @OutMessage = @[email protected]--print @ Outmessage IF @OutParam <> 0   beginset @OutMessage = @OutMessage + @Cur_Table + ' operation failed ' ROLLBACK tran--the wrong batch order number into the exception Table insert into Exception table (@ExTable) SELECT OrderID from #pDROP table #k drop table #p drop table #qRETURN END FETCH Cur_orderheder Into @Cur_Table, @Cur_Key, @Cur_W, @Cur_KENDClOSE cur_orderhederdeallocate cur_orderheder

--STEP.5 Resource Release

--STEP.6 Process Processing

These 2 parts don't go into detail.

Worst-performing SQL evolutionary process

STEP.1 not in the Distinc distinc and not is the infamous role of not in after +dinstinc Lily.

Post-Change sql:

SELECT TOP A.keyname from Bug01 a left JOIN bug02 b on a.keyname = B.keyname
WHERE (A.keyname not in (select B.keyname from bug02))
ORDER by a.keyname ASC

STEP.2 alias do not underestimate the alias to speak the original SQL plan

Post-Change sql:

SELECT TOP A.keyname from Bug01 a left JOIN bug02 b on a.keyname = B.keyname
WHERE (A.keyname not in (select c.keyname from bug02 C))
ORDER by a.keyname ASC

STEP.3 Why to use the external direct filter No, I got it.

Post-Change sql:

SELECT TOP A.keyname from Bug01 a
WHERE (A.keyname not in (select C.keyname from bug02 c))
ORDER by a.keyname ASC

STEP.4 to evolve a direct except based on the advice of Luofer classmates.

SELECT TOP a.keyname from bug01 a except
SELECT B.keyname from Bug02 b

In this article, we welcome you to discuss

Toward dba[mssql] High efficiency stored procedures for large tables "principle" with worst performance SQL statements evolutionary process Guest

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More