Toward dba[mssql] High efficiency stored procedures for large tables "principle" with worst performance SQL statements evolutionary process Guest

Source: Internet
Author: User
Tags mssql

In the original: Toward Dba[mssql Article] high-efficiency stored procedures for large tables "principle" with the worst performance SQL statement evolutionary process Guest

The results of the test are here to explain the principles of this article

Design background

Due to historical reasons, the online library environment data volume and its huge, many tens above even billions of tables. The goal is to have N-sheets of interconnected tables follow a source table as the base table, and the data is moved to archive here we example N for 50 per table data 5000W

Worst Performance SQL Evolutionary cameo

2 table keyname field meaning name, etc. same as the data from the BUG01 table, the first 500 not in the BUG02 table

Worst Performance:

Select TOP a.keyname from Bug01 a left joins bug02 b on a.keyname = B.keyname WHERE (a.keyname not in (select DISTINCT B.keyname from Bug02), ORDER by a.keyname ASC

The evolutionary body was unveiled at the end.

Detailed design

Problem point: Performance security fault tolerance

Why the process is so designed the following article explains

STEP.1 source table Data filtering

This part has nothing to say. Set different filtering rules according to your own business scenario

STEP.2 source table Data copy

The entry point of the program must be the source table, and the contents of the extension table are expanded with the source table as key. So how does this unfolding process work.

Let's start by identifying some concepts, which are the hierarchical relationships in the 50 tables. There are only 10 tables that may be associated directly with the source table key key.

For example, I count all the library details in the city, then we use the library as the source table. Library related bookshelf, address, member information. So this 3 information we are divided into a level table.

Bookshelf related Books category, address associated Street information, member associated user borrowing information, then the back of 3 we continue to be divided into two levels table, ... Continue to expand according to the scene.

Scenario 1: Using the cursor loop source table to process key-related data based on the source table key value Suppose we don't batch 500-hop source table data

That is, according to the library ID, traverse all nodes. Let's say we do not divide the level two three level table, is the first table our insert operation number is 500*50. Select operation same amount of data

This is certainly not a pleasure to anyone, and it's harder to imagine if you go through Level 2 Table 3.

Scenario 2: Set the source table key data, save the variable, and then use the in expression. Seems feasible. Reduce the number of operations directly to 1/500. But here's one of the scariest questions.

Variables have length, such as varchar maximum length cannot exceed 65535.

Scenario 3: The source table key is made into a query filter pool (the SQL Where condition statement, relative to the bottom of the first-level table, will be described in more detail below) we seem to have increased the number of operations compared to the second scenario.

Insert Operation 50, regardless of hierarchy. Select operation 50*2 can be accepted.

Scenario 3 Extension: 50 times for a large table is not an optimistic number, and this 50 is likely to become 500,5000,50000.

One more problem is that when you operate these 500, there may be data disturbances, and the 500 items you get 1 seconds ago are not necessarily 1 seconds behind.

So take a temp table strategy.

 CREATE TABLE #p (OrderID varchar, primary key (OrderID)); Set @temp_text = ' INSERT into #p ' [Email protected]--print @temp_textEXEC (@temp_text) Set @KeyText = ' SELECT OrderID From #p '--if the first level table is associated with more operations then you can visit the source table operation to replace the physical table set @SubKeyText = ' Select first table _a_ associated key from table _a with (nolock) where Table _a_ Association source table Key in (' + @KeyText + ') ' CREATE TABLE #q (OrderID varchar (), primary key (OrderID)); Set @temp_text = ' INSERT into #q ' [Email protected]exec (@temp_text) Set @SubKeyText = ' SELECT OrderID from #q '--If a level table is off Not many operations can be directly generated data filter pool Set @SubKeyTextforA = ' Select Level table _b_ is two-level association key from table _b with (nolock) where Table _b_ Association source table Key in (' + @KeyText + ') ' SET @SubKeyTextforB = ' Select Level table _c_ is two-level association key from table _c with (NOLOCK) where Level Table _C_ Association source table Key in (' + @KeyText + ') '--if there are more layer operations here you can follow Continuation of the associated Resources Filter pool demo only do three layer set @THKeyTextforA = ' Select two Level table _a_ is three level association key from two level table _a with (NOLOCK) where two level table _a_ association level table key in (' + @SubKeyTextf ora+ ') ' 

--STEP.3 Sub-table archiving operations

The problem with this link is how security transactions control how large a transaction is, how fault tolerance is measured, and how the program can be scaled to be maintainable.

You distinguish your own batch range according to the business scenario take the bug this demo to say 50 Tens large table if it is a batch of more than 5,000 transactions to be placed in the inner layer processing if it is 5,000 below can be placed in the outermost

The size of a transaction directly affects performance fluctuations

Fault Tolerant solutions You can also design bugs by themselves programmers using the Third Class table exception table to reset failed to insert the next batch directly on the filter

--Put the wrong batch order number into the exception table insert into exception table (@ExTable) SELECT OrderID from #p [email protected] to store exception data if there is an error in the current batch The batch order information will be put into the next batch of @extable to filter the data and then execute set @KeyText = ' SELECT TOP ' +cast (@SynSize as VARCHAR) + ' [email protected]_key+ ' From +

How to make the program beautiful and maintainable

We can also use the idea of interviewing objects in stored procedures except that the stored procedure has no such concept for us so we might as well design our own

With what or a temp table?

--first level directly associated with the primary key of the source table or the two-level associated main Table insert into #k values (' Level table _a ', @Base_Key, @KeyText, ')--first level table _ainsert into #k VALUES (' Level table _b ', @Base _key, @KeyText, ")--first level table _binsert into #k VALUES (' Level table _c ', @Base_Key, @KeyText, ')--level table _c--level two rule Indirect Association [email protected] Keytext related INSERT into #k values (' Level two table _a ', ' Level two table _a_ Association primary key ', @SubKeyText, ')--two levels table _ainsert into #k VALUES (' Two level table _b ', ' two level table _b_ ') Association primary Key ', @SubKeyText, ')--level two table _binsert into #k VALUES (' Two level table _c ', ' Level two table _c_ Association primary key ', @SubKeyText, ')--two level table _c--special handling-- Custom action INSERT into #k values (' Special table ', ' Special Table Association key ', ' custom data filtering method ', ')--other self-increment processing--Modify order, and its cancel modify Order Status history table insert into #k values (' self-added table ', @ Base_key, @KeyText, ' Custom Fields ')

--STEP.4 Processing Details

Cursor Loop temporary table operation once for each table

DECLARE cur_orderheder insensitive CURSOR for SELECT tablename,keyname,temptext,colname from #k OPEN cur_orderhederfetch Cur_orderheder into @Cur_Table, @Cur_Key, @Cur_W, @Cur_KWHILE @ @FETCH_STATUS = 0BEGIN EXECUTE p_task_sub_synchronization @OutParam  = @OutParam out, @OutMessage = @OutMessage out, @KeyText =  @Cur_W, @Table = @Cur_Table, @[email protected ],@[email protected],@[email protected]_key,@[email protected]_k--set @OutMessage = @[email protected]--print @ Outmessage IF @OutParam <> 0   beginset @OutMessage = @OutMessage + @Cur_Table + ' operation failed ' ROLLBACK tran--the wrong batch order number into the exception Table insert into Exception table (@ExTable) SELECT OrderID from #pDROP table #k drop table #p drop table #qRETURN END FETCH Cur_orderheder Into @Cur_Table, @Cur_Key, @Cur_W, @Cur_KENDClOSE cur_orderhederdeallocate cur_orderheder

--STEP.5 Resource Release

--STEP.6 Process Processing

These 2 parts don't go into detail.

Worst-performing SQL evolutionary process

STEP.1 not in the Distinc distinc and not is the infamous role of not in after +dinstinc Lily.

Post-Change sql:

SELECT TOP A.keyname from Bug01 a left JOIN bug02 b on a.keyname = B.keyname
WHERE (A.keyname not in (select B.keyname from bug02))
ORDER by a.keyname ASC

STEP.2 alias do not underestimate the alias to speak the original SQL plan

Post-Change sql:

SELECT TOP A.keyname from Bug01 a left JOIN bug02 b on a.keyname = B.keyname
WHERE (A.keyname not in (select c.keyname from bug02 C))
ORDER by a.keyname ASC

STEP.3 Why to use the external direct filter No, I got it.

Post-Change sql:

SELECT TOP A.keyname from Bug01 a
WHERE (A.keyname not in (select C.keyname from bug02 c))
ORDER by a.keyname ASC

STEP.4 to evolve a direct except based on the advice of Luofer classmates.

SELECT TOP a.keyname from bug01 a except
SELECT B.keyname from Bug02 b

In this article, we welcome you to discuss

Toward dba[mssql] High efficiency stored procedures for large tables "principle" with worst performance SQL statements evolutionary process Guest

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.