In the original: Toward Dba[mssql Article] high-efficiency stored procedures for large tables "principle" with the worst performance SQL statement evolutionary process Guest
The results of the test are here to explain the principles of this article
Design background
Due to historical reasons, the online library environment data volume and its huge, many tens above even billions of tables. The goal is to have N-sheets of interconnected tables follow a source table as the base table, and the data is moved to archive here we example N for 50 per table data 5000W
Worst Performance SQL Evolutionary cameo
2 table keyname field meaning name, etc. same as the data from the BUG01 table, the first 500 not in the BUG02 table
Worst Performance:
Select TOP a.keyname from Bug01 a left joins bug02 b on a.keyname = B.keyname WHERE (a.keyname not in (select DISTINCT B.keyname from Bug02), ORDER by a.keyname ASC
The evolutionary body was unveiled at the end.
Detailed design
Problem point: Performance security fault tolerance
Why the process is so designed the following article explains
STEP.1 source table Data filtering
This part has nothing to say. Set different filtering rules according to your own business scenario
STEP.2 source table Data copy
The entry point of the program must be the source table, and the contents of the extension table are expanded with the source table as key. So how does this unfolding process work.
Let's start by identifying some concepts, which are the hierarchical relationships in the 50 tables. There are only 10 tables that may be associated directly with the source table key key.
For example, I count all the library details in the city, then we use the library as the source table. Library related bookshelf, address, member information. So this 3 information we are divided into a level table.
Bookshelf related Books category, address associated Street information, member associated user borrowing information, then the back of 3 we continue to be divided into two levels table, ... Continue to expand according to the scene.
Scenario 1: Using the cursor loop source table to process key-related data based on the source table key value Suppose we don't batch 500-hop source table data
That is, according to the library ID, traverse all nodes. Let's say we do not divide the level two three level table, is the first table our insert operation number is 500*50. Select operation same amount of data
This is certainly not a pleasure to anyone, and it's harder to imagine if you go through Level 2 Table 3.
Scenario 2: Set the source table key data, save the variable, and then use the in expression. Seems feasible. Reduce the number of operations directly to 1/500. But here's one of the scariest questions.
Variables have length, such as varchar maximum length cannot exceed 65535.
Scenario 3: The source table key is made into a query filter pool (the SQL Where condition statement, relative to the bottom of the first-level table, will be described in more detail below) we seem to have increased the number of operations compared to the second scenario.
Insert Operation 50, regardless of hierarchy. Select operation 50*2 can be accepted.
Scenario 3 Extension: 50 times for a large table is not an optimistic number, and this 50 is likely to become 500,5000,50000.
One more problem is that when you operate these 500, there may be data disturbances, and the 500 items you get 1 seconds ago are not necessarily 1 seconds behind.
So take a temp table strategy.
CREATE TABLE #p (OrderID varchar, primary key (OrderID)); Set @temp_text = ' INSERT into #p ' [Email protected]--print @temp_textEXEC (@temp_text) Set @KeyText = ' SELECT OrderID From #p '--if the first level table is associated with more operations then you can visit the source table operation to replace the physical table set @SubKeyText = ' Select first table _a_ associated key from table _a with (nolock) where Table _a_ Association source table Key in (' + @KeyText + ') ' CREATE TABLE #q (OrderID varchar (), primary key (OrderID)); Set @temp_text = ' INSERT into #q ' [Email protected]exec (@temp_text) Set @SubKeyText = ' SELECT OrderID from #q '--If a level table is off Not many operations can be directly generated data filter pool Set @SubKeyTextforA = ' Select Level table _b_ is two-level association key from table _b with (nolock) where Table _b_ Association source table Key in (' + @KeyText + ') ' SET @SubKeyTextforB = ' Select Level table _c_ is two-level association key from table _c with (NOLOCK) where Level Table _C_ Association source table Key in (' + @KeyText + ') '--if there are more layer operations here you can follow Continuation of the associated Resources Filter pool demo only do three layer set @THKeyTextforA = ' Select two Level table _a_ is three level association key from two level table _a with (NOLOCK) where two level table _a_ association level table key in (' + @SubKeyTextf ora+ ') '
--STEP.3 Sub-table archiving operations
The problem with this link is how security transactions control how large a transaction is, how fault tolerance is measured, and how the program can be scaled to be maintainable.
You distinguish your own batch range according to the business scenario take the bug this demo to say 50 Tens large table if it is a batch of more than 5,000 transactions to be placed in the inner layer processing if it is 5,000 below can be placed in the outermost
The size of a transaction directly affects performance fluctuations
Fault Tolerant solutions You can also design bugs by themselves programmers using the Third Class table exception table to reset failed to insert the next batch directly on the filter
--Put the wrong batch order number into the exception table insert into exception table (@ExTable) SELECT OrderID from #p [email protected] to store exception data if there is an error in the current batch The batch order information will be put into the next batch of @extable to filter the data and then execute set @KeyText = ' SELECT TOP ' +cast (@SynSize as VARCHAR) + ' [email protected]_key+ ' From +
How to make the program beautiful and maintainable
We can also use the idea of interviewing objects in stored procedures except that the stored procedure has no such concept for us so we might as well design our own
With what or a temp table?
--first level directly associated with the primary key of the source table or the two-level associated main Table insert into #k values (' Level table _a ', @Base_Key, @KeyText, ')--first level table _ainsert into #k VALUES (' Level table _b ', @Base _key, @KeyText, ")--first level table _binsert into #k VALUES (' Level table _c ', @Base_Key, @KeyText, ')--level table _c--level two rule Indirect Association [email protected] Keytext related INSERT into #k values (' Level two table _a ', ' Level two table _a_ Association primary key ', @SubKeyText, ')--two levels table _ainsert into #k VALUES (' Two level table _b ', ' two level table _b_ ') Association primary Key ', @SubKeyText, ')--level two table _binsert into #k VALUES (' Two level table _c ', ' Level two table _c_ Association primary key ', @SubKeyText, ')--two level table _c--special handling-- Custom action INSERT into #k values (' Special table ', ' Special Table Association key ', ' custom data filtering method ', ')--other self-increment processing--Modify order, and its cancel modify Order Status history table insert into #k values (' self-added table ', @ Base_key, @KeyText, ' Custom Fields ')
--STEP.4 Processing Details
Cursor Loop temporary table operation once for each table
DECLARE cur_orderheder insensitive CURSOR for SELECT tablename,keyname,temptext,colname from #k OPEN cur_orderhederfetch Cur_orderheder into @Cur_Table, @Cur_Key, @Cur_W, @Cur_KWHILE @ @FETCH_STATUS = 0BEGIN EXECUTE p_task_sub_synchronization @OutParam = @OutParam out, @OutMessage = @OutMessage out, @KeyText = @Cur_W, @Table = @Cur_Table, @[email protected ],@[email protected],@[email protected]_key,@[email protected]_k--set @OutMessage = @[email protected]--print @ Outmessage IF @OutParam <> 0 beginset @OutMessage = @OutMessage + @Cur_Table + ' operation failed ' ROLLBACK tran--the wrong batch order number into the exception Table insert into Exception table (@ExTable) SELECT OrderID from #pDROP table #k drop table #p drop table #qRETURN END FETCH Cur_orderheder Into @Cur_Table, @Cur_Key, @Cur_W, @Cur_KENDClOSE cur_orderhederdeallocate cur_orderheder
--STEP.5 Resource Release
--STEP.6 Process Processing
These 2 parts don't go into detail.
Worst-performing SQL evolutionary process
STEP.1 not in the Distinc distinc and not is the infamous role of not in after +dinstinc Lily.
Post-Change sql:
SELECT TOP A.keyname from Bug01 a left JOIN bug02 b on a.keyname = B.keyname
WHERE (A.keyname not in (select B.keyname from bug02))
ORDER by a.keyname ASC
STEP.2 alias do not underestimate the alias to speak the original SQL plan
Post-Change sql:
SELECT TOP A.keyname from Bug01 a left JOIN bug02 b on a.keyname = B.keyname
WHERE (A.keyname not in (select c.keyname from bug02 C))
ORDER by a.keyname ASC
STEP.3 Why to use the external direct filter No, I got it.
Post-Change sql:
SELECT TOP A.keyname from Bug01 a
WHERE (A.keyname not in (select C.keyname from bug02 c))
ORDER by a.keyname ASC
STEP.4 to evolve a direct except based on the advice of Luofer classmates.
SELECT TOP a.keyname from bug01 a except
SELECT B.keyname from Bug02 b
In this article, we welcome you to discuss
Toward dba[mssql] High efficiency stored procedures for large tables "principle" with worst performance SQL statements evolutionary process Guest