13th Chapter-Table and Index partitioning (2)--using split to delete and load big data

Source: Internet
Author: User

Original: 13th--table and index partition (2)--use split to delete and load big data

Objective:

Many times you need to archive or delete large data volumes, and load large amounts of data periodically into a large table, now for a simple example, you often need to delete large amounts of data from the big Data scale. At the same time, you want to load a lot of data into this table, and when the data in the table is billions of, this operation can take several hours, but if your table has partitions, it will work.

This article simulates deleting a quarter of data and loads the entire quarter into an existing table, which uses split (splitting), merge (merging), and toggle partitioning (switching).

Suppose we need to store quarterly data and store it in one partition per quarter. You now need to clear the longest quarter and load the latest quarterly data, such as the partition contains:

1, Quarter 1: January 2011 to March 2011

2, Quarter 2: April 2011 to June 2011

3, Quarter 3: July 2011 to September 2011

4, Quarter 4: October 2011 to December 2011

Now you need to delete the quarter 1 and load the first quarter of 2012 data into a new partition. To achieve this, you can use the date and time columns to create partitions for each quarter. You can merge two partitions or split existing partitions.

This article will demonstrate the use of a sliding window implementation, in the sliding window, you can clear the longest partition.

Steps:

1. Open SQL Server

2. Execute the following script, create a partition function, and use range right, and then create a partition schema:

Use Sample_dbuse mastergoif db_id (' sample_db ') are not NULL     DROP database sample_dbcreate database sample_db on PRIMARY (Name=n ' sample_db ', filename=n ' C:\SQLData\Sample_DB.mdf ', size=3072kb,filegrowth=1024kb) LOG on (NAME =n ' Sample_db_log ', filename=n ' C:\SQLData\Sample_DB_log.ldf ', size=1024kb, filegrowth=10%) Gouse Sample_ Dbgocreate PARTITION FUNCTION pf_quaterly_rangeright (DATETIME) as RANGE right for VALUES (' 20110101 ', ' 20110401 ', ' 20110701 ', ' 20111001 ', ' 20120101 ') gocreate PARTITION SCHEME ps_quaterly_rangerightas PARTITION pf_quaterly_rangeright All to ([PRIMARY]) GO

Note: When a partition column is a datetime type, you should always try to use range right, which makes partitioning better, and if you use range left, you must consider a specific time part, including milliseconds.

3. Create a table tbl_mydata and insert 275000 rows of data:

Use Sample_dbgocreate TABLE tbl_mydata (recorddatetime DATETIME isn't null, RecordID INT NOT NULL, Reco Rddata VARCHAR (+) not NULL) gocreate CLUSTERED INDEX idx_tbl_mydata_recorddatetimeon tbl_mydata (Recorddatetime,record ID) on Ps_quaterly_rangeright (recorddatetime) INSERT to Tbl_mydata SELECT ' + ' + Right (' 0 ' + CAST (if                                            ID% = 0 then ELSE ID% 12                                                              END) as VARCHAR), 2 + right (' 0 ' + CAST (case                                                              When ID% 28 = 0                                                              Then, ELSE ID% 28 END) as VARCHAR), 2) as Recorddat ETime, ID , RecordData from (SELECT TOP 275000 ID = Row_number () over (ORDER                            by c1.name), RecordData = NEWID () from Sys.columns as C1 Cross joins Sys.columns as C2 Cross join Sys.columns as C3) as TGO


4. Execute the following statement to verify the number of partitions:

Use Sample_dbgoselect  partition_number,        rowsfrom    sys.partitionswhere   object_id = object_id (' Tbl_ MyData ') ORDER by Partition_number


5. The results are as follows:


6. Now remove data from the first quarter of 2011, which is partition 1:

Use Sample_dbgoif object_id (' tbl_mystagingdata ') are not NULL     DROP table tbl_mystagingdatacreate table Tbl_ Mystagingdata    (      recorddatetime DATETIME not NULL,      RecordID int. NOT NULL,      recorddata VARCHAR (+) not NUL L    ) gocreate CLUSTERED INDEX idx_tbl_mystagingdata_recorddatetime on Tbl_mystagingdata (Recorddatetime,recordid) GO


7. Now, switch the Tbl_mydata partition 2 to the staging table Tbl_mystagingdata and empty the staging table. After emptying, check the partition data:

Use Sample_dbgoalter table Tbl_mydataswitch PARTITION 2 to tbl_mystagingdata PARTITION 1GOTRUNCATE table Tbl_mystagingdat Agoselect  partition_number,        rowsfrom    sys.partitionswhere   object_id = object_id (' Tbl_mydata ') ORDER by Partition_number


8, the result is visible, the partition has been emptied:


9. Now, merge partition 2 with partition 1, set the next filegroup using partition schema to primary and verify the data:

Use Sample_dbgoalter PARTITION FUNCTION pf_quaterly_rangeright () MERGE RANGE (' 20110101 ') goalter PARTITION SCHEME Ps_ Quaterly_rangerightnext used [Primary]select  partition_number,        rowsfrom    sys.partitionswhere   object_id = object_id (' Tbl_mydata ') ORDER by Partition_number


10. The results are as follows:


11, now load the data to Tbl_mydata, in order to achieve this step, you need to generate some data and insert into the staging table Tbl_mystagingdata:

Use Sample_dbgoalter TABLE tbl_mystagingdataadd CONSTRAINT Ck_tbl_mystagingdata_recorddatetimecheck (RecordDateTime  >= ' 20120101 ' and recorddatetime< ' 20120401 ') INSERT into tbl_mystagingdata SELECT ' + ' + Right (' 0 ' + CAST (                                            Case if ID% 3 = 0 Then 3 ELSE ID% 3                                                              END) as VARCHAR), 2 + right (' 0 ' + CAST (case                                                              When ID% 28 = 0                                                              Then, ELSE ID% 28 END) as VARCHAR), 2) as R                            Ecorddatetime, ID, RecordData from (SELECT TOP 275000 ID = Row_number () over (OrdeR by c1.name), RecordData = NEWID () from Sys.columns as C1 Cross joins Sys.columns as C2 Cross join Sys.columns as C3) as TG O



12, now by splitting the last empty partition to create a new partition, after splitting, the partition tbl_mystagingdata switch to Tbl_mydata partition 5, and set the next partition schema using primary:

Use Sample_dbgoalter PARTITION FUNCTION pf_quaterly_rangeright () SPLIT RANGE (' 20120401 ') goalter TABLE Tbl_ Mystagingdataswitch PARTITION 1 to tbl_mydata PARTITION 5ALTER PARTITION SCHEME ps_quaterly_rangerightnext used [PRIMARY] GO




13, now to verify the data:

Use Sample_dbgoselect  partition_number,        rowsfrom    sys.partitionswhere   object_id = object_id (' Tbl_ MyData ') ORDER by Partition_number



14. The results are as follows:

Analysis:

This article uses the CreatePartition function command, first creating a partition function Pf_quaterly_rangeright, by defining the recorddatetime to do the partitioning column, so that 20110101~ 20120101 to partition. Then create the partition schema Ps_quaterly_rangeright using the Create PARTITION Scheme command, mapping all partitions to primary filegroups. The data in the partition is processed by some column operations.

Extended Knowledge:

This article shows a simple example of how a recurring implementation can be configured in a production environment. The point is that the script should be intelligent and dynamic enough.

13th Chapter-Table and Index partitioning (2)--using split to delete and load big data

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.