Use the sql05 feature to delete duplicate data in a table

Source: Internet
Author: User

Problem:A table has an auto-increment ID column, and some records in the table are duplicated. That is to say, these records have the same information except the IDS. Retained one record and deleted the remaining record.

 

Generally, developers can write two versions here.

 

Version 1:Because the record has an auto-increment column, the auto-increment column can be used as the unique identifier of the record. Therefore, the auto-increment ID of the duplicate record is an incremental relationship. Here we can keep only the record with the smallest ID, delete all others. Using a nested statement, you can easily write the following SQL statement. Sname and saddress indicate all columns except ID.

 

Delete from
Where id not in (select Min (ID)
From
Group by sname,
Saddress)

 

Version 2:Make full use of several practical features of sql05. Here we will briefly introduce several features to be used. For detailed usage, you can search online.

 

1: row_number, which is used to generate row numbers. The default value is from 1.

2: The common table expression (CTE). I will not use its recursion here, but use it to simplify nested queries and reference the table itself. The syntax of CTE is as follows:

 

[ < Common_table_expression > [, N]
< Common_table_expression > :: =
Expression_name [(column_name [, N])]
As
(Cte_query_definition)

 

 

Note: 1> in someProgramEquivalent to the table variable or temporary table function. However, compared with table variables, its biggest advantage is its reference. The CTE statement is followed by select, update, delete, and so on. The operation results will directly reflect the actual physical table. Compared with a temporary table, the biggest advantage is performance. A temporary table is actually a physical table. When operating on it, additional Io overhead and management overhead are generated.

2> the CTE syntax must be followed by the Select, update, and delete statements related to the use of the CTE. Otherwise, the CTE becomes invalid. The following statements are incorrect:

 

Code

With B as (select row_number () over (partition by sname, saddress order by sname, saddress) as RN,
*
From
)
Delete from B
Where Rn >   1
Select * From
Select * From B where Rn > 1

 

3: partition by, partition function. Different from Aggregate functions, an aggregate function can return multiple records in a group. Generally, an aggregate function has only one record that reflects the statistical value. partition by is used to group the result set, if this parameter is not specified, the entire result set is used as a group.

 

After the introduction of the three keywords above, the results after the combination of the three keywords are given below.

 

Code

With B as (select row_number () over (partition by sname, saddress order by sname, saddress) as RN,
*
From
)
Delete from B
Where Rn >   1

 

 

Comparison between version 1 and Version 2:

1: Version 2 is easier to read.

2: Version 2 has better performance than Version 1. We can look at it with information. We can see that the version has been scanned twice.

 

Code

Table ' A ' . Scan count 2 , Logical reads 4 , Physical reads 0 , Read - Ahead reads 0 , Lob logical reads 0 , Lob physical reads 0 , Lob read - Ahead reads 0 .
Table ' Worktable ' . Scan count 1 , Logical reads 0 , Physical reads 0 , Read - Ahead reads 0 , Lob logical reads 0 , Lob physical reads 0 , Lob read - Ahead reads 0 .

( 0 Row (s) affected)
Table ' A ' . Scan count 1 , Logical reads 2 , Physical reads 0 , Read - Ahead reads 0 , Lob logical reads 0 , Lob physical reads 0 , Lob read - Ahead reads 0 .

(0Row (s) affected)

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.