Problem:A table has an auto-increment ID column, and some records in the table are duplicated. That is to say, these records have the same information except the IDS. Retained one record and deleted the remaining record.
Generally, developers can write two versions here.
Version 1:Because the record has an auto-increment column, the auto-increment column can be used as the unique identifier of the record. Therefore, the auto-increment ID of the duplicate record is an incremental relationship. Here we can keep only the record with the smallest ID, delete all others. Using a nested statement, you can easily write the following SQL statement. Sname and saddress indicate all columns except ID.
Delete from
Where id not in (select Min (ID)
From
Group by sname,
Saddress)
Version 2:Make full use of several practical features of sql05. Here we will briefly introduce several features to be used. For detailed usage, you can search online.
1: row_number, which is used to generate row numbers. The default value is from 1.
2: The common table expression (CTE). I will not use its recursion here, but use it to simplify nested queries and reference the table itself. The syntax of CTE is as follows:
[ < Common_table_expression > [, N]
< Common_table_expression > :: =
Expression_name [(column_name [, N])]
As
(Cte_query_definition)
Note: 1> in someProgramEquivalent to the table variable or temporary table function. However, compared with table variables, its biggest advantage is its reference. The CTE statement is followed by select, update, delete, and so on. The operation results will directly reflect the actual physical table. Compared with a temporary table, the biggest advantage is performance. A temporary table is actually a physical table. When operating on it, additional Io overhead and management overhead are generated.
2> the CTE syntax must be followed by the Select, update, and delete statements related to the use of the CTE. Otherwise, the CTE becomes invalid. The following statements are incorrect:
Code
With B as (select row_number () over (partition by sname, saddress order by sname, saddress) as RN,
*
From
)
Delete from B
Where Rn > 1
Select * From
Select * From B where Rn > 1
3: partition by, partition function. Different from Aggregate functions, an aggregate function can return multiple records in a group. Generally, an aggregate function has only one record that reflects the statistical value. partition by is used to group the result set, if this parameter is not specified, the entire result set is used as a group.
After the introduction of the three keywords above, the results after the combination of the three keywords are given below.
Code
With B as (select row_number () over (partition by sname, saddress order by sname, saddress) as RN,
*
From
)
Delete from B
Where Rn > 1
Comparison between version 1 and Version 2:
1: Version 2 is easier to read.
2: Version 2 has better performance than Version 1. We can look at it with information. We can see that the version has been scanned twice.
Code
Table ' A ' . Scan count 2 , Logical reads 4 , Physical reads 0 , Read - Ahead reads 0 , Lob logical reads 0 , Lob physical reads 0 , Lob read - Ahead reads 0 .
Table ' Worktable ' . Scan count 1 , Logical reads 0 , Physical reads 0 , Read - Ahead reads 0 , Lob logical reads 0 , Lob physical reads 0 , Lob read - Ahead reads 0 .
( 0 Row (s) affected)
Table ' A ' . Scan count 1 , Logical reads 2 , Physical reads 0 , Read - Ahead reads 0 , Lob logical reads 0 , Lob physical reads 0 , Lob read - Ahead reads 0 .
(0Row (s) affected)