Common SQL Problems:
Select duplicates, eliminate duplicates, and select out sequences
Example of a table: EMP
Emp_no name Age
001 Tom 17
002 Sun 14
003 Tom 15
004 Tom 16
Requirements:
List the records of all the people whose names are duplicated
(1) The most intuitive way: to know that all names have duplicate person information, you must first know which name repeats:
Select name from emp
Group BY name
Having Count (*) >1
The records of all name duplicates are:
SELECT * FROM emp
where
Name in (
Select name from emp
Group BY name
Having Count (*) >1
)
(2) A little more clever, it will be thought, if each name is compared to the original table, more than 2 people name and this record is the same is qualified, there are
SELECT * FROM emp
where
(SELECT COUNT (*) from EMP
e where e.name=emp.name)
>1
--Pay attention to this >1, think about if it is = 1, if it is =2 if it is >2 if E is another table and is =0 the result is even more fun:)
The process is to get 001 First name (emp.name) and then compare it to the name of the original table when judging the number of 001 people e.name
Note that E is an alias for an EMP.
To think a little more, you would think that if there is another name with the same artificial number is not the same as her then this record meets the requirements:
SELECT * FROM emp
where exists
(SELECT * from emp e where
E.name=emp.name and E.emp_no<>emp.emp_no)
The join notation for this idea:
Select emp.*
From Emp,emp E
where
Emp.name=e.name and Emp.emp_no<>e.emp_no
/*
This statement is a more canonical join notation
Select emp.*
From EMP INNER JOIN EMP E
On
Emp.name=e.name and Emp.emp_no<>e.emp_no
But personal comparison tends to the former, the key is clearer
*/
b, a case table: EMP
Name age
Tom 16
Sun 14
Tom 16
Tom 16
Requirements:
Filter out all the extra duplicate records.
(1) We know that distinct and group by can filter and repeat, so there is the most intuitive
SELECT DISTINCT * from emp
Or
Select Name,age from EMP GROUP by Name,age
Get the data you need, if you can use a temporary table there is a solution:
SELECT DISTINCT * into #tmp from EMP
Delete from emp
INSERT INTO the EMP select * FROM #tmp
(2) But what if you can't use a temporary table?
We observe that we have no way of distinguishing between data (physical location, there is no difference to SQL Server, the idea is to find a way to distinguish the data, since all the columns are not able to distinguish between the data, the only way is to add a column to differentiate it, add what column? The best option is the identity column:
ALTER TABLE EMP add CHK int identity (1,1)
Example of a presentation:
Name Age Chk
Tom 16 1
Sun 14 2
Tom 16 3
Tom 16 4
Duplicate records can be expressed as:
SELECT * FROM emp
where
(SELECT COUNT (*) from EMP e where e.name=emp.name) >1
To be removed are:
Delete from emp
where
(SELECT COUNT (*) from EMP e where
E.name=emp.name and E.chk>=emp.chk) >1
The added column is deleted and the result appears.
ALTER TABLE EMP Drop column CHK
(3) Another idea:
View
Select min (chk)
From EMP
Group BY name
Having Count (*) >1
To obtain the minimum value of a duplicate record chk, so you can
Delete
From EMP
where
Chk not in
(
Select min (chk)
From EMP
Group BY name
)
The form of a join can also be:
(1) A case table: EMP
Emp_no name Age
001 Tom 17
002 Sun 14
003 Tom 15
004 Tom 16
Require serial number generation
(1) The simplest method, according to the solution of the B problem:
ALTER TABLE EMP add CHK int identity (1,1)
or select *,identity (int,1,1) chk into #tmp from EMP
What if you need a control order?
Select top 100000 *,identity (int,1,1)
Chk into #tmp from EMP
(2) What if the table structure cannot be changed?
If it is not possible to uniquely differentiate each record, you can use the count in a to solve the problem when you can uniquely differentiate each record.
Select Emp.*, (select COUNT (*) from
EMP e where e.emp_no<=emp.emp_no)
From EMP
Order BY (SELECT COUNT (*) from
EMP e where e.emp_no<=emp.emp_no)