Ways to remove duplicate data from a database using SQL

Source: Internet
Author: User
Tags repetition how to use sql

/***********************************************
Duplicate records in two meanings:
1. is a completely duplicate record, that is, all fields are duplicates of the record,
2. is a duplicate record of some key fields, such as username field repetition,
Other fields do not have to be repeated or repeated and can be omitted, such repetition
Issues typically require that the first record in a duplicate record be retained
************************************************/

/*1. Data is completely duplicated (with a temporary table #tmp) */
CREATE TABLE Admin1 (
[Username] [nvarchar] (COLLATE) Chinese_prc_ci_as NULL,
[Password] [nvarchar] (COLLATE) Chinese_prc_ci_as NULL
) on [PRIMARY]

INSERT into Admin1 (Username,password) VALUES (' Liyan ', ' 111 ')
INSERT into Admin1 (Username,password) VALUES (' Liyan ', ' 111 ')
INSERT into Admin1 (Username,password) VALUES (' Liyan ', ' 222 ')
INSERT into Admin1 (Username,password) VALUES (' Liyan ', ' 222 ')
INSERT into Admin1 (Username,password) VALUES (' Liyan ', ' 333 ')
SELECT * FROM Admin1

SELECT DISTINCT * to #Tmp from admin1
drop table Admin1
SELECT * Into Admin1 from #Tmp
drop table #Tmp
SELECT * FROM Admin1

/*2. A field is duplicated (2 temporary tables are used: TMP1,TMP2) */
CREATE TABLE [dbo]. [Admin] (
[ID] [int] IDENTITY (*) Not NULL,
[Username] [nvarchar] (COLLATE) Chinese_prc_ci_as NULL,
[Password] [nvarchar] (COLLATE) Chinese_prc_ci_as NULL,
CONSTRAINT [pk_admin] PRIMARY KEY CLUSTERED
(
[ID] ASC
) with (Ignore_dup_key = OFF) on [PRIMARY]
) on [PRIMARY]

INSERT into admin (username,password) VALUES (' Adminstrator ', ' 111 ')
Inserts into admin (Username,password) VALUES ( ' Adminstrator ', ' 111 ')
INSERT INTO admin (username,password) VALUES (' adminstrator ', ' 222 ')
INSERT INTO admin ( Username,password) VALUES (' Adminstrator ', ' 222 ')
INSERT into admin (username,password) VALUES (' adminstrator ', ' 333
INSERT INTO admin (username,password) VALUES (' Liyan ', ' 111 ')
Inserts into admin (Username,password) VALUES (' Liyan ', ' 111 ')
INSERT INTO admin (username,password) VALUES (' Liyan ', ' 222 ')
INSERT INTO admin (username,password) VALUES (' Liyan ', ' 222 ')
INSERT into admin (username,password) VALUES (' Liyan ', ' 333 ')

if exists (SELECT * from tempdb. sysobjects where id=object_id (' tempdb. #Tmp1 ')) drop table #Tmp1
Select ID as autoid, * into #Tmp1 from admin
if exists (SELECT * from tempdb. sysobjects where id=object_id (' tempdb. #Tmp2 ')) drop table #Tmp2
Select min (autoid) as autoid into #Tmp2 from #Tmp1 Group by Username,password
if exists (SELECT * from dbo.sysobjects WHERE id = object_id (N ' admin ') and objectproperty (Id,n ' isusertable ') = 1) Drop tabl E admin
Select Id,username,password to admin from #Tmp1 where autoid in (select Autoid from #tmp2)

=====================================================

The following is an article referring to the Internet (http://tb.blog.csdn.net/TrackBack.aspx?PostId=1530926)

about how to use SQL to remove duplicate data from a database:

1. Data is completely duplicated

Transition through a temporary table
INSERT INTO Table1 SELECT distinct field from table
drop TABLE Table
INSERT INTO table select * FROM table1

2. Duplicate a field

This online has a lot of relevant solutions, the more common there are
Delete from table where ID not in (the Select min (id) from table group by name)
Delete from table where field in (Select field from Table group By field has count (*) > 1)

The above method is also useful when deleting a small order of data, and when the data is processed by hundreds of thousands of or more, the general machine estimates that it will be paid as soon as it runs. In fact, a little bit of common sense is to know that such a statement will have a large amount of computation, and its operation is at least in the form of a exponentiation, think about terror.

I am here mainly to give a large number of table data deduplication solution, in fact, is also very simple, but also the use of a transition table to achieve
Insert in Tabletemp SELECT * FROM table
Delete from table as a where a.id > (select min (b.id) from table1 as B where B.field=a.field)
drop table Tabletemp
This makes use of the advantages of the index of the database, greatly reducing the computational capacity

========================================

How SQL Deletes duplicate rows of data--

Delete from table where ID in (
Select Max (ID) from table group by name has COUNT (*) >1
--Delete the maximum number of IDs in duplicate records (if more than 2 duplicate records are executed multiple times)

If the table data is exactly the same, you can import the data into a temporary table first
Or

Delete from table where ID not in (
Select min (id) from table group by name
)--Keep only the first one of the duplicate records (one with the smallest ID)

It's too little. Add something else.

CREATE PROCEDURE Stored Procedure name--Execute dynamic SQL statement
(
@num int
)
As
declare @string nvarchar (100)
Set @string = ' SELECT TOP ' + CAST (@num as nvarchar) + ' * FROM table name '
EXEC (@string)

=====================================

Select Id,name from house1 where Name= ' Zhong Kai ' and roomtype= ' double Room ' and startdate>= ' 2007-5-25 ' and ID in (select min (id) from Group by name)

Name is the same when the ID appears small

======================

The general is distinct, group by, #tempTable,

Of course, using index will be faster.

Ways to remove duplicate data from a database using SQL

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.