Use SQL statements to delete duplicate data in a data table

Source: Internet
Author: User

During the internship, a large amount of data was collected using the "thief" program, but many of them were repeated.

Delete the duplicate data (keep one record) according to the practice requirements, and delete several data tables

And put the data in a data table. Based on your own experience, let's take a few points.

1. Merge data tables

There is a select into statement in the SQL statement.

Copy or archive records), for example: Select column_name (s) into newtable

[In externaldatabase] from source, which can assign values to data in one table to another

Table. Note that "newtable" does not exist.

I tried it. For some reason, I cannot insert a new data table. Report

Output "#1327-undeclared variable: newtable", and cannot be merged into one table

Data table.

The other is (1) creating a table with the same structure as the source table.
(2) Insert into newtable (column_name (s) Select

Distinct column_name (s) from source
I added an unqiue option to a newtable field during insertion.

You can do it after you drop it. The fields in the two tables are identical.

2. Delete duplicate fields

Many methods on the Internet are complicated. Use the distinct field in SQL. Select

Distinct * From sourse, the option of data duplication will be deleted. It should be noted that: heavy

The duplicate data is exactly the same, because each entry with an ID (auto_increment) has a separate ID, such

The data is different.

Some online materials:

There are two Repeated Records. One is a completely repeated record, that is, a record with all fields repeated,

Second, some records with duplicate key fields, such as duplicate name fields, are not necessarily duplicated or both

Repeat can be ignored.

1. For the first type of repetition, it is easier to solve.

Select distinct * From tablename

You can get the result set without repeated records.

If the table needs to delete duplicate records (one record is retained), you can delete the record as follows:

Select distinct * into # TMP from tablename
Drop table tablename
Select * into tablename from # TMP
Drop table # TMP

The reason for this repetition is that the table design is not weekly. You can add a unique index column.

2. Repeat problems usually require that the first record in the repeat record be retained. The procedure is as follows:

Assume that the duplicate fields are Name and address. You must obtain the unique result set of the two fields.

Select Identity (INT, 1, 1) as autoid, * into # TMP from tablename
Select min (autoid) as autoid into # tmp2 from # TMP group by name, autoid
Select * from # TMP where autoid in (select autoid from # tmp2)

The last select gets the result set with the name and address unique (but with an autoid added ).

Field. This column can be omitted in the select clause during actual writing)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.