During the internship, a large amount of data was collected using the "thief" program, but many of them were repeated.
Delete the duplicate data (keep one record) according to the practice requirements, and delete several data tables
And put the data in a data table. Based on your own experience, let's take a few points.
1. Merge data tables
There is a select into statement in the SQL statement.
Copy or archive records), for example: Select column_name (s) into newtable
[In externaldatabase] from source, which can assign values to data in one table to another
Table. Note that "newtable" does not exist.
I tried it. For some reason, I cannot insert a new data table. Report
Output "#1327-undeclared variable: newtable", and cannot be merged into one table
Data table.
The other is (1) creating a table with the same structure as the source table.
(2) Insert into newtable (column_name (s) Select
Distinct column_name (s) from source
I added an unqiue option to a newtable field during insertion.
You can do it after you drop it. The fields in the two tables are identical.
2. Delete duplicate fields
Many methods on the Internet are complicated. Use the distinct field in SQL. Select
Distinct * From sourse, the option of data duplication will be deleted. It should be noted that: heavy
The duplicate data is exactly the same, because each entry with an ID (auto_increment) has a separate ID, such
The data is different.
Some online materials:
There are two Repeated Records. One is a completely repeated record, that is, a record with all fields repeated,
Second, some records with duplicate key fields, such as duplicate name fields, are not necessarily duplicated or both
Repeat can be ignored.
1. For the first type of repetition, it is easier to solve.
Select distinct * From tablename
You can get the result set without repeated records.
If the table needs to delete duplicate records (one record is retained), you can delete the record as follows:
Select distinct * into # TMP from tablename
Drop table tablename
Select * into tablename from # TMP
Drop table # TMP
The reason for this repetition is that the table design is not weekly. You can add a unique index column.
2. Repeat problems usually require that the first record in the repeat record be retained. The procedure is as follows:
Assume that the duplicate fields are Name and address. You must obtain the unique result set of the two fields.
Select Identity (INT, 1, 1) as autoid, * into # TMP from tablename
Select min (autoid) as autoid into # tmp2 from # TMP group by name, autoid
Select * from # TMP where autoid in (select autoid from # tmp2)
The last select gets the result set with the name and address unique (but with an autoid added ).
Field. This column can be omitted in the select clause during actual writing)