The duplicate data method in the mysql query table.

Source: Internet
Author: User

The duplicate data method in the mysql query table.

INSERT INTO hk_test(username, passwd) VALUES('qmf1', 'qmf1'),('qmf2', 'qmf11') delete from hk_test where username='qmf1' and passwd='qmf1'

Duplicate data records in the query table in MySQL:

View the duplicate raw data first:

Scenario 1: List data with repeat in the username field

select username,count(*) as count from hk_test group by username having count>1; SELECT username,count(username) as count FROM hk_test GROUP BY username HAVING count(username) >1 ORDER BY count DESC;

This method only counts the number of duplicates corresponding to this field.

Scenario 2: List repeated records in the username field:

Select * from hk_test where username in (select username from hk_test group by username having count (username)> 1) SELECT username, passwd FROM hk_test WHERE username in (SELECT username FROM hk_test group by username HAVING count (username)> 1) but this statement is too inefficient in mysql, mysql does not generate a temporary table for the subquery. It takes a long time when the data volume is large.

Solution:

Create table 'tmptable' as (SELECT 'name' FROM 'table' group by 'name' HAVING count ('name')> 1 ); then, use a multi-Table connection to query SELECT. 'id',. 'name' FROM 'table 'a, 'tmptable' t WHERE. 'name' = t. 'name'; the result is coming soon. Repeated SELECT distinct a. 'id', a. 'name' FROM 'table 'a, 'tmptable' t WHERE a. 'name' = t. 'name ';

Scenario 3: view records with duplicate fields. For example, the username and passwd fields have duplicate records:

select * from hk_test awhere (a.username,a.passwd) in (select username,passwd from hk_test group by username,passwd having count(*) > 1)

Scenario 4: query records with repeated fields in the table:

select username,passwd,count(*) from hk_test group by username,passwd having count(*) > 1

MySQL Method for querying and deleting duplicate records in a table (1) 1. Search for redundant duplicate records in the Table. duplicate records are based on a single field (peopleId) select * from peoplewhere peopleId in (select peopleId from people group by peopleId having count (peopleId)> 1) 2. Delete unnecessary duplicate records in the table, repeat records are determined based on a single field (peopleId). Only one record is left: delete from peoplewhere peopleId in (select peopleId from people group by leleid having count (peopleId)> 1) and min (id) not in (select id from people group by peopleId having count (peopleId)> 1) 3. Search for redundant duplicate records in the table (multiple fields) select * from vitae awhere (. peopleId,. seq) in (select peopleId, seq from vitae group by peopleId, seq having count (*)> 1) 4. Delete redundant record (multiple fields) in the table ), delete from vitae awhere (. peopleId,. seq) in (select peopleId, seq from vitae group by peopleId, seq having count (*)> 1) and rowid not in (select min (rowid) from vitae group by peopleId, seq having count (*)> 1) 5. Search for redundant duplicate records (multiple fields) in the table, excluding the records with the smallest rowid select * from vitae awhere (. peopleId,. seq) in (select peopleId, seq from vitae group by peopleId, seq having count (*)> 1) and rowid not in (select min (rowid) from vitae group by peopleId, seq having count (*)> 1) (2) for example, there is A field "name" in Table A, and the "name" values may be the same between different records, now you need to query the records in the table, and the "name" value has repeated items; Select Name, Count (*) From A Group By Name Having Count (*)> 1. If the sex is also the same, Select Name, sex, Count (*) From A Group By Name, sex Having Count (*)> 1 (3) method 1 declare @ max integer, @ id integerdeclare cur_rows cursor local for select Main field, count (*) from table name group by main field having count (*)>; 1 open cur_rowsfetch cur_rows into @ id, @ maxwhile @ fetch_status = 0 beginselect @ max = @ max-1 set rowcount @ maxdelete from table name where primary field = @ idfetch cur_rows into @ id, @ maxendclose cur_rowsset rowcount 0

SELECT * from tab1 where CompanyName in (SELECT companyname from tab1 group by CompanyName having count (*)> 1 ); -- 129.433 ms SELECT * from tab1 INNER join (SELECT companyname from tab1 group by CompanyName having count (*)> 1) as tab2 USING (CompanyName ); -- 0.482 ms method 2 has two duplicate records. One is a completely duplicate record, that is, records with all fields being duplicated. The other is records with repeated key fields, for example, the Name field is repeated, while other fields are not necessarily repeated or can be ignored. 1. For the first type of repeat, it is easy to solve. You can use select distinct * from tableName to obtain the result set without repeated records. If the table needs to delete duplicate records (one record is retained ), you can delete select distinct * into # Tmp from tableName drop table tableName select * into tableName from # Tmp drop table # Tmp this duplication occurs because the table is designed for weeks, you can add a unique index column. 2. This type of repetition problem usually requires that the first record in the repeat record be retained. The operation method is as follows, assuming that there are repeated fields: Name, Address, select identity (int, 1, 1) as autoID, * into # Tmp from tableName select min (autoID) as autoID into # Tmp2 from # Tmp group by Name, autoID select * from # Tmp where autoID in (select autoID from # tmp2) The last select gets the Name, the Address does not repeat the result set (but an autoID field is added, this column can be omitted in the select clause during actual writing) (4) query repeated select * from tablename where id in (select id from ta Blename group by id having count (id)> 1) common statements 1. Search for redundant duplicate records in the Table. duplicate records are based on a single field (mail_id) to determine the Code as follows: copy the code SELECT * FROM table WHERE mail_id IN (SELECT mail_id FROM table group by mail_id having count (mail_id)> 1); 2. Delete unnecessary duplicate records IN the table, duplicate records are determined based on a single field (mail_id, the following code copies the records with the smallest rowid: delete from table WHERE mail_id IN (SELECT mail_id FROM table group by mail_id having count (mail_id)> 1) AND rowid not in (SELECT MIN (Rowid) FROM table group by mail_id having count (mail_id)> 1); 3. Find redundant duplicate records in the table (multiple fields) copy the Code as follows: SELECT * FROM table WHERE (mail_id, phone) IN (SELECT mail_id, phone FROM table group by mail_id, phone having count (*)> 1 ); 4. DELETE unnecessary duplicate records (multiple fields) IN the table. Only the Code with the smallest rowid is as follows: copy the code delete from table WHERE (mail_id, phone) IN (SELECT mail_id, phone FROM table group by mail_id, phone having cou (www.jb51.net) NT (*)> 1) AND ro Wid not in (select min (rowid) FROM table group by mail_id, phone having count (*)> 1); 5. Find redundant duplicate records IN the table (multiple fields ), the code that does not contain the minimum rowid record is as follows: copy the code SELECT * FROM table WHERE (. mail_id,. phone) IN (SELECT mail_id, phone FROM table group by mail_id, phone having count (*)> 1) AND rowid not in (select min (rowid) FROM table group by mail_id, phone having count (*)> 1); Stored Procedure declare @ max integer, @ id integer declare cur_rows c Ursor local for select Main field, count (*) from table name group by main field having count (*)>; 1 open cur_rows fetch cur_rows into @ id, @ max while @ fetch_status = 0 begin select @ max = @ max-1 set rowcount @ max delete from table name where primary field = @ id fetch cur_rows into @ id, @ max end close cur_rows set rowcount 0 (1) single field 1. Search for redundant duplicate records in the table, according to (question_title) field to determine the Code as follows: copy the code select * from questions where question_title in (select ques Tion_title from people group by question_title having count (question_title)> 1) 2. Delete unnecessary duplicate records in the table and determine based on the (question_title) field, only one record code is left: copy the code delete from questionswhere peopleId in (select peopleId from people group by peopleId having count (question_title)> 1) and min (id) not in (select question_id from questions group by question_title having count (question_title)> 1) (2) Multiple Fields Delete redundant record (multiple fields) in the table with only rowid The minimum record code is as follows: copy the code delete from questions WHERE (questions_title, questions_scope) IN (SELECT questions_title, questions_scope FROM que (batch) stions group by questions_title, questions_scope HAVING COUNT) AND question_id not in (select min (question_id) FROM questions group by questions_scope, questions_title having count (*)> 1) cannot be deleted using the preceding statement. It is deleted only when a temporary table is created, please explain it to me. Copy the following code to create table tmp as select question_id FROM questions WHERE (questions_title, questions_scope) IN (SELECT region, questions_scope FROM questions group by questions_title, questions_scope having count (*)> 1) AND question_id not in (select min (question_id) FROM questions group by questions_scope, questions_title having count (*)> 1); delete from questions WHERE question_id IN (SELECT question_id FROM tmp ); drop table tmp;

Find duplicate records in the mysql DATA table
There are more and more data in the mysql database. Of course, duplicate data cannot be ruled out. When maintaining the data, I suddenly thought of removing unnecessary data and leaving valuable data.

The following SQL statement can be used to find all duplicate records in a table.
Select user_name, count (*) as count from user_table group by user_name having count> 1;

Parameter description:

User_name is the repeated field to be searched.

Count is used to judge whether the value greater than one is repeated.

User_table is the name of the table to be searched.

Group

Having is used to filter.

Replace the parameter with the corresponding field parameter of your data table. You can run it in Phpmyadmin or Navicat to see which data is duplicated and then delete it from the database, you can also directly put the SQL statement on the page for reading news in the background to complete the query into a list of duplicate data. If there are duplicates, you can delete them directly.

The effect is as follows:

Disadvantage: the disadvantage of this method is that when the data volume in your database is large, the efficiency is very low. I use Navicat for testing, the data volume is not large, and the efficiency is very high. Of course, there are other SQL statements that repeatedly query data on the website. Let's take a look at them and find a query statement suitable for your website.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.