Find duplicate T-SQL in Database

Last Update:2018-12-07 Source: Internet

Author: User

Tags mysql manual

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Find duplicate data T-SQL in the database ========== first article ============ there are duplicate records under a field in a table, there are many ways, however, there is a method that is relatively efficient. The following statement: Select data_guid from adam_entity_datas A where. rowid> (select Min (B. rowid) from adam_entity_datas B where B. data_guid =. data_guid) if the table contains a large amount of data but few duplicate data, you can use the following statement to improve efficiency: Select data_guid from adam_entity_datas where data_guid in (select data_guid from adam_entity_datas group by data_guid having count (*)> 1) This method queries all Repeated Records, That is to say, if it is repeated, the following statement may be more efficient select data_guid from adam_entity_datas where rowid in (select rid from (select rowid RID, row_number () over (partition by data_guid order by rowid) m from adam_entity_datas) Where m <> 1) currently, only these three effective methods are available. The first method is easier to understand, but slowest. The second method is the fastest, but the selected record is all repeated records, rather than a list of repeated records. The third method is as follows, I think it is best. ========= Article 2 =========== select usercode, count (*) from ptype group by usercode having count (*)> 1 ========= Article 3 ============= find the ID of the duplicate record: Select ID from (select ID, count (*) as CNT from the table group by ID to be eliminated) T1 where t1.cnt> 1. Several Methods for deleting duplicate data in the database Program Some problems may be caused by repeated data, which leads to incorrect database settings ...... Method 1 declare @ Max integer, @ ID integer declare cur_rows cursor local for Select Main field, count (*) from table name group by main field having count (*)> 1 open cur_rows fetch cur_rows into @ ID, @ Max while @ fetch_status = 0 begin select @ max = @ max-1 Set rowcount @ Max Delete from table name where primary field = @ ID fetch cur_rows into @ ID, @ Max end close cur_rows set rowcount 0 method 2 duplicate records in two meanings. One is completely repeated records, that is, records with all fields repeated, second, record with duplicate key fields, such as duplicate name fields And other fields are not necessarily repeated or can be ignored. 1. For the first type of repeat, it is easy to solve. You can use select distinct * From tablename to obtain the result set without repeated records. If the table needs to delete duplicate records, you can delete select distinct * into # TMP from tablename drop table tablename select * into tablename from # TMP drop table # TMP 2. Repeat problems usually require that the first record in the repeat record be retained., * assume that the duplicate fields are Name and address. You must obtain the Select Identity (INT,) as autoid, * Into # TMP from tablename select Min (autoid) as autoid into # tmp2 from # TMP group by name, autoid select * from # TMP where autoid in (select autoid from # TM P2) the name of the last select is obtained, the two methods for changing the user of the table in the database using a result set with no duplicate address may often encounter a database backup restoring to another machine, resulting in the failure of opening all the tables, the reason is that the current database user was used during table creation ...... ========= Article 4 =========== how to query duplicate records in the database? For example, the data in a table is as follows: --------- aaabbc --------- the query result is: number of records A 3B 2C 1 how to write this SQL statement? ----------------------- Select distinct (name), count (*) from tabname group by name; ------------------------------------- come up with a command to sort the data. Select A1, count (A1) as total from tablename group by a1 order by total DESC ---------------------------------------- select distinct (A1), count (A1) as total from tablename group by a1 order by total DESC add distinct more efficiently ---------------------------------------------------------- select P. *, M. * From Table1 P left join Table2 m on p. item1 = m. item2 where p. item3 = '# $ # @ % $ @ 'order by P. item3 ASC limit 10 is written like this ========= Article 5 =========== how to find duplicate records in the database? Methods available in access: Explain select * from Table A inner join (select Field 1, Field 2 from Table group by field 1, Field 2 having count (*)> 1) B On. field 1 = B. field 1 and. field 2 = B. field 2 -------------------------------------------------------- problem: Based on several of the fields, retained only one record, but all fields must be displayed. How to query? Thank you !! For example, field 1 Field 2 Field 3 field 4 a B c 1 a B c 1 a B d 2 a B d 3 B d 2 The expected result is a B c 1 A B D 2 (or 3) B D 2 indicates that fields 1, 2, and 3 are not repeated and fields 4 are not considered. Three records are obtained, but field 4 is also displayed. Method 1: You can use a temporary table to solve the problem: currentproject. connection. execute "Drop table temptable" currentproject. connection. execute "select * into temptable from table 2 where 1 = 2" currentproject. connection. execute "insert into temptable (Field 1, Field 2, Field 3) Select distinct table 2. field 1, table 2. field 2, table 2. field 3 from table 2; "currentproject. connection. execute "Update temptable inner join table 2 on (table 2. field 1 = temptable. field 1) and (table 2. field 2 = temptable. field 2) and (Table 2. field 3 = temptable. field 3) set temptable. field 4 = [Table 2]. [field 4]; "Method 2: You can use a SELECT statement to query the required data. Assume that select [1], [2], [3], min ([4]) as min4 from table 1 group by Table 1. [1], table 1. [2], table 1. [3]; problem: Table 2 ID name R1 R2 1 1 1 w ee 1 1 1 1232 1 2 123 123 1 2 12 434 1 2 123 2 1 123 123 123 ID is a numerical value, name is a character. Each record does not have a unique identifier. It is required that the records whose IDs and names are merged be retained. If one record is retained, all records must be displayed. Answer: select. *, (select top 1 R1 from table 2 as A1 where a1.id =. ID and a1.name =. name) as R1, (select top 1 R2 from table 2 as A2 where a2.id =. ID and a2.name =. name) as R2 from [select distinct table 2.id, table 2. name from table 2]. as a; select. *, dlookup ("R1", "Table 2", "id =" &. ID & "and name = '" &. name & "'") as R1, dlookup ("R2", "Table 2", "id =" &. ID & "and name = '" &. name & "'") as R2 from [select distinct table 2.id, table 2. name from table 2]. as a; note: Code Because there is no unique identifier column in, the order of R1 R2 displayed cannot be determined. Generally, the order is based on the input order. However, Microsoft does not have official information to explain the order in which the column is displayed, please note. Note that table 2 does not have a unique field. If you create another "primary key" field, you can use the following code to select. ID,. name, B. r1, B. r2, B. primary Key from (Select Table 2.id, table 2. name, min (table 2. as primary key from table 2 group by table 2.id, table 2. name) as a inner join Table 2 as B on. primary Key = B. primary Key ======= Article 6 ========== 1. query repeated records in the database: Select realname, count (*) from Users Group by realname having count (*)> 1 ========= Article 7 =========== select t0.itemcode, t0.itemname from oitm t0 where exists (select 1 from Oitm A where. codebars =. codebars and. itemcode <>. itemcode) =========== Article 8 =============== I believe that many people will encounter searches for non-repeated records in a table when querying the database, when it comes to searching records that do not repeat, we will immediately think of distinct or group by groups. The younger brother encountered some trouble when using the record for the first time. I will share it with you here, hope to help more friends! First, let's look at the database table structure: Table Name: Test Field: ID, A, B, C, and D. The B field contains the duplicate value; IDA BC D111 A34 Bvb222 A35 Fgg333 DHT Sdf444 A345 De555 Csfsf Sscv666 BRT FG, let's take a look at the SQL statements used to retrieve data that does not contain duplicate records: the distinct keyword distinct is used to remove the repeated record select distinct [field name] from [Table name] Where [Search Condition words] In the SELECT query record based on the value of a specified field. one SQL statement can remove repeated items: [color =] Select distinct (B) from test. However, there is a very important note here: no other field can be followed after select distinct [field name, otherwise, the retrieved record will still contain repeated items. incorrect syntax: Select distinct [field name], [other field names] from [Table name] Where [Search Condition words] Actually, in the preceding SQL statement result set, only field B is available. (in general, this result is difficult to meet the requirements.) If other field values are required in our record set, how can this problem be solved? What should I do? In fact, we can use another method to solve the problem. We only need to use subqueries! Note: When using a query statement with the group by clause, the column specified in the select list is either a column specified by group, you can use the following SQL statement to remove duplicate items: [color =] Select * from test where ID in (select Min (ID) from test group by B) to get the expected result set: Ida BC D111 A34 Bvb333 DHT Sdf555 Csfsf Sscv666 BRT FG ======= Article 9 ======= MySQL ===---------------------------------------------------------------------- the account in my MySQL table is an 8-bit random number, what should I do if I want to check whether the account number is repeated? ------------------------------------------------------------------ select count (*) as num, the account from Table group by account num> 1 is repeated! ======== Article 10 ======= (for anxious people, read the red letter directly) ===== when using MySQL, sometimes you need to query records that do not repeat a field. Although MySQL provides the keyword distinct to filter out redundant duplicate records and retain only one record, it is often used to return the number of records that do not repeat, instead of using it to return all non-record values. The reason is that distinct can only return its target field, but cannot return other fields. This problem has plagued me for a long time. If distinct cannot solve this problem, I only need to use double loop query to solve it, this will undoubtedly directly affect the efficiency of a station with a large data volume. So I spent a lot of time studying this problem, and I couldn't find a solution on the Internet. During this period, I pulled Rong for help. As a result, both of us were depressed ......... Let's take a look at the example: The table ID name 1 A 2 B 3 C 4 C 5 B database structure is like this. This is just a simple example, and the actual situation will be much more complicated. For example, if you want to use a statement to query all data with no duplicate names, you must use distinct to remove redundant duplicate records. The result of select distinct name from table is: Name a B c seems to have achieved the effect, but what do I want to get is the ID value? Change the query statement: Select distinct name. The result of ID from table is: ID name 1 A 2 B 3 C 4 C 5 bdistinct. Why didn't it work? It works, but it also applies to two fields, that is, it must have the same ID and name to be excluded ....... Let's change the query statement: Select ID, distinct name from table. Unfortunately, you cannot get anything except the error information. Distinct must start. Is it difficult to place distinct in the where condition? Yes. An error is reported as a result ....... Very troublesome? Indeed, this problem cannot be solved with all the effort. No way. Continue to ask. He grabbed a Java programmer in the company and showed me the solution in MySQL after using distinct in Oracle. Before leaving work, he suggested that I try group. I tried it for a long time, and I couldn't do it. I finally found a usage in the MySQL manual. I realized what I needed with group_concat (distinct name) and group by name, try it now. Error ............ Depressed ....... I can't even go through the MySQL manual. I gave me hope first, and then pushed me to disappointment .... Check again. The group_concat function is supported by 4.1, dizzy. I have 4.0. No way. Upgrade. The upgrade is successful ...... Finally, the customer must be asked to upgrade. Suddenly, the ghost machine flashed. Since the group_concat function can be used, can other functions be used? Use the count function to try it out. I am a success ....... It takes so much time to cry ........ It turns out to be so simple ...... Now let out the complete statement: Select *, count (distinct name) from Table group by name result: ID name count (distinct name) 1 A 1 2 B 1 3 C 1 the last item is redundant and you don't have to worry about it. The goal is to achieve ..... Alas, it turned out that MySQL was so stupid that I would just lie to him with just a few clicks. I am so depressed (by the way, there is also the guy Rong). Now I hope you will not be overwhelmed by this problem. Oh, yes. By the way, group by must be placed before order by and limit. Otherwise, an error will be reported, which is almost the same and sent to Rongrong website, I keep busy ...... A more depressing thing happened. When preparing for submission, we found that there was a simpler solution ...... Select ID, name from Table group by nameselect * from Table group by name ========= 11th Article ========= Method for querying and deleting duplicate records (1) 1. Search for redundant duplicate records in the Table. Repeat records determine the select * From peoplewhere peopleid in (select peopleid from people group by peopleid having count (peopleid)> 1) 2. Delete unnecessary duplicate records in the Table. duplicate records are determined based on a single field (peopleid, only records with the smallest rowid are left: delete from people where peopleid in (select peopleid from people group by peopleid having Count (peopleid)> 1) and rowid not in (select Min (rowid) from people group by peopleid having count (peopleid)> 1) 3. Search for redundant duplicate records in the table (multiple fields) Select * From vitae awhere (. peopleid,. SEQ) in (select peopleid, seq from vitae group by peopleid, seq having count (*)> 1) 4. Delete redundant record (multiple fields) in the table ), delete from vitae awhere (. peopleid,. SEQ) in (select peopleid, seq from vitae group by peopleid, seq having cou NT (*)> 1) and rowid not in (select Min (rowid) from vitae group by leleid, seq having count (*)> 1) 5. Search for redundant duplicate records (multiple fields) in the table, excluding the select * From vitae awhere (. peopleid,. SEQ) in (select peopleid, seq from vitae group by peopleid, seq having count (*)> 1) and rowid not in (select Min (rowid) from vitae group by peopleid, SEQ having count (*)> 1) (2) for example, there is a field "name" in Table A, and the "name" values may be the same between different records, now you need to query There are repeated items in the "name" value between records; Select name, count (*) from a group by name having count (*)> 1. If the sex is also the same, select name, sex, count (*) from a group by name, sex having count (*)> 1 (3) method 1 declare @ Max integer, @ ID integerdeclare cur_rows cursor local for Select Main field, count (*) from table name group by main field having count (*)>; 1 open cur_rowsfetch cur_rows into @ ID, @ maxwhile @ fetch_status = 0 beginselect @ max = @ max-1 Set rowcount @ Maxdelete from table name where primary field = @ idfetch cur_rows into @ ID, @ maxendclose cur_rowsset rowcount 0 method 2 duplicate records in two meanings, one is completely duplicate records, that is, record where all fields are repeated. Second, record with duplicate key fields. For example, if the name field is repeated, other fields may not be repeated or can be ignored. 1. For the first type of repeat, it is easy to solve. You can use select distinct * From tablename to obtain the result set without repeated records. If the table needs to delete duplicate records (one record is retained ), you can delete select distinct * into # TMP from tablenameddrop table tablenameselect * into tablename from # tmpdrop table # TMP. This duplication occurs because the table is not designed for weeks, you can add a unique index column. 2. This type of repetition problem usually requires that the first record in the repeat record be retained. The operation method is as follows, assuming that there are repeated fields: name, address, select Identity (INT, 1, 1) as autoid, * into # TMP from tablenameselect min (autoid) as autoid into # tmp2 from # TMP group by name, autoidselect * from # TMP where autoid in (select autoid from # tmp2) The last select gets the name, the address does not repeat the result set (but an autoid field is added, this column can be omitted in the select clause during actual writing) (4) query repeated select * From tablename where ID in (select ID from tablename group by ID having count (ID)> 1)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More