Someone in the group asked a question today.
I have a table named test, which contains three fields A, B, and C. Now field B has a lot of duplicate data. Some of these duplicate data fields are empty, some have data, and the C field is correct. Excuse me: I want to remove duplicates Based on the B field. If the two data fields are the same, and the field is empty, if there is data in the other field, delete
described in two parts: user_id and ID.
I. Check user_id first
1. The effects of Count (ID) and count (user_id) after group by user_id are different.
The following statement indicates that three rows of user_id are null, and count (null) seems invalid.
2. Make the userid in (null, 1257646) and user_id in (1257646) statements consistent, and the previous null will be ignored.
In fact, all results are user_id in (1257646 ).
Can it be u
A recently written data collection program generates a file containing more than million rows of data. The data consists of four fields, and the second field must be deleted as required, no suitable tool was found in linux. stream processing tools such as sed/gawk can only process one row but cannot find rows with duplicate fields. It seems that I had to use a py
PremiseWhen this article discusses only SQL Server queries,For non-composite statistics, where the statistics for each field contain only the data distribution of theHow to estimate the number of rows based on statistical information when combining queries with multiple fields.The algorithm principle of estimating the number of data rows using statistics from dif
SQL Duplicate record query
1, look for redundant records in the table, duplicate records are based on a single field (Peopleid) to judge
SELECT * from people
where Peopleid to (select Peopleid from People GROUP by Peopleid have count (Peopleid) > 1)
Case Two:
SELECT * from TestTable
where Numeber to (select number from people group by nu
emp_name salary
------------------------------------------------------------
1 sunshine 10000
2 semon 20000
SQL> select * from employee E1
Where rowid in (select max (rowid) from employe E2Where e1.emp _ id = e2.emp _ id and
E1.emp _ name = e2.emp _ name and e1.salary = e2.salary );
Emp_id emp_name salary
------------------------------------------------------------
1 sunshine 10000
3. XYZ 30000
2 semon 20000
2. delete several methods:
(1) create a
new record, the value of the affected row is 1. If the original record is updated, the value of the affected row is 2.For more information about the insert into... on duplicate key function, see MySQL Reference Documentation: 13.2.4. INSERT syntax.Now the question is, how can I specify the value of the field after ON DUPLICATE KEY UPDATE if multiple rows are ins
There are many ways for SQLServer to search for and delete duplicate records in tables. Below I will list several commonly used SQL statements with good performance. If you need them, please refer to them.
There are many ways for SQL Server to find and delete duplicate records in tables. Below I will cite several commo
SQL removes duplicate records, SQL removes duplicates
There are two Repeated Records. One is a completely repeated record, that is, records with all fields being repeated, and the other is records with duplicate key fields, such as duplicate Name fields, other fields are not
SQL statement for data duplication between Oracle and MySQL
It can be said that this data is repeated, no matter in development, data maintenance and experience interviews, you should encounter common problems! Here, I also paid special attention to some articles on the Internet and collected them for your reference and study. It also facilitates your future review!
Repeated data processing in Oracle
How to query
into duplicate_all
Select * from # tmp
(2) Use ROW_NUMBERCopy codeThe Code is as follows: with tmpAs(Select *, ROW_NUMBER () OVER (partition by c1, c2, c3 order by (getdate () as numFrom duplicate_allWhere c1 = 1)Delete tmp where num> 1
If multiple tables have completely repeated rows, you can use UNION to join multiple tables to a new table with the same structure, SQL Server helps remove
full-volume scan operation. You should find an alternative or use a precondition statement to minimize the number of rows before the like lookup.B, try to avoid the large table data for select top n * from XXX where xxxx order BYNEWID () takes a random record of the operation. The NEWID () operation reads the full amount of data and then sorts it. Also consumes a lot of CPU and read operations. Consider using the rand () function, which I'm still wor
the retrieval of rows from a table or view.It contains keys that are generated by one or more columns in a table or view. These keys are stored in a structure (B-tree), allowing SQL Server to quickly and efficiently find the rows associated with the key value.7. Type of index: clustered, non-clustered.There is also a unique index, a unique index that ensures tha
#tmp
From Duplicate_all
where C1 = 1
Go
Delete Duplicate_all where C1 = 1
Go
INSERT INTO Duplicate_all
SELECT * FROM #tmp
(2) using Row_number
Copy Code code as follows:
With TMP
As
(
Select *,row_number () over (PARTITION by c1,c2,c3 Order by (GETDATE ()) as Num
From Duplicate_all
where C1 = 1
)
Delete tmp where num > 1
If multiple tables have completely duplicate
also a very streamlined approach.
Look for duplicates and get rid of the smallest one.
The code is as follows
Copy Code
Delete Users_groups as a from users_groups as a,(Select *,min (ID) from Users_groups GROUP by UID has count (1) > 1) as Bwhere A.uid = B.uid and a.id > b.id;(7 row (s) affected)(0 ms taken)Query result (7 records)ID UID GID1 11 5022 107 5023 100 5034 110 5015 112 5016 104 5029 102 501
3. Now look at the efficiency of these two approaches
Then I thought about a statement:
The Code is as follows:
Mysql> use mysqlDatabase changedMysql> select * from duplicate;+ ---- + ------- +| Id | name |+ ---- + ------- +| 1 | wang || 3 | wdang || 5 | wdand || 6 | wddda || 2 | wang || 4 | wdang |+ ---- + ------- +6 rows in set (0.00 sec)Mysql> delete duplicate as a from
following prompt:Server: Msg 3604, Level 16, State 1, line 1The duplicate key has been ignored.It indicates that no duplicate rows appear in the Product information temporary table products_temp. Fourth Axe--import new data into the original tableEmpty the original product information sheet products, import the data from the temporary table products_temp, and
sec) Mysql> Delete Duplicate as a from duplicate as a, -> ( -> SELECT * from duplicate group by name has count (1) >1) as B nb sp;-> where a.name=b.name and a.id > b.id; Query OK, 2 rows Affected (0.00 sec) Mysql> select * FROM duplicate +----+-------+ | id | name |
# tmp(2) Use ROW_NUMBERCopy codeThe Code is as follows:With tmpAs(Select *, ROW_NUMBER () OVER (partition by c1, c2, c3 order by (getdate () as numFrom duplicate_allWhere c1 = 1)Delete tmp where num> 1
If multiple tables have completely repeated rows, you can use UNION to join multiple tables to a new table with the same structure, SQL Server helps remove duplicate
SQL statements remove duplicate records, get duplicate records
--Query a table to effectively remove duplicate records, UserID as a self-growing primary key, Roleid as a repeating field
The code is as follows
Copy Code
SELECT MIN (UserID) as UserID, Roleid from Tmptable GROUP by RoleidSELE
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.