How does SQL query repeated data in multiple columns of a database?

Source: Internet
Author: User
First, I have a movie database. This is part of it: [rel] movie_id, star_id, and time are generally a movie, corresponding to multiple performers. The same movie_id and star_id can only appear once in the database. However, previous system design vulnerabilities lead to repeated insertion... first, I have a Movie Database, which is part of it:
[Rel] movie_id, star_id, time

Generally, it is a movie that corresponds to multiple performers. The same movie_id and star_id can only appear once in the database.
However, previous system design vulnerabilities have resulted in repeated insertion of multiple pieces of data.:
For example:
Kung Fu, Stephen Chow
Kung Fu, Huang shengyi
Kung Fu, Stephen Chow
As a result, Zhou xingchi made two moves in our database.

Could you please design an SQL statement to help me quickly find out the redundant items and movie IDs.

Reply content:

First, I have a Movie Database, which is part of it:
[Rel] movie_id, star_id, time

Generally, it is a movie that corresponds to multiple performers. The same movie_id and star_id can only appear once in the database.
However, previous system design vulnerabilities have resulted in repeated insertion of multiple pieces of data.:
For example:
Kung Fu, Stephen Chow
Kung Fu, Huang shengyi
Kung Fu, Stephen Chow
As a result, Zhou xingchi made two moves in our database.

Could you please design an SQL statement to help me quickly find out the redundant items and movie IDs.

Keep records with the smallest id

Query duplicate data
Select * from TABLE_NAME where (movie_id, star_id) in (select movie_id, star_id from TABLE_NAME group by movie_id, star_id having count ()> 1) and id not in (select min (id) from TABLE_NAME group by movie_id, star_id having count ()> 1)

Delete duplicate data
Delete from TABLE_NAME where (movie_id, star_id) in (select movie_id, star_id from TABLE_NAME group by movie_id, star_id having count ()> 1) and id not in (select min (id) from TABLE_NAME group by movie_id, star_id having count ()> 1)

sqlSELECT movie_id, star_id, COUNT(*) FROM xxx GROUP BY (movie_id + '-' + star_id) HAVING COUNT(*) > 1

I have one that is similar to yours, but I am highly concurrent. I can store 3000 pieces of data in one minute.
I first put the data in memcahe,
First match of each warehouse receiving,
If not, store the database,
If yes, compare the data timestamp before processing ....

SELECT distinct movie_id, star_id FROM xxx

SELECT t.id, t.movie_id, t.star_id from xxx AS t LEFT JOIN xxx AS t1 on t.movie_id = t1.movie_id WHERE t.id != t1.id AND t.star_id = t1.star_id

Replace xxx with your table name. The id here is the primary key id. If you do not have a table, replace it with time. However, this may cause duplicate data with the same time to be found.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.