First, I have a movie database. This is part of it: [rel] movie_id, star_id, and time are generally a movie, corresponding to multiple performers. The same movie_id and star_id can only appear once in the database. However, previous system design vulnerabilities lead to repeated insertion... first, I have a Movie Database, which is part of it:
[Rel] movie_id, star_id, time
Generally, it is a movie that corresponds to multiple performers. The same movie_id and star_id can only appear once in the database.
However, previous system design vulnerabilities have resulted in repeated insertion of multiple pieces of data.:
For example:
Kung Fu, Stephen Chow
Kung Fu, Huang shengyi
Kung Fu, Stephen Chow
As a result, Zhou xingchi made two moves in our database.
Could you please design an SQL statement to help me quickly find out the redundant items and movie IDs.
Reply content:
First, I have a Movie Database, which is part of it:
[Rel] movie_id, star_id, time
Generally, it is a movie that corresponds to multiple performers. The same movie_id and star_id can only appear once in the database.
However, previous system design vulnerabilities have resulted in repeated insertion of multiple pieces of data.:
For example:
Kung Fu, Stephen Chow
Kung Fu, Huang shengyi
Kung Fu, Stephen Chow
As a result, Zhou xingchi made two moves in our database.
Could you please design an SQL statement to help me quickly find out the redundant items and movie IDs.
Keep records with the smallest id
Query duplicate data
Select * from TABLE_NAME where (movie_id, star_id) in (select movie_id, star_id from TABLE_NAME group by movie_id, star_id having count ()> 1) and id not in (select min (id) from TABLE_NAME group by movie_id, star_id having count ()> 1)
Delete duplicate data
Delete from TABLE_NAME where (movie_id, star_id) in (select movie_id, star_id from TABLE_NAME group by movie_id, star_id having count ()> 1) and id not in (select min (id) from TABLE_NAME group by movie_id, star_id having count ()> 1)
sql
SELECT movie_id, star_id, COUNT(*) FROM xxx GROUP BY (movie_id + '-' + star_id) HAVING COUNT(*) > 1
I have one that is similar to yours, but I am highly concurrent. I can store 3000 pieces of data in one minute.
I first put the data in memcahe,
First match of each warehouse receiving,
If not, store the database,
If yes, compare the data timestamp before processing ....
SELECT distinct movie_id, star_id FROM xxx
SELECT t.id, t.movie_id, t.star_id from xxx AS t LEFT JOIN xxx AS t1 on t.movie_id = t1.movie_id WHERE t.id != t1.id AND t.star_id = t1.star_id
Replace xxx with your table name. The id here is the primary key id. If you do not have a table, replace it with time. However, this may cause duplicate data with the same time to be found.