方法:
| 代碼如下 |
複製代碼 |
| group by XX having count(*)>1,rowid,distinct,temporary table,procedure |
下面語句可以查詢出那些資料是重複的:
| 代碼如下 |
複製代碼 |
| select 欄位1,欄位2,count(*) from 表名 group by 欄位1,欄位2 having count(*) > 1 |
將上面的>號改為=號就可以查詢出沒有重複的資料了。
想要刪除這些重複的資料,可以使用下面語句進行刪除
| 代碼如下 |
複製代碼 |
delete from 表名 a where 欄位1,欄位2 in (select 欄位1,欄位2,count(*) from 表名 group by 欄位1,欄位2 having count(*) > 1) |
例
1、查詢表中的重複資料
a.重複一個欄位
b.重複多個欄位
c.重複一整行
建立測試表:
| 代碼如下 |
複製代碼 |
create table cfa (businessid number,customer varchar2(50),branchcode varchar2(10),data_date varchar2(10)); insert into cfa values (1,'Albert','SCB','2011-11-11'); insert into cfa values (2,'Andy','DB','2011-11-12'); insert into cfa values (3,'Allen','HSBC','2011-11-13'); ---------------以下為重複資料---------------------------------------------- insert into cfa values (1,'Alex','ICBC','2011-11-14'); insert into cfa values (1,'Albert','CTBK','2011-11-15'); insert into cfa values (1,'Albert','SCB','2011-11-11'); |
對於a的情況,只有businessid重複
| 代碼如下 |
複製代碼 |
select * from cfa where businessid in (select businessid from cfa group by businessid having count(businessid)>1); |
如果是b的情況,businessid 和name同時存在重複
| 代碼如下 |
複製代碼 |
select * from cfa where (businessid,customer) in (select businessid,customer from cfa group by businessid,customer having count(*)>1); |
對於c的情況,重複一整行
參考b的方法:
| 代碼如下 |
複製代碼 |
| select * from cfa where (businessid,customer,branchcode,data_date) in (select * from cfa group by businessid,customer,branchcode,data_date having count(*)>1); |
2、刪除表中的重複資料
a情況,刪除表中多餘的重複記錄,重複記錄是根據單個欄位(businessid)來判斷,只留有rowid最小的記錄
也可以只保留rowid不是最小記錄,需要把代碼中的min改為max這裡不再贅述。
| 代碼如下 |
複製代碼 |
delete from cfa where businessid in (select businessid from cfa group by businessid having count(businessid) > 1) and rowid not in (select min(rowid) from cfa group by businessid having count(businessid) > 1); |
或者,使用下面更簡單高效的語句
| 代碼如下 |
複製代碼 |
DELETE FROM cfa t WHERE t.ROWID > (SELECT MIN(X.ROWID) FROM cfa X WHERE X.businessid = t.businessid); |
b情況,刪除表中多餘的重複記錄(多個欄位),只留有rowid最小的記錄
| 代碼如下 |
複製代碼 |
delete from cfa where (businessid,customer) in (select businessid,customer from cfa group by businessid,customer having count(*) > 1) and rowid not in (select min(rowid) from cfa group by businessid,customer having count(*) > 1); |
或者,使用下面更簡單高效的語句
| 代碼如下 |
複製代碼 |
DELETE FROM cfa t WHERE t.ROWID > (SELECT MIN(X.ROWID) FROM cfa X WHERE X.businessid = t.businessid and x.customer = t.customer); |
上面的語句非常簡單,就是將查詢到的資料刪除掉。不過這種刪除執行的效率非常低,對於大資料量來說,可能會將資料庫弔死。所以我建議先將查詢到的重複的資料插入到一個暫存資料表中,然後對進行刪除,這樣,執行刪除的時候就不用再進行一次查詢了。如下:
CREATE TABLE 暫存資料表 AS
| 代碼如下 |
複製代碼 |
(select 欄位1,欄位2,count(*) from 表名 group by 欄位1,欄位2 having count(*) > 1)
|
上面這句話就是建立了暫存資料表,並將查詢到的資料插入其中。
下面就可以進行這樣的刪除操作了:
| 代碼如下 |
複製代碼 |
delete from 表名 a where 欄位1,欄位2 in (select 欄位1,欄位2 from 暫存資料表);
|
這種先建暫存資料表再進行刪除的操作要比直接用一條語句進行刪除要高效得多。
c情況,這種情況就比較簡單,使用暫存資料表方法
| 代碼如下 |
複製代碼 |
create table cfabak as select distinct * from cfa; truncate table cfa;--如果是生產最好對該表backup Insert into cfa select * from cfabak; commit; |
我們要重複資料刪除資料,只保留最新的一條資料,就可以這樣寫了:
| 代碼如下 |
複製代碼 |
delete from 表名 a where a.rowid != ( select max(b.rowid) from 表名 b where a.欄位1 = b.欄位1 and a.欄位2 = b.欄位2 ) |