Temporary instance processing is disabled when 80 w data is removed at a time
Recently, When I deduplicate a million data business table, the de-duplication operation was stuck. The following is a simple review.
1. query the data volume of the business table. A total of more than 200 million data records are displayed.
SQL> select count (*) from tb_bj_banker_etl;
2552381
2. query the duplicate data volume that should be removed from the table. There are more than 80 million records.
SQL> select count (*) from tb_bj_banker_etl where (id) in (select id from tb_bj_banker_etl group by id having count (*)> 1) and rowid not in (select max (rowid) from tb_bj_banker_etl group by id having count (*)> 1 );
830099
3. So, before work in the evening, execute the following statement script to remove duplicates
SQL> delete from tb_bj_banker_etl where (id) in (select id from tb_bj_banker_etl group by id having count (*)> 1) and rowid not in (select max (rowid) from tb_bj_banker_etl group by id having count (*)> 1 );
SQL> commit;
4. The next day, when we arrived at the site, we found that the statements executed last night in PL/SQL Developer were still being executed.
First of all, I noticed that I had not finished running more than 80 million de-duplicated data for one night? What's wrong with this?
Suspected lock table.
The user queries whether the table has a lock.
- SELECT
- A. OWNER, -- user to which the OBJECT belongs
- A. OBJECT_NAME, -- OBJECT Name
- B. XIDUSN,
- B. XIDSLOT,
- B. XIDSQN,
- B. SESSION_ID, -- lock the table user's session
- B. ORACLE_USERNAME, -- Oracle username of the table lock user
- B. OS _USER_NAME, -- lock table user's operating system login Username
- B. PROCESS,
- B. LOCKED_MODE,
- C. MACHINE, -- lock table user's computer name
- C. STATUS, -- lock Table STATUS
- C. SERVER,
- C. SID,
- C. SERIAL #,
- C. PROGRAM-database management tools used by table lock users
- FROM
- ALL_OBJECTS,
- V $ LOCKED_OBJECT B,
- SYS. GV _ $ SESSION C
- WHERE
- A. OBJECT_ID = B. OBJECT_ID
- And B. PROCESS = C. PROCESS
- Order by 1, 2
In the following results, we can see that only the initiate session of the deduplicate statement is locked, and no other users cause the lock table. Does this mean that the statement is still being executed? Start to solve the problem with questions.
1 BJHYL tb_bj_banker_ETL 15 18 9000 913 BJHYL Administrator 4036: 972 3 WORKGROUP \ backdb active dedicated 913 3381 plsqldev.exe
2 BJHYL tb_bj_banker_ETL 15 18 9000 913 BJHYL Administrator 4036: 972 3 WORKGROUP \ backdb inactive dedicated 649 41791 plsqldev.exe
3 BJHYL tb_bj_banker_ETL 15 18 9000 913 BJHYL Administrator 4036: 972 3 WORKGROUP \ backdb inactive dedicated 817 27777 plsqldev.exe
4 BJHYL tb_bj_banker_ETL 15 18 9000 913 BJHYL Administrator 4036: 972 3 WORKGROUP \ backdb inactive dedicated 841 1981 plsqldev.exe
5. batch processing is adopted to solve the problem of deduplication
Since direct de-duplication cannot proceed smoothly, I thought of a batch de-duplication method and tried it.
- First time:
- Delete from tb_bj_banker_etl where (id) in (select id from tb_bj_banker_etl group by id having count (*)> 1) and rowid not in (select max (rowid) from tb_bj_banker_etl group by id having count (*)> 1) and rownum <= 100000;
- Commit;
-
- Second:
- Delete from tb_bj_banker_etl where (id) in (select id from tb_bj_banker_etl group by id having count (*)> 1) and rowid not in (select max (rowid) from tb_bj_banker_etl group by id having count (*)> 1) and rownum <= 100000;
- Commit;
-
- .......
- .......
- .......
-
- Eighth time:
- Delete from tb_bj_banker_etl where (id) in (select id from tb_bj_banker_etl group by id having count (*)> 1) and rowid not in (select max (rowid) from tb_bj_banker_etl group by id having count (*)> 1 );
- Commit;
Result: After Dividing more than 0.8 million data into 10 million data records for single deduplication operation, the total duration is over 140 seconds, and the goal of removing 0.8 million data is achieved. However, the reason for direct handling of zombie cases remains to be tracked and analyzed in the future.
Blog Source: http://blog.csdn.net/huangyanlong/article/details/46041735