When deduplication is, temporary processing is disabled, and 80

Source: Internet
Author: User

When deduplication is, temporary processing is disabled, and 80

Original Works are from the blog of "Deep Blue blog". You are welcome to repost them. Please indicate the source when you repost them. Otherwise, the copyright will be held legally responsible.

Deep Blue blog: http://blog.csdn.net/huangyanlong/article/details/46041735

 

Temporary handling of 80 million data deduplication

Recently, When I deduplicate a million data business table, the de-duplication operation was stuck. The following is a simple review.

1. query the data volume of the business table. A total of more than 200 million data records are displayed.
SQL> select count (*) from tb_bj_banker_etl;
2552381

2. query the duplicate data volume that should be removed from the table. There are more than 80 million records.
SQL> select count (*) from tb_bj_banker_etl where (id) in (select id from tb_bj_banker_etl group by id having count (*)> 1) and rowid not in (select max (rowid) from tb_bj_banker_etl group by id having count (*)> 1 );
830099

3. So, before work in the evening, execute the following statement script to remove duplicates
SQL> delete from tb_bj_banker_etl where (id) in (select id from tb_bj_banker_etl group by id having count (*)> 1) and rowid not in (select max (rowid) from tb_bj_banker_etl group by id having count (*)> 1 );
SQL> commit;

4. The next day, when we arrived at the site, we found that the statements executed last night in PL/SQL Developer were still being executed.
First of all, I noticed that I had not finished running more than 80 million de-duplicated data for one night? What's wrong with this?
Suspected lock table.
The user queries whether the table has a lock.

Select. OWNER, -- User. OBJECT_NAME, -- OBJECT Name B. XIDUSN, B. XIDSLOT, B. XIDSQN, B. SESSION_ID, -- lock table user's session B. ORACLE_USERNAME, -- Oracle username B of the table lock user. OS _USER_NAME, -- lock table user's operating system login username B. PROCESS, B. LOCKED_MODE, C. MACHINE, -- lock table user's computer name C. STATUS, -- lock Table status c. SERVER, C. SID, C. SERIAL #, C. PROGRAM: the database management tool used by the lock table user FROM ALL_OBJECTS A, V $ LOCKED_OBJECT B, SYS. GV _ $ session c where. OBJECT_ID = B. OBJECT_ID and B. PROCESS = C. processorder by 1, 2

In the following results, we can see that only the initiate session of the deduplicate statement is locked, and no other users cause the lock table. Does this mean that the statement is still being executed? Start to solve the problem with questions.

1 BJHYL tb_bj_banker_ETL 15 18 9000 913 BJHYL Administrator 4036: 972 3 WORKGROUP \ backdb active dedicated 913 3381 plsqldev.exe
2 BJHYL tb_bj_banker_ETL 15 18 9000 913 BJHYL Administrator 4036: 972 3 WORKGROUP \ backdb inactive dedicated 649 41791 plsqldev.exe
3 BJHYL tb_bj_banker_ETL 15 18 9000 913 BJHYL Administrator 4036: 972 3 WORKGROUP \ backdb inactive dedicated 817 27777 plsqldev.exe
4 BJHYL tb_bj_banker_ETL 15 18 9000 913 BJHYL Administrator 4036: 972 3 WORKGROUP \ backdb inactive dedicated 841 1981 plsqldev.exe


5. batch processing is adopted to solve the problem of deduplication
Since direct de-duplication cannot proceed smoothly, I thought of a batch de-duplication method and tried it.

First time: delete from tb_bj_banker_etl where (id) in (select id from tb_bj_banker_etl group by id having count (*)> 1) and rowid not in (select max (rowid) from tb_bj_banker_etl group by id having count (*)> 1) and rownum <= 100000; commit; Second: delete from tb_bj_banker_etl where (id) in (select id from tb_bj_banker_etl group by id having count (*)> 1) and rowid not in (select max (rowid) from tb_bj_banker_etl group by I D having count (*)> 1) and rownum <= 100000; commit ;..................... Eighth time: delete from tb_bj_banker_etl where (id) in (select id from tb_bj_banker_etl group by id having count (*)> 1) and rowid not in (select max (rowid) from tb_bj_banker_etl group by id having count (*)> 1); commit;

 

Result: After Dividing more than 0.8 million data into 10 million data records for single deduplication operation, the total duration is over 140 seconds, and the goal of removing 0.8 million data is achieved. However, the reason for direct handling of zombie cases remains to be tracked and analyzed in the future.

 

Original Works are from the blog of "Deep Blue blog". You are welcome to repost them. Please indicate the source when you repost them. Otherwise, the copyright will be held legally responsible.

Deep Blue blog: http://blog.csdn.net/huangyanlong/article/details/46041735

 

Series link _ 20150523:

Blue growth note-chasing DBA (1): traveling on the road to Shandong 

Blue growth notes-Chase DBA (2): Install! Install! Long-lost memories have aroused my new awareness of DBAs.

Blue growth note-chasing DBA (3): importing and exporting data on antiques becomes a problem 

Blue growth note-chasing DBA (4): recalling the sorrow of teenagers, and exploring oracle Installation (10g and 11g in Linux) 

Blue growth note-chasing DBA (5): Not talking about technology or business, annoying Application Systems

Blue growth note-chasing DBA (6): doing things and being human: Small technology, great human

Blue growth note-Chase DBA (7): Basic commands, foundation stone

Blue growth notes-chasing DBA (8): repicking SP reports and recalling oracle's STATSPACK Experiment

Blue growth-chasing DBA (9): Chasing DBA, new planning, new departure

Blue growth note-chasing DBA (10): Flying knife defense, familiarity rather than expertise: Playing with middleware Websphere

Blue growth note-chasing DBA (11): It's easy to go home and wake up.

Blue growth notes-Chase DBA (12): seven days and seven gains of SQL

BlueGrowth story-chasing DBA (13): Coordinating hardware manufacturers. Six Stories: What you see: servers, storage, switches ......"

Blue growth note-chasing DBA (14): An unforgettable "Cloud" end, started hadoop deployment 

Blue growth notes-Chase DBA (15 ):Who thinks FTP is "simple "?

Blue growth note-chasing DBA (16): DBA also drank alcohol and was rejected

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.