Use rowid to quickly update massive data online (zt)

Source: Internet
Author: User
Rapid online updating of massive data (zt) http://www.itpub.net/thread-1052077-1-2.html using rowid

Recently, I have been tossing the issue of updating large tables, and today I have made a breakthrough. I am excited to post a post to share my experiences with you and discuss whether it may further increase the processing speed.
The problem is that there is no partition in a 0.5 billion-record table. Because a redundant field is added, you need to update this large table based on another table (0.48 billion entries. The following is a specific description:
Environment: HP-UX 11i + oracle9.2.0.8 + raid
Table to be updated: T1 (id1 number, Id2 number, curr_count number,...) -- id1 unique 0.5 billion records> 60 GB
Update Data source: T2 (Id2 number, curr_count number) -- Id2 unique 0.48 billion
Update logic: Find the corresponding record (t2.id2 = t1.id2) in T1 for each record in T2. update t1.curr _ COUNT = t2.curr _ count
Restrictions: You can only update the table online (because the application has been accessing this table, you cannot use insert select). You cannot use too many system resources and must update the table within three days.
Original practice:
Declare
Cursor cur_t2 is
Select/* + use_hash (T1, T2) parallel (T1, 16) parallel_index (ix_t1_id2, 16 )*/
T2.id2, t2.curr _ count, t1.rowid row_id
From T1, T2
Where t1.id2 = t2.id2;
V_counter number;
Begin
V_counter: = 0;
For row_t2 in cur_t2 Loop
Update T1 set curr_count = row_t2.curr_count
Where rowid = row_t2.row_id;
V_counter: = v_counter + 1;
If (v_counter> = 1000) then
Commit;
V_counter: = 0;
End if;
End loop;
Commit;
End;
/
Problem: update is too slow. It takes 22 days to update 260 rows/s!
After investigation, we found that the execution efficiency of the update statement is too low. Further tracking found that at least 90% of the update time is waiting for the DB file sequential read event. Why are there so many disks waiting? Let's look at disk reads again and understand that the original update statement generated a large number of physical reads, which of course slowed down. Presumably, table T1 is too big, data buffer cannot be installed, and other tables compete with it. So I just updated a piece of data and read a data block from the disk to the memory, soon it was squeezed out. The next time you update other data on the block, you have to read it from the disk. In this way, the efficiency of the data buffer cache is very low, and it is basically not used.
How can this problem be solved? It is better to be updated in the order of data blocks, so that after the first row of data in a data block is updated, other rows in the data block do not need to be read from the disk (it is unlikely that it will be squeezed out of the memory so quickly). Physical read is reduced and the speed will certainly be faster. But how to update data blocks in sequence? I have thought of the rowid structure: Data Object number (6-Bit String) + relative file number (3-Bit String) + block number (6-Bit String) + row number (3-Bit String), then the rowid sequence should be the data block sequence. So I modified PLSQL:
Alter table T1 storage (buffer_pool keep); -- keep buffer pool size = 6 GB
Declare
Cursor cur_t2 is
Select/* + use_hash (T1, T2) parallel (T1, 16) parallel_index (ix_t1_id2, 16 )*/
T2.id2, t2.curr _ count, t1.rowid row_id
From T1, T2
Where t1.id2 = t2.id2
Order by t1.rowid;
V_counter number;
Begin
V_counter: = 0;
For row_t2 in cur_t2 Loop
Update T1 set curr_count = row_t2.curr_count
Where rowid = row_t2.row_id;
V_counter: = v_counter + 1;
If (v_counter> = 1000) then
Commit;
V_counter: = 0;
End if;
End loop;
Commit;
End;
/
Alter table T1 storage (buffer_pool default );

The update speed is greatly improved: 10000 rows/s. The analysis trace file indicates that the DB file sequential reads and disk reads are rarely changed. The update will be completed within 20 hours.

TIPS: the amount of data processed has not been reduced, but the processing sequence can be changed to greatly improve the performance.

========================================================== ======================================
* Postscript 3: experiment with the influence of keep
(1) re-run test 1 (not order by rowid)
Change the statement to alter table T1 storage (buffer_pool keep );
Processing speed: 73 ~ 74 rows/second
(2) run the test again. 2 (order by rowid ):
Change the statement to alter table T1 storage (buffer_pool default );
Processing speed: 10 thousand entries per second
Conclusion: From this test, we can confirm the previous inference that whether the buffer_pool attribute of the table is set to keep has little impact on the processing speed and can be ignored. The reason for faster processing is that order by rowid processes data in block order, greatly reducing physical reads.

* Postscript 2: about order by rowid:
Http://rdc.taobao.com/blog/dba/html/199_oracle_rowid_order.html
This article says order by rowid causes a large number of physical reads to be queried. In fact, this is also true in the second test in this article. The opening time of cursor is longer than that of rowid without order by, because sort is more important. However, this is worthwhile, because there will be a large number of updates in the future, saving a lot of physical reads.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.