Use rowid to quickly update massive data online (zt)

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Rapid online updating of massive data (zt) http://www.itpub.net/thread-1052077-1-2.html using rowid

Recently, I have been tossing the issue of updating large tables, and today I have made a breakthrough. I am excited to post a post to share my experiences with you and discuss whether it may further increase the processing speed.
The problem is that there is no partition in a 0.5 billion-record table. Because a redundant field is added, you need to update this large table based on another table (0.48 billion entries. The following is a specific description:
Environment: HP-UX 11i + oracle9.2.0.8 + raid
Table to be updated: T1 (id1 number, Id2 number, curr_count number,...) -- id1 unique 0.5 billion records> 60 GB
Update Data source: T2 (Id2 number, curr_count number) -- Id2 unique 0.48 billion
Update logic: Find the corresponding record (t2.id2 = t1.id2) in T1 for each record in T2. update t1.curr _ COUNT = t2.curr _ count
Restrictions: You can only update the table online (because the application has been accessing this table, you cannot use insert select). You cannot use too many system resources and must update the table within three days.
Original practice:
Declare
Cursor cur_t2 is
Select/* + use_hash (T1, T2) parallel (T1, 16) parallel_index (ix_t1_id2, 16 )*/
T2.id2, t2.curr _ count, t1.rowid row_id
From T1, T2
Where t1.id2 = t2.id2;
V_counter number;
Begin
V_counter: = 0;
For row_t2 in cur_t2 Loop
Update T1 set curr_count = row_t2.curr_count
Where rowid = row_t2.row_id;
V_counter: = v_counter + 1;
If (v_counter> = 1000) then
Commit;
V_counter: = 0;
End if;
End loop;
Commit;
End;
/
Problem: update is too slow. It takes 22 days to update 260 rows/s!
After investigation, we found that the execution efficiency of the update statement is too low. Further tracking found that at least 90% of the update time is waiting for the DB file sequential read event. Why are there so many disks waiting? Let's look at disk reads again and understand that the original update statement generated a large number of physical reads, which of course slowed down. Presumably, table T1 is too big, data buffer cannot be installed, and other tables compete with it. So I just updated a piece of data and read a data block from the disk to the memory, soon it was squeezed out. The next time you update other data on the block, you have to read it from the disk. In this way, the efficiency of the data buffer cache is very low, and it is basically not used.
How can this problem be solved? It is better to be updated in the order of data blocks, so that after the first row of data in a data block is updated, other rows in the data block do not need to be read from the disk (it is unlikely that it will be squeezed out of the memory so quickly). Physical read is reduced and the speed will certainly be faster. But how to update data blocks in sequence? I have thought of the rowid structure: Data Object number (6-Bit String) + relative file number (3-Bit String) + block number (6-Bit String) + row number (3-Bit String), then the rowid sequence should be the data block sequence. So I modified PLSQL:
Alter table T1 storage (buffer_pool keep); -- keep buffer pool size = 6 GB
Declare
Cursor cur_t2 is
Select/* + use_hash (T1, T2) parallel (T1, 16) parallel_index (ix_t1_id2, 16 )*/
T2.id2, t2.curr _ count, t1.rowid row_id
From T1, T2
Where t1.id2 = t2.id2
Order by t1.rowid;
V_counter number;
Begin
V_counter: = 0;
For row_t2 in cur_t2 Loop
Update T1 set curr_count = row_t2.curr_count
Where rowid = row_t2.row_id;
V_counter: = v_counter + 1;
If (v_counter> = 1000) then
Commit;
V_counter: = 0;
End if;
End loop;
Commit;
End;
/
Alter table T1 storage (buffer_pool default );

The update speed is greatly improved: 10000 rows/s. The analysis trace file indicates that the DB file sequential reads and disk reads are rarely changed. The update will be completed within 20 hours.

TIPS: the amount of data processed has not been reduced, but the processing sequence can be changed to greatly improve the performance.

========================================================== ======================================
* Postscript 3: experiment with the influence of keep
(1) re-run test 1 (not order by rowid)
Change the statement to alter table T1 storage (buffer_pool keep );
Processing speed: 73 ~ 74 rows/second
(2) run the test again. 2 (order by rowid ):
Change the statement to alter table T1 storage (buffer_pool default );
Processing speed: 10 thousand entries per second
Conclusion: From this test, we can confirm the previous inference that whether the buffer_pool attribute of the table is set to keep has little impact on the processing speed and can be ignored. The reason for faster processing is that order by rowid processes data in block order, greatly reducing physical reads.

* Postscript 2: about order by rowid:
Http://rdc.taobao.com/blog/dba/html/199_oracle_rowid_order.html
This article says order by rowid causes a large number of physical reads to be queried. In fact, this is also true in the second test in this article. The opening time of cursor is longer than that of rowid without order by, because sort is more important. However, this is worthwhile, because there will be a large number of updates in the future, saving a lot of physical reads.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Use rowid to quickly update massive data online (zt)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Use rowid to quickly update massive data online (zt)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support