Oracle 的資料批量讀取__Oracle

來源:互聯網
上載者:User

有這麼一個大表假設數億條記錄,純資料文本百Gb以上,批量讀取的話咋整。

SQLLoader,算是一個選擇吧。嗯,開並行,資料表本身也許已經partition過了。可是如果自己批量unload資料應該怎麼做呢。多線程並行是個好主意,但是怎麼並行呢。如果你在想什麼數字主鍵均分的想法,那你也許不會有好的結果。原因,主鍵的範圍範圍不一定是均勻分布的,這會造成你某些背景工作執行緒很忙,有些卻草草結束了。深層次的原因,你也許會碰到不同線程同時爭用同一塊資料檔案的情況,這不是也許,可以說是肯定會發生的。那還有更直接的方法嗎。不幸的是,還真有。

思路就是,Oracle存放的資料檔案都是有據可查的,包括檔案編號,blocks個數等等。你要做的就是指定相關資料檔案的起止ROWID就可以了。一般來說表的擴充都是自動均勻的,比如20000條記錄一個等等。這樣我們基本上可以達到均勻分配任務,而且又避免了同一個資料檔案的IO爭用問題。

給出兩個SQL,一個是取自Quest 的 Sqoop Oracle 外掛程式程式,另一個取自前人的經驗,(你知道我在幹嘛了)。

    SELECT data_object_id,           file_id,           relative_fno,           file_batch,           MIN (start_block_id) start_block_id,           MAX (end_block_id) end_block_id,           SUM (blocks) blocks           FROM (SELECT o.data_object_id,           e.file_id,           e.relative_fno,           e.block_id start_block_id,           e.block_id + e.blocks - 1 end_block_id,           e.blocks,           CEIL (              SUM (                 e.blocks)              OVER (PARTITION BY o.data_object_id, e.file_id                    ORDER BY e.block_id ASC)              / (SUM (e.blocks)                    OVER (PARTITION BY o.data_object_id, e.file_id)                 / :numchunks))              file_batch           FROM dba_extents e, dba_objects o, dba_tab_subpartitions tsp           WHERE     o.owner = :owner           AND o.object_name = :object_name           AND e.owner = :owner           AND e.segment_name = :object_name           AND o.owner = e.owner           AND o.object_name = e.segment_name           AND (o.subobject_name = e.partition_name                OR (o.subobject_name IS NULL AND e.partition_name IS NULL))           AND o.owner = tsp.table_owner(+)           AND o.object_name = tsp.table_name(+)           AND o.subobject_name = tsp.subpartition_name(+)  )            GROUP BY data_object_id,                    file_id,                    relative_fno,                    file_batch           ORDER BY data_object_id,                    file_id,                    relative_fno,                    file_batch                   ;


下面這個SQL已經貼心的把起止的ROWID給你算出來了,還要啥單車啊。

                  select grp,       dbms_rowid.rowid_create( 1, data_object_id, lo_fno, lo_block, 0 ) min_rid,       dbms_rowid.rowid_create( 1, data_object_id, hi_fno, hi_block, 10000 ) max_rid  from (select distinct grp,       first_value(relative_fno)        over (partition by grp order by relative_fno, block_id              rows between unbounded preceding and unbounded following) lo_fno,       first_value(block_id    )       over (partition by grp order by relative_fno, block_id            rows between unbounded preceding and unbounded following) lo_block,       last_value(relative_fno)        over (partition by grp order by relative_fno, block_id        rows between unbounded preceding and unbounded following) hi_fno,       last_value(block_id+blocks-1)        over (partition by grp order by relative_fno, block_id         rows between unbounded preceding and unbounded following) hi_block,       sum(blocks) over (partition by grp) sum_blocks  from (select relative_fno,       block_id,       blocks,       trunc( (sum(blocks) over (order by relative_fno, block_id)-0.01) /              (sum(blocks) over ()/:numchunks) ) grp  from dba_extents where segment_name = upper(:object_name)   and owner = :owner   order by block_id       )       ),       (select data_object_id from dba_objects where owner=:owner and object_name = upper(:object_name) )       order by grp;



參考:

http://agstamy.blogspot.sg/2011/11/spliting-very-large-table-to-pieces.html

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:29986713177238

http://www.quest.com/data-connector-for-oracle-and-hadoop/

https://github.com/QuestSoftwareTCD/OracleSQOOPconnector



聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.