Using kettle paging to handle large data-volume extraction tasks

Source: Internet
Author: User

GreyOriginal address: http://www.cnblogs.com/greyzeng/p/5524614.html

Requirements:

Import a table history data from an Oracle database into a table in MySQL.

source table (Oracle): table1

Target table (MySQL): table2

Data Volume: 20,000,000

Ideas:

due to the limited memory resources of the server, it is not possible to use kettle to import the target table Tens data from the source table one time, and consider the method of paging import for data transfer, namely:

According to the actual situation to set a data volume per processing, such as: 5,000, and then according to the total number of data and the amount of data processed each time to calculate a few pages,

Assuming that the total data volume is: 20,000,000, so the page is: 20,000,000/5,000=4,000

Note: If there are decimals, the fractional part counts as one page, for example: 20.3 count 21 pages

steps:

According to the requirements of the conditions, the first data paging:

Data Volume: 20,000,000

Data volume per page: 5,000

Number of pages: 4,000

source table (Oracle): table1

Target table (MySQL): table2

Main flow: TRANSFER_TABLE1_TO_TABLE2.KJB

Process Description:
TRANSFER_TABLE1_TO_TABLE2.KJB: Main process

BUILD_QUERY_PAGE.KTR: Constructing page Cursors

LOOP_EXECUTE.KJB: Perform data import operations based on number of pages

We look at the composition of each part separately:

build_query_page.ktr: Constructing page Cursors

In this step, we need to construct a data structure like this:

Where P_page is the table header, the remainder is the number of pages,

Note: Take the page number here I construct it through the rownum of this table

Sql:

Select rownum as P_page from Mds.mds_balances_hist whererownum<=4000

Specific implementations such as:

LOOP_EXECUTE.KJB: Perform data import operations based on number of pages

In the previous step, we constructed the number of pages, and in this step we walked through the page numbers in the previous step, finding the corresponding data set by the number of pages,

These include SET_VALUES.KTR and EXECUTE_BY_PAGE.KTR two conversions

LOOP_EXECUTE.KJB specific implementations are as follows:

SET_VALUES.KTR: Represents the number of pages obtained from the previous step

EXECUTE_BY_PAGE.KTR: Indicates data import operations based on number of pages

Where Query_by_page uses Oracle's classic three-tier nested paging algorithm:

SELECT b.rn,b.* from

(

SELECT A.*, ROWNUM RN

from (SELECT * from Mds.mds_balances_hist) A

WHERE ROWNUM <= (${var_p_page}*5000)

) b

WHERE RN >= ((${var_p_page}-1) *5000+1)

Note: ${var_p_page} is the number of page numbers to get each time.

Select_field to set the name of the column you want to import:

The output_target purpose is to output to the target table table2:

Because you want to iterate over the results of the last execution, you need to do the following in TRANSFER_TABLE1_TO_TABLE2.KJB's LOOP_EXECUTE.KJB:

Finally, execute the TRANSFER_TABLE1_TO_TABLE2.KJB.

Summary:

Through the above method, we can solve the problem of low memory, the large amount of data in the import work between different databases.

Using kettle paging to handle large data-volume extraction tasks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.