GreyOriginal address: http://www.cnblogs.com/greyzeng/p/5524614.html
Requirements:
Import a table history data from an Oracle database into a table in MySQL.
source table (Oracle): table1
Target table (MySQL): table2
Data Volume: 20,000,000
Ideas:
due to the limited memory resources of the server, it is not possible to use kettle to import the target table Tens data from the source table one time, and consider the method of paging import for data transfer, namely:
According to the actual situation to set a data volume per processing, such as: 5,000, and then according to the total number of data and the amount of data processed each time to calculate a few pages,
Assuming that the total data volume is: 20,000,000, so the page is: 20,000,000/5,000=4,000
Note: If there are decimals, the fractional part counts as one page, for example: 20.3 count 21 pages
steps:
According to the requirements of the conditions, the first data paging:
Data Volume: 20,000,000
Data volume per page: 5,000
Number of pages: 4,000
source table (Oracle): table1
Target table (MySQL): table2
Main flow: TRANSFER_TABLE1_TO_TABLE2.KJB
Process Description:
TRANSFER_TABLE1_TO_TABLE2.KJB: Main process
BUILD_QUERY_PAGE.KTR: Constructing page Cursors
LOOP_EXECUTE.KJB: Perform data import operations based on number of pages
We look at the composition of each part separately:
build_query_page.ktr: Constructing page Cursors
In this step, we need to construct a data structure like this:
Where P_page is the table header, the remainder is the number of pages,
Note: Take the page number here I construct it through the rownum of this table
Sql:
Select rownum as P_page from Mds.mds_balances_hist whererownum<=4000
Specific implementations such as:
LOOP_EXECUTE.KJB: Perform data import operations based on number of pages
In the previous step, we constructed the number of pages, and in this step we walked through the page numbers in the previous step, finding the corresponding data set by the number of pages,
These include SET_VALUES.KTR and EXECUTE_BY_PAGE.KTR two conversions
LOOP_EXECUTE.KJB specific implementations are as follows:
SET_VALUES.KTR: Represents the number of pages obtained from the previous step
EXECUTE_BY_PAGE.KTR: Indicates data import operations based on number of pages
Where Query_by_page uses Oracle's classic three-tier nested paging algorithm:
SELECT b.rn,b.* from
(
SELECT A.*, ROWNUM RN
from (SELECT * from Mds.mds_balances_hist) A
WHERE ROWNUM <= (${var_p_page}*5000)
) b
WHERE RN >= ((${var_p_page}-1) *5000+1)
Note: ${var_p_page} is the number of page numbers to get each time.
Select_field to set the name of the column you want to import:
The output_target purpose is to output to the target table table2:
Because you want to iterate over the results of the last execution, you need to do the following in TRANSFER_TABLE1_TO_TABLE2.KJB's LOOP_EXECUTE.KJB:
Finally, execute the TRANSFER_TABLE1_TO_TABLE2.KJB.
Summary:
Through the above method, we can solve the problem of low memory, the large amount of data in the import work between different databases.
Using kettle paging to handle large data-volume extraction tasks