Kettle is an open-source ETL Tool written in Java. It can be run on Windows, Linux, and Unix. It does not need to be installed green, and data extraction is efficient and stable.
Business Model: there is a large table in a relational database, which is designed as a parity database storage. Each database has 100 identical tables, each table stores 1000 million data records, and the fields are switched to the next table. This data needs to be synchronized to hive (HDFS) and extracted cyclically. If incremental fields are taken for extraction (the table in which the incremental data is stored every day, the odd or even databases are unknown ).
A sqoop runs directly from MySQL to hive, so some special characters may cause sqoop to terminate abnormally. In this way, a large number of databases on the server are retrieved in a loop, which puts a lot of pressure on the server and can easily paralyze the server.
B uses kettle to handle the conversion process. Kettle supports querying data by page to reduce the pressure on the server to a certain extent.
1. first look at the summary diagram (the following version is 5.1)
2. Set Environment Variables
3: JavaScript code
The edited content is
VaR count;
Count = parent_job.getvariable ("v_id ");
If (COUNT = 10 ){
False;
} Else {
Count ++;
Parent_job.setvariable ("v_id", count );
True;
}
4. New Conversion
Edit the conversion. The content is as follows:
5 dummy condition judgment, not modified
It is important to set the loop logic, arrow direction and type.
6. Execute the job and test the cycle.
In addition, the kettle loop of version 3.2 is attached.
Set Variables
Set judgment Conditions
Conversion table input file output
JS judgment
ETL Tool and kettle implement Loop