Use kettle to batch download files and kettle to batch download files
Use kettle to batch download files
In the latest projects, you need to download files in batches and import the results into the data. kettle is indeed competent through some experimental tests. The problem is that if you download files in batches through http, this article will describe in detail. If you already know the basic knowledge of kettle, you can check my tutorials.
The sample code in this article can be downloaded here.
Main job
Kettle does not use http to Download files, but the job has corresponding steps. Therefore, the main job calls the sub-job (Download. kjb), the list of files to be downloaded is provided through a conversion.
File List Conversion
Here, I only use the data table step to provide five record files with two fields: "filename" and "url" (the url content is based on your business needs, here, we use an example for testing. kjb is accessible, and the "copy rows to result" step in the job category is used later.
Downloaded job
The download job only downloads a single file, but we need to run each record in the file list. In the advanced settings of the job, select "Execute for every input row" to implement cyclic calling.
In the http step, we need to set filename and url. After the two fields are entered, we use the variables $ {URL} and $ {FILENAME }, to make the data correspond to the variable relationship, we need to do two things.
1) You must declare the "URL" and "FILENAME" Naming parameters.
In job attribute settings, set in the named parameters tab.
2) Select the ing relationship between the specified field and the variable (named parameter ).
Double-click the download job step in the main job, and then name the parameter option to configure the ing relationship. At the same time, the PATH variable is defined in the main job to determine the location where the file is stored. The variable is used in the http step to determine the location and name of the file.
Conclusion
After running the program, you can successfully download the file in the c: \ temp directory. It is not difficult to read the file result into the database. Other articles are required.
I have created multiple kettle script files. How can I execute these scripts in batches (if they are executed one by one, it is troublesome)
For I in $ (ls * kettle *); do
Source $ I
Done
How to Use kettle?
You can restart the kettle.exe file or the spoon. bat file! Before restarting, delete two files, under the C: \ Documents and Settings \ User Directory, which are ". kettle" and ". pentaho "! These two files record some configuration information during kettle usage!