Use kettle to batch download files and kettle to batch download files

Source: Internet
Author: User

Use kettle to batch download files and kettle to batch download files

Use kettle to batch download files

In the latest projects, you need to download files in batches and import the results into the data. kettle is indeed competent through some experimental tests. The problem is that if you download files in batches through http, this article will describe in detail. If you already know the basic knowledge of kettle, you can check my tutorials.

The sample code in this article can be downloaded here.

Main job

Kettle does not use http to Download files, but the job has corresponding steps. Therefore, the main job calls the sub-job (Download. kjb), the list of files to be downloaded is provided through a conversion.

 

File List Conversion

Here, I only use the data table step to provide five record files with two fields: "filename" and "url" (the url content is based on your business needs, here, we use an example for testing. kjb is accessible, and the "copy rows to result" step in the job category is used later.

 

 

Downloaded job

The download job only downloads a single file, but we need to run each record in the file list. In the advanced settings of the job, select "Execute for every input row" to implement cyclic calling.

In the http step, we need to set filename and url. After the two fields are entered, we use the variables $ {URL} and $ {FILENAME }, to make the data correspond to the variable relationship, we need to do two things.

1) You must declare the "URL" and "FILENAME" Naming parameters.

In job attribute settings, set in the named parameters tab.


2) Select the ing relationship between the specified field and the variable (named parameter ).

Double-click the download job step in the main job, and then name the parameter option to configure the ing relationship. At the same time, the PATH variable is defined in the main job to determine the location where the file is stored. The variable is used in the http step to determine the location and name of the file.

 

Conclusion

After running the program, you can successfully download the file in the c: \ temp directory. It is not difficult to read the file result into the database. Other articles are required.



I have created multiple kettle script files. How can I execute these scripts in batches (if they are executed one by one, it is troublesome)

For I in $ (ls * kettle *); do
Source $ I
Done

How to Use kettle?

You can restart the kettle.exe file or the spoon. bat file! Before restarting, delete two files, under the C: \ Documents and Settings \ User Directory, which are ". kettle" and ". pentaho "! These two files record some configuration information during kettle usage!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.