Use kettle to batch download files

Source: Internet
Author: User
Using kettle to batch download files the latest projects need to batch download files and import the results into the data. kettle is indeed competent through some experimental tests. The problem is that if you download files in batches through http, this article will describe in detail. If you already know the basic knowledge of kettle, you can check my tutorials. This article

Using kettle to batch download files the latest projects need to batch download files and import the results into the data. kettle is indeed competent through some experimental tests. The problem is that if you download files in batches through http, this article will describe in detail. If you already know the basic knowledge of kettle, you can check my tutorials. This article

Use kettle to batch download files

In the latest projects, you need to download files in batches and import the results into the data. kettle is indeed competent through some experimental tests. The problem is that if you download files in batches through http, this article will describe in detail. If you already know the basic knowledge of kettle, you can check my tutorials.

The sample code in this article can be downloaded here.

Main job

Kettle does not use http to Download files, but the job has corresponding steps. Therefore, the main job calls the sub-job (Download. kjb), the list of files to be downloaded is provided through a conversion.

File List Conversion

Here, I only use the data table step to provide five record files with two fields: "filename" and "url" (the url content is based on your business needs, here, we use an example for testing. kjb is accessible, and the "copy rows to result" step in the job category is used later.

Downloaded job

The download job only downloads a single file, but we need to run each record in the file list. In the advanced settings of the job, select "Execute for every input row" to implement cyclic calling.

In the http step, we need to set filename and url. After the two fields are entered, we use the variables $ {URL} and $ {FILENAME }, to make the data correspond to the variable relationship, we need to do two things.

1) You must declare the "URL" and "FILENAME" Naming parameters.

In job attribute settings, set in the named parameters tab.

2) Select the ing relationship between the specified field and the variable (named parameter ).

Double-click the download job step in the main job, and then name the parameter option to configure the ing relationship. At the same time, the PATH variable is defined in the main job to determine the location where the file is stored. The variable is used in the http step to determine the location and name of the file.

Conclusion

After running the program, you can successfully download the file in the c: \ temp directory. It is not difficult to read the file result into the database. Other articles are required.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.