"Kettle" Data integration ftp download + local photo file import Oracle Database

Source: Internet
Author: User

First, data integration business Scenario

1.1 Background

Because of a system of GA to adjust, resulting in the original from the system backup database obtained from the corresponding data resources can not be properly obtained, the subsequent data unified by the GA a system daily ztry photo data from the previous day is compressed and uploaded to the designated FTP server, need to download from the FTP, After decompression, the JPEG photo data is parsed and loaded into the database.

1.2 Goals

Need to set up a job can be scheduled from the FTP download server to download the zip file to the local designated folder, the zip file extracted, to obtain JPEG format photos, through the kettle conversion of photos to parse, and according to certain rules will be the name of the photos and other information to parse into a unique primary key, In order to achieve the integration of local photo files into the database for storage, after the integration of the zip file to the backup folder, while the extracted folder and photo files deleted.

1.3 Tools and Components

Tools: IDI (ETL tool based on kettle two-time development)

Components: ① Conversions: Input components, output components,

② Job:

Second, data integration design ideas

2.1 Design ideas of Job scheduling

Design ideas as shown in the process:

1. Get the time variable

When the job starts, it calls the time variable transformation to capture the time variable and gets the date of the format processed yesterday.

2, check the directory situation

Go to the local designated download folder to see if the zip file for yesterday's date has been downloaded in the folder.

3. Execution judgment

If the download folder is empty, or does not have a zip file of yesterday's date, jump into the ftp download, otherwise, jump out of the job stop, waiting for the next job to start.

4. FTP Download file

Download the zip file that meets the date requirements from FTP based on the date and save it in the local specified downloads folder.

5. Unzip the zip file

Unzip the zip file to the specified decompression folder.

6, Data Cleaning integration

Call the Data cleansing processing integration transformation, capture the data in the zip file and clean, transform, process and load the data.

7. Delete Directory

Data Cleansing conversion Complete, the data loaded into the database after normal loading, the extracted directories and files are deleted, freeing up space.

8. Moving files

Transfer the zip file to the backup folder for storage, so that later data problems can be recovered or checked.

9. End

Ends the entire job, waiting for the job to start the next time.

2.2 Conversion Extraction Design ideas

2.2.1 Capturing time variables

Design ideas as shown in the process:

1. Get the system date and time

Get the current system yesterday 0 point time, even if this day is July 15, 2018 18:36 47 seconds, the time to capture should be 2018/07/14 00:00:00.000.

2. Date and time formatting

Format the above 2018/07/14 00:00:00.000 as 20180714.

3. Set the time variable

Saves the formatted date data as a variable in the variable passed in.

2.2.2 Data Cleaning Integration

The design ideas are as follows:

1. Capture Photo file name

2. Photo File parsing

Parses the photo file into binary data.

3, photo data cleaning integration

The data in the data flow is cleaned, transformed and integrated into the target data stream through the components.

4. Data loading

Load the destination data stream into the database for storage.

Third, data integration solutions

3.1 Job scheduling model

3.1.1 Job scheduling process

Diagram: Job scheduling model

3.1.2 Job Model Analysis

3.1.2.1 Start

3.1.2.2 Call time Variable transformation model

3.1.2.3 Check if the directory is empty

3.1.2.4 Jump Job

3.1.2.5 FTP Download

3.1.2.6 Unzip a file

3.1.2.7 Call Data Cleansing Integrated transformation model

3.1.2.8 Deleting an extract directory

3.1.2.9 Moving Compressed files

3.1.2.10 Job End

3.2 Integrated Transformation model

3.2.1 Integrated Conversion process

Diagram: Capture time variable conversions.

Diagram: Data Cleansing Integration transformation

3.2.2 Integration Conversion Process Analysis

3.2.2.1 Capture Time variable conversion

3.2.2.1.1 Get System Date Time

3.2.2.1.2 Date Time formatting

Script here


var sys_date = Date2str (today, ' yyyyMMdd ');

3.2.2.1.3 Setting the time variable

3.2.2.2 Data Cleansing Integrated Conversion

3.2.2.2.1 Get file name

3.2.2.2.2 Photo Conversion binary

/script here

var file = new Java.io.File (filename);

var fileinputstream = new Java.io.FileInputStream (file);

var Content = Org.pentaho.di.core.Const.createByteArray (File.length ());

Fileinputstream.read (Content,0,file.length ());

Fileinputstream.close ();

3.2.2.2.3 Split Field

3.2.2.2.4 string substitution

3.2.2.2.5 Field Selection

3.2.2.2.6 career

3.2.2.2.7 Data modification

Script here

var zp01 = zp_1;
var zp02 = zp_2;
var zp03 = zp_3;
var zp04 = Zp_4;
var zp05 = zp_5;
var zp06 = zp_6;

var zp01modifytime = Zp_1_lastmodifytime;
var zp02modifytime = Zp_2_lastmodifytime;
var zp03modifytime = Zp_3_lastmodifytime;
var zp04modifytime = Zp_4_lastmodifytime;
var zp05modifytime = Zp_5_lastmodifytime;
var zp06modifytime = Zp_6_lastmodifytime;

var ZPSCRQ = replace (dirname, ' _zp ', ');
var dirname1 = ' ftp://10.78.1.30/' + dirname + '. zip ';

if (zp01==null && zp02==null && zp03==null) {
zp01=zp04;
ZP02=ZP05;
zp03=zp06;
Zp01modifytime=zp04modifytime;
Zp02modifytime=zp05modifytime;
Zp03modifytime=zp06modifytime;

}

3.2.2.2.8 Getting System Information

3.2.2.2.9 Field Selection

3.2.2.2.10 Filtering Records

3.2.2.2.11 inserting updates

3.2.2.2.12 Discard Invalid data

Iv. Other Notes

No.

"Kettle" Data integration ftp download + local photo file import Oracle Database

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.