Data Load tool (ETL) design (Extract Transform load)

Source: Internet
Author: User
Tags date log ole split
Design | data

1. The system expects

Use the graphical interface to mount the external data files into the database and add or replace the mounted data to the target database according to the specified rules.

Supports multiple format data files.

Support file loading rules can be customized flexibly.

Supports filtering configuration for the contents of the specified data file.

Implement template management.

Operation logging and log management.

Load-sink result tracking.

The loading process embodies the batch concept.

Load the library process monitoring.

2. noun explanation

The parser (Parser)---is used to convert data files in different formats to a uniform format (a symbol-delimited file format, for example: #content #, #content2 #, etc.).

Mapping (mapping)---the content in the data file corresponds to the fields in the preprocessing library, supporting merging, splitting, and string interception in the corresponding process.

Filter---to replace, delete, and judge the contents of a specified field to replace, to determine equal to a value, and so on.

3. System description

The system uses the graphical interface to configure, its main work flow is divided into two parts: preprocessing part and processing part respectively. Preprocessing part of the workflow is: first of all formats of data files through a predefined parser (Parser) for the unified format of the transformation. Generates a preprocessing table (PRD) in the system based on the structure (format) of the converted data file, and then converts the structure (transformation) According to business rules, including field mappings (mapping) and field content filtering (filter) operations. Finally, the corresponding content in the data file is added to the generated preprocessing table according to the mapping relationship and filtering rules, and logs are logged. End of this preprocessing section. Each stage in the preprocessing section has its corresponding template, which can be set up directly by selecting its corresponding template.

Processing part of the workflow is: first select the source database that is (preprocessing library), select the destination database (procurement handover library, processing Library), and then select the Library rules (predefined adaptor configuration file) finally set the log information. At this stage, you can directly select the templates you have defined to set up.

After you have completed both of these settings, you will eventually import the data file contents into the target database according to the specified rules.

The above process is only to do the entire loading process configuration steps to explain, in the actual loading operation can directly select the library template, this template is included for each stage template of the total template. This template can be directly selected for all stages of the library setup, and then click OK to install the library operation.

4. function realization

When loading the data file, the system adopts the uniform method of external format, selects the corresponding data format interpreter (Parser) and sets its transformation rules to transform the data files of different formats into the same data format. The setup content can then be saved as a file format conversion template.

Maps the contents of a data file to a preprocessed database field after importing a formatted data file into the system. and specifies the field type for the preprocessed database field. The mapping Operation supports field merging and field splits and field interception operations, with the symbol "+", "directional arrows", "subString" (string start position, number of characters intercepted) to support the identity and the corresponding operation.

Field mapping operation, filter the contents of the field, filtering operation to achieve the specified field content to replace, delete, judge if equal to a value is replaced with a value, if not equal to a value is replaced with a value operation. Filter fields (for all selectable fields), field contents (a value of the field contents or the entire field) also support the conversion of date format "04/12/2004" to "2004/04/12" date format conversion operations.

After filtering the contents of the field, specify the source database (preprocessing library) and the destination database (purchase handover library and processing library) and set up the library log list, specify the rules of the storage, the system will automatically set the source database and destination database, The library log information is modified to the appropriate location of the adaptor template file, and the background adaptor engine is invoked to load the library operation.

The adaptor engine encapsulates itself as a process during execution, and the foreground system determines the progress of the library by monitoring the status of the process. The adaptor engine writes some information about the log information and the library operation to the corresponding datasheet when performing the loading operation. The foreground system monitors the process of performing a loading task and can then take out the loading information (the number of records loaded) and the library log to the foreground display in the corresponding data table. So as to realize the monitoring function of the load-store process.

4.1 Analyzer (Parser) implementation

A parser may include one or more program classes and a configuration file for a program class developed for a data file in a format that converts it to a specific format.

A configuration file is a prerequisite for the system to recognize and correctly use the analyzer, that is, the system is able to correctly invoke the parser's program class to implement specific functionality by parsing the configuration file. After the system resolves the configuration file, it knows what parameters need to be passed in and how to invoke the individual program classes of the parser.

In the configuration file, the main description of the program class method name invocation parameters and method comments.

The interface appears as follows:

4.2 Mapping (mapping) implementation

The mapping operation is the operation of the content of the read data file corresponding to the field of the preprocessing library and specifying the data type for the field of the preprocessing library, during which the fields can be merged, copied, split, and intercepted by the mapping process.

Field Merge: This action combines two or more data file strings into a single field in the preprocessing library, and the fields are sorted sequentially when the merge operation occurs. Use the "+" sign in the interface to identify the field merge operation.

For example: Data file field identification 1+ data file field identification The data file field Identification 3

Field replication: This action is to copy the data file string and into a field in the preprocessing library. Use directional lines in the interface to identify this action.

Field split: This action splits a data file string into two or more two strings and corresponds to the corresponding field in the preprocessing library. Use direction lines and subString () in the interface to identify the split operation.

Field interception: This action is to intercept a data file string as required and correspond to the corresponding field in the preprocessing library. Use the direction lines and subString () in the interface to identify.

Example: SubString (string start position, number of characters intercepted)

The interface appears as follows:

4.3 Filtering (filter) implementation

The filtering operation is to replace, delete, judge, delete, format convert the contents of the specified preprocessing library field.

Replace: Replaces certain characters in the specified field with other characters. Use "Replace" in the interface to identify this action.

Delete: Removes some characters from the specified field. Use delete in the interface to identify this action.

Judge substitution: Make certain characters in the specified field to be judged if "not equal" is replaced. Use "Not equal" in the interface to identify this operation.

Determines whether certain characters in the specified field are to be replaced if "equals". Use the "equals" symbol in the interface to identify this action.

Determine deletion: Determine certain characters in the specified field if "Not equal to" is deleted. Use "Not equal" in the interface to identify this operation.

Determine certain characters in the specified field if "equals" is deleted. Use the "equals" symbol in the interface to identify this action.

4.4 Data Transfer Library implementation

A data-transfer operation is the process of transferring data from a mounted preprocessing library to a different database according to the configured rules. It includes database operations such as recording content judgment, record adding, record modification and so on. The implementation of this method is to first define a adaptor template and adaptor configuration file for the rules of the loading process, modify the corresponding value in adaptor configuration file through system visualization, and set up the log content used in it. Call the adaptor engine for a library operation when it is finished. The adaptor engine packs itself into a process that monitors execution progress by dynamically monitoring the status of the process. Adaptor engine in the library operation of the log information and the number of records of the library and other information to the Library information table and the log table, the system by reading information from the log and the Library information table to show the progress of its implementation and the number of records executed. At the same time, the system determines whether the operation has ended by monitoring whether the process is finished, and displays all the library information if it has ended.

Note: The contents of the data file are loaded into the database as a backup before filtering the data in the entire operation.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.