Because both of them are used, informatica is easy to manage in the future, especially for data correction. when data is supplemented in the later stage, the data stream is clear at a glance.SQL is efficient, but it is inconvenient to maintain it later. It takes a long time to find a data stream ..ETL tools are easier to manage and maintain, especially complicated cleaning processes.
data from the total data source into the database tables in each subsidiary, at this time the subsidiaries in the development of the report only need to connect their own database tables, so that the control of data rights, but also better the data of the subsidiaries in the various subsidiaries of the database table.three,Project Construction Plan:1Kettle Introduction to the tools usedKettle is a foreign open source
1, Ali Open source software: datax
Datax is a heterogeneous data source offline Synchronization tool that is dedicated to achieving stable and efficient data synchronization between heterogeneous data sources including relational databases (MySQL, Oracle, etc.), HDFS, Hive, ODPS, HBase, FTP, and more. (Excerpt from Wikipedia)
2. Apache Open source software: Sqoop
Sqoop (pronunciation: skup) is an open source tool that is used primarily in Hadoop (Hive) and traditional databases (MySQL, PostgreSQ
These years, almost all work with ETL, have been exposed to a variety of ETL tools. These tools are now organized to share with you.
An ETL Tool
Foreign
1. DataStage
Reviews: The most professional ETL
. Hold down the SHIFT key and drag the "table input" icon from "Convert 1" to "table output" to establish a connection. Notice that the arrow is in the opposite direction. 3. Double-click "Table Input" to configure the relevant content Figure 4: Configuring the relevant content in table input Figure 5: Test result diagram Figure 6: Configure the SQL statement to query the specified table. And you can see the records in the table by "preview" Note: You can see the records in the table, which
file Contents① Select file type② to set separators between fieldsThe ③ field has enclosing characters, some words need to fill in with the enclosing character, such as the default is double quotation marks; No words can be removedWhether the ④ contains a file header, as contained, the first few lines are⑤ file format, Unix or Windows?⑥ sets the file character set. Otherwise, there will be garbled occurrences.7. Set the fields to be read. According to the order of the text, from left to right, i
The kettle of ETL tools extracts data from one database into another database:
1. Open the ETL folder, double-click Spoon.bat start Kettle
2. Resource pool selection, Connaught no choice to cancel
3. Select Close
4. Create a new transformation
5. Configure the required database
6. The data table that needs to be extracted, with the table input
considerServices sets the preferred instance, the standby instance. Once a single point of failure occurs for the preferred instance, services automatically failover to the standby instance.If the current RAC database is defined with 3 nodes Srv1,srv2,srv3There are two different service sales.2gotrade.com and settlement.2gotrade.com running in the current databaseThe Sales department establishes the connection through the Sales.2gotrade.com service name, and the Settlement department establishe
First, the purposeMerge tables on different servers onto another server. For example, merge table B on server 1 on table A and server 2 to table C on server 3Requirements: Table A needs to be cropped (removing unnecessary fields), table B needs to add some fieldsIi. Methods of Use(1) Create a new Table C (field that conforms to the actual system design) in the database on server 3(2) Create a new table input, connect to server 1, select the table you want to use by getting the SQL statement, or
Tags: sha feature ima Oracle ROCE-O technology share OSS settingsThe value mapping here is a bit like the Oracle's CAS when feature, such as a field a value of 1, but I now want to make the a=1 of a male, that is, 1 mapping into a male, this is the value mapping, then how to operate, in fact, Kettle has a "value mapping" component The following is a brief introduction to how to use;First enter the value mapping in the search box to the left of the program, find the value mapping component, and t
ETL is responsible for the distribution, heterogeneous data sources such as relational data, flat data files, such as the extraction of the temporary middle tier after the cleaning, transformation, integration, and finally loaded into the data warehouse or data mart, to become the basis of online analytical processing, data mining.
If the frequency of data conversion or not high requirements can be manually implemented
regular expressions to clean data.Informatica: A specialized product, Informatica data quality, to ensure data quality.Inaplex inaport: data cleansing is easier because only specific data is processed.
Monitoring:Talend: provides monitoring and log tools.Kettle: Monitoring and logging toolsInformatica: provides detailed monitoring and log tools.Inaplex inaport: provides monitoring and log tools.
Connectivity:Talend: common databases, files, and W
SQL statements manually.Kettle: Data quality features in the GUI, you can manually write SQL statements, Java scripts, regular expressions to complete the data cleansing.Informatica: A product dedicated to Informatica data quality to ensure qualityInaplex Inaport: Data cleansing is easier because only specific data is processed.
Monitoring:Talend: There are monitoring and logging toolsKettle: There are monitoring and logging toolsInformatica: Very detailed monitoring and logging toolsInaplex I
ETL
TL, short for extraction-transformation-loading. The Chinese name is data extraction, conversion, and loading. ETL tools include: owb (Oracle warehouse builder), Odi (Oracle data integrator), informatic powercenter, aicloudetl, datastage, repository explorer, beeload, kettle, dataspider
ETL extracts data from di
As a data warehouse system, ETL is a key link. If it is big, ETL is a data integration solution. If it is small, it is a tool for data dumping. Recall that there have been a lot of data migration and transformation operations over the past few years. However, the work is basically a one-time job or a small amount of data. You can use access, DTS, or compile a small program on your own. However, in the data
the data source, cleans the data, and finally loads the data to the data warehouse according to the pre-defined data warehouse model.Therefore, how enterprises use various technical means and convert data into information and knowledge has become the main bottleneck for improving their core competitiveness. ETL is a major technical means.
As a data warehouse system, ETL is a key link. If it is big,
data, or the data rule standards are inconsistent, in addition, the file format cannot be used immediately by the ETL process of the quick win project, therefore, data files must be cleaned (illegal and redundant data deletion, unified data rules and standards, and the ETL process converted to quick win project can "LOAD" the file format ).3. Data Loading: load the cleaned data (File Format) through SQL ..
).
Business metadata is a description of data from the business perspective. It is usually used to analyze and use data for report tools and front-end users.
Technical metadata is a description of data from a technical perspective. It usually includes some attributes of data, such as the data type, length, or some results after data profile analysis.
Process processing metadata is the statistical data in the ETL
ETL ConsiderationsAs a data warehouse system, ETL is the key link. Said Big, ETL is a data integration solution, said small, is to pour data tools. Recall the work over the years, the processing of data migration, conversion is really a lot of work. But those jobs are basically a one-time job or a small amount of data,
ETL scheduling development (5) -- connect to the database to execute database command subroutines and etl Scheduling
In ETL scheduling, you need to connect to the database to read and write data. The following subprograms use the input database connection string and database commands (or SQL) to perform the required operations:
#!/usr/bin/bash#created by lubinsu
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.