ETL Tool Pentaho Kettle's transformation and job integration
1. Kettle
1.1. Introduction
Kettle is an open-source etl Tool written in pure java. It extracts data efficiently and stably (data migration tool ). Kettle has two types of script files: transformation and job. transformation completes basic data conversion, and job controls the entire workflow.2. Integrated Development
2.1. transformation implemen
function.Under the job of the start module, there is a timer function, can be daily, weekly, and other ways of timing, for the periodic ETL, very helpful.
A. When you log on using the resource pool (repository), the default username and password is admin/admin.
B. When a job is stored in a resource pool (a common repository uses a database), the following command line is used when you use Kitchen.bat to perform a job:Kitchen.bat/rep kettle/user admin
ETL scheduling Read and write data information, all need to connect to the database, the following sub-program through the incoming database connection string and Database command (or SQL) to perform the required actions:#!/usr/bin/bash#created by Lubinsu#2014source ~/.bash_profilevalues= ' sqlplus-s Parameters are: Database connection string, Database command (or SQL statement)
One: Code section1. Create a new MAVEN project2. Add the required Java code3. Writing Mapper Class4. Writing Runner classTwo: Operation mode1. Run locally2.3.Three: local operation mode1. Unzip Hadoop to a local 2. Modify the configuration file Hadoop_home 3. Unzip the common package 4. Copy the contents of the compressed package to the bin 5. PrerequisitesThe site file for core and hbase must exist in resource 6. Uploading dataNew Catalog/eventlogs/2015/12/20Upload to Linux Uploading to H
Just finished a project to contact the ETL interface, while still warm to do a summary.ETL Interface Functional Test Point summary:1, the data volume check: The target table and the source table data volume is consistent2, the field is correct: pull the source table field is required fields (there will be a typo paragraph case)3, the field value conversion correctness: If the date or numeric field is pulled to the target table if the conversion needs
Method
Analyze and process in hive, export the results to HDFS, and then use sqoop to import HDFS results to the database.1) extraction: Oracle Data is extracted to hive. See the previous two steps.2) Conversion: insert the query result to the hive table
INSERT OVERWRITE TABLE result_etl select a.empno, a.ename, a.comm, b.dname FROM emp_etl a join dept_etl b on (a.deptno = b.deptno);
3) Conversion: import data to the HDFS File System
INSERT OVERWRITE DIRECTORY ‘RESULT_ETL_HIVE‘ SELECT * from re
Use Oracle Database 10
GInternal ETL infrastructure
Http://www.oracle.com/technology/global/cn/obe/10gr2_db_single/bidw/etl2/etl2_otn.htm
-- Some basic concepts and types of CDC are introduced in Change Data Capture (1. This article mainly demonstrates the basic steps of implementing the synchronization mode CDC through a practical example.
-- Create table
Create table SALES
(
ID
NUMBER,
Productid number,
PRICE
NUMBER,
QUANTITY
NUMBER
....);
2. Kettle jobs and conversions are continuously visible by default, regardless of whether they are finished or not. However, the jobs that are executed continuously and regularly become full after running for a period of time.
This effect is especially uncomfortable, and the persistence of such logs will also lead to JVM oom. However, some parameters are configured:
Then, it is found that the port cannot be released after the cluster runs the job. So again, we can o
Tags: sha feature ima Oracle ROCE-O technology share OSS settingsThe value mapping here is a bit like the Oracle's CAS when feature, such as a field a value of 1, but I now want to make the a=1 of a male, that is, 1 mapping into a male, this is the value mapping, then how to operate, in fact, Kettle has a "value mapping" component The following is a brief introduction to how to use;First enter the value mapping in the search box to the left of the program, find the value mapping component, and t
"Table Type" and "file or directory" two rows Figure 3: When you click Add, the table of contents will appear in the "Selected files" Figure 4: My data is in Sheet1, so Sheet1 is selected into the list Figure 5: Open the Fields tab, click "Get fields from header data", and note the correctness of the Time field format 3. Set "table output" related parameters1), double-click the "a" workspace (I'll "convert 1" to save the "table output" icon in "a") to open the Settings window. Figure 6:
Believe that in the process of ETL inevitable practical union all to assemble the data, then this involves whether the problem of parallel processing.Whether a parallel map is applicable in hive can be set by parameters:set hive.exec.parallel=trueThen it is useful for the data of the previous blog, Link: http://www.cnblogs.com/liqiu/p/4873238.htmlIf we need some data:Select from (Selectfrom where create_time="2015-10-10"9718 Select as from where 97
This is the last Sunday. I think it is better to record it. After all, my memory is not very good and it is easy to forget.
On Saturday, I started to discuss with Northwest China and Zhang Wei where to go for dinner. I thought annie said on
The data in the database may be transferred in different databases for replacement. Because different databases may use different character sets, the resulting data may be garbled. This time, we ran data in a job. After the data was run, the
Two months ago, the process of a workbook, this is not a tutorial not too much to tell the specific painting methods and specific software techniques, mainly when I draw this picture
Think of some analysis of the image of the way, so strong
Cost:Software costs include software products, pre-sales training, after-sales consulting, and technical support.Open-source products are free of charge and the cost is mainly training and consulting, so the cost will remain at a low
When data is extracted from the production environment to the warehouse, Chinese characters in the target database are garbled. My environment is from MySQL to MySQL. There are no heterogeneous databases at the moment, and the architecture is
The company has been involved in the development of Bi projects for a period of time, but it is only developed as needed. This afternoon, the company will train you on the knowledge of data warehouses, let's hear about the two to three hours, what
2006-12-20
The Excel file in the source file cannot be written into a combination of letters when the data is listed. The date format contains "Noon, when the source file is used as an Excel worksheet, modify these columns ('A' is added to the
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.