Kettle Rookie study Note 1----related preparation knowledge

Source: Internet
Author: User

Recently tutor to the younger sister to do a training on kettle, instantly embarrassed, kettle I also just learned, even fur are not, and, last use kettle already last year's thing ...

No way, had to re-study, fortunately, before writing a few documents, but also left a few lines of code, think or put on the blog, later own view is more convenient.

Data Cleansing :

Data cleansing refers to the discovery and correction of identifiable errors in a data file, including checking data consistency, handling invalid values and missing values, and so on.

From the name of the technique, it is easy to understand that the dirty data is washed away (discarded), or cleaned (corrected).

Like an elephant in a refrigerator, data cleansing can generally be divided into three steps:

ETL : Extract-transform-load . This actually describes the three aspects of building a data warehouse: Data extraction, data transformation, data loading.

But it is generally believed that data cleansing refers only to the process of data conversion.

Kettle:

Open Source ETL tool, written in pure java.

Kettle Chinese name is the kettle, the project's main programmer, Matt, wants to put all kinds of data into a pot and then flow out in a specified format.

download and related use Help , accessible: http://community.pentaho.com/projects/data-integration/

Interested in studying Kettle Source code , you can download Kettle source code:

SVN address: Svn://source.pentaho.org/svnkettleroot

Note: SVN has only 5.0 and previous versions, then migrated to GitHub

Git address: https://github.com/pentaho/pentaho-kettle/

Interested in the development of Kettle two times , may be used

Online Help manual: http://javadoc.pentaho.com/kettle/

Kettle Rookie study Note 1----related preparation knowledge

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.