The first time to write a blog, a little excitement in the heart, there must be a lot of improvement in the place, looking haihan.
Kettle is my relatively more in-depth study of open source software, is one of my favorite open source software, it can complete the work of many manual labor, in the ETL data extraction has been widely used. I am not very familiar with the various controls used by kettle, but only use the most common parts of the control, and that is how easy it is to be used and deeply attracted by its beauty.
Well, get to the point, here's the assumption that you are familiar with Java development, GIT general use, kettle general use. Kettle Source was hosted on Kettle's official SVN, and later migrated to GitHub with the address on GitHub: Https://github.com/pentaho/pentaho-kettle, It is recommended that you register your github account, and then fork the kettle project to your own users, then enter the working directory (e.g. E:/workspace) to clone the code locally using the following command:
git clone https://github.com/pentaho/pentaho-kettle
Download the code at the same time you can go to: http://community.pentaho.com/projects/data-integration/this address to download the latest Kettle release version. After the download is complete, follow the steps below to successfully run the Kettle source code. It is recommended to use JDK7 or later.
- Import the existing projects in Eclipse, import the core, engine, UI, DB Dialog, and kettle projects themselves into eclipse, as shown, and other closed projects are not imported.
- Create user Libraries in Eclipse, add all of the LIB directories in the downloaded Runnable Kettle directory to the user library, and then remove the four jar packages, such as Core, engine, UI, DB dialog, and then libswt\ The Swt.jar in the Win32 directory is added to the user library, as shown in.
- Then right-click the above imported project select Bulid Path, add the user library created in the previous step to the classpath, and then engine, UI, DB dialog three projects all rely on Core,ui also rely on engine, DB dialog two projects, Finally, the main project relies on the above four projects and the Kettle User library, as shown in.
- Assembly under Package-res is added as a source file as shown in.
- This should allow the kettle to run successfully, as in the following way:
- Kettle Source Run Results
The above is the way I run Kettle source code, there must be a lot of ways, this way should not be difficult, especially to rely on jar resolution, kettle use of ant management project, one I usually use Maven management project, Ant is not very familiar with, And the ant way many jar packages are not downloaded at all.
Here just to share the way to run Kettle source code, to tell the truth just contact Kettle Source code when really did not run, separated after a long time after another attempt to run successfully. If you want to run in this way and feel trouble, you can first create a kettle named User Library in the above way. Then download the configured Kettle project from my GitHub, which can be downloaded in theory. Here's how:
- Follow the steps above to download the kettle installation package and create a good user library in eclipse.
- From Https://github.com/ma459006574/pentaho-kettle.git This address clone code to local, switch to My_run branch and then import Eclipse. The difference with the official is only for the purposes of. classpath files, you can also compare the differences by configuring them yourself.
Later I will introduce the kettle source structure, share some of the kettle control improvements, but also customize the development of their own kettle (kettle part of the functionality of the package into a Web application) in the process of writing kettle custom development tools, we make progress together.
Here are some of the areas that you will need to improve when using kettle:
- File loading into memory function modification, the file content by default is binary, so that you can copy pictures, compressed files, etc. to the database.
- The files loaded into the memory feature of Convertrowmeta and Outputrowmeta do not consider the encoding stringencoding. The associated clones are also not considered for encoding.
- Conversion, support #{} way to take the field from the previous step
- Java Script supports user new import package
- Excel 07 support is not good, the 07 version of Excel Two parsing tools are not effective.
- The extraction UI Part method is the tool class, facilitates the Web custom development.
- As long as the connection fails once, the kettle connection fails even after the data is restored to normal. You can set up a timed re-connection instead, covering a wide range of facets.
Kettle Series -1.kettle source code acquisition and operation