ETL Tool Pentaho Kettle's transformation and job integration
1. Kettle
1.1. Introduction
Kettle is an open-source etl Tool written in pure java. It extracts data efficiently and stably (data migration tool ). Kettle has two types of script files: transformation and job. transformation completes basic data conversion, and job controls the entire workflow.
2. Integrated Development
2.1. transformation implementation Parsing
// Initialize the Kettle environment and load the configuration
KettleEnvironment.init();
// File path and file name
String filename=”/foo/bar/trans.ktr”;
// Parse the transformation File
TransMeta transmeta = new TransMeta(filename);
// Load transformation
Trans trans = new Trans(transmeta);
// Execute transformation in an independent thread. "null" can be replaced by a parameter set.
trans.execute(null);
// Wait until transformation execution is complete
trans.waitUntilFinished();
// Obtain the execution result
Result result=trans.getResult();
2.2. job implementation Parsing
// Initialize the Kettle environment and load the configuration
KettleEnvironment.init();
// File path and file name
String filename=”/foo/bar/jobn.kjb”;
// Parse the job file without using the resource library
JobMeta jobmeta=new JobMeta(filename, null,null);
// Load the job
Job job=new Job(null, jobmeta);
// Execute a job in an independent thread
job.start();
// Wait until job execution is complete
job.waitUntilFinished();
// Obtain the execution result
Result result=job.getResult();
2.3. resource library-based integration
// Initialize the Kettle environment and load the configuration
KettleEnvironment.init();
// Initialize the resource library type plug-in
PluginRegistry.init();
// Instantiate the resource library object
Repository repository=null;
RepositoriesMeta repositoriesMeta = new RepositoriesMeta();
// Read the resource library
repositoriesMeta.readData();
// Traverse the resource library
/*
for ( int i = 0; i < repositoriesMeta.nrRepositories(); i++ ) {
RepositoryMeta rinfo = repositoriesMeta.getRepository( i );
System.out.println( "#"+ ( i + 1 ) + " : " + rinfo.getName() + " [" + rinfo.getDescription() + "] id=" + rinfo.getId() );
}
*/
// Search for the resource library based on the resource library name "1.0"
RepositoryMeta repositoryMeta = repositoriesMeta.findRepository( "1.0" );
// Obtain the PluginRegistry instance
PluginRegistry registry = PluginRegistry.getInstance();
// Load the resource library
repository = registry.loadClass(RepositoryPluginType.class,repositoryMeta, Repository.class);
// Resource library Initialization
repository.init(repositoryMeta);
// Obtain the resource library path
RepositoryDirectoryInterface directory = repository.loadRepositoryDirectoryTree();
// JobMeta instantiation
JobMeta jobmeta = new JobMeta();
// Parse the job file of the resource library
jobmeta = repository.loadJob("job2", directory, null, null);
// Job instantiation
Job job = null;
// Load the resource library job
job = new Job(repository, jobmeta);
// Execute a job in an independent thread
job.start();
// Wait until job execution is complete
job.waitUntilFinished();
// Obtain the execution result
Result result=job.getResult();
2.4. Log Integration
Log output preparation:
// FileLoggingEventListener instantiation
FileLoggingEventListener fileLoggingEventListener=null;
// Append logs to the tran. log File. true indicates append, and false indicates no append.
fileLoggingEventListener=new FileLoggingEventListener( "tran.log", true );
Running result log output:
// Obtain the result set
Map
map=result.getResultFiles();
// Traverse the running result and output the log file
for(String key:map.keySet()){
// Obtain the ResultFile object
ResultFile rf=map.get(key);
// Create a log channel
LogChannelInterface log =NewLogChannel ("running result ");
// Output logs to log files
log.logBasic(rf.getFile().getName().toString());