Zookeeper
Use kettle to convert an XML document into a data table structure
Read and parse the XML file in kettle's get data from XML step and XML input stream (Stax) step. The get data from XML step is parsed using Dom, which consumes a lot of memory and is not desirable when the file is large. The XML input stream (Stax) Step parses large and complex files in different ways and can quickly load data. Therefore, we recommend that you use this step.
The following example shows how to use this step. The content of the source XML file is as follows:
<? XML version = "1.0"?>
<Timeseries>
<Measurementyear = "2000" type = "parmname" text = "parmname">
<! -- Value ofitem named a in 2000 is 8.5 -->
<Itemname = "A"> 8.5 </item>
<Itemname = "B"> 9.8 </item>
</Measurement>
<Measurementyear = "2001" type = "parmname" text = "parmname">
<Itemname = "A"> 12.2 </item>
<Itemname = "B"> 9.4 </item>
</Measurement>
<Measurementyear = "2002" type = "parmname" text = "parmname">
<Itemname = "A"> 11.1 </item>
<Itemname = "B"> 7.2 </item>
</Measurement>
</Timeseries>
The format of the data to be parsed into a data table is as follows:
Use the following steps to implement this function
Xmlinput stream (Stax): loading XML documents in stream mode
Filter row: Remove irrelevant document elements
Switch/case step: Separate Level 1 (measurement) and level 2 (item)
Rowdenormaliser: converts multiple rows at Level 2 into one row (row column)
Mergejoin: Merge child elements to the first row (add column)
The example can beDownload here
Rowdenormaliser steps
The easiest way to understand the execution is to preview the execution results without steps (you can directly view the data stream in version 5.x) to display the execution principle of this step.
Mergejoin procedure
Merge two streams from different data sources. In fact, the join principle in SQL is the same, except that kettle is used for stream data, not table data. It is very important that the stream data should be sorted by the key (used in join). The first step in this column is sorted when the XML file is loaded, therefore, the "Sort rows" step is not selected.
Use kettle to convert an XML document into a data table structure