I. Extracting data from HDFS to an RDBMS
1. Download the sample file from the address below.
Http://wiki.pentaho.com/download/attachments/23530622/weblogs_aggregate.txt.zip?version=1&modificationDate =1327067858000
2. Use the following command to place the extracted Weblogs_aggregate.txt file in the/user/grid/aggregate_mr/directory of HDFs.
Hadoop fs-put weblogs_aggregate.txt/user/grid/aggregate_mr/
3. Open PDI, create a new transformation, 1.
Figure 1
4. Edit the ' Hadoop File Input ' step, as shown in Figure 4, 2.
Figure 2
Figure 3
Figure 4
Description
. PDI connects to the Hadoop cluster reference http://blog.csdn.net/wzy0623/article/details/51086821.
. Use tab as the delimiter character.
5. Edit the ' Table Output ' step, as shown in 5.
Figure 5
Description
. Mysql_local is a local MySQL database connection that has been built, as shown in setting 6.
Figure 6
. The database fields label does not need to be set
6. Execute the following script to build the MySQL table
Use Test;create table Aggregate_hdfs ( client_ip varchar), year smallint, month_num tinyint, Pageviews bigint);
7. Save and perform the conversion, as shown in log 7.
Figure 7
As you can see from Figure 7, the transformation has been executed successfully.
8. Query the MySQL table, as shown in result 8
Figure 8
As you can see from Figure 8, the data has been extracted from HDFs into the MySQL table.
Ii. extracting data from hive to an RDBMS
1. Execute the following script to create a table for hive
CREATE TABLE weblogs ( client_ip string, full_request_date string, day string, month string, month_num int, year string, hour string, minute string, second string, timezone string, http_verb string, uri string, http_status_ Code string, bytes_returned string, referrer string, user_agent string) row Format delimitedfields terminated by ' \ t ';
2. Download the sample file from the address below.
Http://wiki.pentaho.com/download/attachments/23530622/weblogs_parse.txt.zip?version=1&modificationDate= 1327068013000
3. Use the following command to place the extracted Weblogs_parse.txt file in the/user/grid/parse/directory of HDFs.
Hadoop fs-put weblogs_parse.txt/user/hive/warehouse/test.db/weblogs/
At this point, data 9 in the Hive table is shown.
Figure 9
4. Open PDI, create a new transformation, 10.
Figure 10
5. Edit the ' Table input ' step, as shown in 11.
Figure 11
Description: hive_101 is a hive database connection that has been built, as shown in setting 12.
Figure 12
Description: PDI connects Hadoop Hive 2, reference http://blog.csdn.net/wzy0623/article/details/50903133.
6. Edit the ' Table output ' step, as shown in 13.
Figure 13
Description
. Mysql_local is a local MySQL database connection that has been built, as shown in setting 6.
. The database fields label does not need to be set
7. Execute the following script to build the MySQL table
Use Test;create table aggregate_hive ( client_ip varchar (a), year varchar (4), month varchar (ten), Month_num tinyint, pageviews bigint);
8. Save and perform the conversion, as shown in log 14.
Figure 14
As you can see from Figure 14, the transformation has been executed successfully.
9. Query the MySQL table, as shown in result 15
Figure 15
As you can see from Figure 15, the data has been extracted from the hive database into the MySQL table.
Reference:
Http://wiki.pentaho.com/display/BAD/Extracting+Data+from+HDFS+to+Load+an+RDBMS
Http://wiki.pentaho.com/display/BAD/Extracting+Data+from+Hive+to+Load+an+RDBMS
Pentaho work with Big data (vii)-extracting data from a Hadoop cluster