International - English

Cart Console

Topic Center

Contact Sales

Home > Hot Categories > Big Data

Pentaho work with Big data (vii)-extracting data from a Hadoop cluster

Last Update:2016-04-16 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Extracting data from HDFS to an RDBMS
1. Download the sample file from the address below.
Http://wiki.pentaho.com/download/attachments/23530622/weblogs_aggregate.txt.zip?version=1&modificationDate =1327067858000

2. Use the following command to place the extracted Weblogs_aggregate.txt file in the/user/grid/aggregate_mr/directory of HDFs.

Hadoop fs-put weblogs_aggregate.txt/user/grid/aggregate_mr/

3. Open PDI, create a new transformation, 1.

Figure 1

4. Edit the ' Hadoop File Input ' step, as shown in Figure 4, 2.

Figure 2

Figure 3

Figure 4

Description
. PDI connects to the Hadoop cluster reference http://blog.csdn.net/wzy0623/article/details/51086821.
. Use tab as the delimiter character.

5. Edit the ' Table Output ' step, as shown in 5.

Figure 5

Description
. Mysql_local is a local MySQL database connection that has been built, as shown in setting 6.

Figure 6

. The database fields label does not need to be set

6. Execute the following script to build the MySQL table

Use Test;create table Aggregate_hdfs (    client_ip varchar), year    smallint,    month_num tinyint,    Pageviews bigint);

7. Save and perform the conversion, as shown in log 7.

Figure 7

As you can see from Figure 7, the transformation has been executed successfully.

8. Query the MySQL table, as shown in result 8

Figure 8

As you can see from Figure 8, the data has been extracted from HDFs into the MySQL table.

Ii. extracting data from hive to an RDBMS
1. Execute the following script to create a table for hive

CREATE TABLE weblogs (    client_ip    string,    full_request_date string,    day    string,    month    string,    month_num int,    year    string,    hour    string,    minute    string,    second    string,    timezone    string,    http_verb    string,    uri    string,    http_status_ Code    string,    bytes_returned        string,    referrer        string,    user_agent    string) row Format delimitedfields terminated by ' \ t ';

2. Download the sample file from the address below.
Http://wiki.pentaho.com/download/attachments/23530622/weblogs_parse.txt.zip?version=1&modificationDate= 1327068013000

3. Use the following command to place the extracted Weblogs_parse.txt file in the/user/grid/parse/directory of HDFs.

Hadoop fs-put weblogs_parse.txt/user/hive/warehouse/test.db/weblogs/

At this point, data 9 in the Hive table is shown.

Figure 9

4. Open PDI, create a new transformation, 10.

Figure 10

5. Edit the ' Table input ' step, as shown in 11.

Figure 11

Description: hive_101 is a hive database connection that has been built, as shown in setting 12.

Figure 12

Description: PDI connects Hadoop Hive 2, reference http://blog.csdn.net/wzy0623/article/details/50903133.

6. Edit the ' Table output ' step, as shown in 13.

Figure 13

Description
. Mysql_local is a local MySQL database connection that has been built, as shown in setting 6.
. The database fields label does not need to be set

7. Execute the following script to build the MySQL table

Use Test;create table aggregate_hive (    client_ip varchar (a), year    varchar (4),    month varchar (ten),    Month_num tinyint,    pageviews bigint);

8. Save and perform the conversion, as shown in log 14.

Figure 14

As you can see from Figure 14, the transformation has been executed successfully.

9. Query the MySQL table, as shown in result 15

Figure 15

As you can see from Figure 15, the data has been extracted from the hive database into the MySQL table.

Reference:
Http://wiki.pentaho.com/display/BAD/Extracting+Data+from+HDFS+to+Load+an+RDBMS
Http://wiki.pentaho.com/display/BAD/Extracting+Data+from+Hive+to+Load+an+RDBMS

Pentaho work with Big data (vii)-extracting data from a Hadoop cluster

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

big data hadoop salary big data hadoop basics big data hadoop example big data hadoop tutorial wiki big data hadoop big data analysis recommendation system with hadoop framework hadoop can only work with structured data true or false

Big Data era: a summary of knowledge points based on Microsof... 11-05

Big Data Architecture Development Mining Analytics Hadoop HBa... 04-28

Big Data Architecture Development Mining Analytics Hadoop HBa... 12-02

0 Basic Learning Cloud computing and Big Data DBA cluster Arc... 02-21

"Big Data dry" implementation of big data platform based on H... 10-21

MYSQL Big Data Import 12-08

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Pentaho work with Big data (vii)-extracting data from a Hadoop cluster

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support