CentOS installation R integration Hadoop, RHive configuration installation manual

Source: Internet
Author: User

CentOS installation R integration Hadoop, RHive configuration installation manual

RHive is a package that uses HIVE high-performance queries to expand R computing capabilities. It can easily call HQL in the R environment, and can also use R objects and functions in Hive. Theoretically, the data processing capacity can be expanded infinitely on the Hive platform, coupled with the R environment, which is a perfect environment for Big Data Analysis and mining.

Resource Package:

------------------------------------------ Split line ------------------------------------------

FTP address: ftp://ftp1.bkjia.com

Username: ftp1.bkjia.com

Password: www.bkjia.com

Installation of R integrated Hadoop and RHive configuration and installation manual on LinuxIDC.com/CentOS/July 15

For the download method, see

------------------------------------------ Split line ------------------------------------------

Install

First, the installation of hadoop and hive is skipped. This section describes how to install the R language in Centos and how to integrate Rhive into hadoop.

This experiment has eight nodes. Therefore, we need to install R and other modules on each node. First, let's take a look at how to install R.

Download the R-3.2.0.tar.gz in the package and unzip it

Install the following modules before Compilation

Run the following command:

Yum install gcc-gfortran gcc-c ++ libXt-devel openssl-devel readline-devel

RHive depends on Rserve. Therefore, when compiling and installing R, we mainly use the -- disable-nls -- enable-R-shlib parameter:

Cd R-3.2.0/

./Configure -- disable-nls -- enable-R-shlib

Make

Make install

Cd ../

Run the R command to install rJAVA, RHive, and other modules.

R cmd install rJava_0.9-6.tar.gz

R cmd install Rserve_1.8-3.tar.gz

R cmd install RHive_2.0-0.2.tar.gz

Note: If you have multiple nodes, please install the above modules in each node and master.

After the installation is complete, go to the Environment configuration section.

Configuration

1. Create a New RHIVE data storage path (local non-HDFS)

Save it in/www/store/rhive/data

2. Create the Rserv. conf file and write "remote enable" to save it to your specified directory.

Stored in/www/cloud/R/Rserv. conf

3. Modify the/etc/profile of each node and master to add environment variables.

Export RHIVE_DATA =/www/store/rhive/data

4. Upload All files in the lib directory under the R directory to the/rhive/lib directory in HDFS (if the directory does not exist, manually create one)

Cd/usr/local/lib64/R/lib

Hadoop fs-put./*/rhive/lib

Start

1. Run

R cmd Rserve -- RS-conf/www/cloud/R/Rserv. conf

Telnet cloud01 6311

Then telnet all slave nodes on the Master node. If Rsrv0103QAP1 is displayed, the connection is successful.

2. Start hive remote service: rhive is connected to hiveserver through thrift. You need to start the background thrift service, that is, start the hive remote service on the hive client. If you have already enabled this step, skip this step.

Nohup hive -- service hiveserver &

Rhive Test

Library (RHive)

Rhive. connect ("master", 10000, hiveServer2 = TRUE)

Finished!

Attached to the RHive documentation address https://github.com/nexr/RHive/wiki/User-Guide

Hive programming guide PDF (Chinese Version)

Hadoop cluster-based Hive Installation

Differences between Hive internal tables and external tables

Hadoop + Hive + Map + reduce cluster installation and deployment

Install in Hive local standalone Mode

WordCount word statistics for Hive Learning

Hive operating architecture and configuration and deployment

Hive details: click here
Hive: click here

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.