CentOS installation R integration Hadoop, RHive configuration installation manual
RHive is a package that uses HIVE high-performance queries to expand R computing capabilities. It can easily call HQL in the R environment, and can also use R objects and functions in Hive. Theoretically, the data processing capacity can be expanded infinitely on the Hive platform, coupled with the R environment, which is a perfect environment for Big Data Analysis and mining.
Resource Package:
------------------------------------------ Split line ------------------------------------------
FTP address: ftp://ftp1.bkjia.com
Username: ftp1.bkjia.com
Password: www.bkjia.com
Installation of R integrated Hadoop and RHive configuration and installation manual on LinuxIDC.com/CentOS/July 15
For the download method, see
------------------------------------------ Split line ------------------------------------------
Install
First, the installation of hadoop and hive is skipped. This section describes how to install the R language in Centos and how to integrate Rhive into hadoop.
This experiment has eight nodes. Therefore, we need to install R and other modules on each node. First, let's take a look at how to install R.
Download the R-3.2.0.tar.gz in the package and unzip it
Install the following modules before Compilation
Run the following command:
Yum install gcc-gfortran gcc-c ++ libXt-devel openssl-devel readline-devel
RHive depends on Rserve. Therefore, when compiling and installing R, we mainly use the -- disable-nls -- enable-R-shlib parameter:
Cd R-3.2.0/
./Configure -- disable-nls -- enable-R-shlib
Make
Make install
Cd ../
Run the R command to install rJAVA, RHive, and other modules.
R cmd install rJava_0.9-6.tar.gz
R cmd install Rserve_1.8-3.tar.gz
R cmd install RHive_2.0-0.2.tar.gz
Note: If you have multiple nodes, please install the above modules in each node and master.
After the installation is complete, go to the Environment configuration section.
Configuration
1. Create a New RHIVE data storage path (local non-HDFS)
Save it in/www/store/rhive/data
2. Create the Rserv. conf file and write "remote enable" to save it to your specified directory.
Stored in/www/cloud/R/Rserv. conf
3. Modify the/etc/profile of each node and master to add environment variables.
Export RHIVE_DATA =/www/store/rhive/data
4. Upload All files in the lib directory under the R directory to the/rhive/lib directory in HDFS (if the directory does not exist, manually create one)
Cd/usr/local/lib64/R/lib
Hadoop fs-put./*/rhive/lib
Start
1. Run
R cmd Rserve -- RS-conf/www/cloud/R/Rserv. conf
Telnet cloud01 6311
Then telnet all slave nodes on the Master node. If Rsrv0103QAP1 is displayed, the connection is successful.
2. Start hive remote service: rhive is connected to hiveserver through thrift. You need to start the background thrift service, that is, start the hive remote service on the hive client. If you have already enabled this step, skip this step.
Nohup hive -- service hiveserver &
Rhive Test
Library (RHive)
Rhive. connect ("master", 10000, hiveServer2 = TRUE)
Finished!
Attached to the RHive documentation address https://github.com/nexr/RHive/wiki/User-Guide
Hive programming guide PDF (Chinese Version)
Hadoop cluster-based Hive Installation
Differences between Hive internal tables and external tables
Hadoop + Hive + Map + reduce cluster installation and deployment
Install in Hive local standalone Mode
WordCount word statistics for Hive Learning
Hive operating architecture and configuration and deployment
Hive details: click here
Hive: click here
This article permanently updates the link address: