Rhive is a package that extends R computing power through hive high-performance queries. It can be very easy to invoke HQL in the R environment, and also allows the use of R's objects and functions in hive. In theory, the data processing capacity can be infinitely extended hive platform, with the tool r environment of data mining, is a perfect big data analysis mining work environment.
Resource bundle:
Http://pan.baidu.com/s/1ntwzeTb
Installation
First, the installation of Hadoop and Hive has been skipped here. Here is a description of how to install the R language in CentOS and how to integrate Rhive into Hadoop.
There are 8 nodes in this experiment so we need to install r in each node and the corresponding other modules first we'll look at how to install R
Download the r-3.2.0.tar.gz in the resource bundle and unzip it
Make sure to install the following modules before compiling
Execute command:
Yum install Gcc-gfortran gcc gcc-c++ libxt-devel openssl-devel
Rhive relies on Rserve, so the main parameter--disable-nls--enable-r-shlib when compiling and installing R:
CD r-3.2.0/
./configure--disable-nls--enable-r-shlibmakemake Install
Cd.. /
Execute r command to install Rjava, rhive and other modules
R cmd install rjava_0.9-6.tar.gzr cmd install rserve_1.8-3.tar.gz R cmd install rhive_2.0-0.2.tar.gz
Description: If you have multiple nodes, install the above modules in each node and master
By the end of this installation, we have entered the Environment configuration section.
Configuration
1. New Rhive data storage path (local non-HDFS)
I'm saving it here in/www/store/rhive/data.
2. Create a new rserv.conf file and write to "remote Enable" to save to the directory you specified
I'm storing it in/www/cloud/r/rserv.conf.
3. Modify the/etc/profile new environment variables for each node and master
Export Rhive_data=/www/store/rhive/data
4. Upload all files in the Lib directory in the R directory to the/rhive/lib directory in HDFs (if the directory does not exist manually new)
Cd/usr/local/lib64/r/lib
Hadoop fs-put./*/rhive/lib
Start
1. Execute on all nodes and master
R-CMD Rserve--rs-conf/www/cloud/r/rserv.conf
Telnet cloud01 6311
And thentelnet All slave nodes on the master node and display RSRV0103QAP1 to indicate successful connection
2. Start Hive Remote service: Rhive is connected through thrift connection Hiveserver, need to start the background thrift service, that is: Start hive remote service on hive client, if you have turned on skip this step
Nohup Hive--service Hiveserver &
Rhive Test
Library (rhive)
Rhive.connect ("Master", 10000,hiveserver2=true)
Complete!
Finally attach rhive related document address
Https://github.com/nexr/RHive/wiki/User-Guide
This article refers to the address:
http://yangqijun.com/archives/341
Http://www.cnblogs.com/end/archive/2013/02/18/2916105.html
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Centos installation R integrated Hadoop, rhive configuration installation Manuals