Hadoop+r+rhipe Installation

Source: Internet
Author: User
Tags gpg readline ssh centos free ssh

Hadoop is now a very popular platform for big data processing, while R is a powerful tool for statistical analysis of data mining, and it lacks in big data processing, with parallel computing, Rhadoop, and Rhipe solutions. Try installing Rhipe. Installation Environment

Environment version
CentOS (64bit) 6.5
Java JDK 1.6.0_45
R 3.1.2
Rhipe 0.73
Google protocol Buffers 2.4.1
Hadoop Chh3u6 Pseudo-Distribution
1. Installing the Java JDK
wget http://download.oracle.com/otn/java/jdk/6u45-b06/jdk-6u45-linux-x64.bin
chown u+x jdk-6u45-linux-x64.bin 
./jdk-6u45-linux-x64.bin
MV jdk-6u45-linux-x64/usr/java/

Configuring environment Variables

sudo vim ~/.BASHRC
Export Java_home=/usr/java export 
jre_home=${java_home}/jre  
export classpath=.:${java_home}/lib:${jre_ Home}/lib  
export Path=${java_home}/bin: $PATH
SOURCE ~/.BASHRC
[hadoop1@localhost usr]$ java-version
java Version "1.6.0_45"
Java (TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot (TM) 64-bit Server VM (build 20.45-b01, Mixed mode)

Create Hadoop user groups and Add permissions to users

sudo addgroup hadoop1
sudo adduser-ingroup hadoop1 hadoop1
sudo gedit/etc/sudoers  

Write in text
2 Installing SSH

Problems with Ubuntu

Yum install openssh-server openssh openssh-client
ssh-keygen-t RSA (All-way return)
cat./.ssh/id_rsa.pub >>. Ssh/authorized_keys

After configuring the SSH localhost discovery also requires a password, consult the following solution:
Frequently asked questions about configuring password-free SSH when building a Hadoop environment

1, User directory 755 or 700, cannot make 77*

2,. SSH directory 755

3,. pub or authorized_key644

4, private key 600

Attention Place
1. Create the. SSH folder under the default path of the startup user in Master and slave.

2, on the master through the ssh-keygen-t RSA. Renamed. Pub to Authorized_key, distributed to each slave. ssh.

And then the pit came to the place. For the. SSH directory, the public key and private key permissions are strictly required.

1, User directory 755 or 700, can not make 77*
2. SSH directory 755
3. Pub or authorized_key644
4. Private key 600
If the permissions are not correct, you may not have password ssh (for the specific reason that is a legacy issue), if the above steps do not achieve the purpose. The means of debugging is necessary. There are two of them.
1, SSH-V:SSH specific process debugging information will be typed
2./var/log/secure: There is a reason for failure in this log 3. Installing Hadoop

wget http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u6.tar.gz
TAR-ZXVF hadoop-0.20.2-cdh3u6.tar.gz

modifying variables
hadoop-env.sh

VI hadoop-env.sh
Modify the Java path
# The Java implementation to use  . Required.
Export  Java_home=/usr/java

VI core-site.xml

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/ hadoop/tmp</value>
        <description>.</description>
    </property>
    <property >
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

VI hdfs-site.xml

<property>        
  <name>dfs.replication</name>
  <value>1</value>
</ Property>

VI mapred-site.xml

<property>
  <name>mapred.job.tracker</name>     
  <value>http://localhost:9001</ Value>
</property>

Formatting Namenode

Bin/hadoop Namenode-format

Start and test

bin/start-all.sh

[hadoop1@localhost usr]$ JPS
3690 tasktracker
18338 JPS
3402 DataNode
3296 NameNode
3589 jobtracker
3513 Secondarynamenode
4. Install r

Installing Hadoop on CentOS is difficult, and downloading and installing yourself is always an error, find the following solution: Epel
0. Install the Yum priority plug-in

Yum Install Yum-priorities

1.epel Introduction

RPM-UVH http://mirrors.ustc.edu.cn/fedora/epel/6/x86_64/epel-release-6-8.noarch.rpm
RPM-UVH/http rpms.famillecollet.com/enterprise/remi-release-6.rpm

Please amend the above URL according to the actual situation

2. See if the installation was successful

Rpm-q Epel-release

3. Import Key:

RPM--import/etc/pki/rpm-gpg/rpm-gpg-key-epel-6

4. Modify the/etc/yum.repos.d/epel.repo file
Add an attribute at the end of [Epel] priority=11

Vi/etc/yum.repos.d/epel.repo

Yum first went to the official source, the official did not go to the source of Epel to find

5. Rebuilding the cache

Yum Makecache

Install the R language

Yum Install-y R
5. Install rhipe (reference URL)

Preliminary setup, CentOS lack of various libraries will cause installation failure, access to data solutions
Yum install Gcc-gfortran #否则报 "configure:error:No F77 compiler found" error
Yum install gcc gcc-c++ #否则报 "configure:error:c++ preprocessor"/lib/cpp "Fails sanity check" error
Yum install Readline-devel #否则报 "–with-readline=yes (default) and Headers/libs is not available" error
Yum install Libxt-devel #否则报 "Configure:error:–with-x=yes (default) and X11 Headers/libs is not available" error

start with root user operation otherwise there will be a permission limit error
1. Download Google protocol buffers 2.4.1

wget http://protobuf.googlecode.com/files/protobuf-2.4.1.tar.gz
tar-zxvf protobuf-2.4.1.tar.gz
CD protobuf-2.4.1
./configure make 
&& make install

After installing Lib in/usr/local/lib run the following command if the result is displayed, the representative has installed the correct

Pkg-config--modversion protobuf 2.4.1
pkg-config--libs protobuf-pthread-l/usr/local/lib-lprotobuf-lpthread  

2. Modifying the system Environment

/ETC/LD.SO.CONF.D
VI protobuf-x86.conf

Add the following line:

/usr/local/lib

Save

/sbin/ldconfig

Add the following environment variables

Vi/etc/profile
Export HADOOP_HOME=/HOME/HADOOP1/HADOOP-CDH3 export
hadoop_bin= $HADOOP _home/bin
export hadoop_conf_dir=$ hadoop_home/conf
Export pkg_config_path=/usr/local/lib/pkgconfig/:/usr/local/lib64/pkgconfig/

Add an environment variable for automatic initialization of the R language
This is key, if not set will cause the library (rhipe) when the error

Hadoop_home missing
and other errors

The Setup method is as follows

sudo vim/usr/lib64/r/etc/renviron

At the end of the join

Hadoop_home=/home/hadoop1/hadoop-cdh3
hadoop_bin= $HADOOP _home/bin
hadoop_conf_dir= $HADOOP _home/conf
pkg_config_path=/usr/local/lib/pkgconfig/:/usr/local/lib64/pkgconfig/
Source/etc/profile

3. Installing Rjava

wget http://cran.r-project.org/src/contrib/rJava_0.9-6.tar.gz
r cmd javareconf #R的程序从系统变量中会读取Java配置
r cmd INSTALL rjava_0.9-6.tar.gz

4. Installing Rhipe

wget http://ml.stat.purdue.edu/rhipebin/Rhipe_0.73.5.tar.gz
R CMD INSTALL rhipe_0.73.5.tar.gz

5. Verification
Switch back to Hadoop (root is ok) to make sure that Hadoop is running properly and start the R program:

[Hadoop1@localhost usr]$ R
> Library (rhipe)
------------------------------------------------
| Please call Rhinit () Else rhipe would not run |
------------------------------------------------
> Rhinit ()
rhipe:using Rhipe.jar file
Initializing rhipe v0.73
Initializing mapfile Caches

Success.

However, I have encountered the following issues since the library (rhipe):

Hadoop_home Missing and other tips

I have set the environment variable, so direct export (if prompted by other environment variables missing) also do so

[Hadoop1@localhost usr]$ Export hadoop_home=/home/hadoop1/hadoop-cdh3/

Problem solving.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.