Hadoop is now a very popular platform for big data processing, while R is a powerful tool for statistical analysis of data mining, and it lacks in big data processing, with parallel computing, Rhadoop, and Rhipe solutions. Try installing Rhipe. Installation Environment
Environment |
version |
CentOS (64bit) |
6.5 |
Java JDK |
1.6.0_45 |
R |
3.1.2 |
Rhipe |
0.73 |
Google protocol Buffers |
2.4.1 |
Hadoop |
Chh3u6 Pseudo-Distribution |
1. Installing the Java JDK
wget http://download.oracle.com/otn/java/jdk/6u45-b06/jdk-6u45-linux-x64.bin
chown u+x jdk-6u45-linux-x64.bin
./jdk-6u45-linux-x64.bin
MV jdk-6u45-linux-x64/usr/java/
Configuring environment Variables
sudo vim ~/.BASHRC
Export Java_home=/usr/java export
jre_home=${java_home}/jre
export classpath=.:${java_home}/lib:${jre_ Home}/lib
export Path=${java_home}/bin: $PATH
SOURCE ~/.BASHRC
[hadoop1@localhost usr]$ java-version
java Version "1.6.0_45"
Java (TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot (TM) 64-bit Server VM (build 20.45-b01, Mixed mode)
Create Hadoop user groups and Add permissions to users
sudo addgroup hadoop1
sudo adduser-ingroup hadoop1 hadoop1
sudo gedit/etc/sudoers
Write in text
2 Installing SSH
Problems with Ubuntu
Yum install openssh-server openssh openssh-client
ssh-keygen-t RSA (All-way return)
cat./.ssh/id_rsa.pub >>. Ssh/authorized_keys
After configuring the SSH localhost discovery also requires a password, consult the following solution:
Frequently asked questions about configuring password-free SSH when building a Hadoop environment
1, User directory 755 or 700, cannot make 77*
2,. SSH directory 755
3,. pub or authorized_key644
4, private key 600
Attention Place
1. Create the. SSH folder under the default path of the startup user in Master and slave.
2, on the master through the ssh-keygen-t RSA. Renamed. Pub to Authorized_key, distributed to each slave. ssh.
And then the pit came to the place. For the. SSH directory, the public key and private key permissions are strictly required.
1, User directory 755 or 700, can not make 77*
2. SSH directory 755
3. Pub or authorized_key644
4. Private key 600
If the permissions are not correct, you may not have password ssh (for the specific reason that is a legacy issue), if the above steps do not achieve the purpose. The means of debugging is necessary. There are two of them.
1, SSH-V:SSH specific process debugging information will be typed
2./var/log/secure: There is a reason for failure in this log 3. Installing Hadoop
wget http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u6.tar.gz
TAR-ZXVF hadoop-0.20.2-cdh3u6.tar.gz
modifying variables
hadoop-env.sh
VI hadoop-env.sh
Modify the Java path
# The Java implementation to use . Required.
Export Java_home=/usr/java
VI core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/ hadoop/tmp</value>
<description>.</description>
</property>
<property >
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
VI hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</ Property>
VI mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>http://localhost:9001</ Value>
</property>
Formatting Namenode
Bin/hadoop Namenode-format
Start and test
bin/start-all.sh
[hadoop1@localhost usr]$ JPS
3690 tasktracker
18338 JPS
3402 DataNode
3296 NameNode
3589 jobtracker
3513 Secondarynamenode
4. Install r
Installing Hadoop on CentOS is difficult, and downloading and installing yourself is always an error, find the following solution: Epel
0. Install the Yum priority plug-in
Yum Install Yum-priorities
1.epel Introduction
RPM-UVH http://mirrors.ustc.edu.cn/fedora/epel/6/x86_64/epel-release-6-8.noarch.rpm
RPM-UVH/http rpms.famillecollet.com/enterprise/remi-release-6.rpm
Please amend the above URL according to the actual situation
2. See if the installation was successful
Rpm-q Epel-release
3. Import Key:
RPM--import/etc/pki/rpm-gpg/rpm-gpg-key-epel-6
4. Modify the/etc/yum.repos.d/epel.repo file
Add an attribute at the end of [Epel] priority=11
Vi/etc/yum.repos.d/epel.repo
Yum first went to the official source, the official did not go to the source of Epel to find
5. Rebuilding the cache
Yum Makecache
Install the R language
Yum Install-y R
5. Install rhipe (reference URL)
Preliminary setup, CentOS lack of various libraries will cause installation failure, access to data solutions
Yum install Gcc-gfortran #否则报 "configure:error:No F77 compiler found" error
Yum install gcc gcc-c++ #否则报 "configure:error:c++ preprocessor"/lib/cpp "Fails sanity check" error
Yum install Readline-devel #否则报 "–with-readline=yes (default) and Headers/libs is not available" error
Yum install Libxt-devel #否则报 "Configure:error:–with-x=yes (default) and X11 Headers/libs is not available" error
start with root user operation otherwise there will be a permission limit error
1. Download Google protocol buffers 2.4.1
wget http://protobuf.googlecode.com/files/protobuf-2.4.1.tar.gz
tar-zxvf protobuf-2.4.1.tar.gz
CD protobuf-2.4.1
./configure make
&& make install
After installing Lib in/usr/local/lib run the following command if the result is displayed, the representative has installed the correct
Pkg-config--modversion protobuf 2.4.1
pkg-config--libs protobuf-pthread-l/usr/local/lib-lprotobuf-lpthread
2. Modifying the system Environment
/ETC/LD.SO.CONF.D
VI protobuf-x86.conf
Add the following line:
/usr/local/lib
Save
/sbin/ldconfig
Add the following environment variables
Vi/etc/profile
Export HADOOP_HOME=/HOME/HADOOP1/HADOOP-CDH3 export
hadoop_bin= $HADOOP _home/bin
export hadoop_conf_dir=$ hadoop_home/conf
Export pkg_config_path=/usr/local/lib/pkgconfig/:/usr/local/lib64/pkgconfig/
Add an environment variable for automatic initialization of the R language
This is key, if not set will cause the library (rhipe) when the error
Hadoop_home missing
and other errors
The Setup method is as follows
sudo vim/usr/lib64/r/etc/renviron
At the end of the join
Hadoop_home=/home/hadoop1/hadoop-cdh3
hadoop_bin= $HADOOP _home/bin
hadoop_conf_dir= $HADOOP _home/conf
pkg_config_path=/usr/local/lib/pkgconfig/:/usr/local/lib64/pkgconfig/
Source/etc/profile
3. Installing Rjava
wget http://cran.r-project.org/src/contrib/rJava_0.9-6.tar.gz
r cmd javareconf #R的程序从系统变量中会读取Java配置
r cmd INSTALL rjava_0.9-6.tar.gz
4. Installing Rhipe
wget http://ml.stat.purdue.edu/rhipebin/Rhipe_0.73.5.tar.gz
R CMD INSTALL rhipe_0.73.5.tar.gz
5. Verification
Switch back to Hadoop (root is ok) to make sure that Hadoop is running properly and start the R program:
[Hadoop1@localhost usr]$ R
> Library (rhipe)
------------------------------------------------
| Please call Rhinit () Else rhipe would not run |
------------------------------------------------
> Rhinit ()
rhipe:using Rhipe.jar file
Initializing rhipe v0.73
Initializing mapfile Caches
Success.
However, I have encountered the following issues since the library (rhipe):
Hadoop_home Missing and other tips
I have set the environment variable, so direct export (if prompted by other environment variables missing) also do so
[Hadoop1@localhost usr]$ Export hadoop_home=/home/hadoop1/hadoop-cdh3/
Problem solving.