[Author]: Kwu (and news Big Data)
Basic CDH5.4 Spark1.4.1 SPARKR deployment, combining R with Spark, provides an efficient solution for data analysis, while HDFS in Hadoop provides distributed storage for data analysis. This article describes the steps for an integrated installation:
1, the environment of the cluster
cdh5.4+spark1.4.1
Configuring Environment variables
#javaexport java_home=/usr/java/jdk1.7.0_67-clouderaexport java_bin= $JAVA _home/binexport classpath=.: $JAVA _home/ Lib/dt.jar: $JAVA _home/lib/tools.jarexport path= $PATH: $JAVA _home/binexport java_library_path=/opt/cloudera/ Parcels/cdh/lib/hadoop/lib/nativeexport maven_home=/opt/softwares/apache-maven-3.3.3export PATH= $PATH: $MAVEN _ Home/bin#rhadoopexport Hadoop_home=/opt/cloudera/parcels/cdh/lib/hadoopexport HADOOP_CONF_DIR=/etc/hadoop/ Confexport Hadoop_cmd=/usr/bin/hadoopexport Hadoop_streaming=/opt/cloudera/parcels/cdh/lib/hadoop-0.20-mapreduce /contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.4.0.jarexport hive_home=/opt/cloudera/parcels/cdh/lib/ Hiveexport r_home=/usr/lib64/r#sparkexport spark_home=/opt/modules/sparkexport PATH= $PATH: $SPARK _home/bin: $SPARK _ Home/sbinexport LANG=ZH_CN. UTF-8
2. R language Environment latest version R-3.2.2
1) need to hit the patch before installing R
Yum-y Install gcc-gfortranyum-y install gcc gcc-c++yum-y install readline-develyum-y install Libxt-devel
2) Download the required packages for r installation, and note that there is no use of Yum installation, otherwise the Java version is inconsistent.
Packages that need to be downloaded:
R-3.2.2.tar.gz
Rjava_0.9-7.tar.gz
Rhdfs_1.0.8.tar.gz
Download Link: http://pan.baidu.com/s/1nt5qkJn
3) Install R-3.2.3
Unpacking the installation package
Tar zxfv r-3.2.2.tar.gz
Compiling the installation
./configure
Make && make install
4) Install Rjava and Rhdfs
R CMD INSTALL "rjava_0.9-7.tar.gz"
R CMD INSTALL "rhdfs_1.0.8.tar.gz"
5) Set native
Export Java_library_path=/opt/cloudera/parcels/cdh/lib/hadoop/lib/native
Native below libhadoop.so.0 and libhadoop.so.1.0.0 copy to/usr/lib64
6) Note that the above steps require all nodes in the cluster to be installed.
3. Running Sparkr
The latest version of Spark1.4.1 integrates the components of SPARKR at compile time:
Start Sparkr
/opt/modules/spark/bin/sparkr--master spark://10.130.2.20:7077--executor-memory 8g--total-executor-cores spark.ui.port=54089
Port to allocate appropriate memory, CPU resources, and UI at startup
1) Start log:
Start to show the version of R, which is consistent with the original R boot display
Welcome to Sparkr is running successfully when the boot is complete SPARKR
2) on the Spark cluster monitoring page, see:
Sparkr appears in the list of running applications, just like the tasks in the cluster
OK, based on the CDH5.4 Spark1.4.1 SPARKR deployment completed, the above mentioned command and installation package I Pro test and run successfully. Reprint please specify and news big data, thank you.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Deployment of Sparkr under Spark1.4.1 based on CDH5.4