Deployment of Sparkr under Spark1.4.1 based on CDH5.4

Source: Internet
Author: User
Tags sparkr

[Author]: Kwu (and news Big Data)

Basic CDH5.4 Spark1.4.1 SPARKR deployment, combining R with Spark, provides an efficient solution for data analysis, while HDFS in Hadoop provides distributed storage for data analysis. This article describes the steps for an integrated installation:


1, the environment of the cluster

cdh5.4+spark1.4.1

Configuring Environment variables

#javaexport java_home=/usr/java/jdk1.7.0_67-clouderaexport java_bin= $JAVA _home/binexport classpath=.: $JAVA _home/ Lib/dt.jar: $JAVA _home/lib/tools.jarexport path= $PATH: $JAVA _home/binexport java_library_path=/opt/cloudera/ Parcels/cdh/lib/hadoop/lib/nativeexport maven_home=/opt/softwares/apache-maven-3.3.3export PATH= $PATH: $MAVEN _ Home/bin#rhadoopexport Hadoop_home=/opt/cloudera/parcels/cdh/lib/hadoopexport HADOOP_CONF_DIR=/etc/hadoop/ Confexport Hadoop_cmd=/usr/bin/hadoopexport Hadoop_streaming=/opt/cloudera/parcels/cdh/lib/hadoop-0.20-mapreduce /contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.4.0.jarexport hive_home=/opt/cloudera/parcels/cdh/lib/ Hiveexport r_home=/usr/lib64/r#sparkexport spark_home=/opt/modules/sparkexport PATH= $PATH: $SPARK _home/bin: $SPARK _ Home/sbinexport LANG=ZH_CN. UTF-8



2. R language Environment latest version R-3.2.2

1) need to hit the patch before installing R

Yum-y Install gcc-gfortranyum-y install gcc gcc-c++yum-y install readline-develyum-y install Libxt-devel


2) Download the required packages for r installation, and note that there is no use of Yum installation, otherwise the Java version is inconsistent.

Packages that need to be downloaded:

R-3.2.2.tar.gz

Rjava_0.9-7.tar.gz

Rhdfs_1.0.8.tar.gz

Download Link: http://pan.baidu.com/s/1nt5qkJn


3) Install R-3.2.3

Unpacking the installation package

Tar zxfv r-3.2.2.tar.gz
Compiling the installation
./configure

Make && make install

4) Install Rjava and Rhdfs

R CMD INSTALL "rjava_0.9-7.tar.gz"
R CMD INSTALL "rhdfs_1.0.8.tar.gz"


5) Set native
Export Java_library_path=/opt/cloudera/parcels/cdh/lib/hadoop/lib/native
Native below libhadoop.so.0 and libhadoop.so.1.0.0 copy to/usr/lib64


6) Note that the above steps require all nodes in the cluster to be installed.


3. Running Sparkr

The latest version of Spark1.4.1 integrates the components of SPARKR at compile time:

Start Sparkr

/opt/modules/spark/bin/sparkr--master spark://10.130.2.20:7077--executor-memory 8g--total-executor-cores spark.ui.port=54089

Port to allocate appropriate memory, CPU resources, and UI at startup


1) Start log:

Start to show the version of R, which is consistent with the original R boot display



Welcome to Sparkr is running successfully when the boot is complete SPARKR



2) on the Spark cluster monitoring page, see:


Sparkr appears in the list of running applications, just like the tasks in the cluster


OK, based on the CDH5.4 Spark1.4.1 SPARKR deployment completed, the above mentioned command and installation package I Pro test and run successfully. Reprint please specify and news big data, thank you.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Deployment of Sparkr under Spark1.4.1 based on CDH5.4

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.