Deployment of Sparkr under Spark1.4.1 based on CDH5.4

Last Update:2015-08-29 Source: Internet

Author: User

Tags sparkr

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

[Author]: Kwu (and news Big Data)

Basic CDH5.4 Spark1.4.1 SPARKR deployment, combining R with Spark, provides an efficient solution for data analysis, while HDFS in Hadoop provides distributed storage for data analysis. This article describes the steps for an integrated installation:

1, the environment of the cluster

cdh5.4+spark1.4.1

Configuring Environment variables

#javaexport java_home=/usr/java/jdk1.7.0_67-clouderaexport java_bin= $JAVA _home/binexport classpath=.: $JAVA _home/ Lib/dt.jar: $JAVA _home/lib/tools.jarexport path= $PATH: $JAVA _home/binexport java_library_path=/opt/cloudera/ Parcels/cdh/lib/hadoop/lib/nativeexport maven_home=/opt/softwares/apache-maven-3.3.3export PATH= $PATH: $MAVEN _ Home/bin#rhadoopexport Hadoop_home=/opt/cloudera/parcels/cdh/lib/hadoopexport HADOOP_CONF_DIR=/etc/hadoop/ Confexport Hadoop_cmd=/usr/bin/hadoopexport Hadoop_streaming=/opt/cloudera/parcels/cdh/lib/hadoop-0.20-mapreduce /contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.4.0.jarexport hive_home=/opt/cloudera/parcels/cdh/lib/ Hiveexport r_home=/usr/lib64/r#sparkexport spark_home=/opt/modules/sparkexport PATH= $PATH: $SPARK _home/bin: $SPARK _ Home/sbinexport LANG=ZH_CN. UTF-8

2. R language Environment latest version R-3.2.2

1) need to hit the patch before installing R

Yum-y Install gcc-gfortranyum-y install gcc gcc-c++yum-y install readline-develyum-y install Libxt-devel

2) Download the required packages for r installation, and note that there is no use of Yum installation, otherwise the Java version is inconsistent.

Packages that need to be downloaded:

R-3.2.2.tar.gz

Rjava_0.9-7.tar.gz

Rhdfs_1.0.8.tar.gz

Download Link: http://pan.baidu.com/s/1nt5qkJn

3) Install R-3.2.3

Unpacking the installation package

Tar zxfv r-3.2.2.tar.gz

Compiling the installation

./configure

Make && make install

4) Install Rjava and Rhdfs

R CMD INSTALL "rjava_0.9-7.tar.gz"
R CMD INSTALL "rhdfs_1.0.8.tar.gz"

5) Set native
Export Java_library_path=/opt/cloudera/parcels/cdh/lib/hadoop/lib/native
Native below libhadoop.so.0 and libhadoop.so.1.0.0 copy to/usr/lib64

6) Note that the above steps require all nodes in the cluster to be installed.

3. Running Sparkr

The latest version of Spark1.4.1 integrates the components of SPARKR at compile time:

Start Sparkr

/opt/modules/spark/bin/sparkr--master spark://10.130.2.20:7077--executor-memory 8g--total-executor-cores spark.ui.port=54089

Port to allocate appropriate memory, CPU resources, and UI at startup

1) Start log:

Start to show the version of R, which is consistent with the original R boot display

Welcome to Sparkr is running successfully when the boot is complete SPARKR

2) on the Spark cluster monitoring page, see:

Sparkr appears in the list of running applications, just like the tasks in the cluster

OK, based on the CDH5.4 Spark1.4.1 SPARKR deployment completed, the above mentioned command and installation package I Pro test and run successfully. Reprint please specify and news big data, thank you.

Deployment of Sparkr under Spark1.4.1 based on CDH5.4

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More