[6]: Https://github.com/benweet/stackedit1. Installation configuration for Sparkr
1.1. Installation of R and Rstudio
1.1.1. Installation of R
Our working environment is operated under Ubuntu, so we only introduce the method of installing R under Ubuntu:
1) Add Source in/etc/apt/sources.list
Deb Http://mirror.bjtu.edu.cn/cran/bin/linux/ubuntu precise/,
Then update the source Apt-get update;
2) installation via Apt-get:
sudo apt-get install R-base
1.1.2. Installation of Rstudio
The official website has detailed introduction:
http://www.rstudio.com/products/rstudio/download-server/
sudo apt-get install Gdebi-core
sudo apt-get install Libapparmor1 # Required only for Ubuntu, not Debian
wget Http://download2.rstudio.org/rstudio-server-0.97.551-amd64.deb
sudo gdebi rstudio-server-0.97.551-amd64.deb
1.2. Rjava Installation
1.2.1. Rjava Introduction
Rjava is a communication interface between the R language and the Java language, implemented by the underlying JNI implementation, allowing direct invocation of Java objects and methods in R.
Rjava also provides the functionality of Java invoke R, which is implemented through Jri (JAVA/R Interface). Jri is now embedded in the Rjava package, and we can also try this feature individually. Now Rjava package, has become a lot of Java development based on the basic functional components of the R package.
It is because Rjava is the underlying interface and is called using JNI as an interface, so it is very efficient. In the Jri scenario, the JVM loads the RVM directly through memory, and the call process performance is virtually lossless, making it a very efficient connection channel and a preferred development package for R and Java communications.
1.2.2. Rjava Installation
1) Configuring the Rjava environment
Execute R-CMD javareconf
root@testnode4:/home/payton# R CMD javareconf
2) Start R and install Rjava
root@testnode4:/home/payton# R
Install.packages ("Rjava")
1.3. Installation of Sparkr
1.3.1. Sparkr code Download
Download code from webpage sparkr-pkg-master.zip https://github.com/amplab-extras/SparkR-pkg
1.3.2. Sparkr Code Compilation
1) Unzip the Sparkr-pkg-master.zip and then CD sparkr-pkg-master/
2) You need to specify the Hadoop version and the spark version when compiling
spark_hadoop_version=2.4.1 spark_version=1.2.0./install-dev.sh
At this point, the standalone version of the SPARKR has been installed.
1.3.3. Deployment configuration for Distributed Sparkr
1) After the successful compilation, will generate a Lib folder, into the Lib folder, packaging Sparkr for SparkR.tar.gz, which is the key to distributed SPARKR deployment.
2) Install SPARKR on each cluster node by the packaged SparkR.tar.gz
R CMD INSTALL SparkR.tar.gz
The distributed Sparkr is now complete. Operation of the Sparkr
2.1. Operation mechanism of SPARKR
Sparkr is an R development package released by Amplab that provides a lightweight front end for Apache Spark. SPARKR provides a flexible distributed data set (RDD) API in spark that allows users to run jobs interactively through R shell on a cluster. Sparkr the benefits of Spark and R, and the following 3 illustrations illustrate how the Sparkr works.
2.2. Using Sparkr for data analysis
2.2.1. SPARKR Basic Operations
First introduce the basic operation of the following sparkr:
The first step is to load the SPARKR package
Library (SPARKR)
Step two, initialize spark context
SC <-sparkr.init (master= "spark://localhost:7077"
, Sparkenvir=list (spark.executor.memory= "1g", spark.cores.max= "10"))
Third, reading the data, the core of Spark is the resilient distributed Dataset (RDD), Rdds can be created from the inputformats of Hadoop (for example, HDFs files) or by converting other rdds. For example, the following is an example of reading data directly from HDFs as an rdd:
Lines <-Textfile (SC, "Hdfs://sparkr_test.txt")
Alternatively, you can create an rdd from a vector or list by using the Parallelize function, such as:
Rdd <-Parallelize (SC, 1:10, 2)
When we get here, we can use the Rdd action (actions) and the transform (transformations) to manipulate the RDD and generate a new RDD, or we can easily invoke the R development package. You only need to read the R development package with Includepackage before performing the operation on the cluster (example: Includepackage (SC, Matrix)), and of course it can be manipulated by converting the RDD to the data form of the R language format.
Refer to the following two links for details:
http://amplab-extras.github.io/SparkR-pkg/
Https://github.com/amplab-extras/SparkR-pkg/wiki/SparkR-Quick-Start
So let's take a look at the two examples below to see how SPARKR works. 1. Installation configuration of Sparkr
1.1. Installation of R and Rstudio
1.1.1. Installation of R
Our working environment is operated under Ubuntu, so we only introduce the method of installing R under Ubuntu:
1) Add Source in/etc/apt/sources.list
Deb Http://mirror.bjtu.edu.cn/cran/bin/linux/ubuntu precise/,
Then update the source Apt-get update;
2) installation via Apt-get:
sudo apt-get install R-base
1.1.2. Installation of Rstudio
The official website has detailed introduction:
http://www.rstudio.com/products/rstudio/download-server/
sudo apt-get install Gdebi-core
sudo apt-get install Libapparmor1 # Required only for Ubuntu, not Debian
wget Http://download2.rstudio.org/rstudio-server-0.97.551-amd64.deb
sudo gdebi rstudio-server-0.97.551-amd64.deb
1.2. Rjava Installation
1.2.1. Rjava Introduction
Rjava is a communication interface between the R language and the Java language, implemented by the underlying JNI implementation, allowing direct invocation of Java objects and methods in R.
Rjava also provides the functionality of Java invoke R, which is implemented through Jri (JAVA/R Interface). Jri is now embedded in the Rjava package, and we can also try this feature individually. Now Rjava package, has become a lot of Java development based on the basic functional components of the R package.
It is because Rjava is the underlying interface and is called using JNI as an interface, so it is very efficient. In the Jri scenario, the JVM loads the RVM directly through memory, and the call process performance is virtually lossless, making it a very efficient connection channel and a preferred development package for R and Java communications.
1.2.2. Rjava Installation
1) Configuring the Rjava environment
Execute R-CMD javareconf
root@testnode4:/home/payton# R CMD javareconf
2) Start R and install Rjava
root@testnode4:/home/payton# R
Install.packages ("Rjava")
1.3. Installation of Sparkr
1.3.1. Sparkr code Download
Download code from webpage sparkr-pkg-master.zip https://github.com/amplab-extras/SparkR-pkg
1.3.2. Sparkr Code Compilation
1) Unzip the Sparkr-pkg-master.zip and then CD sparkr-pkg-master/
2) You need to specify the Hadoop version and the spark version when compiling
spark_hadoop_version=2.4.1 spark_version=1.2.0./install-dev.sh
At this point, the standalone version of the SPARKR has been installed.
1.3.3. Deployment configuration for Distributed Sparkr
1) After the successful compilation, will generate a Lib folder, into the Lib folder, packaging Sparkr for SparkR.tar.gz, which is the key to distributed SPARKR deployment.
2) Install SPARKR on each cluster node by the packaged SparkR.tar.gz
R CMD INSTALL SparkR.tar.gz
The distributed Sparkr is now complete. Operation of the Sparkr
2.1. Operation mechanism of SPARKR
Sparkr is an R development package released by Amplab that provides a lightweight front end for Apache Spark. SPARKR provides a flexible distributed data set (RDD) API in spark that allows users to run jobs interactively through R shell on a cluster. Sparkr the benefits of Spark and R, and the following 3 illustrations illustrate how the Sparkr works.
2.2. Using Sparkr for data analysis
2.2.1. SPARKR Basic Operations
First introduce the basic operation of the following sparkr:
The first step is to load the SPARKR package
Library (SPARKR)
Step two, initialize spark context
SC <-sparkr.init (master= "spark://localhost:7077"
, Sparkenvir=list (spark.executor.memory= "1g", spark.cores.max= "10"))
Third, reading the data, the core of Spark is the resilient distributed Dataset (RDD), Rdds can be created from the inputformats of Hadoop (for example, HDFs files) or by converting other rdds. For example, the following is an example of reading data directly from HDFs as an rdd:
Lines <-Textfile (SC, "Hdfs://sparkr_test.txt")
Alternatively, you can create an rdd from a vector or list by using the Parallelize function, such as:
Rdd <-Parallelize (SC, 1:10, 2)
When we get here, we can use the Rdd action (actions) and the transform (transformations) to manipulate the RDD and generate a new RDD, or we can easily invoke the R development package. You only need to read the R development package with Includepackage before performing the operation on the cluster (example: Includepackage (SC, Matrix)), and of course it can be manipulated by converting the RDD to the data form of the R language format.
Refer to the following two links for details:
http://amplab-extras.github.io/SparkR-pkg/
Https://github.com/amplab-extras/SparkR-pkg/wiki/SparkR-Quick-Start
So let's take a look at the two examples below to see how SPARKR works. 1. Installation configuration of Sparkr
1.1. Installation of R and Rstudio
1.1.1. Installation of R
Our working environment is operated under Ubuntu, so we only introduce the method of installing R under Ubuntu:
1) Add Source in/etc/apt/sources.list
Deb Http://mirror.bjtu.edu.cn/cran/bin/linux/ubuntu precise/,
Then update the source Apt-get update;
2) installation via Apt-get:
sudo apt-get install R-base
1.1.2. Installation of Rstudio
The official website has detailed introduction:
http://www.rstudio.com/products/rstudio/download-server/
sudo apt-get install Gdebi-core
sudo apt-get install Libapparmor1 # Required only for Ubuntu, not Debian
wget Http://download2.rstudio.org/rstudio-server-0.97.551-amd64.deb
sudo gdebi rstudio-server-0.97.551-amd64.deb
1.2. Rjava Installation
1.2.1. Rjava Introduction
Rjava is a communication interface between the R language and the Java language, implemented by the underlying JNI implementation, allowing direct invocation of Java objects and methods in R.
Rjava also provides the functionality of Java invoke R, which is implemented through Jri (JAVA/R Interface). Jri is now embedded in the Rjava package, and we can also try this feature individually. Now Rjava package, has become a lot of Java development based on the basic functional components of the R package.
It is because Rjava is the underlying interface and is called using JNI as an interface, so it is very efficient. In the Jri scenario, the JVM loads the RVM directly through memory, and the call process performance is virtually lossless, making it a very efficient connection channel and a preferred development package for R and Java communications.
1.2.2. Rjava Installation
1) Configuring the Rjava environment
Execute R-CMD javareconf
root@testnode4:/home/payton# R CMD javareconf
2) Start R and install Rjava
root@testnode4:/home/payton# R
Install.packages ("Rjava")
1.3. Installation of Sparkr
1.3.1. Sparkr code Download
Download code from webpage sparkr-pkg-master.zip https://github.com/amplab-extras/SparkR-pkg
1.3.2. Sparkr Code Compilation
1) Unzip the Sparkr-pkg-master.zip and then CD sparkr-pkg-master/
2) You need to specify the Hadoop version and the spark version when compiling
spark_hadoop_version=2.4.1 spark_version=1.2.0./install-dev.sh
At this point, the standalone version of the SPARKR has been installed.
1.3.3. Deployment configuration for Distributed Sparkr
1) After the successful compilation, will generate a Lib folder, into the Lib folder, packaging Sparkr for SparkR.tar.gz, which is the key to distributed SPARKR deployment.
2) Install SPARKR on each cluster node by the packaged SparkR.tar.gz
R CMD INSTALL SparkR.tar.gz
The distributed Sparkr is now complete. Operation of the Sparkr
2.1. Operation mechanism of SPARKR
Sparkr is an R development package released by Amplab that provides a lightweight front end for Apache Spark. SPARKR provides a flexible distributed data set (RDD) API in spark that allows users to run jobs interactively through R shell on a cluster. Sparkr the benefits of Spark and R, and the following 3 illustrations illustrate how the Sparkr works.
2.2. Using Sparkr for data analysis
2.2.1. SPARKR Basic Operations
First introduce the basic operation of the following sparkr:
The first step is to load the SPARKR package
Library (SPARKR)
Step two, initialize spark context
SC <-sparkr.init (master= "spark://localhost:7077"
, Sparkenvir=list (spark.executor.memory= "1g", spark.cores.max= "10"))
Third, reading the data, the core of Spark is the resilient distributed Dataset (RDD), Rdds can be created from the inputformats of Hadoop (for example, HDFs files) or by converting other rdds. For example, the following is an example of reading data directly from HDFs as an rdd:
Lines <-Textfile (SC, "Hdfs://sparkr_test.txt")
Alternatively, you can create an rdd from a vector or list by using the Parallelize function, such as:
Rdd <-Parallelize (SC, 1:10, 2)
When we get here, we can use the Rdd action (actions) and the transform (transformations) to manipulate the RDD and generate a new RDD, or we can easily invoke the R development package. You only need to read the R development package with Includepackage before performing the operation on the cluster (example: Includepackage (SC, Matrix)), and of course it can be manipulated by converting the RDD to the data form of the R language format.
Refer to the following two links for details:
http://amplab-extras.github.io/SparkR-pkg/
Https://github.com/amplab-extras/SparkR-pkg/wiki/SparkR-Quick-Start
So let's take a look at the two examples below to see how SPARKR works.
Article reprinted from: http://www.cnblogs.com/payton/p/4227770.html
When individuals follow these steps, the following problems occur:
Invalid or corrupt jarfile Sbt/sbt-launch-0.13.5.jar
My solution is to download an already compiled sbt-launch-0.13. Jar package, to replace the Sbt-launch-0.13.6.jar in/OPT/SPARKR-PKG-MASTER/PKG/SRC/SBT. This file has uploaded my resources, please find it yourself.