"Sparkr" under CentOS7 to compile and install R3.3.2 and SPARKR II (cluster installation) preparation
A: Install a local machine at least first. can refer to the single-machine installation of the post "Sparkr" under CentOS7 to compile and install R3.3.2 and SPARKR
B: Prepare three slave machines
C: Configuration file Copyok
D: Install Rstudio
E: Install Sparkr (Spark is less than version 1.4)
F: Configure R cluster. installation
1. Follow the previous blog post with a variety of environment installation configurations.
All three slave machines are installed and the input R enter can enter the R console.
2. Unified installation of Rjava on four machines.
After entering the R console, enter
> install.packages ("Rjava")
Select Region Selection 14, Beijing, China.
3. Installation Rstudio is relatively simple, direct RPM can be.
Download the RPM package on the website first
And then install it by installing
wget https://download2.rstudio.org/rstudio-server-rhel-1.0.136-x86_64.rpm
Yum Install--nogpgcheck rstudio-server-rhel-1.0.136-x86_64.rpm
4. If spark is less than 1.4, you will need to install the SPARKR project and configuration on GitHub.
Https://github.com/amplab-extras/SparkR-pkg
If the spark version is more than 2.0 new, then Spark has officially supported SPARKR without configuring it separately.
There is a lot of configuration on the network, you can refer to the process similar to this article.
5. Cluster configuration R.
First of all, install Snow pack.
Install.packages ("Snow")
Select 12, United Kingdom
Then, in the console, enter
Library (parallel) #调用你的并行包
workerlist <-list (
host = Masterip, port = 10187, outfile = "/usr/local/ R/log/log1.log ", Rshcmd =" Ssh-p "),
list (host = Slave1ip, port = 10187, outfile ="/usr/local/r/log/log2.log ", rsh cmd = "Ssh-p"),
list (host = Slave2ip, port = 10187, outfile = "/usr/local/r/log/log2.log", Rshcmd = "ssh-p 22"), C5/>list (host = slave3ip, port = 10187, outfile = "/usr/local/r/log/log3.log", Rshcmd = "ssh-p")
);
CL <-makecluster (workerlist, type= "SOCK", Master=masterip)
#配置集群, specifying master
Where host is replaced with your IP for "XXX.XXX.XXX.XXX" authentication
Does the R language work in parallel, looking at the outfile we set up earlier.
See this message appear in each cluster node
That means success. Rstudio Start-Up related
If the boot cannot be masterip:8787 directly, then the configuration of the following two files should be the corresponding one.
One is the associated R environment.
The other is to configure the corresponding session.
vim/etc/rstudio/rserver.conf
# Server Configuration File
rsession-which-r=/usr/local/r-3.3.2/bin/r
www-port=8787
www-address=0.0.0.0 # Allow access to the IP address, default to 0.0.0.0
vim/etc/rstudio/rsession.conf
# R Session Configuration File
session-timeout-minutes=30 # Sessions Time-out
r-cran-repos=http://ftp.ctex.org/ mirrors/cran/ # Setting up the CRAN repository
Then you can see the interface.
In addition, Rstudio can not be logged in with the root user is also a comparison pit.