The Spark cluster is required for the recent completion, so the deployment process is documented. We know that Spark has officially provided three cluster deployment scenarios: Standalone, Mesos, YARN. One of the most convenient Standalone, this article mainly on the integration of YARN deployment plan.
Software Environment:
Ubuntu 14.04.1 LTS (gnu/linux 3.13.0-32-generic x86_64)
hadoop:2.6.0
spark:1.3.0 0 written in front
The demo in this example is non-root, so some command lines require sudo, and if you are running as root, ignore sudo. Download and install the software recommendations are placed on the home directory, such as ~/workspace, this is more convenient, so as to avoid the issue of permissions to unnecessary trouble. 1. Environment Ready to Modify host name
We will build 1 master,2 slave cluster solutions. First modify the hostname vi/etc/hostname, modify it to master on master, one of the slave is modified to slave1, and the other is the same. Configure the hosts
To modify the host file on each host
Vi/etc/hosts
10.1.1.107 Master
10.1.1.108 slave1
10.1.1.109 slave2
|
After configuration ping the user name to see if it takes effect
2.SSH Password-free login, if the SCP command permission denied can be copied to operate, the following red section is recorded
Installing OpenSSH server
sudo apt-get install Openssh-server
|
Generate private and public keys on all machines
Need to be able to access each other between the machines, the id_rsa.pub on each machine to the master node, the transmission of the public key can be transmitted by SCP.
SCP ~/.ssh/id_rsa.pub spark@master:~/.ssh/id_rsa.pub.slave1
|
On master, all public keys are added to the public key file Authorized_keys for authentication
Cat ~/.ssh/id_rsa.pub* >> ~/.ssh/authorized_keys
|
Distribute the public key file Authorized_keys to each slave
SCP ~/.ssh/authorized_keys spark@slave1:~/.ssh/
|
Verify SSH password-free communication on each machine
SSH master
ssh slave1
ssh slave2
|
If the login test is unsuccessful, you may need to modify the permissions of the file Authorized_keys (the settings for permissions are important because unsafe settings will make it impossible for you to use the RSA feature)
chmod ~/.ssh/authorized_keys
|
3. Installing Java
Download the latest version of Java from official website, spark official description Java as long as the version of more than 6 can be, I am under the jdk-7u75-linux-x64.gz
Unzip directly under the ~/workspace directory
TAR-ZXVF jdk-7u75-linux-x64.gz
|
Modify the environment variable sudo vi/etc/profile, add the following, and note the home path is replaced by your :
Export work_space=/home/spark/workspace/export
java_home= $WORK _space/jdk1.7.0_75
export jre_home=/home/ Spark/work/jdk1.7.0_75/jre
Export path= $JAVA _home/bin: $JAVA _home/jre/bin: $PATH
Export classpath=$ CLASSPATH:.: $JAVA _home/lib: $JAVA _home/jre/lib
|
Then make the environment variable effective and verify that Java is installed successfully
$ source/etc/profile #生效环境变量
$ java-version #如果打印出如下版本信息, then the installation was successful
Java version "1.7.0_75"
Java (TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot (TM) 64-bit Server VM (build 24.75-b04, Mixed mode)
|
4. Install Scala
Spark officially requires the Scala version to be 2.10.x, take care not to make the wrong version, I'm under 2.10.4, Official download address (hateful celestial large LAN download Scala turtle Speed General).
And we ~/workspace in the same way.
TAR-ZXVF scala-2.10.4.tgz
|
Modify the environment variable again sudo vi/etc/profile, adding the following:
Export scala_home= $WORK _space/scala-2.10.4
export path= $PATH: $SCALA _home/bin
|
The same approach takes the environment variable into effect and verifies that Scala is installed successfully
$ source/etc/profile #生效环境变量
$ scala-version #如果打印出如下版本信息, then the installation was successful
Scala code runner version 2.10.4-C Opyright 2002-2013, LAMP/EPFL
|
5. Install the configuration Hadoop YARN Download Unzip
Download the hadoop2.6.0 version from the official website, here is a mirror download address for our school.
And we ~/workspace in the same way.
TAR-ZXVF hadoop-2.6.0.tar.gz
|
Configure Hadoop
CD ~/workspace/hadoop-2.6.0/etc/hadoop into the Hadoop configuration directory, The following 7 files need to be configured: Hadoop-env.sh,yarn-env.sh,slaves,core-site.xml,hdfs-site.xml,maprd-site.xml,yarn-site.xml
Configuring Java_home in Hadoop-env.sh
# The Java implementation to use.
Export java_home=/home/spark/workspace/jdk1.7.0_75
|
Configuring Java_home in Yarn-env.sh
# some Java parameters
export java_home=/home/spark/workspace/jdk1.7.0_75
|
</