Spark+hadoop (Yarn mode)

Last Update:2018-07-25 Source: Internet

Author: User

Tags ssh openssh server scp command

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The Spark cluster is required for the recent completion, so the deployment process is documented. We know that Spark has officially provided three cluster deployment scenarios: Standalone, Mesos, YARN. One of the most convenient Standalone, this article mainly on the integration of YARN deployment plan.

Software Environment:

Ubuntu 14.04.1 LTS (gnu/linux 3.13.0-32-generic x86_64)
hadoop:2.6.0
spark:1.3.0 0 written in front

The demo in this example is non-root, so some command lines require sudo, and if you are running as root, ignore sudo. Download and install the software recommendations are placed on the home directory, such as ~/workspace, this is more convenient, so as to avoid the issue of permissions to unnecessary trouble. 1. Environment Ready to Modify host name

We will build 1 master,2 slave cluster solutions. First modify the hostname vi/etc/hostname, modify it to master on master, one of the slave is modified to slave1, and the other is the same. Configure the hosts

To modify the host file on each host

Vi/etc/hosts

10.1.1.107 Master
10.1.1.108 slave1
10.1.1.109 slave2

After configuration ping the user name to see if it takes effect

Ping slave1
Ping slave2

2.SSH Password-free login, if the SCP command permission denied can be copied to operate, the following red section is recorded

Installing OpenSSH server

sudo apt-get install Openssh-server

Generate private and public keys on all machines

ssh-keygen-t RSA #一路回车

Need to be able to access each other between the machines, the id_rsa.pub on each machine to the master node, the transmission of the public key can be transmitted by SCP.

SCP ~/.ssh/id_rsa.pub spark@master:~/.ssh/id_rsa.pub.slave1

On master, all public keys are added to the public key file Authorized_keys for authentication

Cat ~/.ssh/id_rsa.pub* >> ~/.ssh/authorized_keys

Distribute the public key file Authorized_keys to each slave

SCP ~/.ssh/authorized_keys spark@slave1:~/.ssh/

Verify SSH password-free communication on each machine

SSH master
ssh slave1
ssh slave2

If the login test is unsuccessful, you may need to modify the permissions of the file Authorized_keys (the settings for permissions are important because unsafe settings will make it impossible for you to use the RSA feature)

chmod ~/.ssh/authorized_keys

3. Installing Java

Download the latest version of Java from official website, spark official description Java as long as the version of more than 6 can be, I am under the jdk-7u75-linux-x64.gz
Unzip directly under the ~/workspace directory

TAR-ZXVF jdk-7u75-linux-x64.gz

Modify the environment variable sudo vi/etc/profile, add the following, and note the home path is replaced by your :

Export work_space=/home/spark/workspace/export
java_home= $WORK _space/jdk1.7.0_75
export jre_home=/home/ Spark/work/jdk1.7.0_75/jre
Export path= $JAVA _home/bin: $JAVA _home/jre/bin: $PATH
Export classpath=$ CLASSPATH:.: $JAVA _home/lib: $JAVA _home/jre/lib

Then make the environment variable effective and verify that Java is installed successfully

$ source/etc/profile #生效环境变量
$ java-version #如果打印出如下版本信息, then the installation was successful
Java version "1.7.0_75"
Java (TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot (TM) 64-bit Server VM (build 24.75-b04, Mixed mode)

4. Install Scala

Spark officially requires the Scala version to be 2.10.x, take care not to make the wrong version, I'm under 2.10.4, Official download address (hateful celestial large LAN download Scala turtle Speed General).

And we ~/workspace in the same way.

TAR-ZXVF scala-2.10.4.tgz

Modify the environment variable again sudo vi/etc/profile, adding the following:

Export scala_home= $WORK _space/scala-2.10.4
export path= $PATH: $SCALA _home/bin

The same approach takes the environment variable into effect and verifies that Scala is installed successfully

$ source/etc/profile #生效环境变量
$ scala-version #如果打印出如下版本信息, then the installation was successful
Scala code runner version 2.10.4-C Opyright 2002-2013, LAMP/EPFL

5. Install the configuration Hadoop YARN Download Unzip

Download the hadoop2.6.0 version from the official website, here is a mirror download address for our school.

And we ~/workspace in the same way.

TAR-ZXVF hadoop-2.6.0.tar.gz

Configure Hadoop

CD ~/workspace/hadoop-2.6.0/etc/hadoop into the Hadoop configuration directory, The following 7 files need to be configured: Hadoop-env.sh,yarn-env.sh,slaves,core-site.xml,hdfs-site.xml,maprd-site.xml,yarn-site.xml

Configuring Java_home in Hadoop-env.sh

# The Java implementation to use.
Export java_home=/home/spark/workspace/jdk1.7.0_75

Configuring Java_home in Yarn-env.sh

# some Java parameters
export java_home=/home/spark/workspace/jdk1.7.0_75

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More