spark2.x Study notes: 5, Spark on yarn mode

Source: Internet
Author: User
Tags xsl ssh
Spark Learning Notes: 5, spark on yarn mode

Some of the blogs about spark on yarn deployment are actually about Spark's standalone run mode. If you start the master and worker services for Spark, this is the standalone run mode of spark, not the spark on Yarn run mode, please do not confuse.

In a production environment, Spark is primarily deployed in a Hadoop cluster, runs in spark on yarn mode, and relies on yarn to dispatch Spark, much better than the default spark run mode performance.
So you need to build a Hadoop distributed environment before you can start deploying spark on yarn.

If you are ready for a Hadoop distributed environment, go directly to section 5.5;
If you are unfamiliar with Hadoop distributed environments, refer to sections 5.1 through 5.4 below.

Hadoop distributed environment: Build a spark distributed environment with 192.168.1.180 nodes, then distribute the packages to individual nodes.
My 192.168.1.180 node is a virtual machine, so you can quickly build a cluster by replicating virtual machines. 5.1 Basic Linux environment Setup

(1) Configure IP address

(2) Modify the Hosts file

[Root@master ~]# vi/etc/hosts
[root@master ~]# cat/etc/hosts
127.0.0.1   localhost localhost.localdomain Localhost4 localhost4.localdomain4
:: 1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.180   Master
192.168.1.181   slave1
192.168.1.182   slave2

(3) Turn off the firewall and disable SELinux
Stop the firewall

[Root@master ~]# systemctl stop firewalld
[root@master ~]# systemctl Disable FIREWALLD

Disable SELinux

[Root@master ~]# setenforce 0
[root@master ~]# sed-i ' s/selinux=enforcing/selinux=disabled/g '/etc/selinux/ Config

View the modified file

[Root@master ~]# cat/etc/selinux/config # This file controls the state of the SELinux on the 

system.
# selinux= can take one of the these three values:
#     Enforcing-selinux security policy is enforced.
#     Permissive-selinux Prints warnings instead of enforcing.
#     Disabled-no SELinux policy is loaded.
selinux=disabled
# selinuxtype= can take one of three values:
#     targeted-targeted processes is Protec Ted,
#     Minimum-modification of targeted policy. Only selected processes is protected. 
#     Mls-multi level Security protection.
selinuxtype=targeted 


[Root@master ~]#

(4) Installation openssh-clients
Installing Openssh-clients

[Root@master ~]# Yum install-y openssh-clients

Prepare a script file sshutil.sh in the/root directory for the SSH password-free login configuration (later) with the following content:

#!/bin/bash
ssh-keygen-q-T rsa-n ""-f/root/.ssh/id_rsa
ssh-copy-id-i localhost
ssh-copy-id-i master
  ssh-copy-id-i slave1
ssh-copy-id-i slave2

Parameter description:-T specify algorithm-F to generate secret key path-n Specify password

(5) Installation JDK8
Download unzip

[Root@master ~]# tar-zxvf jdk-8u144-linux-x64.tar.gz-c/opt

Configuring Environment variables

[Root@master ~]# vi/etc/profile.d/custom.sh
[root@master ~]# cat/etc/profile.d/custom.sh
#!/bin/bash
# Java path
export java_home=/opt/jdk1.8.0_144
export path= $PATH: $JAVA _home/bin
Export classpath=.:$ CLASSPATH: $JAVA _home/lib
[root@master ~]# source/etc/profile.d/custom.sh
[Root@master ~]# java-version
java Version "1.8.0_144"
Java (TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot (tm) 64-bit Server VM (build 25.144-b01, Mixed mode )
5.2 Hadoop Environment Setup

(1) Hadoop cluster planning

Serial Number OS IP Node name NN DN RM NM
1 CentOS7 192.168.1.180 Master Y Y Y Y
2 CentOS7 192.168.1.181 Slave1 Y Y
3 CentOS7 192.168.1.182 Slave2 Y Y

(2) Download Hadoop package

[Root@master ~]# wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.4/hadoop-2.7.4.tar.gz 
[Root@master ~]# tar-zxvf hadoop-2.7.4.tar.gz-c/opt

(3) hadoop-env.sh

[Root@master hadoop]# pwd
/opt/hadoop-2.7.4/etc/hadoop

(4) Core-site.xml

[Root@master hadoop-2.7.4]# VI Etc/hadoop/core-site.xml

Edit content as follows

<?xml version= "1.0" encoding= "UTF-8"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value> hdfs://master:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir< /name>
        <value>/var/data/hadoop</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>65536</value>
    </property>
</configuration>

(5) Hdfs-site.xml

[Root@node1 hadoop]# VI Hdfs-site.xml

The contents of the Hdfs-site.xml file are as follows:

<?xml version= "1.0" encoding= "UTF-8"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>
<configuration>
    <property>
        <name>dfs.replication</name>
        <value> 3</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address </name>
        <value>slave2:50090</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.https-address</name>
        <value>slave2:50091</value>
    </property>    
</configuration>

(6) Slaves

[Root@master hadoop]# echo ' master ' > Slaves
[Root@master hadoop]# echo ' slave1 ' >> slaves
[ Root@master hadoop]# Echo ' slave2 ' >> slaves
[Root@master hadoop]# cat Slaves 
master
slave1
Slave2
[Root@master hadoop]#

(7) Mapred-site.xml

[Root@master hadoop-2.7.4]# VI etc/hadoop/mapred-site.xml
[root@master hadoop-2.7.4]# cat etc/hadoop/ Mapred-site.xml
<?xml version= "1.0"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"? >
<configuration>
    <property>  
        <name>mapreduce.framework.name</name>  
        <value>yarn</value>  
    </property> 
</configuration>

(8) Yarn-site.xml

<?xml version= "1.0" encoding= "Utf-8"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>
<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>
    <property>
        <name> yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property >
</configuration>

(9) Configuring environment variables
Edit/etc/profile.d/custom.sh to add the following:

#hadoop path
export hadoop_home=/opt/hadoop-2.7.4
export path=${hadoop_home}/bin:${hadoop_home}/sbin:$ PATH
Export Hadoop_mapred_home=${hadoop_home}
export Hadoop_common_home=${hadoop_home}
Export Hadoop_hdfs_home=${hadoop_home}
Export Yarn_home=${hadoop_home}
5.3 Building a cluster

The Linux infrastructure and Hadoop environments are already well-configured, and you now need to build your cluster. This is the quickest way to replicate a virtual machine.

(1) Replicating virtual machines
First shut down the virtual machine master 192.168.1.180, first copy a slave1 node, as follows: Right-click Master in the VMware software, select mange–>clone; in the shortcut menu that pops up, and then click the Clone Wizard. Next "Clone from interface default option, click" Next "Clone type interface to select the Create a full Clone name input box input node name slave1,location Check the new virtual machine storage directory (default), Click Next to click the Finish button to start copying and then click the Close button

The same operation, and then copy a slave2 node.
(2) Modify IP and hostname
Modify the IP and hostname of the new node slave1 first
Modify the IPAddr value directly from the SED command.

Sed-i ' s/192.168.1.180/192.168.1.181 '/etc/sysconfig/network-scripts/ifcfg-ens32

Then restart the network

Systemctl Restart Network

Modify Host Name

Hostnamectl Set-hostname slave1

Same operation modifies the IP and hostanem of the Slave2 node

problems that may exist:
If the copied virtual machine cannot be networked, try editing the/etc/sysconfig/network-scripts/ifcfg-ens32 file, removing the UUID and hwaddr two rows of data . Then restart the network.

(3) SSH-free operation
Execute the sshutil.sh script on the master node and follow the prompts to enter Yes and the corresponding node password

[Root@master ~]# sh sshutil.sh The authenticity of host ' localhost ' (:: 1) ' can ' t be established.
ECDSA key fingerprint is 22:5e:82:fa:7b:c3:26:de:30:76:73:bd:7c:a2:17:29. Is you sure want to continue connecting (yes/no)? Yes/usr/bin/ssh-copy-id:info:attempting to log in with the new key (s), to filter out any that is already installed/us R/bin/ssh-copy-id:info:1 key (s) remain to being installed--if you are prompted now it's to install the new keys Root@loc Alhost ' s Password:number of key (s) added:1 now try logging into the machine, with: "SSH ' localhost '" and check to M

Ake sure that is only the key (s) for you wanted were added.
The authenticity of host ' master (192.168.1.180) ' can ' t be established.
ECDSA key fingerprint is 22:5e:82:fa:7b:c3:26:de:30:76:73:bd:7c:a2:17:29. Is you sure want to continue connecting (yes/no)? Yes/usr/bin/ssh-copy-id:info:attempting to log in with the new key (s), to filter out any that is already installed/u Sr/bin/ssh-copy-id:warning:alL keys were skipped because they already exist on the remote system.
The authenticity of host ' slave1 (192.168.1.181) ' can ' t be established.
ECDSA key fingerprint is 22:5e:82:fa:7b:c3:26:de:30:76:73:bd:7c:a2:17:29. Is you sure want to continue connecting (yes/no)? Yes/usr/bin/ssh-copy-id:info:attempting to log in with the new key (s), to filter out any that is already installed/us R/bin/ssh-copy-id:info:1 key (s) remain to being installed--if you are prompted now it's to install the new keys Root@sla Ve1 ' s Password:number of key (s) added:1 now try logging into the machine, with: "SSH ' slave1 '" and check to make Su

Re that is only the key (s) for you wanted were added.
The authenticity of host ' slave2 (192.168.1.182) ' can ' t be established.
ECDSA key fingerprint is 22:5e:82:fa:7b:c3:26:de:30:76:73:bd:7c:a2:17:29. Is you sure want to continue connecting (yes/no)? Yes/usr/bin/ssh-copy-id:info:attempting to log in with the new key (s), to filter out any that is already instaLled/usr/bin/ssh-copy-id:info:1 key (s) remain to being installed--if you are prompted now it's to install the new keys  Root@slave2 ' s Password:number of key (s) added:1 now try logging into the machine, with: "SSH ' slave2 '" and check to

Make sure, that's only the key (s) wanted were added.  [Root@master ~]#

Then execute the sshutil.sh on the other two nodes

[Root@slave1 ~]# SH sshutil.sh 

(4) environment variable effective

[Root@master ~]# source/etc/profile.d/custom.sh
[Root@slave1 ~]# source/etc/profile.d/custom.sh
[Root@slave2 ~]# source/etc/profile.d/custom.sh
5.4 Starting a Hadoop cluster

(0) Clear Data
Because the Hadoop pseudo-distributed environment was built on the 192.168.1.180 node, it was Namenode formatted, and the Hadoop data was cleared first. Do not clear if the Hadoopnamenode format has not been previously performed.

[Root@master ~]# rm-rf/tmp/*

(1) namenode formatting

[Root@master ~]# HDFs namenode-format 17/09/01 04:13:51 INFO namenode.
Namenode:startup_msg:/************************************************************ startup_msg:starting NameNode Startup_msg:host = master/192.168.1.180 Startup_msg:args = [-format] startup_msg:version = 2.7.4 startup_msg:c Lasspath =/opt/hadoop-2.7.4/etc/hadoop:/opt/hadoop-2.7.4/share/hadoop/common/lib/commons-compress-1.4.1.jar:/ opt/hadoop-2.7.4/share/hadoop/common/lib/commons-cli-1.2.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/ Jettison-1.1.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/curator-framework-2.7.1.jar:/opt/hadoop-2.7.4/share /hadoop/common/lib/java-xmlbuilder-0.4.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/opt /hadoop-2.7.4/share/hadoop/common/lib/commons-digester-1.8.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/ httpclient-4.2.5.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/api-asn1-api-1.0.0-m20.jar:/opt/hadoop-2.7.4/ share/hadoop/common/lib/protobuf-java-2.5.0.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/jersey-server-1.9.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/ mockito-all-1.8.5.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/commons-httpclient-3.1.jar:/opt/hadoop-2.7.4/ share/hadoop/common/lib/jersey-core-1.9.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/xmlenc-0.52.jar:/opt/ hadoop-2.7.4/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/ Jersey-json-1.9.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/curator-client-2.7.1.jar:/opt/hadoop-2.7.4/share /hadoop/common/lib/avro-1.7.4.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/commons-net-3.1.jar:/opt/ hadoop-2.7.4/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/ log4j-1.2.17.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/gson-2.2.4.jar:/opt/hadoop-2.7.4/share/hadoop/ common/lib/hamcrest-core-1.3.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/commons-io-2.4.jar:/opt/ Hadoop-2.7.4/share/hadoop/common/lib/commons-configuration-1.6.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/activation-1.1.jar:/opt/hadoop-2.7.4/share/hadoop/ common/lib/api-util-1.0.0-m20.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/jets3t-0.9.0.jar:/opt/hadoop-2.7.4 /share/hadoop/common/lib/apacheds-i18n-2.0.0-m15.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/ jetty-util-6.1.26.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/commons-collections-3.2.2.jar:/opt/ hadoop-2.7.4/share/hadoop/common/lib/zookeeper-3.4.6.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/ jackson-core-asl-1.9.13.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/opt/ hadoop-2.7.4/share/hadoop/common/lib/jsch-0.1.54.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/ jaxb-impl-2.2.3-1.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/commons-math3-3.1.1.jar:/opt/hadoop-2.7.4/ share/hadoop/common/lib/servlet-api-2.5.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/ Commons-logging-1.1.3.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/jsr305-3.0.0.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/ xz-1.0.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/opt/hadoop-2.7.4/share/hadoop/common/ lib/jetty-sslengine-6.1.26.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/opt/ hadoop-2.7.4/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/ guava-11.0.2.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/httpcore-4.2.5.jar:/opt/hadoop-2.7.4/share/hadoop/ common/lib/hadoop-auth-2.7.4.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/junit-4.11.jar:/opt/hadoop-2.7.4/ share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/ Jackson-jaxrs-1.9.13.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/paranamer-2.3.jar:/opt/hadoop-2.7.4/share/ hadoop/common/lib/netty-3.6.2.final.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/jsp-api-2.1.jar:/opt/ Hadoop-2.7.4/share/hadoop/common/lib/asm-3.2.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/stax-api-1.0-2.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/ apacheds-kerberos-codec-2.0.0-m15.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/commons-codec-1.4.jar:/opt/ hadoop-2.7.4/share/hadoop/common/lib/hadoop-annotations-2.7.4.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/ jetty-6.1.26.jar:/opt/hadoop-2.7.4/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:/opt/hadoop-2.7.4/ share/hadoop/common/lib/commons-lang-2.6.jar:/opt/hadoop-2.7.4/share/hadoop/common/hadoop-nfs-2.7.4.jar:/opt/ hadoop-2.7.4/share/hadoop/common/hadoop-common-2.7.4-tests.jar:/opt/hadoop-2.7.4/share/hadoop/common/ hadoop-common-2.7.4.jar:/opt/hadoop-2.7.4/share/hadoop/hdfs:/opt/hadoop-2.7.4/share/hadoop/hdfs/lib/ Xml-apis-1.3.04.jar:/opt/hadoop-2.7.4/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/opt/hadoop-2.7.4/share/hadoop /hdfs/lib/netty-all-4.0.23.final.jar:/opt/hadoop-2.7.4/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/opt/ Hadoop-2.7.4/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/opt/hadoop-2.7.4/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/opt/hadoop-2.7.4/share/hadoop/hdfs/lib/ Xmlenc-0.52.jar:/opt/hadoop-2.7.4/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoo

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.