Spark_on_yarn Environment Construction

Source: Internet
Author: User
Tags failover mkdir xsl zookeeper ssh iptables firewall

Cluster mode machine software version public zookeeper service download Unified time configuration hosts firewall configure a password-free login installation hadoop273 Hadoop configuration hadoop-envsh configuration yarn-envsh configuration Slaves configuration Core-sitex ML configuration hdfs-sitexml Configuration mapred-sitexml configuration Yarn-sitexml configuration distribution to a process configured to slave start Dfs Dfs starts before DFS startup commands dfs boot process dfs shutdown command when DFS startup fails operation Nam Enode webui Status View start yarn yarn Start command yarn start effect yarn off command Cluster webui Status View install spark200 Spark configuration Spark-envsh Configuration slaves configuration turn on log4j start s Park Run test add slave machine stop all processes slave Configure app configuration restart all processes hbase cluster configuration change impact on spark on yarn cluster configuration replacement update spark

Reference article: Hua Tuo Project task of Spark-on-yarn cluster scheme
RESOURCEMANAGERHA cluster mode

Using yarn cluster machines

See machine list software version

Store directory after download:/home/work/soft
spark:2.0.0
hadoop:2.7.3 Public Zookeeper service

Service address: Zookeeper.waimai.baidu.com:2181/waimai/inf/spark-yarn Download

Configure the agent first, then download Hadoop and Spark

wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.0-bin-hadoop2.7.tgz

wget http://apache.fayea.com/hadoop/ Common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
Unified Time

OP provides us with a spark cluster time of Shanghai 8, so no configuration is required and can be viewed through the date-r command
If it is different, you can set it with the following command

sudo cp/usr/share/zoneinfo/asia/shanghai/etc/localtime
vi/etc/sysconfig/clock View specific configuration information
Configure the hosts

Our machine is divided into 2 sets of resourcemanager,8 NodeManager
Spark00 for Active
Spark01 to Standby

IP Host name Process
a.212 Ahost.00.name ResourceManager
a.213 Ahost.01.name ResourceManager
a.214 Ahost.02.name NodeManager
a.215 Ahost.03.name NodeManager
a.216 Ahost.04.name NodeManager
a.217 Ahost.05.name NodeManager
a.218 Ahost.06.name NodeManager
a.219 Ahost.07.name NodeManager
a.220 Ahost.08.name NodeManager
a.221 Ahost.09.name NodeManager

We add the following configuration to the/etc/hosts of 10 machines

a.212 ahost.00.name
a.213 ahost.01.name
a.214 ahost.02.name
a.215 ahost.03.name
a.216 AHOST.04.name
a.217 Ahost.05.name
a.218 ahost.06.name
a.219 ahost.07.name
a.220 ahost.08.name
a.221 ahost.09.name
Firewall

Practice has proved that firewalls do not affect our build, if there is an impact the following commands may help you

View status: Sudo service iptables status on

: sudo service iptables start 

off: sudo service iptables stop
Configure a password-free login

Execute the following command in the node machine

SSH-KEYGEN-T RSA

Append the information in the/home/work/.ssh/id_rsa.pub to the ~/.ssh/authorized_keys in the master machine, and all the node machines are id_rsa.pub to append the information to the master machine.
~/.ssh/authorized_keys are copied to all other machines.

SCP ~/.ssh/authorized_keys work@a.213:~/.ssh/
SCP ~/.ssh/authorized_keys work@a.214:~/.ssh/
SCP ~/.ssh/ Authorized_keys work@a.215:~/.ssh/
SCP ~/.ssh/authorized_keys work@a.216:~/.ssh/
SCP ~/.ssh/authorized_ Keys work@a.217:~/.ssh/
SCP ~/.ssh/authorized_keys work@a.218:~/.ssh/
SCP ~/.ssh/authorized_keys work@A.219 : ~/.ssh/
SCP ~/.ssh/authorized_keys work@a.220:~/.ssh/
SCP ~/.ssh/authorized_keys work@a.221:~/.ssh/
Installing hadoop2.7.3

Enter the spark00 machine (Master master machine)

Installation directory:/home/work/hadoop

Enter/home/work/soft to execute the following command:

TAR-ZXVF  hadoop-2.7.3.tar.gz
cp-r hadoop-2.7.3/*/home/work/hadoop/

The Hadoop file structure is as follows

Configuring the Hadoop environment variables

Configure the following information in/etc/profile (NNA and NNS two machines)

Export Hadoop_home=/home/work/hadoop export

hadoop_conf_dir=${hadoop_home}/etc/hadoop

export yarn_home=/ Home/work/hadoop

export yarn_conf_dir=${yarn_home}/etc/hadoop

path= $JAVA _home: $PATH: $HADOOP _home/bin

Use Source/etc/profile to make the environment variable effective by using the Echo $HADOOP _home to view the environment variable information.

Hadoop Configuration

To enter/home/work/hadoop/etc/hadoop, the files that need to be configured are in this directory. Due to the experimental phase, I first used 1 main 1 to prepare 3slave structure.

We defined some folders before we configured them.

Mkdir-p/home/work/tmp
mkdir-p/home/work/data/tmp/journal
mkdir-p/home/work/data/dfs/namenode
mkdir -p/home/work/data/dfs/datanode
mkdir-p/home/work/data/yarn/local
mkdir-p/home/work/log/yarn
hadoop-env.sh Configuration

JDK paths used by the Hadoop environment

Export java_home=/usr/java/jdk1.8.0_65/
yarn-env.sh Configuration

The JDK path used by the yarn environment

Export java_home=/usr/java/jdk1.8.0_65/
Slaves Configuration

Configure node Information

Ahost.02.name
ahost.03.name
ahost.04.name
core-site.xml Configuration
<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "text/xsl" href= " Configuration.xsl "?> <configuration><property> <name>fs.defaultFS</name> <v Alue>hdfs://cluster1</value> </property><property><name>io.file.buffer.size</name ><value>131072</value></property> <property> <name>hadoop.tmp.dir</name>
        ; <value>/home/work/tmp</value> </property> <property><name> Hadoop.proxyuser.hadoop.hosts</name><value>*</value></property> <property>< Name>hadoop.proxyuser.hadoop.groups</name><value>*</value></property> <property ><name>ha.zookeeper.quorum</name><value>zookeeper.waimai.baidu.com:2181/waimai/inf/ Spark-yarn</value></property> </configuration> 
hdfs-site.xml Configuration

Hdfs-site.xml Configuration Explanation

<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?> < 
  configuration> <property> <name>dfs.nameservices</name> <value>cluster1</value> </property> <property> <name>dfs.ha.namenodes.cluster1</name> <value>nna,nns&lt
    ;/value> </property> <property> <name>dfs.namenode.rpc-address.cluster1.nna</name> <value>AHOST.00.name:9000</value> </property> <property> <name>dfs.namenode.rpc-add
    ress.cluster1.nns</name> <value>AHOST.01.name:9000</value> </property> <property> <name>dfs.namenode.http-address.cluster1.nna</name> <value>AHOST.00.name:50070</value> & lt;/property> <property> <name>dfs.namenode.http-address.cluster1.nns</name> <value>a
Host.01.name:50070</value>  </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjourna l://ahost.02.name:8485; ahost.03.name:8485; ahost.04.name:8485/cluster1</value> </property> <property> <name>dfs.client.failover.prox Y.provider.cluster1</name> <value> Org.apache.hadoop.hdfs.server.namenode.ha.configuredfailoverproxyprovider</value> </property> < property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> &LT;/PROPERTY&G
  T <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/work/.ssh/id_ rsa</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <va lue>/home/work/data/tmp/journal</value> </property> <property> <name>dfs.ha.automatic- Failover.enabled</name> <value>true</value&Gt </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/work/data/d fs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> &L t;value>/home/work/data/dfs/datanode</value> </property> <property> <name>dfs.replicat ion</name> <value>3</value> </property> <property> <name>dfs.webhdfs.enabl ed</name> <value>true</value> </property> <property> <name>dfs.journalnode .http-address</name> <value>0.0.0.0:8480</value> </property> <property> <name >dfs.journalnode.rpc-address</name> <value>0.0.0.0:8485</value> </property> <proper Ty> <name>ha.zookeeper.quorum</name> <value>zookeeper.waimai.baidu.com:2181/waimai/inf/
 Spark-yarn</value> </property> </configuration> 
mapred-site.xml Configuration

Copy an mapred-site.xml file from the template first
CP Mapred-site.xml.template Mapred-site.xml

<?xml version= "1.0"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>

< configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value> yarn</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.address< /name>
    <value>AHOST.00.name:10020</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>ahost.00.name:19888</ value>
  </property>
</configuration>
yarn-site.xml Configuration
<?xml version= "1.0"?> <configuration> <property> <name>yarn.resou rcemanager.ha.enabled</name> <value>true</value> </property> <property> <name >yarn.resourcemanager.cluster-id</name> <value>yarn-cluster</value> </property> <PR operty> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </prope rty> <property> <name>yarn.resourcemanager.hostname.rm1</name> <VALUE>AHOST.00.NAME&L t;/value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> < ;value>ahost.01.name</value> </property> <property> << 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.