Spark_on_yarn Environment Construction

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Cluster mode machine software version public zookeeper service download Unified time configuration hosts firewall configure a password-free login installation hadoop273 Hadoop configuration hadoop-envsh configuration yarn-envsh configuration Slaves configuration Core-sitex ML configuration hdfs-sitexml Configuration mapred-sitexml configuration Yarn-sitexml configuration distribution to a process configured to slave start Dfs Dfs starts before DFS startup commands dfs boot process dfs shutdown command when DFS startup fails operation Nam Enode webui Status View start yarn yarn Start command yarn start effect yarn off command Cluster webui Status View install spark200 Spark configuration Spark-envsh Configuration slaves configuration turn on log4j start s Park Run test add slave machine stop all processes slave Configure app configuration restart all processes hbase cluster configuration change impact on spark on yarn cluster configuration replacement update spark

Reference article: Hua Tuo Project task of Spark-on-yarn cluster scheme
RESOURCEMANAGERHA cluster mode

Using yarn cluster machines

See machine list software version

Store directory after download:/home/work/soft
spark:2.0.0
hadoop:2.7.3 Public Zookeeper service

Service address: Zookeeper.waimai.baidu.com:2181/waimai/inf/spark-yarn Download

Configure the agent first, then download Hadoop and Spark

wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.0-bin-hadoop2.7.tgz

wget http://apache.fayea.com/hadoop/ Common/hadoop-2.7.3/hadoop-2.7.3.tar.gz

Unified Time

OP provides us with a spark cluster time of Shanghai 8, so no configuration is required and can be viewed through the date-r command
If it is different, you can set it with the following command

sudo cp/usr/share/zoneinfo/asia/shanghai/etc/localtime
vi/etc/sysconfig/clock View specific configuration information

Configure the hosts

Our machine is divided into 2 sets of resourcemanager,8 NodeManager
Spark00 for Active
Spark01 to Standby

IP	Host name	Process
a.212	Ahost.00.name	ResourceManager
a.213	Ahost.01.name	ResourceManager
a.214	Ahost.02.name	NodeManager
a.215	Ahost.03.name	NodeManager
a.216	Ahost.04.name	NodeManager
a.217	Ahost.05.name	NodeManager
a.218	Ahost.06.name	NodeManager
a.219	Ahost.07.name	NodeManager
a.220	Ahost.08.name	NodeManager
a.221	Ahost.09.name	NodeManager

We add the following configuration to the/etc/hosts of 10 machines

a.212 ahost.00.name
a.213 ahost.01.name
a.214 ahost.02.name
a.215 ahost.03.name
a.216 AHOST.04.name
a.217 Ahost.05.name
a.218 ahost.06.name
a.219 ahost.07.name
a.220 ahost.08.name
a.221 ahost.09.name

Firewall

Practice has proved that firewalls do not affect our build, if there is an impact the following commands may help you

View status: Sudo service iptables status on

: sudo service iptables start 

off: sudo service iptables stop

Configure a password-free login

Execute the following command in the node machine

SSH-KEYGEN-T RSA

Append the information in the/home/work/.ssh/id_rsa.pub to the ~/.ssh/authorized_keys in the master machine, and all the node machines are id_rsa.pub to append the information to the master machine.
~/.ssh/authorized_keys are copied to all other machines.

SCP ~/.ssh/authorized_keys work@a.213:~/.ssh/
SCP ~/.ssh/authorized_keys work@a.214:~/.ssh/
SCP ~/.ssh/ Authorized_keys work@a.215:~/.ssh/
SCP ~/.ssh/authorized_keys work@a.216:~/.ssh/
SCP ~/.ssh/authorized_ Keys work@a.217:~/.ssh/
SCP ~/.ssh/authorized_keys work@a.218:~/.ssh/
SCP ~/.ssh/authorized_keys work@A.219 : ~/.ssh/
SCP ~/.ssh/authorized_keys work@a.220:~/.ssh/
SCP ~/.ssh/authorized_keys work@a.221:~/.ssh/

Installing hadoop2.7.3

Enter the spark00 machine (Master master machine)

Installation directory:/home/work/hadoop

Enter/home/work/soft to execute the following command:

TAR-ZXVF  hadoop-2.7.3.tar.gz
cp-r hadoop-2.7.3/*/home/work/hadoop/

The Hadoop file structure is as follows

Configuring the Hadoop environment variables

Configure the following information in/etc/profile (NNA and NNS two machines)

Export Hadoop_home=/home/work/hadoop export

hadoop_conf_dir=${hadoop_home}/etc/hadoop

export yarn_home=/ Home/work/hadoop

export yarn_conf_dir=${yarn_home}/etc/hadoop

path= $JAVA _home: $PATH: $HADOOP _home/bin

Use Source/etc/profile to make the environment variable effective by using the Echo $HADOOP _home to view the environment variable information.

Hadoop Configuration

To enter/home/work/hadoop/etc/hadoop, the files that need to be configured are in this directory. Due to the experimental phase, I first used 1 main 1 to prepare 3slave structure.

We defined some folders before we configured them.

Mkdir-p/home/work/tmp
mkdir-p/home/work/data/tmp/journal
mkdir-p/home/work/data/dfs/namenode
mkdir -p/home/work/data/dfs/datanode
mkdir-p/home/work/data/yarn/local
mkdir-p/home/work/log/yarn

hadoop-env.sh Configuration

JDK paths used by the Hadoop environment

Export java_home=/usr/java/jdk1.8.0_65/

yarn-env.sh Configuration

The JDK path used by the yarn environment

Export java_home=/usr/java/jdk1.8.0_65/

Slaves Configuration

Configure node Information

Ahost.02.name
ahost.03.name
ahost.04.name

core-site.xml Configuration

<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "text/xsl" href= " Configuration.xsl "?> <configuration><property> <name>fs.defaultFS</name> <v Alue>hdfs://cluster1</value> </property><property><name>io.file.buffer.size</name ><value>131072</value></property> <property> <name>hadoop.tmp.dir</name>
        ; <value>/home/work/tmp</value> </property> <property><name> Hadoop.proxyuser.hadoop.hosts</name><value>*</value></property> <property>< Name>hadoop.proxyuser.hadoop.groups</name><value>*</value></property> <property ><name>ha.zookeeper.quorum</name><value>zookeeper.waimai.baidu.com:2181/waimai/inf/ Spark-yarn</value></property> </configuration>

hdfs-site.xml Configuration

Hdfs-site.xml Configuration Explanation

<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?> < 
  configuration> <property> <name>dfs.nameservices</name> <value>cluster1</value> </property> <property> <name>dfs.ha.namenodes.cluster1</name> <value>nna,nns&lt
    ;/value> </property> <property> <name>dfs.namenode.rpc-address.cluster1.nna</name> <value>AHOST.00.name:9000</value> </property> <property> <name>dfs.namenode.rpc-add
    ress.cluster1.nns</name> <value>AHOST.01.name:9000</value> </property> <property> <name>dfs.namenode.http-address.cluster1.nna</name> <value>AHOST.00.name:50070</value> & lt;/property> <property> <name>dfs.namenode.http-address.cluster1.nns</name> <value>a
Host.01.name:50070</value>  </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjourna l://ahost.02.name:8485; ahost.03.name:8485; ahost.04.name:8485/cluster1</value> </property> <property> <name>dfs.client.failover.prox Y.provider.cluster1</name> <value> Org.apache.hadoop.hdfs.server.namenode.ha.configuredfailoverproxyprovider</value> </property> < property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> &LT;/PROPERTY&G
  T <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/work/.ssh/id_ rsa</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <va lue>/home/work/data/tmp/journal</value> </property> <property> <name>dfs.ha.automatic- Failover.enabled</name> <value>true</value&Gt </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/work/data/d fs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> &L t;value>/home/work/data/dfs/datanode</value> </property> <property> <name>dfs.replicat ion</name> <value>3</value> </property> <property> <name>dfs.webhdfs.enabl ed</name> <value>true</value> </property> <property> <name>dfs.journalnode .http-address</name> <value>0.0.0.0:8480</value> </property> <property> <name >dfs.journalnode.rpc-address</name> <value>0.0.0.0:8485</value> </property> <proper Ty> <name>ha.zookeeper.quorum</name> <value>zookeeper.waimai.baidu.com:2181/waimai/inf/
 Spark-yarn</value> </property> </configuration>

mapred-site.xml Configuration

Copy an mapred-site.xml file from the template first
CP Mapred-site.xml.template Mapred-site.xml

<?xml version= "1.0"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>

< configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value> yarn</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.address< /name>
    <value>AHOST.00.name:10020</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>ahost.00.name:19888</ value>
  </property>
</configuration>

yarn-site.xml Configuration

<?xml version= "1.0"?> <configuration> <property> <name>yarn.resou rcemanager.ha.enabled</name> <value>true</value> </property> <property> <name >yarn.resourcemanager.cluster-id</name> <value>yarn-cluster</value> </property> <PR operty> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </prope rty> <property> <name>yarn.resourcemanager.hostname.rm1</name> <VALUE>AHOST.00.NAME&L t;/value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> < ;value>ahost.01.name</value> </property> <property> <<

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark_on_yarn Environment Construction

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Spark_on_yarn Environment Construction

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support