Cluster mode machine software version public zookeeper service download Unified time configuration hosts firewall configure a password-free login installation hadoop273 Hadoop configuration hadoop-envsh configuration yarn-envsh configuration Slaves configuration Core-sitex ML configuration hdfs-sitexml Configuration mapred-sitexml configuration Yarn-sitexml configuration distribution to a process configured to slave start Dfs Dfs starts before DFS startup commands dfs boot process dfs shutdown command when DFS startup fails operation Nam Enode webui Status View start yarn yarn Start command yarn start effect yarn off command Cluster webui Status View install spark200 Spark configuration Spark-envsh Configuration slaves configuration turn on log4j start s Park Run test add slave machine stop all processes slave Configure app configuration restart all processes hbase cluster configuration change impact on spark on yarn cluster configuration replacement update spark
Reference article: Hua Tuo Project task of Spark-on-yarn cluster scheme
RESOURCEMANAGERHA cluster mode
Using yarn cluster machines
See machine list software version
Store directory after download:/home/work/soft
spark:2.0.0
hadoop:2.7.3 Public Zookeeper service
Service address: Zookeeper.waimai.baidu.com:2181/waimai/inf/spark-yarn Download
Configure the agent first, then download Hadoop and Spark
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.0.0-bin-hadoop2.7.tgz
wget http://apache.fayea.com/hadoop/ Common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
Unified Time
OP provides us with a spark cluster time of Shanghai 8, so no configuration is required and can be viewed through the date-r command
If it is different, you can set it with the following command
sudo cp/usr/share/zoneinfo/asia/shanghai/etc/localtime
vi/etc/sysconfig/clock View specific configuration information
Configure the hosts
Our machine is divided into 2 sets of resourcemanager,8 NodeManager
Spark00 for Active
Spark01 to Standby
IP |
Host name |
Process |
a.212 |
Ahost.00.name |
ResourceManager |
a.213 |
Ahost.01.name |
ResourceManager |
a.214 |
Ahost.02.name |
NodeManager |
a.215 |
Ahost.03.name |
NodeManager |
a.216 |
Ahost.04.name |
NodeManager |
a.217 |
Ahost.05.name |
NodeManager |
a.218 |
Ahost.06.name |
NodeManager |
a.219 |
Ahost.07.name |
NodeManager |
a.220 |
Ahost.08.name |
NodeManager |
a.221 |
Ahost.09.name |
NodeManager |
We add the following configuration to the/etc/hosts of 10 machines
a.212 ahost.00.name
a.213 ahost.01.name
a.214 ahost.02.name
a.215 ahost.03.name
a.216 AHOST.04.name
a.217 Ahost.05.name
a.218 ahost.06.name
a.219 ahost.07.name
a.220 ahost.08.name
a.221 ahost.09.name
Firewall
Practice has proved that firewalls do not affect our build, if there is an impact the following commands may help you
View status: Sudo service iptables status on
: sudo service iptables start
off: sudo service iptables stop
Configure a password-free login
Execute the following command in the node machine
SSH-KEYGEN-T RSA
Append the information in the/home/work/.ssh/id_rsa.pub to the ~/.ssh/authorized_keys in the master machine, and all the node machines are id_rsa.pub to append the information to the master machine.
~/.ssh/authorized_keys are copied to all other machines.
SCP ~/.ssh/authorized_keys work@a.213:~/.ssh/
SCP ~/.ssh/authorized_keys work@a.214:~/.ssh/
SCP ~/.ssh/ Authorized_keys work@a.215:~/.ssh/
SCP ~/.ssh/authorized_keys work@a.216:~/.ssh/
SCP ~/.ssh/authorized_ Keys work@a.217:~/.ssh/
SCP ~/.ssh/authorized_keys work@a.218:~/.ssh/
SCP ~/.ssh/authorized_keys work@A.219 : ~/.ssh/
SCP ~/.ssh/authorized_keys work@a.220:~/.ssh/
SCP ~/.ssh/authorized_keys work@a.221:~/.ssh/
Installing hadoop2.7.3
Enter the spark00 machine (Master master machine)
Installation directory:/home/work/hadoop
Enter/home/work/soft to execute the following command:
TAR-ZXVF hadoop-2.7.3.tar.gz
cp-r hadoop-2.7.3/*/home/work/hadoop/
The Hadoop file structure is as follows
Configuring the Hadoop environment variables
Configure the following information in/etc/profile (NNA and NNS two machines)
Export Hadoop_home=/home/work/hadoop export
hadoop_conf_dir=${hadoop_home}/etc/hadoop
export yarn_home=/ Home/work/hadoop
export yarn_conf_dir=${yarn_home}/etc/hadoop
path= $JAVA _home: $PATH: $HADOOP _home/bin
Use Source/etc/profile to make the environment variable effective by using the Echo $HADOOP _home to view the environment variable information.
Hadoop Configuration
To enter/home/work/hadoop/etc/hadoop, the files that need to be configured are in this directory. Due to the experimental phase, I first used 1 main 1 to prepare 3slave structure.
We defined some folders before we configured them.
Mkdir-p/home/work/tmp
mkdir-p/home/work/data/tmp/journal
mkdir-p/home/work/data/dfs/namenode
mkdir -p/home/work/data/dfs/datanode
mkdir-p/home/work/data/yarn/local
mkdir-p/home/work/log/yarn
hadoop-env.sh Configuration
JDK paths used by the Hadoop environment
Export java_home=/usr/java/jdk1.8.0_65/
yarn-env.sh Configuration
The JDK path used by the yarn environment
Export java_home=/usr/java/jdk1.8.0_65/
Slaves Configuration
Configure node Information
Ahost.02.name
ahost.03.name
ahost.04.name
core-site.xml Configuration
<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "text/xsl" href= " Configuration.xsl "?> <configuration><property> <name>fs.defaultFS</name> <v Alue>hdfs://cluster1</value> </property><property><name>io.file.buffer.size</name ><value>131072</value></property> <property> <name>hadoop.tmp.dir</name>
; <value>/home/work/tmp</value> </property> <property><name> Hadoop.proxyuser.hadoop.hosts</name><value>*</value></property> <property>< Name>hadoop.proxyuser.hadoop.groups</name><value>*</value></property> <property ><name>ha.zookeeper.quorum</name><value>zookeeper.waimai.baidu.com:2181/waimai/inf/ Spark-yarn</value></property> </configuration>
hdfs-site.xml Configuration
Hdfs-site.xml Configuration Explanation
<?xml version= "1.0" encoding= "UTF-8"?> <?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?> <
configuration> <property> <name>dfs.nameservices</name> <value>cluster1</value> </property> <property> <name>dfs.ha.namenodes.cluster1</name> <value>nna,nns<
;/value> </property> <property> <name>dfs.namenode.rpc-address.cluster1.nna</name> <value>AHOST.00.name:9000</value> </property> <property> <name>dfs.namenode.rpc-add
ress.cluster1.nns</name> <value>AHOST.01.name:9000</value> </property> <property> <name>dfs.namenode.http-address.cluster1.nna</name> <value>AHOST.00.name:50070</value> & lt;/property> <property> <name>dfs.namenode.http-address.cluster1.nns</name> <value>a
Host.01.name:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjourna l://ahost.02.name:8485; ahost.03.name:8485; ahost.04.name:8485/cluster1</value> </property> <property> <name>dfs.client.failover.prox Y.provider.cluster1</name> <value> Org.apache.hadoop.hdfs.server.namenode.ha.configuredfailoverproxyprovider</value> </property> < property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </PROPERTY&G
T <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/work/.ssh/id_ rsa</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <va lue>/home/work/data/tmp/journal</value> </property> <property> <name>dfs.ha.automatic- Failover.enabled</name> <value>true</value&Gt </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/work/data/d fs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> &L t;value>/home/work/data/dfs/datanode</value> </property> <property> <name>dfs.replicat ion</name> <value>3</value> </property> <property> <name>dfs.webhdfs.enabl ed</name> <value>true</value> </property> <property> <name>dfs.journalnode .http-address</name> <value>0.0.0.0:8480</value> </property> <property> <name >dfs.journalnode.rpc-address</name> <value>0.0.0.0:8485</value> </property> <proper Ty> <name>ha.zookeeper.quorum</name> <value>zookeeper.waimai.baidu.com:2181/waimai/inf/
Spark-yarn</value> </property> </configuration>
mapred-site.xml Configuration
Copy an mapred-site.xml file from the template first
CP Mapred-site.xml.template Mapred-site.xml
<?xml version= "1.0"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>
< configuration>
<property>
<name>mapreduce.framework.name</name>
<value> yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address< /name>
<value>AHOST.00.name:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>ahost.00.name:19888</ value>
</property>
</configuration>
yarn-site.xml Configuration
<?xml version= "1.0"?> <configuration> <property> <name>yarn.resou rcemanager.ha.enabled</name> <value>true</value> </property> <property> <name >yarn.resourcemanager.cluster-id</name> <value>yarn-cluster</value> </property> <PR operty> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </prope rty> <property> <name>yarn.resourcemanager.hostname.rm1</name> <VALUE>AHOST.00.NAME&L t;/value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> < ;value>ahost.01.name</value> </property> <property> <<