Build a high-availability cluster of Hadoop2.7.1 Spark1.7 on a Single Machine Based on Docker
Build a high-availability cluster of Hadoop2.7.1 Spark1.7 on a Single Machine Based on Docker
Get Ubuntu Image
Sudo docker pull ubuntu
Download spark1.7 hadoop2.7.1 scala1.1 zookeeper3.4.6 jdk1.8, decompress the package, and place the local folder for mounting to the container.
And create a file under the folder
Authorized_keys
Hosts
In this example, use/home/docker/config
Start container
Sudo docker run -- name installspark-v/home/docker/config/:/config-it ubuntu: 14.04
Install
After startup, you can view the installation file in the/config folder of the container.
Install jdk, scala:
Vim ~ /. Bashrc
Append:
/Usr/sbin/sshd
Cat/config/hosts>/etc/hosts
Cat/config/authorized_keys>/root/. ssh/authorized_keys
Export JAVA_HOME =/usr/lib/jvm/java-8-sun
Export PATH =$ {JAVA_HOME}/bin: $ PATH
Export HADOOP_HOME =/opt/hadoop
Export PATH =$ {HADOOP_HOME}/bin: $ PATH
Export SCALA_HOME =/opt/scala
Export PATH =$ {SCALA_HOME}/bin: $ PATH
Export SPARK_HOME =/opt/spark
Export PATH =$ {SPARK_HOME}/bin: $ PATH
Copy spark/hadoop/zookeeper to/opt
Install hadoop:
Create a folder:/opt/hadoop/namenode/opt/hadoop/datanode/opt/hadoop/tmp/opt/hadoop/journal
Root @ nn1:/opt/hadoop/etc/hadoop # vim hadoop-env.sh.
Modify:
Export JAVA_HOME =/usr/lib/jvm/java-8-sun
Root @ nn1:/opt/hadoop/etc/hadoop # vim core-site.xml.
Add:
<Property>
<Name> fs. defaultFS </name>
<Value> hdfs: // ns1 </value>
</Property>
<Property>
<Name> hadoop. tmp. dir </name>
<Value>/opt/hadoop/tmp </value>
</Property>
<Property>
<Name> ha. zookeeper. quorum </name>
<Value> dnzk1: 2181, dnzk2: 2181, dnzk3: 2181 </value>
</Property>
Root @ nn1:/opt/hadoop/etc/hadoop # vim hdfs-site.xml.
Add:
<Property>
<Name> dfs. datanode. data. dir </name>
<Value> file:/opt/hadoop/datanode </value>
</Property>
<Property>
<Name> dfs. namenode. name. dir </name>
<Value> file:/opt/hadoop/namenode </value>
</Property>
<Property>
<Name> dfs. nameservices </name>
<Value> ns1 </value>
</Property>
<Property>
<Name> dfs. ha. namenodes. ns1 </name>
<Value> nn1, nn2 </value>
</Property>
<Property>
<Name> dfs. namenode. rpc-address.ns1.nn1 </name>
<Value> nn1: 9000 </value>
</Property>
<Property>
<Name> dfs. namenode. http-address.ns1.nn1 </name>
<Value> nn1: 50070 </value>
</Property>
<Property>
<Name> dfs. namenode. rpc-address.ns1.nn2 </name>
<Value> nn2: 9000 </value>
</Property>
<Property>
<Name> dfs. namenode. http-address.ns1.nn2 </name>
<Value> nn2: 50070 </value>
</Property>
<Property>
<Name> dfs. namenode. shared. edits. dir </name>
<Value> qjournal: // dnzk1: 8485; dnzk2: 8485; dnzk3: 8485/ns1 </value>
</Property>
<Property>
<Name> dfs. journalnode. edits. dir </name>
<Value>/opt/hadoop/journal </value>
</Property>
<Property>
<Name> dfs. journalnode. http-address </name>
<Value> 0.0.0.0: 8480 </value>
</Property>
<Property>
<Name> dfs. journalnode. rpc-address </name>
<Value> 0.0.0.0: 8485 </value>
</Property>
<Property>
<Name> dfs. ha. automatic-failover.enabled </name>
<Value> true </value>
</Property>
<Property>
<Name> dfs. client. failover. proxy. provider. ns1 </name>
<Value> org. apache. hadoop. hdfs. server. namenode. ha. ConfiguredFailoverProxyProvider </value>
</Property>
<Property>
<Name> dfs. ha. fencing. methods </name>
<Value>
Sshfence
Shell (/bin/true)
</Value>
</Property>
<Property>
<Name> dfs. ha. fencing. ssh. private-key-files </name>
<Value>/root/. ssh/id_rsa </value>
</Property>
<Property>
<Name> dfs. ha. fencing. ssh. connect-timeout </name>
<Value> 30000 </value>
</Property>
<Property>
<Name> dfs. permissions </name>
<Value> false </value>
</Property>
<Name> yarn. resourcemanager. store. class </name> <value> org. apache. hadoop. yarn. server. resourcemanager. recovery. ZKRMStateStore </value> </property> <name> yarn. resourcemanager. zk-address </name> <value> dnzk1: 2181, dnzk2: 2181, dnzk3: 2181 </value> </property> <name> yarn. nodemanager. aux-services </name> <value> mapreduce_shuffle </value> </property>
Root @ nn1:/opt/hadoop # vim/opt/hadoop/etc/hadoop/slaves
Add:
Dnzk1
Dnzk2
Dnzk3
Install spark
Root @ nn1:/opt/spark/conf # vim spark-env.sh.
Add:
Export SPARK_MASTER_IP = nn1
Export spark _ worker_memory = 256 m
Export JAVA_HOME =/usr/lib/jvm/java-8-sun
Export SCALA_HOME =/opt/scala
Export SPARK_HOME =/opt/spark
Export HADOOP_CONF_DIR =/opt/hadoop/etc/hadoop
Export SPARK_LIBRARY_PATH =$ $ SPARK_HOME/lib
Export SCALA_LIBRARY_PATH = $ SPARK_LIBRARY_PATH
Export SPARK_WORKER_CORES = 1
Export SPARK_WORKER_INSTANCES = 1
Export spark _ master_port = 7077
Root @ nn1:/opt/spark/conf # vim slaves
Add:
Install zookeeper
Create a folder/opt/zookeeper/tmp
Create a file/opt/zookeeper/tmp/myid
Echo 1>/opt/zookeeper/tmp/myid
Root @ nn1:/opt/zookeeper/conf # vim zoo. cfg
Modify
DataDir =/opt/zookeeper/tmp
Server.1 = dnzk1: 2888: 3888
Server.2 = dnzk2: 2888: 3888
Server.3 = dnzk3: 2888: 3888
Generate key
Ssh-keygen-t dsa
Append id_dsa.pub to the/home/docker/config/authorized_keys file of the host.
Root @ nn1:/opt/hadoop # cat ~ /. Ssh/id_dsa.pub
Run
Sudo docker commit-m "namenode1" installspark ubuntu: ns1
Modify the local host/home/docker/config/hosts file
Add
172.17.0.11 nn1
172.17.0.12 nn2
172.17.0.13 rm1
172.17.0.14 rm2
172.17.0.15 dnzk1
172.17.0.16 dnzk2
172.17.0.17 dnzk3
Start docker
Sudo docker run -- name dnzk1-h dnzk1 -- net = none-p 2185: 2181-p 50075: 50070-p 9005: 9000-p 8485: 8485-p 7075: 7077-p 2885: 2888-v/home/docker/config/:/config-it spark1_7-hadoop2_7_1-scala1_1: basic
Sudo docker run -- name dnzk2-h dnzk2 -- net = none-p 2186: 2181-p 50076: 50070-p 9006: 9000-p 8486: 8485-p 7076: 7077-p 2886: 2888-v/home/docker/config/:/config-it spark1_7-hadoop2_7_1-scala1_1: basic
Sudo docker run -- name dnzk3-h dnzk3 -- net = none-p 2186: 2181-p 50076: 50070-p 9006: 9000-p 8486: 8485-p 7076: 7077-p 2887: 2888-v/home/docker/config/:/config-it spark1_7-hadoop2_7_1-scala1_1: basic
Sudo docker run -- name nn1-h nn1 -- net = none-p 2181: 2181-p 50071: 50070-p 9001: 9000-p 8481: 8485-p 7071: 7077-p 2881: 2888-v/home/docker/config/:/config-it spark1_7-hadoop2_7_1-scala1_1: basic
Sudo docker run -- name nn2-h nn2 -- net = none-p 2182: 2181-p 50072: 50070-p 9002: 9000-p 8482: 8485-p 7072: 7077-p 2882: 2888-v/home/docker/config/:/config-it spark1_7-hadoop2_7_1-scala1_1: basic
Sudo docker run -- name rm1-h rm1 -- net = none-p 2183: 2181-p 50073: 50070-p 9003: 9000-p 8483: 8485-p 7073: 7077-p 2883: 2888-v/home/docker/config/:/config-it spark1_7-hadoop2_7_1-scala1_1: basic
Sudo docker run -- name rm2-h rm2 -- net = none-p 2184: 2181-p 50074: 50070-p 9004: 9000-p 8484: 8485-p 7074: 7077-p 2884: 2888-v/home/docker/config/:/config-it spark1_7-hadoop2_7_1-scala1_1: basic
Dnzk2 (execute echo 2>/opt/zookeeper/tmp/myid), dnzk2 (execute echo 3>/opt/zookeeper/tmp/myid)
Configure the network
Sudo pipework docker0-I eth0 nn1 172.17.0.11/16
Sudo pipework docker0-I eth0 nn2 172.17.0.12/16
Sudo pipework docker0-I eth0 rm1 172.17.0.13/16
Sudo pipework docker0-I eth0 rm2 172.17.0.14/16
Sudo pipework docker0-I eth0 dnzk1 172.17.0.15/16
Sudo pipework docker0-I eth0 dnzk2 172.17.0.16/16
Sudo pipework docker0-I eth0 dnzk3 172.17.0.17/16
Start a hadoop Cluster
Start zookeeper and hadoop journal on dnzk1/dnzk2/dnzk3
/Opt/zookeeper/bin/zkServer. sh start
/Opt/hadoop/sbin/hadoop-daemon.sh start journalnode
Format zookeeper startup and format hadoop on nn1
/Opt/hadoop/bin/hdfs namenode-format
/Opt/hadoop/bin/hdfs namenode-format
Scp-r/opt/hadoop/namenode/nn2:/opt/hadoop/
Or
/Opt/hadoop/bin/hdfs namenode-bootstrapStandby
/Opt/hadoop/bin/hdfs zkfc-formatZK
/Opt/hadoop/sbin/start-dfs.sh
Start yarn on rm1
/Opt/hadoop/sbin/start-yarn.sh
Start on rm2
// Opt/hadoop/sbin/yarn-daemon.sh start resourcemanager
Start spark
// Opt/spark/sbin/start-all.sh
View:
Http: // MAID: 50070 (active)
Http: // 172.17.0.12: 50070 (standbyte)
Cluster service status after startup
Nn1 172.17.0.11 jdk, hadoop NameNode, DFSZKFailoverController (zkfc)
Nn2 172.17.0.12 jdk, hadoop NameNode, DFSZKFailoverController (zkfc)
Rm1 172.17.0.13 jdk, hadoop ResourceManager
Rm2 172.17.0.14 jdk, hadoop ResourceManager
Dnzk1 172.17.0.15 jdk, hadoop, zookeeper DataNode, NodeManager, JournalNode, QuorumPeerMain
Dnzk2 172.17.0.16 jdk, hadoop, zookeeper DataNode, NodeManager, JournalNode, QuorumPeerMain
Dnzk3 172.17.0.17 jdk, hadoop, zookeeper DataNode, NodeManager, JournalNode, QuorumPeerMain
You may also like the following articles about Hadoop:
Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04
Install and configure Hadoop2.2.0 on CentOS
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition