Build a high-availability cluster of Hadoop2.7.1 Spark1.7 on a Single Machine Based on Docker

Source: Internet
Author: User

Build a high-availability cluster of Hadoop2.7.1 Spark1.7 on a Single Machine Based on Docker

Build a high-availability cluster of Hadoop2.7.1 Spark1.7 on a Single Machine Based on Docker

Get Ubuntu Image

Sudo docker pull ubuntu

Download spark1.7 hadoop2.7.1 scala1.1 zookeeper3.4.6 jdk1.8, decompress the package, and place the local folder for mounting to the container.

And create a file under the folder

Authorized_keys

Hosts

In this example, use/home/docker/config

Start container

Sudo docker run -- name installspark-v/home/docker/config/:/config-it ubuntu: 14.04

Install
After startup, you can view the installation file in the/config folder of the container.

Install jdk, scala:

Vim ~ /. Bashrc

Append:

/Usr/sbin/sshd
Cat/config/hosts>/etc/hosts
Cat/config/authorized_keys>/root/. ssh/authorized_keys
Export JAVA_HOME =/usr/lib/jvm/java-8-sun
Export PATH =$ {JAVA_HOME}/bin: $ PATH
Export HADOOP_HOME =/opt/hadoop
Export PATH =$ {HADOOP_HOME}/bin: $ PATH
Export SCALA_HOME =/opt/scala
Export PATH =$ {SCALA_HOME}/bin: $ PATH
Export SPARK_HOME =/opt/spark
Export PATH =$ {SPARK_HOME}/bin: $ PATH

Copy spark/hadoop/zookeeper to/opt

Install hadoop:

Create a folder:/opt/hadoop/namenode/opt/hadoop/datanode/opt/hadoop/tmp/opt/hadoop/journal

Root @ nn1:/opt/hadoop/etc/hadoop # vim hadoop-env.sh.

Modify:

Export JAVA_HOME =/usr/lib/jvm/java-8-sun

Root @ nn1:/opt/hadoop/etc/hadoop # vim core-site.xml.

Add:

<Property>
<Name> fs. defaultFS </name>
<Value> hdfs: // ns1 </value>
</Property>
<Property>
<Name> hadoop. tmp. dir </name>
<Value>/opt/hadoop/tmp </value>
</Property>
<Property>
<Name> ha. zookeeper. quorum </name>
<Value> dnzk1: 2181, dnzk2: 2181, dnzk3: 2181 </value>
</Property>

Root @ nn1:/opt/hadoop/etc/hadoop # vim hdfs-site.xml.
Add:

<Property>
<Name> dfs. datanode. data. dir </name>
<Value> file:/opt/hadoop/datanode </value>
</Property>
<Property>
<Name> dfs. namenode. name. dir </name>
<Value> file:/opt/hadoop/namenode </value>
</Property>
<Property>
<Name> dfs. nameservices </name>
<Value> ns1 </value>
</Property>
<Property>
<Name> dfs. ha. namenodes. ns1 </name>
<Value> nn1, nn2 </value>
</Property>
<Property>
<Name> dfs. namenode. rpc-address.ns1.nn1 </name>
<Value> nn1: 9000 </value>
</Property>
<Property>
<Name> dfs. namenode. http-address.ns1.nn1 </name>
<Value> nn1: 50070 </value>
</Property>
<Property>
<Name> dfs. namenode. rpc-address.ns1.nn2 </name>
<Value> nn2: 9000 </value>
</Property>
<Property>
<Name> dfs. namenode. http-address.ns1.nn2 </name>
<Value> nn2: 50070 </value>
</Property>
<Property>
<Name> dfs. namenode. shared. edits. dir </name>
<Value> qjournal: // dnzk1: 8485; dnzk2: 8485; dnzk3: 8485/ns1 </value>
</Property>
<Property>
<Name> dfs. journalnode. edits. dir </name>
<Value>/opt/hadoop/journal </value>
</Property>
<Property>
<Name> dfs. journalnode. http-address </name>
<Value> 0.0.0.0: 8480 </value>
</Property>
<Property>
<Name> dfs. journalnode. rpc-address </name>
<Value> 0.0.0.0: 8485 </value>
</Property>
<Property>
<Name> dfs. ha. automatic-failover.enabled </name>
<Value> true </value>
</Property>
<Property>
<Name> dfs. client. failover. proxy. provider. ns1 </name>
<Value> org. apache. hadoop. hdfs. server. namenode. ha. ConfiguredFailoverProxyProvider </value>
</Property>
<Property>
<Name> dfs. ha. fencing. methods </name>
<Value>
Sshfence
Shell (/bin/true)
</Value>
</Property>
<Property>
<Name> dfs. ha. fencing. ssh. private-key-files </name>
<Value>/root/. ssh/id_rsa </value>
</Property>
<Property>
<Name> dfs. ha. fencing. ssh. connect-timeout </name>
<Value> 30000 </value>
</Property>
<Property>
<Name> dfs. permissions </name>
<Value> false </value>
</Property>

<Name> yarn. resourcemanager. store. class </name> <value> org. apache. hadoop. yarn. server. resourcemanager. recovery. ZKRMStateStore </value> </property> <name> yarn. resourcemanager. zk-address </name> <value> dnzk1: 2181, dnzk2: 2181, dnzk3: 2181 </value> </property> <name> yarn. nodemanager. aux-services </name> <value> mapreduce_shuffle </value> </property>

Root @ nn1:/opt/hadoop # vim/opt/hadoop/etc/hadoop/slaves

Add:

Dnzk1
Dnzk2
Dnzk3

Install spark
Root @ nn1:/opt/spark/conf # vim spark-env.sh.
Add:

Export SPARK_MASTER_IP = nn1
Export spark _ worker_memory = 256 m
Export JAVA_HOME =/usr/lib/jvm/java-8-sun
Export SCALA_HOME =/opt/scala
Export SPARK_HOME =/opt/spark
Export HADOOP_CONF_DIR =/opt/hadoop/etc/hadoop
Export SPARK_LIBRARY_PATH =$ $ SPARK_HOME/lib
Export SCALA_LIBRARY_PATH = $ SPARK_LIBRARY_PATH
Export SPARK_WORKER_CORES = 1
Export SPARK_WORKER_INSTANCES = 1
Export spark _ master_port = 7077

Root @ nn1:/opt/spark/conf # vim slaves

Add:

Install zookeeper

Create a folder/opt/zookeeper/tmp
Create a file/opt/zookeeper/tmp/myid
Echo 1>/opt/zookeeper/tmp/myid
Root @ nn1:/opt/zookeeper/conf # vim zoo. cfg

Modify

DataDir =/opt/zookeeper/tmp
Server.1 = dnzk1: 2888: 3888
Server.2 = dnzk2: 2888: 3888
Server.3 = dnzk3: 2888: 3888

Generate key

Ssh-keygen-t dsa

Append id_dsa.pub to the/home/docker/config/authorized_keys file of the host.

Root @ nn1:/opt/hadoop # cat ~ /. Ssh/id_dsa.pub

Run

Sudo docker commit-m "namenode1" installspark ubuntu: ns1

Modify the local host/home/docker/config/hosts file
Add

172.17.0.11 nn1
172.17.0.12 nn2
172.17.0.13 rm1
172.17.0.14 rm2
172.17.0.15 dnzk1
172.17.0.16 dnzk2
172.17.0.17 dnzk3

Start docker

Sudo docker run -- name dnzk1-h dnzk1 -- net = none-p 2185: 2181-p 50075: 50070-p 9005: 9000-p 8485: 8485-p 7075: 7077-p 2885: 2888-v/home/docker/config/:/config-it spark1_7-hadoop2_7_1-scala1_1: basic
Sudo docker run -- name dnzk2-h dnzk2 -- net = none-p 2186: 2181-p 50076: 50070-p 9006: 9000-p 8486: 8485-p 7076: 7077-p 2886: 2888-v/home/docker/config/:/config-it spark1_7-hadoop2_7_1-scala1_1: basic
Sudo docker run -- name dnzk3-h dnzk3 -- net = none-p 2186: 2181-p 50076: 50070-p 9006: 9000-p 8486: 8485-p 7076: 7077-p 2887: 2888-v/home/docker/config/:/config-it spark1_7-hadoop2_7_1-scala1_1: basic
Sudo docker run -- name nn1-h nn1 -- net = none-p 2181: 2181-p 50071: 50070-p 9001: 9000-p 8481: 8485-p 7071: 7077-p 2881: 2888-v/home/docker/config/:/config-it spark1_7-hadoop2_7_1-scala1_1: basic
Sudo docker run -- name nn2-h nn2 -- net = none-p 2182: 2181-p 50072: 50070-p 9002: 9000-p 8482: 8485-p 7072: 7077-p 2882: 2888-v/home/docker/config/:/config-it spark1_7-hadoop2_7_1-scala1_1: basic
Sudo docker run -- name rm1-h rm1 -- net = none-p 2183: 2181-p 50073: 50070-p 9003: 9000-p 8483: 8485-p 7073: 7077-p 2883: 2888-v/home/docker/config/:/config-it spark1_7-hadoop2_7_1-scala1_1: basic
Sudo docker run -- name rm2-h rm2 -- net = none-p 2184: 2181-p 50074: 50070-p 9004: 9000-p 8484: 8485-p 7074: 7077-p 2884: 2888-v/home/docker/config/:/config-it spark1_7-hadoop2_7_1-scala1_1: basic

Dnzk2 (execute echo 2>/opt/zookeeper/tmp/myid), dnzk2 (execute echo 3>/opt/zookeeper/tmp/myid)

Configure the network

Sudo pipework docker0-I eth0 nn1 172.17.0.11/16
Sudo pipework docker0-I eth0 nn2 172.17.0.12/16
Sudo pipework docker0-I eth0 rm1 172.17.0.13/16
Sudo pipework docker0-I eth0 rm2 172.17.0.14/16
Sudo pipework docker0-I eth0 dnzk1 172.17.0.15/16
Sudo pipework docker0-I eth0 dnzk2 172.17.0.16/16
Sudo pipework docker0-I eth0 dnzk3 172.17.0.17/16

Start a hadoop Cluster
Start zookeeper and hadoop journal on dnzk1/dnzk2/dnzk3

/Opt/zookeeper/bin/zkServer. sh start
/Opt/hadoop/sbin/hadoop-daemon.sh start journalnode

Format zookeeper startup and format hadoop on nn1

 

/Opt/hadoop/bin/hdfs namenode-format

/Opt/hadoop/bin/hdfs namenode-format

Scp-r/opt/hadoop/namenode/nn2:/opt/hadoop/

Or

/Opt/hadoop/bin/hdfs namenode-bootstrapStandby

/Opt/hadoop/bin/hdfs zkfc-formatZK

/Opt/hadoop/sbin/start-dfs.sh

Start yarn on rm1

/Opt/hadoop/sbin/start-yarn.sh

Start on rm2

// Opt/hadoop/sbin/yarn-daemon.sh start resourcemanager

Start spark

// Opt/spark/sbin/start-all.sh

View:
Http: // MAID: 50070 (active)
Http: // 172.17.0.12: 50070 (standbyte)

Cluster service status after startup

Nn1 172.17.0.11 jdk, hadoop NameNode, DFSZKFailoverController (zkfc)
Nn2 172.17.0.12 jdk, hadoop NameNode, DFSZKFailoverController (zkfc)
Rm1 172.17.0.13 jdk, hadoop ResourceManager
Rm2 172.17.0.14 jdk, hadoop ResourceManager
Dnzk1 172.17.0.15 jdk, hadoop, zookeeper DataNode, NodeManager, JournalNode, QuorumPeerMain
Dnzk2 172.17.0.16 jdk, hadoop, zookeeper DataNode, NodeManager, JournalNode, QuorumPeerMain
Dnzk3 172.17.0.17 jdk, hadoop, zookeeper DataNode, NodeManager, JournalNode, QuorumPeerMain

You may also like the following articles about Hadoop:

Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04

Install and configure Hadoop2.2.0 on CentOS

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.