The construction of Hadoop distributed cluster

Source: Internet
Author: User
Tags mkdir tmp folder zookeeper nameserver
Hadoop2.0 has released a stable version, adding a lot of features, such as HDFs HA, yarn, and so on. The newest hadoop-2.4.1 also adds yarn HA
Note: The hadoop-2.4.1 installation package provided by Apache is compiled on a 32-bit operating system because Hadoop relies on some C + + local libraries, so if you install hadoop-2.4.1 on a 64-bit operation you need to re-compile on the 64 operating system (It is recommended that the first installation with 32-bit system, I will compile the 64-bit is also uploaded to the group share, if interested can compile their own)
Pre-preparation is not detailed, the class is introduced 1. Modify the Linux hostname 2. Modify IP 3. Modify the mapping relationship between host name and IP ##### #注意 ##### #如果你们公司是租用的服务器或是使用的云主机 (such as Huawei host, Alibaba Cloud host, etc.)/etc/ Hosts inside to configure is the network IP address and host name Mapping Relationship 4. Turn off firewall 5.ssh free login 6. Install JDK, configure environment variables, etc.
Cluster Planning: Host name IP installed software running process weekend01 jdk, Hadoop NameNode, Dfszkfailovercontroller (ZKFC) weekend02 JDK, Hadoop NameNode, Dfszkfailovercontroller (ZKFC) weekend03 jdk, Hadoop ResourceManager Weekend04 jdk, Hadoop ResourceManager weekend05 jdk, Hadoop, zookeeper DataNode, NodeManager, Journalnode, Quorumpeermain weekend06 jdk, Hadoop, zookeeper DataNode, NodeManager, Journalnode, Quorumpeermain weekend07 jdk, Hadoop, zookeeper DataNode, NodeManager, Journalnode, Quorumpeermain
Description: 1. In hadoop2.0, it is usually composed of two namenode, one in active state and the other in standby state. Active Namenode provides services externally, while standby Namenode does not provide services to the outside, synchronizing only the state of active namenode so that it can switch quickly when it fails. Hadoop2.0 officially provides two types of HDFs ha solutions, one for NFS and the other for QJM. Here we use the simple QJM. In this scenario, the primary and standby Namenode synchronize metadata information between a set of Journalnode, and a single piece of data is considered successful if it is successfully written to most journalnode. Typically configure an odd number of journalnode there is also a zookeeper cluster configured for ZKFC (Dfszkfailovercontroller) failover, which automatically switches when active Namenode is hung out standby Namenode for standby State 2. Hadoop-2.2.0 still has a problem, that is, there is only one ResourceManager, there is a single point of failure, hadoop-2.4.1 solve the problem, there are two ResourceManager, one is active, one is standby, state by Zookeep Er for coordinated installation steps: 1. Installation configuration Zooekeeper cluster (on WEEKEND05) 1.1 decompression TAR-ZXVF zookeeper-3.4.5.tar.gz-c/weekend/1.2 Modify Configuration Cd/weekend/zook EEPER-3.4.5/CONF/CP zoo_sample.cfg zoo.cfg Vim zoo.cfg modified: Datadir=/weekend/zookeeper-3.4.5/tmp added at last: server.1= weekend05:2888:3888 server.2=weekend06:2888:3888 server.3=weekend07:2888:3888 Save exit and then create a TMP folder mkdir/weekend/ ZOOKEEPER-3.4.5/TMP Create an empty file Touch/weekend/zookeeper-3.4.5/tmp/myid finally write ID to the file echo 1 >/weekend/zookeeper-3.4.5/ Tmp/myid 1.3 Copy the configured zookeeper to the other nodes (first create a weekend directory under the weekend06, weekend07 root directory: mkdir/weekend) scp-r/weekend/zookeeper-3.4.5/ Weekend06:/weekend/scp-r/weekend/zookeeper-3.4.5/weekend07:/weekend/
Note: Modify WEEKEND06, weekend07 corresponding/weekend/zookeeper-3.4.5/tmp/myid content Weekend06:echo 2 >/weekend/zookeeper-3.4.5/tmp/ myID Weekend07:echo 3 >/weekend/zookeeper-3.4.5/tmp/myid
2. Install configure Hadoop cluster (operate on WEEKEND01) 2.1 Unzip TAR-ZXVF hadoop-2.4.1.tar.gz-c/weekend/2.2 Configure HDFs (hadoop2.0 All profiles are in $hadoop_ Home/etc/hadoop directory) #将hadoop添加到环境变量中 vim/etc/profile export java_home=/usr/java/jdk1.7.0_55 export hadoop_home=/ weekend/hadoop-2.4.1 export path= $PATH: $JAVA _home/bin: $HADOOP _home/bin
The configuration files for #hadoop2.0 are all under $hadoop_home/etc/hadoop cd/home/hadoop/app/hadoop-2.4.1/etc/hadoop
2.2.1 Modify HADOOP-ENV.SH Export java_home=/home/hadoop/app/jdk1.7.0_55
2.2.2 Modify Core-site.xml <configuration> <!--specify HDFs nameservice to ns1---<property> <name> The default file system used in the Fs.defaultfs</name> collection <value>hdfs://ns1/</value> now changes to nameserver location </property> <!--Specify Hadoop temp directory-<property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/app/ Hadoop-2.4.1/tmp</value>//refers to each Hadoop working directory. If this node is Namenode, a directory of name will be created under this directory, and if this node is Datanode, a directory of data will be created in this directory </property>
<!--Specify zookeeper address--Specify address to zookeeper <property> <name>ha.zookeeper.quorum</name> < Value>weekend05:2181,weekend06:2181,weekend07:2181</value> can be added after the machine is more </property> </ Configuration>
2.2.3 Modify Hdfs-site.xml <configuration> <!--specifies that HDFs Nameservice is ns1 and needs to be consistent in Core-site.xml--< Property> <name>dfs.nameservices</name>//define the name of a nameservice <value>ns1</value>// If you have multiple nameserver can be added "," Add Ns1,ns2 </property> <!--ns1 Below are two namenode, respectively nn1,nn2--<property> <name>dfs.ha.namenodes.ns1</name>//This is the property of configuration nameserver <value>nn1,nn2</value>// There are two namenode under the nameserver. You can take two IDs. This is the logical ID and the system does not know which two hosts it points to. Below to define the specific </property>
<!--NN1 RPC communication address--<property> <name>dfs.namenode.rpc-address.ns1.nn1</name>// Define the communication address for RPC for NN1 <value>weekend01:9000</value>//Specify a host </property>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.