Apache Hadoop 2.2.0 HDFS HA + yarn multi-Machine deployment

Last Update:2018-08-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

To deploy the logical schema:

HDFS HA Deployment Physical architecture
Attention: Journalnode uses very few resources, even in the actual production environment, but also Journalnode and Datanode deployed on the same machine; in the production environment, it is recommended that the main standby namenode each individual machine. Yarn Deployment Schema:

Personal Experiment Environment deployment diagram：

Ubuntu12 32bit Apache Hadoop 2.2.0 jdk1.7 preparatory work: 1. In 4 machines are equipped with hosts; 2. Configure the Namenode node can be password-free login to all the rest of the nodes, only one-way access-free login can be, without bidirectional; password-free login is only used when starting and stopping the cluster. 3. Install JDK 4. Create a dedicated account, do not use the root account to deploy, manage Hadoop Deploying Hadoop: The first step: the Hadoop installation package to each node (can be extracted to a node, and then complete the 2nd step after the configuration, and then the SCP copy to the remaining node) of the fixed directory (each node directory unified), such as/home/yarn/hadoop/hadoop-2.2.0 Step Two: Modify the configuration file (just configure it on one node and then distribute it to the remaining nodes with the SCP) config file path: etc/hadoop/ hadoop-env.shModify the JDK path, search the file for the following line, and set Java_home to the JDK installation path: # The Java implementation to use. Export Java_home=/usr/lib/jvm/java-6-sun Core-site.xmlSpecifies the host name/IP and port number of the active Namenode, and the port number can be modified to suit your needs: <configuration> <property> <name>fs.defaultFS< /name> <value>hdfs://SY-0217:8020</value> </property> </configuration> Note: The above configuration of the SY-0217 is fixed host, only for manual switching of the main standby namenode scene, if you need to automatically switch through zookeeper, you need to configure the logical name, followed by the details. Mapred-site.xml
<configuration>  <property> <name> Mapreduce.framework.name</name> <value>yarn</value> <description>the Runtime framework for Executing MapReduce jobs. Can be one of the local, classic or yarn. </description> </property>
 <property> <name>mapreduce. jobhistory.address</name> <value>SY-0355:10020</value> <description>mapreduce jobhistory Server IPC Host:port</description> </property>
<property> <name>mapreduce. jobhistory.webapp.address</name> <value>SY-0355:19888</value> <description>mapreduce jobhistory Server Web UI host:port</description> </property> </configuration> Hdfs-site.xmlVery critical configuration file. <configuration>
<property> <name>dfs.nameservices</name> <value>hadoop-test</value> <description > Specifies the name of the namespace, optionally named comma-separated List of nameservices. </description> </property>
<property> <name>dfs.ha.namenodes.hadoop-test</name> <value>nn1,nn2</value> < description> Specifies the Namenode logical name under the namespace the prefix for a given nameservice, contains a comma-separated list of Nam Enodes for a given nameservice (eg Examplenameservice). </description> </property>
<property> <name>dfs.namenode.rpc-address.hadoop-test.nn1</name> <value>SY-0217:8020< /value> <description> is the name of the namespace. Namenode logical name "Configure RPC address RPC addresses for nomenode1 of Hadoop-test </description> </property>
<property> <name>dfs.namenode.rpc-address.hadoop-test.nn2</name> <value>SY-0355:8020< /value> <description> is the name of the namespace. Namenode logical name "Configure RPC address RPC addresses for nomenode2 of Hadoop-test </description> </property>
<property> <name>dfs.namenode.http-address.hadoop-test.nn1</name> <value>sy-0217:50070 </value> <description> is the name of the namespace. Namenode logical Name "Configure the HTTP address and the base port where the Dfs Namenode1 Web UI would listen on. </description> </property>
<property> <name>dfs.namenode.http-address.hadoop-test.nn2</name> <value>sy-0355:50070 </value> <description> is the name of the namespace. Namenode logical Name "Configure the HTTP address and the base port where the Dfs Namenode2 Web UI would listen on. </description> </property>
<property> <name>dfs.namenode.name.dir</name> <value>file:///home/dongxicheng/hadoop/ Hdfs/name</value> <description> Configure the path for storing namenode metadata, and if there are multiple hard drives on the machine, it is recommended to configure multiple paths, separated by commas. Determines where on the local filesystem the DFS name node should store the name table (fsimage). If This is a comma-delimited list of directories then the name of the directories, For redundancy. </description> </property>
<property> <name>dfs.datanode.data.dir</name> <value>file:///home/dongxicheng/hadoop/ hdfs/data</value> <description> Configure Datanode data storage path; If there are multiple hard drives on the machine, it is recommended to configure multiple paths, separated by commas. Determines where on the local filesystem a DFS data node should store its blocks. If This is a comma-delimited list of directories, then data would be stored in all named directories, typically on diff Erent devices. Directories that does not exist are ignored. </description> </property>
<property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://SY-0355:8485; sy-0225:8485; sy-0226:8485/hadoop-journal</value> <description> configuration Journalnode, including three parts: (1) qjournal is an agreement, without modification; (2) Then is the three deployment Journalnode host HOST/IP: Port, the three machines separated by semicolons, (3) The final hadoop-journal is the Journalnode namespace, can be named randomly. A directory on shared storage between the multiple namenodes in A HA cluster. This directory is written by the "active" and "read by" the standby in order to keep the namespaces synchronized. This directory does not need to is listed in Dfs.namenode.edits.dir above. It should be left empty in a non-ha cluster. </description> </property>
<property> <name>dfs.journalnode.edits.dir</name> <value>/home/dongxicheng/hadoop/hdfs/ Journal/</value>

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Apache Hadoop 2.2.0 HDFS HA + yarn multi-Machine deployment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Apache Hadoop 2.2.0 HDFS HA + yarn multi-Machine deployment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support