Apache Hadoop 2.2.0 HDFS HA + yarn multi-Machine deployment

Source: Internet
Author: User
To deploy the logical schema:

HDFS HA Deployment Physical architecture
Attention: Journalnode uses very few resources, even in the actual production environment, but also Journalnode and Datanode deployed on the same machine; in the production environment, it is recommended that the main standby namenode each individual machine. Yarn Deployment Schema:

Personal Experiment Environment deployment diagram

Ubuntu12 32bit Apache Hadoop 2.2.0 jdk1.7 preparatory work: 1. In 4 machines are equipped with hosts; 2. Configure the Namenode node can be password-free login to all the rest of the nodes, only one-way access-free login can be, without bidirectional; password-free login is only used when starting and stopping the cluster. 3. Install JDK 4. Create a dedicated account, do not use the root account to deploy, manage Hadoop Deploying Hadoop: The first step: the Hadoop installation package to each node (can be extracted to a node, and then complete the 2nd step after the configuration, and then the SCP copy to the remaining node) of the fixed directory (each node directory unified), such as/home/yarn/hadoop/hadoop-2.2.0 Step Two: Modify the configuration file (just configure it on one node and then distribute it to the remaining nodes with the SCP) config file path: etc/hadoop/ hadoop-env.shModify the JDK path, search the file for the following line, and set Java_home to the JDK installation path: # The Java implementation to use. Export Java_home=/usr/lib/jvm/java-6-sun Core-site.xmlSpecifies the host name/IP and port number of the active Namenode, and the port number can be modified to suit your needs: <configuration> <property> <name>fs.defaultFS< /name> <value>hdfs://SY-0217:8020</value> </property> </configuration> Note: The above configuration of the SY-0217 is fixed host, only for manual switching of the main standby namenode scene, if you need to automatically switch through zookeeper, you need to configure the logical name, followed by the details. Mapred-site.xml
<configuration> <!--MR YARN application properties--> <property> <name> Mapreduce.framework.name</name> <value>yarn</value> <description>the Runtime framework for   Executing MapReduce jobs.   Can be one of the local, classic or yarn. </description> </property>
<!-- Jobhistory PropertiesJobhistory server, through which you can view information about applications that have run out. --> <property> <name>mapreduce. jobhistory.address</name> <value>SY-0355:10020</value> <description>mapreduce jobhistory Server IPC Host:port</description> </property>
<property> <name>mapreduce. jobhistory.webapp.address</name> <value>SY-0355:19888</value> <description>mapreduce jobhistory Server Web UI host:port</description> </property> </configuration> Hdfs-site.xmlVery critical configuration file. <configuration>
<property> <name>dfs.nameservices</name> <value>hadoop-test</value> <description   > Specifies the name of the namespace, optionally named comma-separated List of nameservices. </description> </property>
<property> <name>dfs.ha.namenodes.hadoop-test</name> <value>nn1,nn2</value> < description> Specifies the Namenode logical name under the namespace the prefix for a given nameservice, contains a comma-separated list of Nam   Enodes for a given nameservice (eg Examplenameservice). </description> </property>
<property> <name>dfs.namenode.rpc-address.hadoop-test.nn1</name> <value>SY-0217:8020< /value> <description> is the name of the namespace. Namenode logical name "Configure RPC address RPC addresses for nomenode1 of Hadoop-test </description> </property>
<property> <name>dfs.namenode.rpc-address.hadoop-test.nn2</name> <value>SY-0355:8020< /value> <description> is the name of the namespace. Namenode logical name "Configure RPC address RPC addresses for nomenode2 of Hadoop-test </description> </property>
<property> <name>dfs.namenode.http-address.hadoop-test.nn1</name> <value>sy-0217:50070 </value> <description> is the name of the namespace.   Namenode logical Name "Configure the HTTP address and the base port where the Dfs Namenode1 Web UI would listen on. </description> </property>
<property> <name>dfs.namenode.http-address.hadoop-test.nn2</name> <value>sy-0355:50070 </value> <description> is the name of the namespace.   Namenode logical Name "Configure the HTTP address and the base port where the Dfs Namenode2 Web UI would listen on. </description> </property>
<property> <name>dfs.namenode.name.dir</name> <value>file:///home/dongxicheng/hadoop/ Hdfs/name</value> <description> Configure the path for storing namenode metadata, and if there are multiple hard drives on the machine, it is recommended to configure multiple paths, separated by commas.  Determines where on the local filesystem the DFS name node should store the name table (fsimage). If This is a comma-delimited list of directories then the name of the directories, For redundancy. </description> </property>
<property> <name>dfs.datanode.data.dir</name> <value>file:///home/dongxicheng/hadoop/ hdfs/data</value> <description> Configure Datanode data storage path; If there are multiple hard drives on the machine, it is recommended to configure multiple paths, separated by commas.  Determines where on the local filesystem a DFS data node should store its blocks. If This is a comma-delimited list of directories, then data would be stored in all named directories, typically on diff   Erent devices.   Directories that does not exist are ignored. </description> </property>
<property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://SY-0355:8485; sy-0225:8485; sy-0226:8485/hadoop-journal</value> <description> configuration Journalnode, including three parts: (1) qjournal is an agreement, without modification; (2) Then is the three deployment Journalnode host HOST/IP: Port, the three machines separated by semicolons, (3) The final hadoop-journal is the Journalnode namespace, can be named randomly. A directory on shared storage between the multiple namenodes in A HA cluster. This directory is written by the "active" and "read by" the standby in order to keep the namespaces synchronized. This directory does not need to is listed in Dfs.namenode.edits.dir above.   It should be left empty in a non-ha cluster. </description> </property>
<property> <name>dfs.journalnode.edits.dir</name> <value>/home/dongxicheng/hadoop/hdfs/ Journal/</value>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.