CDH5 Perfect manual configuration Process Improvement version

Last Update:2018-07-26 Source: Internet

Author: User

Tags manual failover shuffle zookeeper ssh

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, pre-installation: Operating system: CentOS 6.5 64-bit operating system environment: Jdk1.7.0_45 above, this time using jdk-7u55-linux-x64.tar.gz Master01 10.10.2.57 namenode node Master02 10.10.2.58 namenode node slave01:10.10.2.173 datanode node slave02:10.10.2.59 datanode node slave03:10.10.2.60 datanod E-node Note: Hadoop2.0 above is the JDK environment is 1.7,linux the JDK Uninstall, reinstall download Address: http://www.oracle.com/technetwork/java/javase/downloads/ index.html software version: hadoop-2.3.0-cdh5.1.0.tar.gz, zookeeper-3.4.5-cdh5.1.0.tar.gz download Address: http://archive.cloudera.com/ cdh5/cdh/5/Start Installation: Second, JDK installation 1, check whether to bring your own JDK Rpm-qa | grep JDK java-1.6.0-openjdk-1.6.0.0-1.45.1.11.1.el6.i686 2, uninstall your own JDK yum-y remove java-1.6.0-openjdk-1.6.0.0-1.45.1.11.1.el6.i686 3, install jdk-7u55-linux-x64.tar.gz Create folder Java under usr/directory, run under Java folder tar–  ZXVF jdk-7u55-linux-x64.tar.gz Extract to Java directory [Root@master01 java]# ls jdk1.7.0_55 Three, configuration environment variable Travel vi/etc/profile #/etc/profile # System wide environment and startup programs, for login Setup # Functions and aliases go IN/ETC/BASHRC export java_home= /USR/JAVA/JDK1.7.0_55 Export jre_home=/usr/java/jdk1.7.0_55/JRE Export Classpath=/usr/java/jdk1.7.0_55/lib export path= $JAVA _home/bin: $PATH save changes, run Source/etc/profile Reload environment variable Run java-version [Root@master01 java]# java-version java Version "1.7.0_55" Java (TM) SE Runtime Environment (Buil D 1.7.0_55-b13) Java HotSpot (TM) 64-bit Server VM (build 24.55-b03, Mixed mode) JDK configuration succeeded four, system configuration pre-prepared 5 machines, and configure IP shutdown firewall chkcon Fig iptables Off (permanent shutdown) Configure host name and Hosts file [Root@master01 java]# vi/etc/hosts 127.0.0.1 localhost localhost.localdomain loca Lhost4 localhost4.localdomain4:: 1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.10.2.57 m Aster01 10.10.2.58 master02 10.10.2.173 slave01 10.10.2.59 slave02 10.10.2.60 slave03 Configure different host names according to different machine IP 3, SSH No password authentication configuration because had The OOP process requires remote management of the Hadoop daemon, namenode nodes need to link each datanode node through SSH (Secure Shell), stop or start their processes, so SSH must be without a password,
So we have to make namenode nodes and Datanode nodes into a non-secret communication, the same datanode also need to configure the Namenode node without a password link. Configure on each machine: Vi/etc/ssh/sshd_config open rsaauthentication Yes # Enable RSA authentication, pubkeyauthentication Yes # Enable public key private key pairing authentication mode MastEr01: Run: ssh-keygen–t rsa–p "Do not enter the password directly enter the default in the/root/.ssh directory, cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys [ Root@master01. ssh]# ls authorized_keys id_rsa id_rsa.pub known_hosts slave01 Perform the same operation, and then master01/root/.ssh/directory of Id_rsa The. Pub is placed in the same directory as Slave01 Authorized_keys so SLAVE01 holds the Master01 public key and then directly SSH SLAVE01 test whether it can be connected to slave01 without a password, then slave01 on id_
Rsa.pub Append to Master01 authorized_keys, test whether SSH Master01 can connect directly to SLAVE01. [Root@master01 ~]# ssh slave01 last Login:tue-14:28:15 from Master01 [root@slave01 ~]# master01-master02 Mas
  
TER01-SLAVE01 master01-slave02 master01-slave03 master02-slave01 master02-slave02 Master02-slave03 perform the same operation.  V. Installing Hadoop to create a file directory/usr/local/cloud creating a folder data, storage, log files, haooop original file, zookeeper original file [root@slave01 cloud]# ls data Hadoop tar Zookeeper 5.1, configuration hadoop-env.sh into the/usr/local/cloud/hadoop/etc/hadoop directory configuration VI hadoop-env.sh HADOOP runtime environment load export Java_ Home=/usr/java/jdk1.7.0_55 5.2, configuration core-site.xml <!-hadoop.tmp.dir:hadoop Many paths are dependent on him, Namenode node that directory can not be deleted, otherwise you need toTo reformat-<property> <name>hadoop.tmp.dir</name> <value>/usr/local/cloud/data/hadoop/t Mp</value> </property> <!-This configuration file describes the URL of the cluster's Namenode node, where HA represents the default logical name, Each datanode node in the cluster needs to know the address of the Namenode and the data can be used-<property> <name>fs.defaultFS</name> <value& Gt;hdfs://zzg</value> </property> <!--the address and port of the zookeeper cluster, it's best to keep a base of at least 3--<property> <name >ha.zookeeper.quorum</name> <value>master01:2181,slave01:2181,slave02:2181</value> </ Property> (2) hdfs-site.xml configuration <!-hadoop namenode data storage directory, only for with Namenode, contains namenode system Information metadata information-< Property> <name>dfs.namenode.name.dir</name> <value>/usr/local/cloud/data/hadoop/dfs/nn< /value> </property> <!-datanode to store data to a local path, not every machine is the same, but the best way to manage it is the same--<property> <name& Gt;dfs.datanode.data.dir</name> <value>/usr/local/cloud/data/hadoop/dfs/dn</value> </property> <!-System file backup number, system default is 3--<property> <name>dfs.replication</name > <value>3</value> </property> <!--dfs.webhdfs.enabled to True, Otherwise, some commands cannot be used such as: Webhdfs liststatus-<property> <name>dfs.webhdfs.enabled</name> <value>t rue</value> </property> <!-Optional, turn off permissions bring some unnecessary hassle--<property> <name>dfs.permissions< /name> <value>false</value> </property> <!-Optional, turn off permissions bring some unnecessary hassle--<property> < name>dfs.permissions.enabled</name> <value>false</value> </property> <!-ha configuration--& lt;! -Set the logical name of the cluster--<property> <name>dfs.nameservices</name> <value>zzg</value> </pro Perty> namenode nodes in <!-hdfs federated cluster logical names--<property> <name>dfs.ha.namenodes.zzg</name> <v alue>nn1,nn2</value> </property> <!-hdfs NameNode logical name in RPC configuration, RPC simple understood as the transfer of files on a serialized file to be used--<property> <name>dfs.namenode.rpc-address.zzg.nn1</ name> <value>master01:9000</value> </property> <property> <name>dfs.namenode.rpc- address.zzg.nn2</name> <value>master02:9000</value> </property> <!-Configuring Hadoop page Access Port ports-- > <property> <name>dfs.namenode.http-address.zzg.nn1</name> <value>master01:50070</ value> </property> <property> <name>dfs.namenode.http-address.zzg.nn2</name> <value& Gt;master02:50070</value> </property> <!-Building Communications with Namenode-<property> <name>
Dfs.namenode.servicerpc-address.zzg.nn1</name> <value>master01:53310</value> </property> <property> <name>dfs.namenode.servicerpc-address.zzg.nn2</name> <value>master02:53310 </value> </property> <!-journalnode shared file cluster--&GT <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://master01:8485; Slave01:8485;slave02:8485/zzg</value> </property> <!-journalnode Share settings with Namenode-< Property> <name>dfs.journalnode.edits.dir</name> <value>/usr/local/cloud/data/hadoop/ha/ Journal</value> </property> <!-set up fault handling classes--<property> <name> Dfs.client.failover.proxy.provider.zzg</name> <value> Org.apache.hadoop.hdfs.server.namenode.ha.configuredfailoverproxyprovider</value> </property> <!- Turn on auto Switch-<property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</va lue> </property> <property> <name>ha.zookeeper.quorum</name> <value>maste r01:2181,slave01:2181,slave02:2181</value> </property> <!-using SSH for failover-<property> < Name>dfs.ha.fenciNg.methods</name> <value>sshfence</value> </property> <!-ssh Communication Password Communication location--<property > <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> &
                Lt;/property> 5.3 Configuration Maped-site.xml <property> <name>mapreduce.framework.name</name>
  <value>yarn</value> </property> 5.4 Configuring Yarn HA Configuration yarn-en.sh Java Environment # some Java parameters
                Export java_home=/usr/java/jdk1.7.0_55 5.5 Configure Yarn-site.xml <!-rm after re-linking time-<property> <name>yarn.resourcemanager.connect.retry-interval.ms</name> <value>2000</
                value> </property> <!-Open Resource Manager HA, default to False--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> &L T;/property> <!-Turn on fault auto switch-<property> <name>yarn.resourcemanager.ha.automatic-failover.enabled&lt ;/name> <value>true</value> </property> <!-Configuration Resource Manager--&G
        T <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,
                rm2</value> </property> <!-Configure RM1 on Master01, Master02 on RM2,--> <property>
               <name>yarn.resourcemanager.ha.id</name> <value>rm1</value> <description>if we want to launch + than one RM in single node, we need this configuration</descriptio N> </property> <!-turn on auto-restore-<property> &LT;NAME&GT;YARN.R
        Esourcemanager.recovery.enabled</name> <value>true</value> </property> <!-withConnect to Zookeeper-<property> <name>yarn.resourcemanager.zk-state-store.address&lt 
                ;/name> <value>localhost:2181</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.
                Yarn.server.resourcemanager.recovery.zkrmstatestore</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>localhost:2181</value&
        Gt
                </property> <property> <name>yarn.resourcemanager.cluster-id</name>
         <value>yarn-cluster</value> </property> <!-schelduler Waiting Connection Time--
              <property> <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>  <value>5000</value> </property> <!-Configuration rm1--> <property>
        <name>yarn.resourcemanager.address.rm1</name> <value>master01:23140</value> </property> <property> &LT;NAME&GT;YARN.RESOURCEMANAGER.SCHEDULER.ADDRESS.RM1&L
                t;/name> <value>master01:23130</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>master01:2318 8</value> </property> <property> <name>yarn.resourcemanager.resour
         Ce-tracker.address.rm1</name> <value>master01:23125</value> </property> <property> <name>yarn.resourcemanager.admin.address.rm1</name> &L t;value>master01:23141</value> </property> <property> <name>yarn.resourcemanager.ha.adm in.address.rm1</name> <value>master01:23142</value> </property> &L t;!
                -Configure rm2--> <property> <name>yarn.resourcemanager.address.rm2</name> <value>master02:23140</value> </property> <property> <na
        Me>yarn.resourcemanager.scheduler.address.rm2</name> <value>master02:23130</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</n
                ame> <value>master02:23188</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm2</name> <value>master0 2:23125</value>
        </property> <property> <name>yarn.resourcemanager.admin.address.rm2<
                /name> <value>master02:23141</value> </property> <property> <name>yarn.resourcemanager.ha.admin.address.rm2</name> <value>master02:2314 2</value> </property> <!-configuration nodemanager--> <property> <d Escription>address where the localizer IPC is.</description> <name>yarn.nodemanager.localiz er.address</name> <value>0.0.0.0:23344</value> </property> <!-n Odemanager HTTP Access Port-<property> <description>nm Webapp address.</description&
                Gt
        <name>yarn.nodemanager.webapp.address</name> <value>0.0.0.0:23999</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <na Me>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.ma Pred. shufflehandler</value> </property> <property> <name>yarn.nodemanag Er.local-dirs</name> <value>/usr/local/cloud/data/hadoop/yarn/local</value> <
                /property> <property> <name>yarn.nodemanager.log-dirs</name>
                <value>/usr/local/cloud/data/logs/hadoop</value> </property> <property> <name>mapreduce.shuffle.port</name> <value>23080</value> </proper Ty> <-Fault Handling class-<property> <name>yarn.client.failover-proxy-provider</name> <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value> </prope Rty> Six, configure the zookeeper cluster in the Zookeeper directory to establish the data directory and logs directory, configuration zoo.cnf datadir=/usr/local/cloud/zookeeper/data datalogdir= /usr/local/cloud/zookeeper/logs # The port at which the clients would connect clientport=2181 server.1=master01:2888:3888 s
erver.2=master02:2888:3888 server.3=slave01:2888:3888 server.4=slave02:2888:3888 server.5=slave03:2888:3888
Create the myID file in the data directory and fill in the numbers on the corresponding machine, such as the myID write 1 on the configuration Master01 Server01, Master02 write 2 of the data in the case, and sequentially perform the same operation on the other machines.
  
Execute the zkserver.sh start command in the bin directory under the Zookeeper directory under each machine and run zkserver.sh status if leader or Fllower is present, the cluster configuration is correct. The configuration files are complete. Seven, start the Hadoop cluster in the following order (first) (1) Each node starts the zookeeper, in Zookeeper/bin/zkserver.sh start (2) Select an NN to perform the completion
   The Hadoop/bin/hdfs ZKFC–FORMATZK is formatted to create a namespace (3) on a node configured with Journalnode, MASTER01,SLAVE01,SLAVE02On hadoop/sbin/hadoop-daemon.sh start Journalnode (4) on the main Namenode node, perform a formatted Hadoop Namenode-format ZZG host machine to start Namenode hadoop/ sbin/hadoop-daemon.sh start Namenode (5) Copy the directory of the primary Namenode node to the primary namenode node Hadoop/bin/hdfs namenode– Bootstrapstandby hadoop/sbin/hadoop-daemon.sh start Namenode (6) Execute the following command on all two namenode nodes./sbin/hadoop-daemon.sh Start ZKFC (7) Execute the following command on all Datanode nodes to start Datanode $HADOOP _home/sbin/hadoop-daemon.sh start Datanode (8) starts yarn on the main namenode node, Run the yarn-start.sh command JPS you can see the NameNode node [Root@master01 ~]# JPS 38972 journalnode 38758 NameNode 39166 Dfszkfailovercontrolle R 37473 Quorumpeermain 39778 ResourceManager 42620 Jps DataNode node [root@slave01 ~]# Jps 33440 DataNode 35277 Jps 32681 Quo Rumpeermain 33568 Journalnode 34231 NodeManager

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

CDH5 Perfect manual configuration Process Improvement version

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

CDH5 Perfect manual configuration Process Improvement version

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support