CDH5 Perfect manual configuration Process Improvement version

Source: Internet
Author: User
Tags manual failover shuffle zookeeper ssh
First, pre-installation: Operating system: CentOS 6.5 64-bit operating system environment: Jdk1.7.0_45 above, this time using jdk-7u55-linux-x64.tar.gz Master01 10.10.2.57 namenode node Master02 10.10.2.58 namenode node slave01:10.10.2.173 datanode node slave02:10.10.2.59 datanode node slave03:10.10.2.60 datanod E-node Note: Hadoop2.0 above is the JDK environment is 1.7,linux the JDK Uninstall, reinstall download Address: http://www.oracle.com/technetwork/java/javase/downloads/ index.html software version: hadoop-2.3.0-cdh5.1.0.tar.gz, zookeeper-3.4.5-cdh5.1.0.tar.gz download Address: http://archive.cloudera.com/ cdh5/cdh/5/Start Installation: Second, JDK installation 1, check whether to bring your own JDK Rpm-qa | grep JDK java-1.6.0-openjdk-1.6.0.0-1.45.1.11.1.el6.i686 2, uninstall your own JDK yum-y remove java-1.6.0-openjdk-1.6.0.0-1.45.1.11.1.el6.i686 3, install jdk-7u55-linux-x64.tar.gz Create folder Java under usr/directory, run under Java folder tar–  ZXVF jdk-7u55-linux-x64.tar.gz Extract to Java directory [Root@master01 java]# ls jdk1.7.0_55 Three, configuration environment variable Travel vi/etc/profile #/etc/profile # System wide environment and startup programs, for login Setup # Functions and aliases go IN/ETC/BASHRC export java_home= /USR/JAVA/JDK1.7.0_55 Export jre_home=/usr/java/jdk1.7.0_55/JRE Export Classpath=/usr/java/jdk1.7.0_55/lib export path= $JAVA _home/bin: $PATH save changes, run Source/etc/profile Reload environment variable Run java-version [Root@master01 java]# java-version java Version "1.7.0_55" Java (TM) SE Runtime Environment (Buil D 1.7.0_55-b13) Java HotSpot (TM) 64-bit Server VM (build 24.55-b03, Mixed mode) JDK configuration succeeded four, system configuration pre-prepared 5 machines, and configure IP shutdown firewall chkcon Fig iptables Off (permanent shutdown) Configure host name and Hosts file [Root@master01 java]# vi/etc/hosts 127.0.0.1 localhost localhost.localdomain loca Lhost4 localhost4.localdomain4:: 1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.10.2.57 m Aster01 10.10.2.58 master02 10.10.2.173 slave01 10.10.2.59 slave02 10.10.2.60 slave03 Configure different host names according to different machine IP 3, SSH No password authentication configuration because had The OOP process requires remote management of the Hadoop daemon, namenode nodes need to link each datanode node through SSH (Secure Shell), stop or start their processes, so SSH must be without a password,
So we have to make namenode nodes and Datanode nodes into a non-secret communication, the same datanode also need to configure the Namenode node without a password link. Configure on each machine: Vi/etc/ssh/sshd_config open rsaauthentication Yes # Enable RSA authentication, pubkeyauthentication Yes # Enable public key private key pairing authentication mode MastEr01: Run: ssh-keygen–t rsa–p "Do not enter the password directly enter the default in the/root/.ssh directory, cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys [ Root@master01. ssh]# ls authorized_keys id_rsa id_rsa.pub known_hosts slave01 Perform the same operation, and then master01/root/.ssh/directory of Id_rsa The. Pub is placed in the same directory as Slave01 Authorized_keys so SLAVE01 holds the Master01 public key and then directly SSH SLAVE01 test whether it can be connected to slave01 without a password, then slave01 on id_
Rsa.pub Append to Master01 authorized_keys, test whether SSH Master01 can connect directly to SLAVE01. [Root@master01 ~]# ssh slave01 last Login:tue-14:28:15 from Master01 [root@slave01 ~]# master01-master02 Mas
  
TER01-SLAVE01 master01-slave02 master01-slave03 master02-slave01 master02-slave02 Master02-slave03 perform the same operation.  V. Installing Hadoop to create a file directory/usr/local/cloud creating a folder data, storage, log files, haooop original file, zookeeper original file [root@slave01 cloud]# ls data Hadoop tar Zookeeper 5.1, configuration hadoop-env.sh into the/usr/local/cloud/hadoop/etc/hadoop directory configuration VI hadoop-env.sh HADOOP runtime environment load export Java_ Home=/usr/java/jdk1.7.0_55 5.2, configuration core-site.xml <!-hadoop.tmp.dir:hadoop Many paths are dependent on him, Namenode node that directory can not be deleted, otherwise you need toTo reformat-<property> <name>hadoop.tmp.dir</name> <value>/usr/local/cloud/data/hadoop/t Mp</value> </property> <!-This configuration file describes the URL of the cluster's Namenode node, where HA represents the default logical name, Each datanode node in the cluster needs to know the address of the Namenode and the data can be used-<property> <name>fs.defaultFS</name> <value& Gt;hdfs://zzg</value> </property> <!--the address and port of the zookeeper cluster, it's best to keep a base of at least 3--<property> <name >ha.zookeeper.quorum</name> <value>master01:2181,slave01:2181,slave02:2181</value> </ Property> (2) hdfs-site.xml configuration <!-hadoop namenode data storage directory, only for with Namenode, contains namenode system Information metadata information-< Property> <name>dfs.namenode.name.dir</name> <value>/usr/local/cloud/data/hadoop/dfs/nn< /value> </property> <!-datanode to store data to a local path, not every machine is the same, but the best way to manage it is the same--<property> <name& Gt;dfs.datanode.data.dir</name> <value>/usr/local/cloud/data/hadoop/dfs/dn</value> </property> <!-System file backup number, system default is 3--<property> <name>dfs.replication</name > <value>3</value> </property> <!--dfs.webhdfs.enabled to True, Otherwise, some commands cannot be used such as: Webhdfs liststatus-<property> <name>dfs.webhdfs.enabled</name> <value>t rue</value> </property> <!-Optional, turn off permissions bring some unnecessary hassle--<property> <name>dfs.permissions< /name> <value>false</value> </property> <!-Optional, turn off permissions bring some unnecessary hassle--<property> < name>dfs.permissions.enabled</name> <value>false</value> </property> <!-ha configuration--& lt;! -Set the logical name of the cluster--<property> <name>dfs.nameservices</name> <value>zzg</value> </pro Perty> namenode nodes in <!-hdfs federated cluster logical names--<property> <name>dfs.ha.namenodes.zzg</name> <v alue>nn1,nn2</value> </property> <!-hdfs NameNode logical name in RPC configuration, RPC simple understood as the transfer of files on a serialized file to be used--<property> <name>dfs.namenode.rpc-address.zzg.nn1</ name> <value>master01:9000</value> </property> <property> <name>dfs.namenode.rpc- address.zzg.nn2</name> <value>master02:9000</value> </property> <!-Configuring Hadoop page Access Port ports-- > <property> <name>dfs.namenode.http-address.zzg.nn1</name> <value>master01:50070</ value> </property> <property> <name>dfs.namenode.http-address.zzg.nn2</name> <value& Gt;master02:50070</value> </property> <!-Building Communications with Namenode-<property> <name>
Dfs.namenode.servicerpc-address.zzg.nn1</name> <value>master01:53310</value> </property> <property> <name>dfs.namenode.servicerpc-address.zzg.nn2</name> <value>master02:53310 </value> </property> <!-journalnode shared file cluster--&GT <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://master01:8485; Slave01:8485;slave02:8485/zzg</value> </property> <!-journalnode Share settings with Namenode-< Property> <name>dfs.journalnode.edits.dir</name> <value>/usr/local/cloud/data/hadoop/ha/ Journal</value> </property> <!-set up fault handling classes--<property> <name> Dfs.client.failover.proxy.provider.zzg</name> <value> Org.apache.hadoop.hdfs.server.namenode.ha.configuredfailoverproxyprovider</value> </property> <!- Turn on auto Switch-<property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</va lue> </property> <property> <name>ha.zookeeper.quorum</name> <value>maste r01:2181,slave01:2181,slave02:2181</value> </property> <!-using SSH for failover-<property> < Name>dfs.ha.fenciNg.methods</name> <value>sshfence</value> </property> <!-ssh Communication Password Communication location--<property > <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> &
                Lt;/property> 5.3 Configuration Maped-site.xml <property> <name>mapreduce.framework.name</name>
  <value>yarn</value> </property> 5.4 Configuring Yarn HA Configuration yarn-en.sh Java Environment # some Java parameters
                Export java_home=/usr/java/jdk1.7.0_55 5.5 Configure Yarn-site.xml <!-rm after re-linking time-<property> <name>yarn.resourcemanager.connect.retry-interval.ms</name> <value>2000</
                value> </property> <!-Open Resource Manager HA, default to False--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> &L T;/property> <!-Turn on fault auto switch-<property> <name>yarn.resourcemanager.ha.automatic-failover.enabled&lt ;/name> <value>true</value> </property> <!-Configuration Resource Manager--&G
        T <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,
                rm2</value> </property> <!-Configure RM1 on Master01, Master02 on RM2,--> <property>
               <name>yarn.resourcemanager.ha.id</name> <value>rm1</value> <description>if we want to launch + than one RM in single node, we need this configuration</descriptio N> </property> <!-turn on auto-restore-<property> &LT;NAME&GT;YARN.R
        Esourcemanager.recovery.enabled</name> <value>true</value> </property> <!-withConnect to Zookeeper-<property> <name>yarn.resourcemanager.zk-state-store.address&lt 
                ;/name> <value>localhost:2181</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.
                Yarn.server.resourcemanager.recovery.zkrmstatestore</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>localhost:2181</value&
        Gt
                </property> <property> <name>yarn.resourcemanager.cluster-id</name>
         <value>yarn-cluster</value> </property> <!-schelduler Waiting Connection Time--
              <property> <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>  <value>5000</value> </property> <!-Configuration rm1--> <property>
        <name>yarn.resourcemanager.address.rm1</name> <value>master01:23140</value> </property> <property> &LT;NAME&GT;YARN.RESOURCEMANAGER.SCHEDULER.ADDRESS.RM1&L
                t;/name> <value>master01:23130</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>master01:2318 8</value> </property> <property> <name>yarn.resourcemanager.resour
         Ce-tracker.address.rm1</name> <value>master01:23125</value> </property> <property> <name>yarn.resourcemanager.admin.address.rm1</name> &L t;value>master01:23141</value> </property> <property> <name>yarn.resourcemanager.ha.adm in.address.rm1</name> <value>master01:23142</value> </property> &L t;!
                -Configure rm2--> <property> <name>yarn.resourcemanager.address.rm2</name> <value>master02:23140</value> </property> <property> <na
        Me>yarn.resourcemanager.scheduler.address.rm2</name> <value>master02:23130</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</n
                ame> <value>master02:23188</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm2</name> <value>master0 2:23125</value>
        </property> <property> <name>yarn.resourcemanager.admin.address.rm2<
                /name> <value>master02:23141</value> </property> <property> <name>yarn.resourcemanager.ha.admin.address.rm2</name> <value>master02:2314 2</value> </property> <!-configuration nodemanager--> <property> <d Escription>address where the localizer IPC is.</description> <name>yarn.nodemanager.localiz er.address</name> <value>0.0.0.0:23344</value> </property> <!-n Odemanager HTTP Access Port-<property> <description>nm Webapp address.</description&
                Gt
        <name>yarn.nodemanager.webapp.address</name> <value>0.0.0.0:23999</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <na Me>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.ma Pred. shufflehandler</value> </property> <property> <name>yarn.nodemanag Er.local-dirs</name> <value>/usr/local/cloud/data/hadoop/yarn/local</value> <
                /property> <property> <name>yarn.nodemanager.log-dirs</name>
                <value>/usr/local/cloud/data/logs/hadoop</value> </property> <property> <name>mapreduce.shuffle.port</name> <value>23080</value> </proper Ty> <-Fault Handling class-<property> <name>yarn.client.failover-proxy-provider</name> <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value> </prope Rty> Six, configure the zookeeper cluster in the Zookeeper directory to establish the data directory and logs directory, configuration zoo.cnf datadir=/usr/local/cloud/zookeeper/data datalogdir= /usr/local/cloud/zookeeper/logs # The port at which the clients would connect clientport=2181 server.1=master01:2888:3888 s
erver.2=master02:2888:3888 server.3=slave01:2888:3888 server.4=slave02:2888:3888 server.5=slave03:2888:3888
Create the myID file in the data directory and fill in the numbers on the corresponding machine, such as the myID write 1 on the configuration Master01 Server01, Master02 write 2 of the data in the case, and sequentially perform the same operation on the other machines.
  
Execute the zkserver.sh start command in the bin directory under the Zookeeper directory under each machine and run zkserver.sh status if leader or Fllower is present, the cluster configuration is correct. The configuration files are complete. Seven, start the Hadoop cluster in the following order (first) (1) Each node starts the zookeeper, in Zookeeper/bin/zkserver.sh start (2) Select an NN to perform the completion
   The Hadoop/bin/hdfs ZKFC–FORMATZK is formatted to create a namespace (3) on a node configured with Journalnode, MASTER01,SLAVE01,SLAVE02On hadoop/sbin/hadoop-daemon.sh start Journalnode (4) on the main Namenode node, perform a formatted Hadoop Namenode-format ZZG host machine to start Namenode hadoop/ sbin/hadoop-daemon.sh start Namenode (5) Copy the directory of the primary Namenode node to the primary namenode node Hadoop/bin/hdfs namenode– Bootstrapstandby hadoop/sbin/hadoop-daemon.sh start Namenode (6) Execute the following command on all two namenode nodes./sbin/hadoop-daemon.sh Start ZKFC (7) Execute the following command on all Datanode nodes to start Datanode $HADOOP _home/sbin/hadoop-daemon.sh start Datanode (8) starts yarn on the main namenode node, Run the yarn-start.sh command JPS you can see the NameNode node [Root@master01 ~]# JPS 38972 journalnode 38758 NameNode 39166 Dfszkfailovercontrolle R 37473 Quorumpeermain 39778 ResourceManager 42620 Jps DataNode node [root@slave01 ~]# Jps 33440 DataNode 35277 Jps 32681 Quo Rumpeermain 33568 Journalnode 34231 NodeManager

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.