Hadoop + Zookeeper for High Availability of NameNode

Source: Internet
Author: User
Tags exit in

Hadoop + Zookeeper for High Availability of NameNode

Hadoop + zookeepker installation and configuration:
 
Add environment variables for export JAVA to the hadoop-env.sh
Modify the hostname file name, configure the ing between the host name and ip address in the/etc/hosts file, and add the Host Name and ip address of mstaer and slave.
 
Configure ssh key-free Configuration
Ssh-keygen-t rsa
Generate two files in the./. ssh file: id_rsa (Private Key), id_rsa.pub (Public Key)
Cat id_rsa.pub>. ssh/authorized_keys
Scp authorized_keys user @ ipaddress:/home/user/id_rsa.pub
Modify the authorzed File Permission to 600
 
// High availability between Namenode is actually achieved through the journalNode cluster or nfs. The two master-slave namenode nodes share a shared directory to achieve high availability, the standy machine synchronizes the active namenode machine at all times. The automatic failover of namenode is generally implemented in the zookeeper cluster.
 
Namenode High Availability Configuration:
The property of adding fs. defaultFS to the Core-site.Xml is hdfs: // mycluster
Add dfs. federation. nameservers to mycluster in hdfs-site.
Add dfs. namenodes. mycluster with values nn1 and nn2
Add dfs. namenode. The rpc-address.mycluster.nn1 value is hostname1: 8020
Add dfs. namenode. The rpc-address.mysqlcluster.nn2 value is hostname2: 8020
Add dfs. namenode. The http-address.mycluster.nn1 value is hostname1: 50070 // configure web View for the namenode Node
Add dfs. namenode. The http-address.mycluster.nn1 value is hostname1: 50070
Add the directory location of the dfs. namenode. shared. edits. dir shared storage, and 8485 of all slave ports.
Add dfs. client. failover. proxy. provider. the value of mycluster is org. apache. hadoop. hdfs. server. namenode. ha. configureFailoverProxyProvider // check the java class that the hadoop client communicates with the active node and use it to check whether the active node is active.
The value of dfs. ha. fencing. methods Added is sshfence and ssh is used for switching.
// There must be only one namenode node in any period. This configuration uses ssh to connect to the namenode node to kill the active state of namenode.
 
Below are all the configurations of hadoop + zookepper:
Configure hdfs-site.Xml
<Configuration>
<Property>
<Name> dfs. replication </name>
<Value> 3 </value> // The number of copies copied to the text is 3.
</Property>
<Property>
<Name> heartbeat. recheckinterval </name> // The heartbeat time of datanode is 10 s.
<Value> 10 </value>
</Property>
<Property>
<Name> dfs. name. dir </name>
<Value> file:/mnt/vdc/hadoopstore/hdfs/name </value> // determine the directory for storing the metadata of the hdfs file system, you can save multiple backups of metadata data.
</Property>
<Property>
<Name> dfs. data. dir </name>
<Value> file:/mnt/vdc/hadoopstore/hdfs/data </value> // determine the directory for storing data in the hdfs file system, hdfs can be created on different partitions.
</Property>
<Property>
<Name> dfs. webhdfs. enabled </name> // ability to access hdfs on the web
<Value> true </value>
</Property>
<Property>
<Name> dfs. nameservices </name> // defines the nameserver family of mycluster.
<Value> mycluster </value>
</Property>
<Property>
<Name> dfs. ha. namenodes. mycluster </name> // supports two namenode nodes: nn1 and nn2.
<Value> nn1, nn2 </value>
</Property>
<Property>
<Name> dfs. namenode. rpc-address.mycluster.nn1 </name> // The communication address of the first rpc, port 8020
<Value> MAID: 8020 </value>
</Property>
<Property>
<Name> dfs. namenode. rpc-address.mycluster.nn2 </name> // communication address of the second rpc, port 8020
<Value> master2: 8020 </value>
</Property>
<Property>
<Name> dfs. namenode. http-address.mycluster.nn1 </name>
<Value> master1: 50070 </value> // defines the http port of the second namenode.
</Property>
<Property>
<Name> dfs. namenode. http-address.mycluster.nn2 </name>
<Value> master2: 50070 </value> // defines the httpd port of the second namenode.
</Property>
<Property>
<Name> dfs. namenode. shared. edits. dir </name> <value> qjournal: // master1: 8485; master2: 8485; slave1: 8485; slave2: 8485; slave3: 8485; slave4: 8485; slave5: 8485; slave6: 8485; slave7: 8485; slave8: 8485; slave9: 8485; slave10: 8485/mycluster </value>
</Property> // shared datanode Information
// Client failover configuration
<Property>
<Name> dfs. client. failover. proxy. provider. mycluster </name>
<Value> org. apache. hadoop. hdfs. server. namenode. ha. ConfiguredFailoverProxyProvider </value>
</Property> // which class is automatically implemented during automatic failover?
<Property>
<Name> dfs. ha. fencing. methods </name>
<Value> sshfence </value> // namenode // use ssh or other methods during switchover
</Property>
 
<Property>
<Name> dfs. ha. fencing. ssh. private-key-files </name>
<Value>/home/kduser/. ssh/id_rsa </value> // location of the stored key
</Property>
<Property>
<Name> dfs. ha. automatic-failover.enabled </name> //??? Whether mycluster needs to be added, and whether to automatically switch when a fault occurs
<Value> true </value>
</Property>
// Configure the namenode node id as nn1
<Property>
<Name> dfs. ha. namenode. id </name>
<Value> nn1 </value>
</Property>
</Configuration>
 
 
Configure mapred-site.xml files
<Configuration>
<Property>
<Name> mapreduce. framework. name </name>
<Value> yarn </value> // The framework for Versions later than hadoop2.x is yarn.
</Property>
<Property>
<Name> mapreduce. reduce. shuffle. input. buffer. percent </name> // The default value is 0.7, which improves system configuration.
<Value> 0.1 <value>
</Property>
</Configuration>
 
 
Configure yarn-site.xml
<Property>
<Name> yarn. nodemanager. resource. memory-mb </name> // The total available physical memory of nodemanager.
<Value> 10240 </value>
</Property>
<Property>
<Name> yarn. resourcemanager. address </name>
// The address exposed by ResourceManager to the client. The client uses this address to submit applications to RM and kill applications.
<Value> MAID: 8032 </value>
</Property>
<Property>
<Name> yarn. nodemanager. disk-health-checker.max-disk-utilization-per-disk-percentage </name>
<Value> 95.0 </value>
</Property>
<Property>
<Name> yarn. resourcemanager. schedager. address </name>
<Value> MAID: 8030 </value>
</Property>
<Property>
<Name> yarn. resourcemanager. resource-tracker.address </name>
<Value> MAID: 8031 </value>
</Property>
<Property>
<Name> yarn. nodemanager. aux-services </name>
<Value> mapreduce_shuffle </value>
</Property>
<Property>
<Name> yarn. resourcemanager. admin. address </name>
<Value> MAID: 8033 </value>
</Property>
<Property>
<Name> yarn. nodemanager. aux-services.mapreduce.shuffle.class </name>
<Value> org. apache. hadoop. mapred. ShuffleHandler </value>
</Property>
<Property>
<Name> yarn. resourcemanager. webapp. address </name>
<Value> MAID: 8088 </value>
</Property>
 
Configure core-site.xml configurations
<Configuration>
<Property>
<Name> hadoop. native. lib </name>
<Value> true </value>
<Description> Shouldnative hadoop libraries, if present, be used. </description>
// Set to start the local database. The local database is used by default.
</Property>
<! --
<Property>
<Name> fs. default. name </name>
<Value> hdfs: // 0.0.0.0: 9000 </value> // url of the namenode Node
</Property>
-->
<Property>
<Name> hadoop. tmp. dir </name>
<Value>/mnt/vdc/hadoopstore/tmp </value> // temporary file directory of hdfs
</Property>
<Property>
<Name> fs. defaultFS </name>
<Value> hdfs: // mycluster </value> // specify the nameservice of hdfs as mycluster (two), which is the high-availability configuration of hadoop namenode nodes.
</Property>
<Property>
<Name> dfs. journalnode. edits. dir </name>
<Value>/mnt/vdc/hadoopstore/journal/data </value>
</Property>
<Property>
<Name> ha. zookeeper. quorum. mycluster </name>
<Value> master1: 2181, master2: 2181, slave1: 2181 </value>
</Property>
<Property>
<Name> hadoop. proxyuser. oozie. hosts </name>
<Value> * </value>
</Property>
<Property>
<Name> hadoop. proxyuser. oozie. groups </name>
<Value> * </value>
</Property>
<Property>
<Name> hadoop. proxyuser. hue. hosts </name>
<Value> * </value>
</Property>
<Property>
<Name> hadoop. proxyuser. hue. groups </name>
<Value> * </value>
</Property>
</Configuration>
 
Format hadoop namenode-format at the first startup
Use jps to view the cluster status.
Hadoop dfsadmin-report
 
Zookeeper command details:
Configure basic environment variables:
Export ZOOKEEPER_HOME =/home/zookeeper-3.3.3
Export PATH = $ PATH: $ ZOOKEEPER_HOME/bin: $ ZOOKEEPER_HOME/conf
Zookeeper configuration file zoo. cfg
TickTime = 2000 // by default, a heartbeat is sent every two seconds.
DataDir =/diskl/zookeeper // location of the database snapshot stored in the memory
DataLogDir =/disk2/zookeeper // directory where the log is stored
ClientPort = 2181
Initlimit = 5 // specifies the number of Heartbeat times when the connection times out. The number of Heartbeat times is 5, which means that the heartbeat will exit in 10 s.
SyncLimit = 2
Server. l = MAID: 2888: 3888
Server.2 = zookeeper2: 2888: 3888
Server.3 = zookeeper3: 2888: 3888
 
Port 2181 of zookeeper is used to connect to the client, port 2888 is used to connect to the followers, and port 3888 is used to elect
Modify the myid file, which is configured as 1, 2, 3 in the dataDir File
 
ZkServer. sh start/stop/status start/Close/status
ZkCLi. sh-serveripaddress: 2181 // connect to a zookeeper Server
Use ls/to view node content
Get/xxxx
Set/create/deletexxx set/create/delete node content
However, zookeeper mainly uses APIs for access.

Ubuntu 14.04 installs distributed storage Sheepdog + ZooKeeper

CentOS 6 installs sheepdog VM distributed storage

ZooKeeper cluster configuration

Use ZooKeeper to implement distributed shared locks

Distributed service framework ZooKeeper-manage data in a distributed environment

Build a ZooKeeper Cluster Environment

Test Environment configuration of ZooKeeper server cluster

ZooKeeper cluster Installation

Zookeeper3.4.6 Installation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.