Two-Computer hot backup scheme for Hadoop Namenode

Source: Internet
Author: User
Keywords Nbsp; name dfs view

Reference HADOOP_HDFS system dual-machine hot standby scheme. PDF, after testing

On Hadoopnamenode dual-Computer hot backup scheme

1. Preface

Currently, hadoop-0.20.2 does not provide a backup of name node, but only provides a secondary node, although it is somewhat able to guarantee a backup of name node, but when the machine where name node is located fails, secondary Node does not provide real-time switching and may have data loss possibilities.

We implement the HA of name node using the DRBD + heartbeat scheme.

The use of DRBD to achieve shared storage, the use of heartbeat to achieve heartbeat monitoring, all servers are equipped with dual network cards, one of the network card is dedicated to establish a heartbeat network connection.

2. Basic Configuration

2.1, Hardware environment

VMware virtual machine as a test machine, a total of three, of which two each provide 2 network cards (one used for network communications, one for Heartbeat Heartbeat), and a blank size of the same partition (for DRBD use). Software Environment: RedHat Linux as 5,hadoop-0.20.2, the following figure:

Host

IP Address

Partition

Server1 (name node)

eth0:10.10.140.140

eth1:10.0.0.201 (Heartbeat heartbeat uses this IP)

eth0:0:10.10.140.200 (Virtual IP)

/dev/drbd0 Mounted On/home/share

Server2 (Data node)

eth0:10.10.140.117

Server3 (Backup name node)

eth0:10.10.140.84

eth1:10.0.0.203 (Heartbeat heartbeat uses this IP)

eth0:0:10.10.140.200 (Virtual IP)

Dev/drbd0 Mounted On/home/share

2.1. Network Configuration

2.2.1, modify Server1 and Server3 hosts (same) files

Vi/etc/hosts

10.10.140.140 Server1

10.10.140.117 Server2

10.10.140.84 Server3

10.10.140.200 SERVERVIP

10.0.0.201 Server1

10.0.0.203 Server3

The network configurations for 2.2.2, Server1, and Server3 are as follows:

Server1 Network configuration:

[Root@server1 ~] #cat/etc/sysconfig/network-scripts/ifcfg-eth0

# Advanced Microdevices [AMD] 79c970 [PCnet32 LANCE]

Device=eth0

Bootproto=none

Hwaddr=00:0c:29:18:65:f5

Onboot=yes

ipaddr=10.10.140.140

netmask=255.255.254.0

gateway=10.10.140.1

Type=ethernet

[Root@server1 ~] #cat/etc/sysconfig/network-scripts/ifcfg-eth1

# please Read/usr/share/doc/initscripts-*/sysconfig.txt

# for Thedocumentation of the parameters.

gateway=10.0.0.1

Type=ethernet

Device=eth1

Hwaddr=00:0c:29:18:65:ff

Bootproto=none

netmask=255.255.255.0

ipaddr=10.0.0.201

Onboot=yes

Userctl=no

Ipv6init=no

Peerdns=yes

Server3 Network configuration:

[Root@server3 ~] #cat/etc/sysconfig/network-scripts/ifcfg-eth0

# Advanced Microdevices [AMD] 79c970 [PCnet32 LANCE]

Device=eth0

Bootproto=none

hwaddr=00:0c:29:d9:6a:53

Onboot=yes

ipaddr=10.10.140.84

netmask=255.255.254.0

gateway=10.10.140.1

Type=ethernet

[Root@server3 ~] #cat/etc/sysconfig/network-scripts/ifcfg-eth1

# please Read/usr/share/doc/initscripts-*/sysconfig.txt

# for Thedocumentation of the parameters.

gateway=10.0.0.1

Type=ethernet

Device=eth1

hwaddr=00:0c:29:d9:6a:5d

Bootproto=none

netmask=255.255.255.0

ipaddr=10.0.0.203

Onboot=yes

Userctl=no

Ipv6init=no

Peerdns=yes

2.2.3, modifying host name

[Root@server1 ~] #cat/etc/sysconfig/network

Networking=yes

Networking_ipv6=yes

Hostname=server1

[Root@server3 ~] #cat/etc/sysconfig/network

Networking=yes

Networking_ipv6=yes

Hostname=server3

2.2.4, shutdown firewall

[Root@server1 ~] #chkconfig iptables off

[Root@server3 ~]# chkconfig iptables off

3, DRBD Installation and configuration

3.1, the principle of DRBD

DRBD (distributedreplicated block Device) is a chunk replication distribution device that is based on a Linux system. It can synchronize the data between the remote host and the local host in real time, similar to the RAID1 function, we can consider it as the network RAID1. Deployment on a server uses DRBD, which can be used instead of shared disk arrays, because the data exists both locally and remotely on the server, and when the local server fails, the data on the remote server can be used to continue working, if uninterrupted services are to be implemented, The seamless takeover of services can be achieved through DRBD combined with another open source tool heartbeat. The working principle of DRBD is as follows:

3.2, installation

Download installation package: Wget http://oss.linbit.com/drbd/8.3/drbd-8.3.0.tar.gz, execute the following command:

Tar xvzf drbd-8.3.0.tar.gz

CD drbd-8.3.0

CD DRBD

Make clean all

CD.

Make Tools

Make install

Make Install-tools

Verify that the installation is correct:

# Insmod Drbd/drbd.ko or # modprobe DRBD

# Lsmod | grep DRBD

DRBD 220056 2

The display is installed correctly. Primarily on Server1 and Server3

3.3, configuration

Hard disk partitions used by 3.3.1, DRBD

The Server1 and Server3 partitions must be of the same size and format. and must all be blank partitions, you can reserve the partition before installing the system, if the system is already installed, it is recommended to use the GParted tool for partitioning.

Use the method can refer to: http://hi.baidu.com/migicq/blog/item/5e13f1c5c675ccb68226ac38.html

The partition with SERVER1:IP address 10.10.140.140,DRBD is:/dev/sda4

The partition with SERVER3:IP address 10.10.140.84,DRBD is:/dev/sda4

3.3.2, main configuration files

When DRBD runs, a configuration file/etc/drbd.conf is read. This file describes the mapping relationship between the DRBD device and the hard disk partition, and some of the configuration parameters of the DRBD.

[Root@server1 ~] #vi/etc/drbd.conf

#是否参加DRBD使用者统计. The default is Yes

Global {

Usage-count Yes;

}

# Set the maximum network rate when the primary standby node is synchronized, in bytes

Common {

syncer {rate 10M;}

# a DRBD device (ie:/DEV/DRBDX) is called a "resource". It contains information about the main standby # node of a DRBD device.

Resource R0 {

# Use Protocol C. Indicates that the write is complete after receiving a write acknowledgement from the remote host.

Kyoto C;

NET {

# Set the information algorithm used for communication between the primary and standby machines.

Cram-hmac-alg SHA1;

Shared-secret "Foofunfactory";

Allow-two-primaries;

}

Syncer {

Rate 10M;

}

# The description for each host begins with "On" followed by the hostname. On server1 {for this host's configuration in the following {}

device/dev/drbd0;

#使用的磁盘分区是/DEV/SDA4

DISK/DEV/SDA4;

# set DRBD listening port for communication with another host

Address 10.10.140.140:7788;

Flexible-meta-disk internal;

}

On Server3 {

device/dev/drbd0;

DISK/DEV/SDA4;

Address 10.10.140.84:7788;

Meta-disk internal;

}

}

3.3.3, copy drbd.conf files to etc directory on standby

[Root@server1 ~] #scp/etc/drbd.conf root@server3:/etc/

3.4, DRBD start

Before you are ready to start, you need to create the corresponding metadata block on a blank partition on 2 hosts:

Two blank partitions are completely cleared before common

Performed on two hosts, respectively

#dd If=/dev/zero of=/dev/sdbx bs=1m count=128

Otherwise, the next step appears

.........

Device size would be Truncated,which
Would corrupt data and result in
' Access beyond end of device ' errors.
Need to either
* Use external meta data (recommended)
* Shrink that filesystem
* Zero out the device (destroy Thefilesystem)
Twist refused.
..........

Executed on Server1 and Server3 respectively.

3.4.1, #drbdadmcreate-md r0 create meta data

Once you are successful, you can then start the DRBD process (enabled at both Server01 and Server02):

3.4.2 on Server1 and Server3 respectively.

[root@server01~]#/etc/init.d/drbd start or SERVICEDRBD start

STARTINGDRBD resources: [D (R0) s (r0) n (r0)].

3.4.3 Set the master node

Execute the following command (first time) in Server1, set Server1 as the primary node, and then use Drbdadmprimary db

#drbdsetup/dev/drbd0 Primary–o

3.4.4 View Connection

The data for the disk is synchronized at the first start.

3.4.5 the blank disk and mount it to the file system

This operation is only performed on the primary node.

[Root@server1 ~]# mkfs.ext2/dev/drbd0

MKE2FS 1.39 (29-may-2006)

FileSystem label=

OS Type:linux

Block size=4096 (log=2)

Fragment size=4096 (log=2)

655360 inodes, 1309232 blocks

65461 blocks (5.00%) reserved forthe Super User

The Data block=0

Maximum filesystemblocks=1342177280

Block groups

32768 blocks per group, 32768fragments per group

16384 inodes per group

Superblock Backups stored onblocks:

32768, 98304, 163840, 229376, 294912,819200, 884736

Writing Inode Tables:done

Creating journal (32768 blocks):d One

Writing superblocks and Filesystemaccounting Information:done

This filesystem would beautomatically checked every mounts or

180 days, whichever comesfirst. Use Tune2fs-c or-i to override.

[Root@server1 ~]# Mount/dev/drbd0/home/share

3.4.6 Set DRBD boot automatically

Chkconfig--level DRBD on

3.5. DRBD Test

3.5.1 Primary Standby Manual switch

Uninstall the DRBD device on the host first

[Root@server1 ~]# umount/dev/drbd0

Drop server1 to from node

[Root@server1 ~]# Drbdadm secondary R0

Query Server1 Status

Upgrade the Server3 to the primary node

[Root@server3 ~]# Drbdadm primary R0

Hanging on the Server3 on the DRBD device

[Root@server3 ~]# Mount/dev/drbd0/home/share

View Server3 Status

4, heartbeat installation and configuration

Installation of 4.1 heartbeat

Install heartbeat using Yum in Server1 and Server3

[root@server1~]# Yum Install heartbeat

4.2 Heartbeat Configuration

Configure/ETC/HA.D/HA.CF

1, use the following command to find heartbeat RPM package after the installation of the release of HA.CF sample configuration file:

RPM-QD Heartbeat | Grepha.cf

2, use the following command to copy the sample configuration file to the appropriate location:

cp/usr/share/doc/packages/heartbeat/ha.cf/etc/ha.d/

3, edit the/etc/ha.d/ha.cf file, uncomment the symbol or add the following:

Udpport 694

#采用ucast方式, use the NIC eth1 to send heartbeat messages between the primary server and the standby server. Specify the IP on the end, that is, specify the 10.0.0.203 on the Server1, and specify 10.0.0.201 on the Server3

Ucast eth1 10.0.0.203

4. At the same time, cancel the annotation symbols for the three lines of Keepalive,deadtime and Initdead:

KeepAlive 2

Deadtime 30

Initdead 120

The Initdead line indicates that the heartbeat daemon should wait 120 seconds after the first boot to start the resource on the primary server, and the KeepAlive line indicates how many seconds between heartbeat messages should elapse. The Deadtime line indicates how long the standby server should wait when the primary server fails and the heartbeat message is not received. Heartbeat may send a warning message stating that you have set an incorrect value (for example: You may set the value of Deadtime very close to keepalive to ensure a secure configuration).

5, add the following two lines to the end of the/etc/ha.d/ha.cf file:

Node Server1

Node Server3

This fills in the name of the primary and standby server (the value returned by the Uname-n command)

5, remove the following comments can be viewed heartbeat run log, the error analysis has a great help

Debugfile/var/log/ha-debug

Logfile/var/log/ha-log

Configure/etc/ha.d/authkeys

1, use the following command to locate the sample Authkeys file and copy it to the appropriate location: RPM-QD Heartbeat | grep Authkeys

Cp/usr/share/doc/packages/heartbeat/authkeys/etc/ha.d

2, edit the/etc/ha.d/authkeys file, cancel the following two lines of content before the annotation symbol:

Auth1

1 CRC

3, to ensure that authkeys files can only be read by root:

chmod 600/etc/ha.d/authkeys

4.3 Installing heartbeat on the standby server

Copy the configuration file to the standby server

[Root@server1 ~]# SCP-R/ETC/HA.D ROOT@SERVER3:/ETC/HA.D

4.4 Start Heartbeat

1 Configure heartbeat to boot automatically on the primary and standby servers

Chkconfig--level Heartbeat on

2 manual starting and stopping method

/etc/init.d/heartbeat start

Or

Service Heartbeat Start

/etc/init.d/heartbeat stop

Or

Service Heartbeat Stop

5. Configuration of Hadoop main configuration file

Tip: Before starting heartbeat, you should first formatnamenode the metadata in the DRBD partition.

Masters

SERVERVIP

Slaves

Server2

Core-site.xml

<property> <name>hadoop.tmp.dir</name> <value>/home/share/hadoopdata/</value> < Description>a base for other temporary directories.</description></property><property> <name >fs.default.name</name> <value>hdfs://servervip:9000</value> <description>the Name of The default file system. A URI whose Schemeand authority determine the filesystem implementation. The URI ' Sscheme determines the Config property (fs. SCHEME.IMPL) naming Thefilesystem implementation class. Theuri ' s authority is used to determine the host, port, etc. for a filesystem.</description></property< Property> <name>fs.checkpoint.dir</name> <value>${hadoop.tmp.dir}/dfs/namesecondary</ Value> <description>determines where on the local filesystem the Dfssecondary Namenode should store the temporary Images to merge. Ifthis is a comma-delimited list of directories then the image is replicated in all of Directories for redundancy. </description></property<property> <name>fs.checkpoint.edits.dir</name> <value> ${fs.checkpoint.dir}</value> <description>determines where on the local filesystem the Dfssecondary Namenode should store the temporary edits to merge. Ifthis is a comma-delimited list of directoires then teh edits are replicated in all of the directoires for redundancy. Default value is mahouve as Fs.checkpoint.dir </description></property>

Hdfs-site.xml

<property> <name>dfs.name.dir</name> <value>${hadoop.tmp.dir}/dfs/name</value> <description>determines where on the local filesystem the DFS Namenode should store the name table (fsimage). If This is a comma-delimitedlist of directories then the name of the directories, for redundancy.</ description> </property> <property> <name>dfs.name.edits.dir</name> <value>${ Dfs.name.dir}</value> <description>determines where on the local filesystem the DFS Namenode should store the Transaction (edits) file. The If is acomma-delimited list of directories then the transaction file isreplicated in all of the directories, for redundancy . Default value is mahouve as dfs.name.dir</description> </property>

Mapred-site.xml

<property> <name>mapred.job.tracker</name> <value>servervip:9001</value> < Description>the host and port that's MapReduce job tracker SETUPCL at. If ' local ', then jobs are run In-processas a single map Andreduce task. </description></property>


6, through the Haresource configuration automatic switching

If you do not use heartbeat, DRBD can only manually switch the master-slave relationship, now modify the heartbeat configuration file, so that DRBD can automatically switch through the heartbeat.

6.1 Creating a resource script

1, new script Hadoop-hdfs, used to start and stop HDFs file system, the same can also build script Hadoop-all,hadoop-jobtracker and other resource files, to HDFs as an example of the following:

[Root@server1 conf]# Cat/etc/ha.d/resource.d/hadoop-hdfs

Cd/etc/ha.d/resource.dvi hadoop-hdfs#!/bin/shcase "$ instart" # Start commands Go herecd/home/hadoop-0.20.2/binmsg= ' Su-root-c "sh/home/hadoop-0.20.2/bin/start-dfs.sh" ' Logger $msg Stop) # Stop commands go herecd/home/hadoop-0.20.2/binmsg= ' su-root-c ' sh/home/hadoop-0.20.2/bin/stop-dfs.sh ' logger $ msg; Status) # Status commands Go

2, modify the permissions

[Root@server1 conf]# Chmod755/etc/ha.d/resource.d/hadoop-hdfs

3, copy the script to the backup machine and also modify the permissions

[Root@server1 conf]# Scp/etc/ha.d/resource.d/hadoop-hdfs Server3: /etc/ha.d/resource.d/

6.2 Configuration Haresources

[Root@server1 conf]# Cat/etc/ha.d/haresources

Server1 ipaddr::10.10.140.200 Drbddisk::r0 Filesystem::/dev/drbd0::/home/share::ext2hadoop-hdfs

Comments:

Server1 Primary server name

10.10.140.200 External Service IP alias

DRBDDISK::R0 resource Drbddisk, parameter is R0

FILESYSTEM::/DEV/DRBD0::/HOME/SHARE::EXT2 Resource Filesystem,mount device/dev/drbd0 to/home/share directory, type ext2

Hadoop-hdfs File system resources

7, DRBD, Heartbeat, Hadoop joint adjustment

7.1 Creating Files and directories

1, in Server1 (Master node) on the DRBD and heartbeat running. Since Heartbeat is started, the virtual address 10.10.140.200 is assigned to the master node. To view with commands:

Using the command CAT/PROC/DRBD to see if Server1 and Server3 are communicating properly, you can see that Server1 and Server3 are primarily from nodes.

To see if the DRBD partition is mounted

2. See if Hadoop DFS starts, open: http://10.10.140.200:50070/dfshealth.jsp

3, upload files to Hadoop

Create a directory and upload a test file,

[root@server1hadoop-0.20.2]# bin/hadoop Dfs-mkdir TestDir

[Root@server1 hadoop-0.20.2]# bin/hadoop dfs-copyfromlocal/home/share/temp2 TestDir

To view files:

7.2 Primary Standby switch

1. Stop Heartbeat on Server1

[Root@server1/]# Service Heartbeat stop

Stopping high-availabilityservices:

[OK]

2, you can view the virtual IP has been switched to Server3 on the

3, verify the Server3 on the Hadoop file system

7.3 Main standby Machine switch again

1. Start Heartbeat on Server1

[Root@server1/]# Service Heartbeatstart

Starting High-availability Services:

2012/07/25_15:03:31 Info:resource is stopped

[OK]

2, view the virtual IP has been switched to Server1.

3, verify the Server1 on the Hadoop file system

8. Other issues

Treatment of 8.1 split brain

Split brain actually means that in some cases, the two nodes that cause DRBD are disconnected and run as primary. When DRBD a primary node to connect the other node to send information, if found that the other side is also primary state, then will immediately disconnect themselves, and that the current has occurred split brain, at this time he will record the following information in the System log: " Split-brain detected,droppingconnection! " When a split brain occurs, if the connection state is viewed, at least one of them is standalone, and the other may be standalone (if the split brain state is found at the same time), it may also be a wfconnection state.

1 node reboot, error message appears in DMESG:

Drbd0:split-brain detected, dropping connection!

Drbd0:self055f46ea3829909e:899ec0ebd8690afd:fea4014923297fc8:3435cd2baccecfcb

Drbd0:peer 7E18F3FEEA113778:899EC0EBD8690AFC:FEA4014923297FC8:3435CD2BACCECFCB

Drbd0:helper command:/sbin/drbdadm split-brain minor-0

Drbd0:meta connection shut down by peer.

2 in 203 view cat/proc/drbd,203 run as standalone state

version:8.3.0 (api:88/proto:86-89)

Git-hash:9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829build by Root@ost3, 2008-12-30 17:16:32

0:cs:standalone Ro:secondary/unknownds:uptodate/dunknown R---

ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0ua:0 ap:0 ep:1 wo:b oos:664

3 in 202 view cat/proc/drbd,202 run as standalone state

version:8.3.0 (api:88/proto:86-89)

git-hash:9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by Root@ost2, 2008-12-3017:23:44

0:cs:standalone Ro:primary/unknownds:uptodate/dunknown R---

ns:0 nr:0 dw:4 dr:21 al:1 bm:0 lo:0pe:0 ua:0 ap:0 ep:1 wo:b oos:68

4 Causal Analysis

Because the node reboot causes inconsistent data, and the configuration file does not configure the contents of the automatic fix error, it causes the handshake to fail and the data cannot be synchronized.

Split Brain has two solutions: manual processing and automatic processing.

Manual processing

1 Stop Heartbeat on 203

Heartbeat locks the resource, only to be released after stopping

/etc/init.d/heartbeat stop

2 Discard data for this resource on nodes that are secondary

On the OST3.

/SBIN/DRBDADM----discard-my-dataconnect r0

3 Reconnect to the node as primary secondary

On the Ost2.

/sbin/drbdadm Disconnect R0

/sbin/drbdadm Connect R0

Set the OST2 as the primary node

/SBIN/DRBDADM Primary R0

4 reboot on 203 heartbeat

/etc/init.d/heartbeat start

5 View 202 status CAT/PROC/DRBD, shown as connected, has returned to normal.

version:8.3.0 (api:88/proto:86-89)

git-hash:9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build Byroot@ost2, 2008-12-30 17:23:44

0:cs:connected ro:primary/secondary ds:uptodate/uptodate C R---

ns:768 nr:0 dw:800 dr:905 al:11 bm:10 lo:0 pe:0 ua:0 ap:0 ep:1wo:b oos:0

6 View 203 status CAT/PROC/DRBD, shown as connected, has returned to normal.

version:8.3.0 (api:88/proto:86-89)

git-hash:9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by Root@ost3, 2008-12-3017:16:32

0:cs:connected ro:secondary/primaryds:uptodate/uptodate C R---

ns:0 nr:768 dw:768 dr:0 al:0 bm:10 lo:0pe:0 ua:0 ap:0 ep:1 wo:b oos:0

Automatic Processing

The automatic processing policy is set up through the/etc/drbd.conf configuration and automatically processed when data inconsistency occurs. The automatic processing policy is defined as follows:

1 After-sb-0pri.

When the state of the two nodes is secondary, it can be automatically recovered through the AFTER-SB-0PRI policy.

1) Disconnect

Default policy, no AutoRecover, simple disconnect.

2) Discard-younger-primary

Automatically synchronizes from the primary node before the split brain occurs.

3) Discard-older-primary

Synchronizes data from a node that becomes primary when the split brain occurs.

4) Discard-least-changes

Synchronizes data from the node with the most blocks when the split brain occurs.

5) Discard-node-nodename

Auto Sync to name node

2 After-sb-1pri

When only one of the two node states is primary, the AFTER-SB-1PRI policy can be automatically recovered.

1) Disconnect

Default policy, no AutoRecover, simple disconnect.

2) consensus

Discard secondary or simply disconnect.

3) Discard-secondary

Discard secondary data.

4) CALL-PRI-LOST-AFTER-SB

Follow the AFTER-SB-0PRI policy.

3 After-sb-2pri

When the state of the two nodes is primary, it can be automatically recovered through the AFTER-SB-2PRI policy.

1) Disconnect

Default policy, no AutoRecover, simple disconnect.

2) violently-as0p

Follow the AFTER-SB-0PRI policy.

3) CALL-PRI-LOST-AFTER-SB

Follow the AFTER-SB-0PRI policy and discard the other nodes.

4 Configuring Automatic Recovery

Edit/etc/drbd.conf, find the Resource R0 section, configure the policy as follows, all nodes are identical.

#after-sb-0pri Disconnect;

After-sb-0pri discard-younger-primary;

#after-sb-1pri Disconnect;

After-sb-1pri discard-secondary;

#after-sb-2pri Disconnect;

After-sb-2pri CALL-PRI-LOST-AFTER-SB;

Phase

Reference: HADOOP_HDFS system dual-Machine hot standby program. pdf

DRBD installation configuration (master-slave mode)-detailed steps and illustrations. doc

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.