Trouble analysis and automatic repair of Hadoop cluster hard disk

Source: Internet
Author: User
Tags perl script

Zhang, Haohao

Summary:

Hard drives play a vital role in the server because the data is stored in the hard disk, and as the manufacturing technology improves, the type of the hard disk is changing gradually. The management of the hard disk is the responsibility of the IaaS department, but it also needs to know the relevant technology as a business operation.

Some companies use LVM to manage the hard drive, this is easy to expand the capacity, but also some companies directly with bare disk to save data, the advantage is not due to LVM loss of part of the hard disk I/O speed. It needs to be managed in different ways depending on the scenario.

Hadoop cluster running Datanode service node is not recommended to do LVM, because there is no need, you think, Hadoop's HDFs is to do distributed big data, with the Hadoop company must have a lot of data, so the basic principle of HDFS is how much space on the hard disk to use how much space , not enough to add the machine or drive.

Hard disk failure in the server hardware failure accounted for the highest proportion, below I give ebay's failure report hardware components and corresponding failure rate status graph:


Can obviously see the hard disk failure rate of the highest, reached 84%, so for operation and maintenance, if you can count the failure cases in peacetime work, and write them as an automated repair script, it will have great significance.

If you can see a little further, you can think: can you make a hardware fault detection and repair system? (requires the cooperation of hardware manufacturers), I only do this here, if you can think of these, it means that you have been in the automated operation of the road.

Here is a sample of the most typical hard disk failure case, then give the hard disk failure of the general processing steps, and finally I will attach the hard disk Automation repair script link.

Environment:

This server is a Hadoop cluster in a slavenode, running on the Datanode and NodeManager services, a total of 12 pieces of data disk and a system disk, each data disk only made a partition, the file system is EXT4, did not do LVM.

Fault Detection:

One day our monitoring system reported an alarm, said a user's job run failed, because this user is very important user, so his every job run success or not, ran for how long we are monitoring, nonsense not much to say.

Check the job id:job_1431213413583_799324,failed job on the node:example.ebay.com, the corresponding log shows: "Error: Java.io.FileNotFoundException ... "and look further at the log and find that the block is not found/path/to/corrupt/file, I use the Hadoop fsck command to view the node where the corresponding block is located and found it on the corrupted.node.com.

Given the company's security, the above hostname and file name are hypothetical, everyone understand. Log in the problem of the machine, "df-h" first check the next hard disk condition:

#df-H

Filesystem Size used Avail use% mounted on

/dev/sda2 451G 20G 408G 5%/

Tmpfs 36G 0 36G 0%/dev/shm

/DEV/SDB1 1.9T 1.5T 354G 81%/HADOOP/1

/DEV/SDC1 1.9T 1.5T 357G 81%/HADOOP/2

/DEV/SDD1 1.9T 1.5T 351G 81%/HADOOP/3

/dev/sde1 1.9T 1.4T 402G 79%/HADOOP/4

/DEV/SDF1 1.9T 1.5T 371G 80%/HADOOP/5

/DEV/SDG1 1.9T 1.5T 375G 80%/HADOOP/6

/DEV/SDH1 1.9T 1.5T 388G 79%/HADOOP/7

/dev/sdi1 1.9T 1.5T 383G 80%/HADOOP/8

/dev/sdj1 1.9T 1.5T 394G 79%/HADOOP/9

/DEV/SDL1 1.9T 1.5T 377G 80%/HADOOP/11

/DEV/SDM1 1.9T 1.5T 386G 79%/HADOOP/12

Careful observation will find/HADOOP/10 no, the corresponding should be/dev/sdk1, that hard drive where to go?

Fault Analysis:

View with Fdisk:

#fdisk-L/DEV/SDK

Found this disk is GPT table, here interspersed with the small knowledge of the partition table, partition table most commonly used is mbr,gpt is a relatively new one, less useful.

Because the other hard disks are MBR partitioned tables, this hard drive should also be MBR.

Then look at the/var/log/messages and find some I/O error messages:

Jul 00:50:00 xxxxxxxxxxxxxx kernel:[8385006.016524] Buffer I/O error on device sdk1, logical block 1415594116

It is estimated that the hard drive appears logically bad.

Fault resolution:

The idea is to delete all the data on the/DEV/SDK and then repartition and format it.

There's no need to worry about data loss, because the Hadoop setting defaults to three blocks of block information on different nodes.

-delete the original partition table information with FDISK and create a new partition:

#fdisk/DEV/SDK
# D
# N
# p
# W

-Use the parted tool to convert the Partition1 partition table to MBR:

#parted/DEV/SDK1
#mklabel Msdos
#quit

-delete reserved 5% of the disk space:

#tune2fs-M 1/DEV/SDK1

-Format the partition with EXT4:

#mkfs. Ext4/dev/sdk1

-View disk information:

#fdisk-L/DEV/SDK

disk/dev/sdk:2000.4 GB, 2000398934016bytes

255 heads, Sectors/track, 243201cylinders

Units = Cylinders of 16065 * 8225280bytes

Sector size (logical/physical): bytes/512 bytes

I/O size (minimum/optimal): bytes/512 bytes

Disk Identifier:0xea6649b8

Device Boot Start End Blocks Id System

/DEV/SDK1 1 243201 1953512001 Linux

-All normal, view/etc/fstab:

.......

LABEL=/HADOOP09/HADOOP/9 Ext4defaults,noatime,nodiratime,noauto 0 2

LABEL=/HADOOP10/HADOOP/10 Ext4defaults,noatime,nodiratime,noauto 0 2

........

-Note the "noauto" option, if you use "Mount-a", the system does not automatically identify the file system type and does not mount the directory automatically.

So you can't use "mount-a", but you should mount it manually:

#mount Label=/hadoop10/hadoop/10-o defaults,noatime,nodiratime,noauto-t Ext4

-then use Fdisk to view:

#df-H

......

/dev/sdk1 1.8T 1.9G 1.8T 1%/HADOOP/10

Here the hard drive failure is solved completely.

? new hard drive to the required steps (no interaction, can be written as script):

1 Delete partition1 in/dev/sda1:

#parted--script--/dev/sda1 RM 1

2 Create a partition table for the MSDOS type on/DEV/SDA1:

#parted--script/dev/sda1 Mklabel Msdos

3 Create Partition1 in/DEV/SDA1:

#parted--script--/dev/sda1 Mkpart primary 1-1

4 Format the/dev/sda1 with the Ext4 file system:

#mkfs. Ext4-l $label-n 61050880-m 1-o sparse_super/dev/sda1

"-N" indicates the number of inode, this value if not specified, the system will default to set it as small as possible, if the hard disk small files, it may cause the inode is not enough to use the situation. Hdfs/hadoop designed to deal with large files, the default block size is 64MB, is the Linux file system default (4KB) 16,384 times times, but also considering that a hard disk is not all HDFs files, there will be many log files, etc., so in the setup Inode The amount of time is best judged by experience, or the insurance point you can take the following formula to calculate:

Inode number = (hard disk size/4kb) * 10

"-M 1" means to reserve 1% of the hard disk space, the default is reserved 5%, the reserved space can be used when the hard disk is exhausted, the root user has the opportunity to operate the hard disk;

"-O sparse_super" means saving hard disk space with fewer superblock backup copies.

5 Prohibit the E2fsck file system from self-test at boot time on/dev/sda1:

#tune2fs-C 0-i 0/dev/sda1

"-C 0" means that the system does not call the E2FSCK scan hard disk, no matter how many times the drive is mount.

If the hard drive is not self-checking for a long time, it may cause data loss. For HDFs, the default is to keep 3 copies of the blocks file, so even if the loss of a piece of data, there are 2 data, when the blocks save less than 3, HDFs will re-find a new server to do backup, so as to maintain 3 copies of the data, So in HDFs the data is relatively safe, hard disk scanning is not so important.

Finally I share a Perl script that automates the repair of a hard drive:

Https://github.com/zhanghaohao/DiskFormat



Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Trouble analysis and automatic repair of Hadoop cluster hard disk

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.