Sinsing Notes of the Hadoop authoritative guide fifth article HDFs basic concept

Source: Internet
Author: User

Each disk has a default chunk size, which is the smallest unit of disk data read and write. A file system built on a single disk manages blocks in the file system through disk blocks, which can be several times the size of a disk block. The file system is typically thousands of bytes, while the disk block is generally 512 bytes.

HDFs also has a block concept, but it is much larger and defaults to 64MB. Like file systems on a single disk, HDFs files are also partitioned into block-sized chunks (chunk) as separate storage units. However, files that are smaller than one block in HDFs do not occupy the entire block of space.

The block of HDFs is larger than the disk block, and its purpose is to minimize addressing overhead. If the block is set large enough, the time to transfer data from the disk can be significantly greater than the time it takes to locate the block's starting position. Thus, the time to transfer a file consisting of multiple blocks depends on the disk transfer rate. As the next generation of disk drive speeds increase, the block size will be set larger. However, this parameter is not too large, and the map task in MapReduce typically processes data in one block at a time, so if the number of tasks is too small (less than the number of nodes in the cluster), the job will run slower.

The abstraction of blocks in a distributed File system brings many benefits, as follows:

(1) The size of a file can be larger than the capacity of any disk in the network. All blocks of a file do not need to be stored on the same disk, so they can be stored using any disk on the cluster. In fact, although not common, for the entire HDFS cluster, you can store only one file that is full of all the disks in the cluster.

(2) The design of the storage subsystem is greatly simplified by using the block abstraction instead of the entire file as the storage unit. Simplification is the goal of all systems, but this is especially important for distributed systems with a wide range of failures. Setting the storage Subsystem control unit as a block simplifies storage management, because the size of the block is fixed, so it is relatively easy to calculate how many blocks a single disk can store. It also eliminates concerns about metadata, because blocks are only part of the data stored, and the metadata of the file, such as county information, does not need to be stored with the block, so that other systems can manage the metadata separately.

And blocks are well suited for data backup to provide data fault tolerance and availability. Copying each block to a few separate machines (by default, 3) ensures that data is not lost after a block, disk, or machine failure occurs. If a block is found to be unavailable, the system reads another copy from somewhere else, and the process is transparent to the user. A block that is lost due to damage or machine failure can be copied from other candidate locations to another machine that can function properly to ensure that the number of replicas is returned to normal levels.

We can use the fsck command in HDFs to display block information, such as Hadoop fsck/-files-blocks

The HDFs cluster has two types of nodes, and runs in Manager-worker mode, which is a Namenode (manager) and multiple datanode (workers). Instead, Namenode manages the namespace in the file system, which maintains the file system tree and all the files and directories in the entire tree. This information is permanently stored on the local disk as two files: the namespace image file and the edit log file. Namenode also records data node information for each block in each file, but it does not permanently store the block's location information because it is created by the data node at startup.

The client accesses the entire file system on behalf of the user through interaction with Namenode and Datanode, and the client provides a file system interface similar to the POSIX (Portable Operating system interface), so that users do not need to know Namenode and datanode to implement their functions when programming.

Datanode are the working nodes of a file system that store and retrieve blocks of data as needed (either by the client or Namenode), and periodically send a list of the blocks they store to Namenode.

Without Namenode, the file system will not be available. In fact, if the machine running the Namenode service is destroyed, all files on the filesystem will be lost because we do not know how to reconstruct the file based on the Datanode block.

The fault tolerance of namenode is important, and Hadoop provides two mechanisms for this:

(1) The first mechanism is to back up those files that make up the persistent state of the file system metadata. Hadoop can be configured to allow Namenode to persist metadata on multiple file systems. These write operations are synchronous in real time and are atomic operations. The general configuration is that we write a persistent state to the local disk while writing to a remotely mounted network file system (NFS).

(2) Another feasible method is to run an auxiliary namenode, but it cannot be used as a namenode. The important role of this auxiliary namenode is to periodically merge through the Edit log and namespace mirroring to prevent the editing log from being too large. This secondary namenode is typically run on a separate physical computer because it takes a significant amount of CPU time to perform the merge operation with the same amount of namenode memory. It saves a copy of the merged namespace mirror and is enabled when the Namenode fails. However, the state of the auxiliary namenode is always lagging behind the primary node, so it is inevitable that some data will be lost after the master node is completely invalidated. In this case, the Namenode metadata stored on NFS is typically copied to the secondary Namenode and run as the new primary namenode.

Sinsing Notes of the Hadoop authoritative guide fifth article HDFs basic concept

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.