Seven of the Hadoop series: Distributed File System HDFS (2)

Source: Internet
Author: User

1. Access the HDFS File System

HDFS is a file system that works in a user space. Its tree file system is independent, HDFS cannot be accessed on the directory tree of the current operating system like a file system that traditionally works as a kernel space, traditionally, commands for file or directory management, such as ls and cat, cannot be used normally. To access files in HDFS, you must use HDFS APIs or command line tools provided by hadoop.

1.1 HDFS user interface

(1) hadoop dfs command line interface;
(2) hadoop dfsadmin command line interface;
(3) web interfaces;
(4) hdfs api;

The first three methods are described in detail later. Regardless of the mode of interaction with the HDFS file system, the process of reading or writing data is the same. The write and read operations are described in detail below.

1.2 save data to the HDFS File System

When you need to store files and write data, the client will first initiate a namespace update request to the Name node. The Name node checks whether the user's access permissions and files already exist. If no problem exists, the namespace selects an appropriate data node and assigns an idle data block to the client program. The client program directly sends the stored data to the corresponding data node. After the storage is completed, the data node copies multiple copies of the data block to other nodes according to the instructions of the Name node.

650) this. width = 650; "src =" http://www.bkjia.com/uploads/allimg/131228/02531KD6-0.jpg "border =" 0 "alt =" "/>

(1) Before saving data to an HDFS cluster, the HDFS client must first know the "block size" and "Replication Factor" used by the target file system, that is, the number of copies to be saved for each block )". Before submitting data to HDFS, The HDFS client splits the files to be saved by block size and sends Block Storage requests to the Name node one by one, the client will require the Name node to provide the same number of idle blocks as the replication factor. Here we assume there are three;

(2) The Name node must identify at least three data nodes with available idle blocks (same as the replication factor ), the addresses of the three nodes are returned to the client in the order of distance from the client;

(3) The client only initiates a data storage request to the nearest data node (assumed to be DN1). When the nearest data node is stored, it will copy the data block to one of the remaining data nodes (assumed as DN2). After the transfer is complete, DN2 is responsible for synchronizing data blocks to the last data node (assumed as DN3). This is also known as "replication pipeline );

(4) When all three data nodes are stored, they will notify the named nodes of the "Storage completed (DONE)" information. Then, the Name node notifies the client that the storage is complete;

(5) The client stores all the remaining data blocks in this way, and notifies the Name node to close the file after all the data blocks are stored, the Name node then stores the metadata of the file to persistent storage;

1.3 read data from HDFS

HDFS provides a POSIX-based access interface. All data operations are transparent to the client program. When the Client program needs to access data in HDFS, it first establishes a connection with the TCP port listened by the name Node Based on the TCP/IP Protocol, and then uses the Client Protocol (Client Protocol) initiate a file read request, and then the Name node returns the block ID of the relevant file based on the user request and the data node that stores the data block. Next, the client initiates a request to the port listened to by the corresponding data node and retrieves the required data block.

 

650) this. width = 650; "src =" http://www.bkjia.com/uploads/allimg/131228/02531GF1-1.jpg "border =" 0 "alt =" "/>

(1) The client requests access to a file from the Name node;

(2) The Name node returns two lists to the client: (a) all data blocks contained in the file, and (B) the list of data nodes where each data block of the file is located;

(3) The client reads each data block from the nearest data node in the storage list, and then completes local merge;

 

2. Availability of Name nodes

The process described in the previous section shows that the downtime of the Name node will cause all data in the HDFS file system to become unavailable, if the namespace image file or the editing log file on the Name node is damaged, the entire HDFS cannot be rebuilt and all data will be lost. Therefore, for data availability, reliability, and other purposes, additional mechanisms must be provided to ensure that such failures do not occur. Hadoop provides two solutions.

The simplest way is to store the persistent metadata information on the Name node in real time multiple copies in different storage devices. Hadoop Name nodes can use multiple different namespace storage devices through attribute configuration, while Name nodes synchronize write operations on multiple devices. When the Name node fails, you can load an available namespace image copy and edit the log copy on a new physical host to recreate the namespace. However, depending on the log size and cluster size, this reconstruction process may take a long time.

Another method is to provideSecond Name Node(Secondary NameNode). The second Name node does not really assume the Name node role. Its main task is to periodically merge the editing logs into the namespace image file to avoid the log being edited too large. It runs on an independent physical host and requires memory resources as large as the Name node to merge files. In addition, it saves a copy of The namespace image. However, according to its working mechanism, the second-Name node lags behind the master node, so some data loss is inevitable when the Name node fails.

Although the above two mechanisms can avoid data loss in the largest program, but they do not have high availability, the Name node is still a single point of failure, because after its downtime, all data cannot be accessed, and all MapReduce jobs dependent on this HDFS will also be suspended. Even if you have backed up the namespace image and edited the log, it takes a long time to recreate the Name node on a new host and complete the receipt of the block information report from each data node. In some application environments, this may be unacceptable. Therefore, Hadoop 0.23 introduces a high availability mechanism for Name nodes-setting two Name nodes to work in the "master-slave" model, when the master node fails, all its services will be immediately transferred to the slave node. For more information, see the official manual.

In a large-scale HDFS cluster, HDFS Federation mechanism is introduced in Hadoop 0.23 to avoid Name nodes becoming a system bottleneck. In the HDFS Federation, each Name node manages a namespace volume (namespace volume) consisting of namespace metadata and block pools containing all block-related information ), the namespaces on each Name node are isolated from each other. Therefore, the damage to one name node does not affect the continued service of Other Name nodes. For more information, see the official manual.

3. Fault Tolerance of HDFS

The three most common scenarios of HDFS faults are node faults, network faults, and data damages.

In Hadoop that does not have the Name node HA function, the Name node failure will lead to the entire file system going offline. Therefore, the Name node failure has very serious consequences. For specific solutions, see "availability of Name nodes.

In an HDFS cluster, each data node periodically sends HEARTBEAT information to the Name node (every 3 seconds) to notify the node of its "health" status. Correspondingly, if the Name node does not receive the heartbeat information of a data node within 10 minutes, it is deemed to have failed and removed from the list of available data nodes, whether the fault is caused by the node or network problems.

The data transmission between the client and the data node is based on the TCP protocol. Each time the client sends a packet to the data node, the data node returns a response packet to the client. Therefore, if the client fails to receive the Response Message from the data node after several retries, it will discard the data node and use the second data node in the list provided by the name node.

Noise during network transmission may lead to data crash. To avoid data storage errors on the data node, the client sends the data to the data node together with the data checksum, the data node stores the checksum together with the data. In an HDFS cluster, each data node periodically reports all its own data block information to the Name node. However, before sending information about each data block, it checks whether the data block is faulty Based on the checksum. If an error occurs, the data node no longer notifies the Name node of its possession of the data block, the data block of the data node is damaged.

References:Data-Intensive Text Processing with MapReduceHadoop in prictise
Hadoop OperationsHadoop Documentation

This article is from the "Marco Education" blog. For more information, contact the author!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.