View Distributed File System Design requirements from HDFS

Source: Internet
Author: User
View Distributed File System Design requirements from HDFS

Distributed File systems are designed to meet the following requirements: transparency, concurrency control, scalability, fault tolerance, and security requirements. I would like to try to observe the design and implementation of HDFS from these perspectives, so that we can see more clearly the application scenarios and design concepts of HDFS.

The first is transparency. According to the standard of open distributed processing, there will be eight kinds of transparency: access transparency, location transparency, concurrency transparency, replication transparency, fault transparency, mobile transparency, performance transparency, and scaling transparency. For distributed file systems, the most important thing is to meet five transparency requirements:

1) access transparency: users can access local files and remote file resources through the same operations. HDFS can do this. If HDFS is set to a local file system instead of a distributed file system, the program that reads and writes distributed HDFS can read and write local files without modification. The configuration file must be modified. It can be seen that the access transparency provided by HDFS is incomplete. After all, HDFS is built on Java and cannot modify the Unix kernel like NFS or AFS, at the same time, the local and remote files are processed in the same way.

2) location transparency: A single file namespace can be relocated without changing the path name. The HDFS cluster only has one namenode to manage the file system namespace. The file blocks can be re-distributed and replicated. The blocks can increase or decrease the number of copies, and the copies can be stored across racks, all of this is transparent to the client.

3) The transparency of movement is similar to the transparency of location. Files in HDFS are often copied or moved due to node failure, increase, replication factor change, or rebalancing, the client and client programs do not need to be changed. The edits log file of namenode records these changes.

4) Performance transparency and scalability transparency: HDFS is designed to build Distributed File System clusters on large-scale cheap machines. There is no doubt about scalability. For performance, refer to some benchmarks on its homepage.


The second is concurrency control. The client's reading and writing of files should not affect the reading and writing of the same file by other clients. To achieve a copy semantics similar to that of a single file in the native file system, the distributed file system needs to make complex interactions, such as using timestamps or similar callback commitments (similar to RPC callbacks from servers to clients, during file update, the callback has two statuses: valid or canceled. The client checks the status of the callback commitment to determine whether the files on the server have been updated ). HDFS does not do this. Its mechanism is very simple, and only one write client is allowed at any time. The file is not changed after it is created and written, its model is write-one-read-committed, one write, multiple read. This is consistent with its application scenarios. The HDFS file size is usually from megabytes to terabytes, and the data is not modified frequently. The most common reason is that the data is read and processed sequentially, few random reads, so
HDFS is ideal for mapreduce frameworks or web crawlers. The size of HDFS files also determines that its client cannot cache hundreds of files that are commonly used as in some distributed file systems.


Third, the file replication function. A file can be represented as multiple copies of its content in different locations. This brings two benefits: access to the same file can be obtained from multiple servers to improve service scalability. In addition, it improves the fault tolerance capability and a copy is damaged, you can still obtain the file from other server nodes. The block in the HDFS file will be backed up for fault tolerance. The default value is 3 based on the configured replication factor. The copy storage policy is also very exquisite. One node is placed on the local rack, one is placed on the other node in the same rack, and the other is placed on other racks. This prevents the loss of copies due to faults to the maximum extent. In addition, when HDFS reads files, it will give priority to reading blocks from the same rack or even nodes in the same data center.


Fourth, the hardware and operating system are heterogeneous. Because HDFS is built on the Java platform, its cross-platform capabilities are beyond doubt, thanks to the encapsulated file I/O system of the Java platform, HDFS can implement the same client and server programs on different operating systems and computers.


Fifth, fault tolerance. In distributed file systems, it is important to ensure that file services can be used normally when a problem occurs on the client or server. The fault tolerance capability of HDFS can be divided into two aspects: File System Fault Tolerance and hadoop fault tolerance. File System Fault tolerance can be achieved through the following methods:

1) heartbeat detection is maintained between namenode and datanode. When the heartbeat packet sent by datanode is not normally received by namenode due to network faults or other reasons, namenode will not distribute any new IO operations to the datanode. The data on this datanode is considered invalid. Therefore, namenode will detect whether there are fewer copies of the file block than the set value, if the value is smaller than the value, a new copy is automatically copied and distributed to other datanode nodes.

2) Check the integrity of the file block. HDFS records the checksum of all blocks of each newly created file. When you retrieve these files in the future, when you obtain the block from a node, you will first check whether the checksum is consistent. If they are inconsistent, you will obtain the copy of the block from other datanode nodes.

3) The Cluster load balancing may result in uneven data distribution due to node failure or increase. When the free space of a datanode node exceeds a critical value, HDFS automatically migrates data from other datanode.

4) The fsimage and edits log files on namenode are the core data structure of HDFS. If these files are damaged, HDFS will become invalid. Therefore, namenode can be configured to support maintenance of multiple copies of fsimage and editlog. Any modification to fsimage or editlog will be synchronized to their copies. It always selects the latest consistent fsimage and editlog. Namenode exists at a single point in HDFS. If the machine where namenode is located is incorrect, manual intervention is required.

5) deleting a file does not immediately remove the namespace from the namenode, but is stored in the/trash directory and can be recovered at any time until the specified time is exceeded.
Besides the fault tolerance of hadoop, hadoop supports upgrade and rollback. When hadoop software is upgraded with bugs or incompatibility, it can be restored to the old hadoop version through rollback.

The last is security. The security of HDFS is relatively weak. Only simple file license control similar to the UNIX file system will be implemented. In future versions, Kerberos verification systems similar to NFS will be implemented.


Summary: HDFS is not suitable for general distributed file systems. It is weak in concurrency control, cache consistency, and small file read/write efficiency. However, it has its own clear design goal: to support large data files (MB to TB), and these files are mainly read in sequence, with the goal of high throughput of File Reading, it is closely integrated with the mapreduce framework.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.