Hdfs-hadoop Distributed File System introduction

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A Profile

Hadoop Distributed File system, referred to as HDFs. is part of the Apache Hadoop core project. Suitable for Distributed file systems running on common hardware. The so-called universal hardware is a relatively inexpensive machine. There are generally no special requirements. HDFS provides high-throughput data access and is ideal for applications on large-scale datasets. And HDFs is a highly fault-tolerant system.

The following is a structure diagram of HDFS.

As can be seen from the above figure, HDFs is also based on the master-slave structure (master/slaver) component, mainly consists of the following parts.

-namenode

-datanode

-sencondary NameNode

1.Namenode is a central server that manages the file system's namespace (namespace) and client access to files. An HDFS system has only one namenode. is the arbiter and manager of all HDFs metadata.

2.DataNode is the node in the cluster. is responsible for managing the storage on the node on which it resides. HDFs exposes the namespace of the file system, allowing users to store data in the form of files. Internally, a file is actually partitioned into one or more blocks of data that are stored on a set of Datanode. Namenode performs namespace operations on the file system, such as opening, closing, renaming files or directories. It is also responsible for determining the mapping of data blocks to specific datanode nodes. The Datanode is responsible for handling read and write requests from the file system client. The creation, deletion and replication of data blocks under the unified dispatch of Namenode.

3.Sencondary Namenode is the auxiliary namenode work. Namenode appends changes to the file system to a log file (edits) on the local file system. When a namenode is started, it first reads the state of HDFs from an image file (Fsimage) and then applies the edits operation in the log file. It then writes the new HDFs State (fsimage) and begins normal operation with an empty edits file. Because Namenode merges fsimage and edits only during the startup phase, the log files can become very large over time, especially for large clusters. Another side effect of a log file that is too large is that the next Namenode start will take a long time. Secondary Namenode regularly merges fsimage and edits logs, controlling the edits log file size to a limit. Because memory requirements and Namenode are at an order of magnitude, usually Secondarynamenode and Namenode run on different machines.

At the same time, Sencondary Namenode can also be used as a cold backup for Namenode.

Two HDFs and traditional Centralized database comparison

The main points are the following differences

1.HDFS supports massive amounts of data storage, and a typical file size on HDFs typically is in G bytes to T bytes. In the case of RDBMS, the query processing speed becomes slow and requires high machine performance when the amount of data reaches a certain limit.

The 2.HDFS uses streaming data read mode, while streaming data read mode mainly depends on the data transmission rate. The traditional B-tree used in the RDBMS, the B-tree structure employed in the RDBMS, is limited to the proportion of addressing. In general, reading streaming patterns for large data volumes are much more efficient than B-trees.

3.HDFS is suitable for batch processing of the entire data set that needs to be analyzed. It is suitable for one-time write, multiple read applications. And does not support transactions. The RDBMS is suitable for point query and update, and after the dataset is indexed, the database can provide fast data retrieval and a small amount of data updates. It is more suitable for continuously updated datasets.

Three Key Features of HDFs

1. File System namespace (namespace)

HDFs supports the traditional hierarchical file organization structure. A user or application can create a directory and then save the file in those directories. The hierarchical structure of the file System namespace is similar to most existing file systems: Users can create, delete, move, or rename files.

Namenode is responsible for maintaining the namespace of the filesystem, and any modifications to the file System namespace or attribute will be recorded namenode. The application can set the number of copies of files that HDFs saves. The number of copies of a file is called the copy factor of the file, and this information is also saved by Namenode.

For example
$ bin/hadoop fs-mkdir-p/user/data/input→ Create directory on HDFs
$ bin/hadoop fs-put <localfile path>/user/data/i nput→ copy files to HDFs
$ bin/hadoop fs-cat/user/data/output/*→ view files on HDFs

2. Data replication

HDFs is designed to reliably store oversized files across machines in a large cluster. It stores each file as a series of data blocks, except for the last one, all of which are of the same size. For fault tolerance, all data blocks of the file will have replicas. The block size and replica coefficients for each file are configurable. An application can specify the number of copies of a file. The replica coefficients can be specified at the time the file is created, or can be changed later. Files in HDFs are write-once, and strict requirements can only have one writer at any time.

Namenode fully manages the replication of the data block, which periodically receives heartbeat and block status reports (Blockreport) from each datanode in the cluster. Receiving a heartbeat signal means that the Datanode node is working properly. The Block status report contains a list of all the data blocks on the Datanode.

☆ Copy Storage Policy

HDFS employs a strategy called rack-aware (rack-aware) to improve data reliability, availability, and utilization of network bandwidth. Large HDFs instances typically run on a cluster of computers spanning multiple racks, and communication between two machines on different racks needs to go through the switch. In most cases, the bandwidth between two machines in the same rack is greater than the bandwidth between two machines in a different rack. With a rack-aware process, Namenode can determine the rack ID that each datanode belongs to. A simple strategy without optimization is to store replicas on different racks. This effectively prevents the loss of data when the entire rack fails and allows the bandwidth of multiple racks to be fully utilized when reading data. This policy setting distributes replicas evenly across the cluster and facilitates load balancing in the event of component failure. However, because a write operation of this strategy requires the transfer of data blocks to multiple racks, this increases the cost of writing. The default storage policy for HDFs is to place one copy on the node of the local rack, one copy on the other node in the same rack, and the last copy on the nodes of the different racks. This strategy reduces the data transfer between racks, which improves the efficiency of write operations. Also, to reduce overall bandwidth consumption and read latency, HDFs tries to get the reader to read the most recent copy from it. If there is a copy on the same rack of the reader, the copy is read.

3. Fault-tolerant mechanism

The main goal of HDFs is to ensure the reliability of data storage even in the event of an error. The three common error cases are: Namenode error, datanode error and network partitions.

☆datanode Error and Network fragmentation

Each datanode node sends a heartbeat signal periodically to the namenode. Datanode errors and network fragmentation can cause some datanode to lose contact with Namenode. Namenode detects this by missing heartbeat signals and marks these recent no-datanode heartbeat signals as down and no longer sends new IO requests to them. Any data stored on the outage Datanode will no longer be valid. Datanode outages can cause some data blocks to have a copy factor lower than the specified value, Namenode continuously detects the data blocks that need to be replicated, and initiates the copy operation as soon as it is discovered. You may need to re-replicate in the following situations:

A Datanode node fails

A copy has been damaged

The file's copy factor is increased.

Also, data blocks obtained from a datanode can be corrupted, possibly caused by a Datanode storage device error, a network error, or a software bug. The HDFS client software implements a checksum (checksum) check of the contents of the HDFs file. When a new HDFs file is created, the checksum is computed for each chunk of the file, and officer is stored in the same HDFs namespace as a separate hidden file. When the client obtains the contents of the file, it verifies that the data obtained from Datanode matches the checksum in the corresponding checksum file, and if it does not match, the client can choose to get a copy of the block from the other datanode.

☆namenode Error

Fsimage and Editlog are the core data structures of HDFS. If these files are corrupted, the entire HDFs instance will be invalidated. Thus, Namenode can be configured to support the maintenance of multiple copies of Fsimage and Editlog. Any modifications to fsimage or Editlog will be synchronized to their copy. When Namenode restarts, it selects the most recent full fsimage and editlog to use.

Four Work flow for HDFs

Divided into write operations and read operations. Let's take a look at the write operation

The client has a Filea file to upload to HDFs. The copy factor for HDFs is 3.HDFS distributed on 3 racks.

Detailed Step Analysis

1.Client Filea Press the default block size (64M). Divided into two pieces, block1 and Block2;
2. Upload a file to the Namenode communication request Filea,namenode Check to see if the destination file already exists and the parent directory exists
3.namenode returns whether it can be uploaded
4.client request the first block1 to which Datanode server to transfer
5.namenode returns 3 Datanode servers
Block1:host1,host4,host5
6.client starts uploading the first block to Host1 (the data is first read from the disk into a local memory cache), in packet units.
After receiving the 7.HOST1, the first package is sent to Host4, and the client sends a second package to host1;
8.HOST4 receives the first package, sends it to HOST5 and receives the second package from Host1.
9. And so on, as shown in the red line of the figure, until the block1 is sent.
10.HOST1,HOST4,HOST5 sends a notification to Namenode,host1 to the client, saying "the message has been sent out." As shown in the Yellow line of the figure.
11.client receives the message from Host1, sends a message to Namenode, notifies the Block1 that the operation is complete. As shown in the Blue Line
12. When a block transfer is complete, client requests Namenode upload Block2 again.
13.namenode returns 3 Datanode servers
Block2:host3,host9,host8
14.client started uploading data to Host3,host9 and HOST8. Process with Block1.

2. Read operation

Detailed step analysis with Namenode communication query metadata, find the file block where the Datanode server pick a datanode (nearest principle, then random) server, request to establish a socket stream Datanode start sending data client to receive in packet unit, Cache locally and then write to the destination file

Five. Application Scenarios

☆HDFS provides high-throughput application data access capabilities for applications with large datasets, and here are some common scenarios.

1. Data-intensive Parallel computing: The amount of data is very large, but computing relatively simple parallel processing, such as large-scale web information search;

2. Compute-intensive parallel computing: The amount of data is relatively not large, but the calculation of more complex parallel computing, such as 3D modeling and rendering, weather forecasting and scientific calculation;

3. Data-dense and compute-dense hybrid parallel computing, such as 3D movie rendering.

☆hdfs has the following limitations during use:

1.HDFS is not suitable for storing large amounts of small files, because namenode the file system's metadata in memory, so the number of files stored is limited by the namenode memory size;

2.HDFS is suitable for high throughput, but not for low time delay access;

3. Streaming read, not suitable for multiple users to write a file (a file can only be written by one client), as well as arbitrary location write (not support random write); 4.HDFS is more suitable to write once, read multiple times of the application scenario.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hdfs-hadoop Distributed File System introduction

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hdfs-hadoop Distributed File System introduction

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support