Distributed File System-HDFS

Last Update:2014-08-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

HDFS

The core of hadoop is HDFS and mapreduce. HDFS is developed based on the GFS design concept.

HDFS stands for hadoop distributed system. HDFS is designed for stream-based access to large files. It is applicable to hundreds of MB, GB, and TB of data that can be read multiple times at a time. It is not suitable for low-latency data access, a large number of small files, simultaneous writing and Arbitrary File modification.

Advantages:

1) suitable for storing very large files

2) suitable for stream data reading, that is, suitable for "write only once, read multiple times" data processing mode

3) suitable for deployment on cheap machines

Disadvantages:

1) It is not suitable for storing a large number of small files because it is limited by the namenode memory size.

2) It is not suitable for Real-time Data Reading. high throughput is contrary to real-time performance. HDFS selects the former.

3) it is not suitable for scenarios where data needs to be modified frequently.

Data Block:

Each disk has a default data block size of 521 bytes. This is the smallest unit for Data Reading and Writing on the disk. HDFS also has the concept of block, but it is much larger and has 64 MB. Like a file system on a single disk, files on HDFS are also divided into multiple blocks of the block size. But it is still different. For example, a file smaller than a block size in HDFS does not occupy the space of the entire block.

Benefits of rapid abstraction in Distributed File Systems:

1) The size of a file may be larger than the capacity of any disk on the network. All file blocks do not need to be stored on the same disk, therefore, any disk in the cluster can be used for storage, but HDFS stores a file.

(Isn't that exactly what we want)

2) Abstract blocks are used as storage units, simplifying the design. It also facilitates block backup.

Shows the architecture of HDFS.Master/SlaveThe architecture consists of the following four parts:

1. Client

The client is the code we implement by calling the interface.

2. namenode

The entire HDFS cluster has only one namenode, which stores the metadata information of the entire cluster file separately. The information is stored on the local disk using the fsimage and editlog files. The client can find the corresponding files through the metadata information. In addition, namenode monitors the health status of datanode. Once a datanode exception is found, it is kicked out and copied to other datanode.

3. Secondary namenode

Secondary namenode is responsible for regularly merging the fsimage and editlog of namenode. Note that it is not a hot standby of namenode, so namenode is still a single point of failure. It mainly aims to share part of namenode work (especially memory consumption, because memory resources are very precious to namenode ).

4. datanode

Datanode is responsible for the actual storage of data. When a file is uploaded to an HDFS cluster, it is distributed in each datanode Based on blocks. To ensure data reliability, each block is written to multiple datanode at the same time (3 by default)

How do I store files?

The files to be stored are divided into multiple blocks and stored in datanode. Block principle: Except for the last data block, the size of other data blocks is the same, generally 64 MB or 128 MB. Each data block has a copy (usually 3), which wastes more space.

Copy storage: In most cases, the copy coefficient is 3. The HDFS storage policy stores a copy on the node of the local rack, and one copy is placed on another node of the same rack ,.

HDFS communication protocol

All HDFS communication protocols are built on TCP/IP. The client connects to namenode through a configurable port and interacts with namenode through clientprotocol. Datanode uses datanodeprotocol to interact with namenode. Furthermore, datanode periodically sends heartbeat and data blocks to the namenode to maintain communication with the namenode. The information in the data block report includes the attributes of the data block, that is, the file where the data block belongs, the data block ID, and the modification time. The Dating between the datanode and the data block of namenode is established through the data block Report of datanode when the system starts. Abstract A Remote Call (RPC) from clientprotocol and datanodeprotocol. in design, namenode does not actively initiate RPC, but responds to RPC requests from clients and datanode.
(When we configure the hadoop file, we can see that all accesses are made through an IP address and a port number)

The file reading process is as follows:

Uses the client development library client provided by HDFS to initiate RPC requests to remote namenode;
Namenode will return part or all of the block lists of the files as needed. For each block, namenode will return the datanode address with this block copy;
The Client client selects the closest datanode to the client to read the block. If the client itself is a datanode, the data is directly obtained from the local machine.
After reading the data of the current block, close the connection with the current datanode and find the best datanode for reading the next block;
After the block in the list is read and the file reading is not completed, the client Development Library will continue to obtain the next block list from namenode.
After reading a block, the system performs checksum verification. If an error occurs when reading datanode, the client notifies namenode and continues reading from the next datanode that has the block copy.

The file writing process is as follows:

Uses the client development library client provided by HDFS to initiate RPC requests to remote namenode;
Namenode checks whether the file to be created already exists and whether the Creator has the permission to perform operations. If the operation succeeds, a record is created for the file. Otherwise, an exception is thrown to the client;
When the client starts writing files, the Development Library splits the files into multiple packages and internally manages these packages in the form of data queue "data queue, apply for a new blocks from namenode to obtain the appropriate datanodes list for storing replicas. The size of the List depends on the replication settings in namenode.
Start to write packet into all replicas in the form of pipelines. The development library writes packet to the first datanode as a stream. This datanode stores the packet and passes it to the next datanode in this pipeline until the last datanode, this data writing method is in the pipeline.
After the last datanode is successfully stored, an ACK packet is returned and transmitted to the client in pipeline. The "Ack queue" is maintained in the client's Development Library ", after receiving the ACK packet returned by datanode, the corresponding packet is removed from "Ack queue.
If a datanode fails during transmission, the current pipeline will be closed and the faulty datanode will be removed from the current pipeline, the remaining blocks will continue to be transmitted as pipelines in the remaining datanode, and a new datanode will be assigned to the namenode to keep the number of replicas set.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Distributed File System-HDFS

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Distributed File System-HDFS

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support