HDFS Core Principle

Source: Internet
Author: User

HDFS Core Principle2016-01-11 Du Yishu

HDFS (Hadoop Distribute file system) is a distributed filesystem

The file system is the disk space management service provided by the operating system, we only need to specify where to put the file, from which path to read the file sentence, do not care about how the file is stored on disk

What happens when the file requires more space than the native disk space?

One is to add a disk, but to a certain extent there is a limit

The second is to add the machine, the way to provide networked storage with remote shared directory, this way can be understood as a prototype of distributed file system, can put different files into different machines, space is not enough to continue to add machines, breaking the limit of storage space

But there are a number of problems with this approach

(1) Stand-alone load may be extremely high

For example, a file is hot, many users often read this file, so that the file is located on the machine access pressure very high

(2) Data not secure

If the machine on which a file is located fails, the file is inaccessible and the reliability is poor

(3) Difficult to organize documents

For example, to adjust the storage location of some files, it is necessary to see if the target machine is sufficient space, and need to maintain the file location, if the machine is very many, the operation is extremely complex

The solution of HDFs

HDFs is an abstraction layer, the bottom relies on a lot of independent servers, external to provide unified file management functions, for the user, feel like to operate a machine, can not feel the number of servers under HDFs

For example, when a user accesses the/a/b/c.mpg file in HDFs, HDFs is responsible for reading from the underlying corresponding server and then returning it to the user so that the user can only deal with HDFS and not care how the file is stored


Write File Example

For example, a user needs to save a file/a/b/xxx.avi

HDFs first divides this file, for example, into 4 pieces, and then puts them on separate servers.


This is a good thing, not afraid of the file is too large, and the pressure to read the file will not be all concentrated on a single server

But if a server is broken, the file is not read all.

HDFs makes multiple backups of each file block to ensure file reliability

Block 1: A B C
Block 2: A B D
Block 3:b C D

Block 4:a C D


The reliability of this file is greatly enhanced, even if a server is broken, it can read the file completely

At the same time, it also brings a great benefit, that is, to increase the file's concurrent access ability, for example, when multiple users read this file, read block 1,hdfs can choose from which server to read the block according to the server's busy level 1

Management of meta-data

What files are stored in HDFs?

What blocks are the files divided into?

On which server is each block placed?

......

These are called meta-data, which are abstracted into a directory tree, documenting these complex correspondence.

The metadata is managed by a separate module called NameNode

The real server that holds the file blocks is called DataNode.

So the process of user access to HDFs can be understood as:

DataNode, NameNode, HDFS, user

HDFS Benefits

(1) capacity can be linearly extended

(2) A copy mechanism, high storage reliability, increased throughput

(3) with Namenode, the user accesses the file only by specifying the path on the HDFs

HDFS Core Principle

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.