One of the two main cores of Hadoop: HDFs Summary

Last Update:2015-04-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

What is HDFs?

Hadoop Distributed File System (Hadoop distributed filesystem)

is a file system that allows files to be shared across multiple hosts on a network,

Allows multiple users on multiple machines to share files and storage space.

Characteristics:

1. Permeability. Let's actually access the file through the network action, from the program and the user's view,

It's like accessing a local disk in general.

2. Fault tolerance. Even if some nodes in the system are offline, the system can continue to function as a whole

Without any data loss.

Applicable scenarios:

Applies to the case that writes multiple queries at once, does not support concurrent write situations, and small files are inappropriate.

The architecture of HDFs

Master-Slave structure

Master node, only one: Namenode

From the node, there are a number of: Datanodes

Namenode is responsible for:

Receiving user Action requests

Maintaining the directory structure of the file system

Managing the relationship between a file and a block, the relationship between block and Datanode

Datanode is responsible for:

Storing files

Files are partitioned into blocks and stored on disk

To keep your data secure, your files will have multiple copies

NameNode (can be understood as the boss)

is the management node for the entire file system. It maintains the file directory tree for the entire file system,

The meta-information for the file/directory and a list of data blocks for each file. Receives the user's action request.

Files are included (these three are stored in a Linux file system):

Fsimage: A metadata image file that stores Namenode memory metadata information for a certain period of time.

Edits: Operation log file.

Fstime: Time to save last checkpoint

Working characteristics:

1.Namenode always saves metedata in memory for processing "read requests".

2. When a "write request" arrives, Namenode will first write editlog to disk,

The log is written to the edits file, and after a successful return, the memory is modified and returned to the client.

3.Hadoop maintains a fsimage file, which is the metedata image in Namenode,

But Fsimage is not always consistent with the Metedata in Namenode memory,

Instead, the content is updated every once in a while by merging the edits file. Secondary Namenode

It is used to merge fsimage and edits files to update the metedata of Namenode.

DataNode (can be understood as younger brother)

A storage service that provides real-world file data.

The most basic storage unit: block (File block), the default size is 64M

Secondary NameNode (can be understood as boss's assistant)

Solution for HA (high Available). However, hot spares are not supported. Configure

The default is installed on the Namenode node, but this ... Not Safe!

(in production environment, it is recommended to install separately)

Execution process:

Download the metadata information (fsimage,edits) from the Namenode, and then combine the two to generate

New Fsimage, save locally, and push it to Namenode, replacing the old fsimage.

Work Flow:

1.secondarynamenode notification Namenode Switch edits file

2.secondarynamenode get fsimage and edits from Namenode (via HTTP)

3.secondarynamenode loads the fsimage into memory and then starts merging edits

4.secondarynamenode send the new fsimage back to Namenode

5.namenodenamenode replaces the old fsimage with the new fsimage

The entire architecture of Hadoop is built on top of RPC

RPC (Remote Procedure call), (RPC in client/server mode)

A remote procedure call protocol, which is a request service from a remote computer program over a network,

Without the need to understand the protocols underlying network technology.

Specific implementation process:

First, the client call process sends a call message with process parameters to the service process,

Then wait for the message to be answered. On the server side, the process stays asleep until the call information arrives.

When a call arrives, the server obtains the process parameters, evaluates the results, sends the reply message,

And then wait for the next call message,

Finally, the client calls the process to receive the reply information, obtains the process result, and then invokes execution to proceed.

The object provided by the server must be an interface, interface extends Versioinedprotocal

The methods in the object that the client can have must be in the interface of the object.

http://m.oschina.net/blog/212102

One of the two main cores of Hadoop: HDFs Summary

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

One of the two main cores of Hadoop: HDFs Summary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

One of the two main cores of Hadoop: HDFs Summary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support