One of the two main cores of Hadoop: HDFs Summary

Source: Internet
Author: User

What is HDFs?

Hadoop Distributed File System (Hadoop distributed filesystem)

is a file system that allows files to be shared across multiple hosts on a network,

Allows multiple users on multiple machines to share files and storage space.

Characteristics:

1. Permeability. Let's actually access the file through the network action, from the program and the user's view,

It's like accessing a local disk in general.

2. Fault tolerance. Even if some nodes in the system are offline, the system can continue to function as a whole

Without any data loss.

Applicable scenarios:

Applies to the case that writes multiple queries at once, does not support concurrent write situations, and small files are inappropriate.

The architecture of HDFs

Master-Slave structure

Master node, only one: Namenode

From the node, there are a number of: Datanodes

Namenode is responsible for:

Receiving user Action requests

Maintaining the directory structure of the file system

Managing the relationship between a file and a block, the relationship between block and Datanode

Datanode is responsible for:

Storing files

Files are partitioned into blocks and stored on disk

To keep your data secure, your files will have multiple copies

NameNode (can be understood as the boss)

is the management node for the entire file system. It maintains the file directory tree for the entire file system,

The meta-information for the file/directory and a list of data blocks for each file. Receives the user's action request.

Files are included (these three are stored in a Linux file system):

Fsimage: A metadata image file that stores Namenode memory metadata information for a certain period of time.

Edits: Operation log file.

Fstime: Time to save last checkpoint

Working characteristics:

1.Namenode always saves metedata in memory for processing "read requests".

2. When a "write request" arrives, Namenode will first write editlog to disk,

The log is written to the edits file, and after a successful return, the memory is modified and returned to the client.

3.Hadoop maintains a fsimage file, which is the metedata image in Namenode,

But Fsimage is not always consistent with the Metedata in Namenode memory,

Instead, the content is updated every once in a while by merging the edits file. Secondary Namenode

It is used to merge fsimage and edits files to update the metedata of Namenode.

DataNode (can be understood as younger brother)

A storage service that provides real-world file data.

The most basic storage unit: block (File block), the default size is 64M

Secondary NameNode (can be understood as boss's assistant)

Solution for HA (high Available). However, hot spares are not supported. Configure

The default is installed on the Namenode node, but this ... Not Safe!

(in production environment, it is recommended to install separately)

Execution process:

Download the metadata information (fsimage,edits) from the Namenode, and then combine the two to generate

New Fsimage, save locally, and push it to Namenode, replacing the old fsimage.

Work Flow:

1.secondarynamenode notification Namenode Switch edits file

2.secondarynamenode get fsimage and edits from Namenode (via HTTP)

3.secondarynamenode loads the fsimage into memory and then starts merging edits

4.secondarynamenode send the new fsimage back to Namenode

5.namenodenamenode replaces the old fsimage with the new fsimage

The entire architecture of Hadoop is built on top of RPC

RPC (Remote Procedure call), (RPC in client/server mode)

A remote procedure call protocol, which is a request service from a remote computer program over a network,

Without the need to understand the protocols underlying network technology.

Specific implementation process:

First, the client call process sends a call message with process parameters to the service process,

Then wait for the message to be answered. On the server side, the process stays asleep until the call information arrives.

When a call arrives, the server obtains the process parameters, evaluates the results, sends the reply message,

And then wait for the next call message,

Finally, the client calls the process to receive the reply information, obtains the process result, and then invokes execution to proceed.

The object provided by the server must be an interface, interface extends Versioinedprotocal

The methods in the object that the client can have must be in the interface of the object.

http://m.oschina.net/blog/212102

One of the two main cores of Hadoop: HDFs Summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.