On the HDFs file system under Hadoop

Source: Internet
Author: User

Hadoop under HDFs file system

Here we have the basic concept of Hadoop, historical functions do not do too much elaboration, focusing on his file system to do some understanding and elaboration.

HDFS (Hadoop Distributed File System) is a distributed filesystem. With high fault tolerance (fault-tolerant), it allows him to deploy on inexpensive hardware. He can provide high throughput rates to access the application's data. HDFs relaxes the requirements for portable operating system interfaces. This allows the data in the file system to be accessed in a streaming format.

Design Objectives for HDFs:

    1. Detect and quickly reply to hardware failures

    2. Streaming data access

    3. Simplifying the consistency model

    4. Communication protocols

HDFS Architecture

650) this.width=650; "title=" 12.png "style=" HEIGHT:499PX;WIDTH:694PX; "src=" http://s3.51cto.com/wyfs02/M00/54/87/ Wkiol1sfgutx4inkaahqyhjabsc552.jpg "width=" "height=" 683 "alt=" wkiol1sfgutx4inkaahqyhjabsc552.jpg "/>

The architecture of HDFs employs a master-slave (Master/slave) model, and an HDFS cluster consists of a namenode and several datanode, where Namenode is the primary server that manages the namespace and file operations of the file's decency. ; Datanode manages the stored data. HDFs allows users to store data in the form of files. Internally, the file is partitioned into blocks of data, which are stored in a set of Datanode. The Namenode unified Dispatch class to create, delete, and copy files. (User data will never go through Namenode)

Hadoop and distributed development

What we commonly call distributed systems is distributed software systems, which are distributed processing software systems, including

Distributed operating system

Distributed programming language and its compilation (interpretation) system

Distributed File System

Distributed Database System

Hadoop is a layer in a file system in a distributed software system. It realizes the function of distributed file system and partial distributed database.

In the region, HDFs enables efficient storage and management of data in a cloud of compute clusters.

Similar characteristics of HDFS distributed systems and other systems:

    1. The namespace for the entire cluster

    2. A model that has data consistency and is suitable for writing multiple reads and writes at a time, and the client cannot see the existence of the file until the file is successfully created

    3. The file is divided into multiple ask price blocks, each file is allocated to the data node, and the security of the data is guaranteed based on the configuration of the copied file blocks.

Next, please learn by reference

The management of HDFS data through specific operation

(1) File write

    • Client requests to Namenode to initiate a file write

    • Namenode returns the information of the Datanode that the client manages, based on the file size and the configuration of the file block.

    • The client divides the files into blocks and writes them sequentially to each datanode block according to the Datanode address information.

(2) file read

    • Client initiates a request to Namenode to read the file

    • Namenode return datanode information for file storage

    • Client reads file information

(3) file blocks (block) replication

    • Namenode found that the block of some files does not meet the minimum number of copies of this requirement or some datanode fail

    • Notify Datanode to duplicate each block

    • Datanode began to replicate directly with each other.

The functions of HDFS in System management

    1. Heartbeat detection

    2. Data replication

    3. Data validation

    4. Single Namenode If the failed task processing information is logged in the local file system and the remote file system

    5. pipelined Writing of data

    6. Safe Mode

HDFs is a simple introduction to this if there are deficiencies in the area please forgive, this document is only for learning reference.

This article from "Round Circle dot point" blog, declined reprint!

Talking about the HDFs file system under Hadoop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.