Hadoop under HDFs file system
Here we have the basic concept of Hadoop, historical functions do not do too much elaboration, focusing on his file system to do some understanding and elaboration.
HDFS (Hadoop Distributed File System) is a distributed filesystem. With high fault tolerance (fault-tolerant), it allows him to deploy on inexpensive hardware. He can provide high throughput rates to access the application's data. HDFs relaxes the requirements for portable operating system interfaces. This allows the data in the file system to be accessed in a streaming format.
Design Objectives for HDFs:
Detect and quickly reply to hardware failures
Streaming data access
Simplifying the consistency model
Communication protocols
HDFS Architecture
650) this.width=650; "title=" 12.png "style=" HEIGHT:499PX;WIDTH:694PX; "src=" http://s3.51cto.com/wyfs02/M00/54/87/ Wkiol1sfgutx4inkaahqyhjabsc552.jpg "width=" "height=" 683 "alt=" wkiol1sfgutx4inkaahqyhjabsc552.jpg "/>
The architecture of HDFs employs a master-slave (Master/slave) model, and an HDFS cluster consists of a namenode and several datanode, where Namenode is the primary server that manages the namespace and file operations of the file's decency. ; Datanode manages the stored data. HDFs allows users to store data in the form of files. Internally, the file is partitioned into blocks of data, which are stored in a set of Datanode. The Namenode unified Dispatch class to create, delete, and copy files. (User data will never go through Namenode)
Hadoop and distributed development
What we commonly call distributed systems is distributed software systems, which are distributed processing software systems, including
Distributed operating system
Distributed programming language and its compilation (interpretation) system
Distributed File System
Distributed Database System
Hadoop is a layer in a file system in a distributed software system. It realizes the function of distributed file system and partial distributed database.
In the region, HDFs enables efficient storage and management of data in a cloud of compute clusters.
Similar characteristics of HDFS distributed systems and other systems:
The namespace for the entire cluster
A model that has data consistency and is suitable for writing multiple reads and writes at a time, and the client cannot see the existence of the file until the file is successfully created
The file is divided into multiple ask price blocks, each file is allocated to the data node, and the security of the data is guaranteed based on the configuration of the copied file blocks.
Next, please learn by reference
The management of HDFS data through specific operation
(1) File write
Client requests to Namenode to initiate a file write
Namenode returns the information of the Datanode that the client manages, based on the file size and the configuration of the file block.
The client divides the files into blocks and writes them sequentially to each datanode block according to the Datanode address information.
(2) file read
Client initiates a request to Namenode to read the file
Namenode return datanode information for file storage
Client reads file information
(3) file blocks (block) replication
Namenode found that the block of some files does not meet the minimum number of copies of this requirement or some datanode fail
Notify Datanode to duplicate each block
Datanode began to replicate directly with each other.
The functions of HDFS in System management
Heartbeat detection
Data replication
Data validation
Single Namenode If the failed task processing information is logged in the local file system and the remote file system
pipelined Writing of data
Safe Mode
HDFs is a simple introduction to this if there are deficiencies in the area please forgive, this document is only for learning reference.
This article from "Round Circle dot point" blog, declined reprint!
Talking about the HDFs file system under Hadoop