The difference between 1.hadoop1.0 and hadoop2.0:
hadoop1.0 ecology such as:
hadoop2.0 Ecology:
2.HDFS Description: HDFs is an open source clone of Google's GFS, and the architecture of HDFs is as follows:
1) NameNode: Manages the namespace of HDFs, manages block mapping information, configures replica policies, and handles client read and write requests.
2) Standbynamenode:namenode hot spare, periodically merge fsimage and Fsedits, push to NameNode, when active NameNode fails, quickly switch to the new active NameNode.
3) Datanode: Stores the actual data block and executes the block read/write.
4) Client: File segmentation, interact with Namenode, get file location information, interact with Datanode, read or write data, manage HDFs, Access HDFs.
Advantages: High fault tolerance, suitable for batch processing, suitable for large data processing, streaming file access, can be built on a cheap machine.
Cons: Low latency data access, such as millisecond, low latency and high throughput, small file access, Namenode large amount of memory, seek time exceeding read time, concurrent write, file random modification a file can have only one writer, only support append.
Data form of 3.HDFS
The file is cut into a fixed-size block, the default block size is 64MB, the size of the block can be configured, if the file size is less than 64MB, it is stored separately into a block. A file storage method is divided into blocks by size, stored on different nodes, with three replicas per block by default.
HDFs Data Write Process:
HDFs Data Read process:
4.MapReduce: Google's MapReduce open source cloning, suitable for petabytes of data on the off-line processing.
The computational framework for MapReduce:
5.yarn:hadoop 2.0 New system, responsible for cluster resource management and scheduling, so that a variety of computing framework can be run in a cluster, with a variety of multi-user scheduler, suitable for sharing the cluster environment.
Yarn Architecture:
6.HDFS Shell Operation:
The Hadoop shell command, under the bin directory of Hadoop, uses the HDFs command to view the commands in the HDFs file system, such as:
Dfsadmin: The Hadoop dfsadmin command options in the bin directory are as follows:
DFS: The Hadoop DFS command options in the bin directory are as follows:
FSCK: Check the file Properties command, which operates as follows:
Hadoop Learning II: Hadoop infrastructure and shell operations