Hadoop Learning II: Hadoop infrastructure and shell operations

Source: Internet
Author: User

The difference between 1.hadoop1.0 and hadoop2.0:

  

hadoop1.0 ecology such as:

  

hadoop2.0 Ecology:

  

2.HDFS Description: HDFs is an open source clone of Google's GFS, and the architecture of HDFs is as follows:

  

1) NameNode: Manages the namespace of HDFs, manages block mapping information, configures replica policies, and handles client read and write requests.

2) Standbynamenode:namenode hot spare, periodically merge fsimage and Fsedits, push to NameNode, when active NameNode fails, quickly switch to the new active NameNode.

3) Datanode: Stores the actual data block and executes the block read/write.

4) Client: File segmentation, interact with Namenode, get file location information, interact with Datanode, read or write data, manage HDFs, Access HDFs.

Advantages: High fault tolerance, suitable for batch processing, suitable for large data processing, streaming file access, can be built on a cheap machine.

Cons: Low latency data access, such as millisecond, low latency and high throughput, small file access, Namenode large amount of memory, seek time exceeding read time, concurrent write, file random modification a file can have only one writer, only support append.

Data form of 3.HDFS

The file is cut into a fixed-size block, the default block size is 64MB, the size of the block can be configured, if the file size is less than 64MB, it is stored separately into a block. A file storage method is divided into blocks by size, stored on different nodes, with three replicas per block by default.

HDFs Data Write Process:

  

HDFs Data Read process:

  

4.MapReduce: Google's MapReduce open source cloning, suitable for petabytes of data on the off-line processing.

The computational framework for MapReduce:

  

5.yarn:hadoop 2.0 New system, responsible for cluster resource management and scheduling, so that a variety of computing framework can be run in a cluster, with a variety of multi-user scheduler, suitable for sharing the cluster environment.

Yarn Architecture:

  

6.HDFS Shell Operation:

The Hadoop shell command, under the bin directory of Hadoop, uses the HDFs command to view the commands in the HDFs file system, such as:

  

Dfsadmin: The Hadoop dfsadmin command options in the bin directory are as follows:

    

DFS: The Hadoop DFS command options in the bin directory are as follows:

  

FSCK: Check the file Properties command, which operates as follows:

  

Hadoop Learning II: Hadoop infrastructure and shell operations

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.