Hadoop Learning for the second time: Application scenario Deployment principle and basic framework of HDFS

Source: Internet
Author: User

Definition and characteristics of 1.HDFS

The disadvantage of a file as a basic storage unit: It is difficult to achieve load balancing-the file size is different, load balancing is difficult to achieve, the user control the file size;

It is difficult to parallelize processing--only one node resource can be used to process a file, and the cluster resources cannot be utilized;

The definition of HDFs: A distributed File system that is easy to expand, runs on a large number of inexpensive machines, provides a fault-tolerant mechanism, and provides a good performance file storage service for a large number of users;

Advantages: High fault tolerance (data automatically saves multiple copies, automatic recovery after copy loss) suitable for batch processing (mobile computing rather than data, data location exposure to computational framework) processing streaming file access for big data can be built on inexpensive machines

Not good: Low latency data access Small file access concurrent write, File random modification

2.HDFS Architecture

Namenode:master manages the namespace of HDFs to manage block mapping information, configure replica policies, and handle client read and write requests

Datanode:slave stores actual blocks of data, performs data block reads and writes

Client: File segmentation interacts with Namenode, obtains file location information, interacts with datanode, reads or writes data, manages HDFs, accesses HDFs

Secondary NameNode: Not NameNode hot-standby, auxiliary NameNode, share their workload; periodically merge fsimage and Fsedits, push to NameNode; assist recovery NameNode in case of emergency

3.HDFS Working principle

4HDFS combined with other systems

Hadoop Learning for the second time: Application scenario Deployment principle and basic framework of HDFS

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.