Hadoop Learning Record (i) HDFS

Source: Internet
Author: User

  • Hadoop was inspired by Google, and was originally designed to address the high and slow cost of data processing in traditional databases.
  • Hadoop two core projects are HDFS(Hadoop Distributed File System) and MapReduce.
  • HDFs is used to store data, which is different from the traditional relational database, and does not require strong data integrity to store large files in streaming data access mode . When the size of a dataset exceeds the storage capacity of a single physical machine is. It is necessary to partition it and store it on several separate computers. A file system that is stored across multiple computers in a management network is called a distributed file system. the concept of a block is applied to HDFs, and the files on it are divided into chunks of block size as separate storage units, the size of each disk default block is 512 bytes, and the HDFs block defaults to 64MB, which is more than the disk block to minimize addressing overhead. Blocks can be used as storage units to store files larger than any disk size, while simplifying the design of the storage subsystem. storing the same blocks on multiple machines ensures that the block data on the other machine can be executed when one block is damaged.
  • There are two types of nodes on the HDFs cluster. One is Namenode and the other is Datanode. Namenode played the role of manager, managing the entire file System namespace, no Namenode, the file system will not be used. Once the Namenode machine is damaged, all the files will be lost, so some contingency measures need to be taken to prevent the irreversible effects of this situation. the first is to back up the files that make up the persistent state of the file system metadata, persisting the Namenode on multiple file systems. The second type is to run an auxiliary namenode.
  • Not to be continued

Hadoop Learning Record (i) HDFS

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.