Hadoop authoritative guide reading record 2

Source: Internet
Author: User

Chapter 2 hadoop Distributed File System

 

Hadoop distributed filesystem

Store ultra-large files in Streaming Data Access Mode

The idea of building hadoop: the most efficient access mode for one write and multiple reads. the time delay for reading the entire dataset is more important than the time delay for reading the first record.

Currently, write operations always add data to the end of a file. They do not support operations with multiple writers or modify the data at any location of the file. They are relatively inefficient and may support these operations in the future.

The disk block size is generally 512 bytes, and the default HDFS block size is 64 MB.

HDFS blocks are larger than disk blocks, so as to minimize addressing overhead and reduce the proportion of addressing time to transmission time. As the disk drive transmission rate increases, the block size will be larger; however, due to the limited processing speed of map tasks, the block cannot be set too large. Otherwise, the number of tasks is too small, and the job running speed is slow.

The fsck command in HDFS can display block information, % hadoop fsck/-files-Blocks

HDFS clusters have two types of nodes-Manager-worker mode-one namenode (manager) and multiple datanode (worker)

Namenode manages the file system namespace and maintains all the files and directories in the file system tree and the entire tree. The namespace image file is permanently saved as two files, and the editing log file records the data node information of each block in each file, but does not permanently Save the block location information, this information will be rebuilt by the data node when the system is started.

The client interacts with namenode and datanode by providing a file system interface.

Datanode is the working node of the file system. It stores and retrieves data blocks as needed, and regularly sends a list of blocks they store to namenode.

Namenode Fault Tolerance Mechanism-1. Back Up Files that make up the persistent state of File System metadata; 2. Run a secondary namenode

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.