HDFs Concept detailed-block

Source: Internet
Author: User

a disk has its block size, which represents the minimum amount of data it can read and write. The file system operates this disk by processing chunks of integer multiples of the size of a disk block. The file system block is typically thousands of bytes, and the disk block is generally a byte. This information is transparent to file system users who simply read or write at any length on a single file. However, some tools maintain file systems, such as DF and fsck, which operate at the system block level.

hdfs There is also a block concept, but a larger unit, the default is 64 MB hdfs The files on the file are also divided into chunks of size, as separate unit storage. But the difference is that hdfs Files that are less than one block in size do not occupy the entire block of space. If not specifically noted, " block " hdfs

Why HDFS What's the size of a block?

HDFS block is larger than the disk block, and is intended to reduce the addressing overhead. By making a block large enough, the time to transfer data from the disk can be far greater than the time it takes to locate the block's starting end. Therefore, the time to transfer a file consisting of multiple blocks depends on the disk transfer rate.

10 100 mega / seconds, in order to make the addressing time for the transmission time 1% 100 MB 64 MB Although many hdfs settings use 128 MB The block. This number will continue to adjust later as the next generation disk drive speeds up the transfer.

mapreduce map ( )

Span style= "FONT-FAMILY:SERIF;" > " block ") do not need to be stored on the same disk, so they can take advantage of any disk on the cluster. In fact, although not common, but for hdfs cluster, You can also store a file whose tiles occupy all the disks in the cluster.

A second benefit is that using a block abstract unit instead of a file simplifies the storage subsystem. Simplification is the pursuit of all systems, but it is particularly important for a wide variety of fault-distribution systems. The storage subsystem controls blocks, simplifying storage management. ( because the size of the block is fixed, it is relatively easy to calculate how many blocks a disk can hold), it also eliminates the concern about metadata ( blocks are just a subset of the stored data - the metadata of the file, such as the license information, does not need to be stored with the block, so that other systems can manage the metadata in an orthogonal way. )

In addition, blocks are well suited for replication operations that provide fault tolerance and practicality. In order to cope with damaged blocks and malfunctions of the disk or machine, each block is in a small number of other scattered machines(generally for3a)for replication. If one block is damaged, another copy is read elsewhere, and the process is transparent to the user. A block that has been lost due to damage or machine failure is copied from other candidate locations to a functioning machine to ensure that the number of copies returns to normal levels. (See section4Chapter of"integrity of data"section to learn more about how to deal with data corruption. )Similarly, some applications may choose to set a higher number of replicas for popular file blocks to increase the amount of read load on the cluster.

similar to the disk file system, HDFS in fsck The instruction displays the information for the block. For example, execute the following command to list the blocks that make up each file in the file system:

1. % Hadoop fsck/-files-blocks


HDFs Concept detailed-block

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.