Understanding Hadoop HDFs Quotas and FS, fsck tool _hbase

Source: Internet
Author: User
Tags hadoop fs
Hadoop uses HDFs to store HBase's data, and we can view the size of the HDFS using the following command. Hadoop fsck Hadoop fs-dus Hadoop fs-count-q
The above command may have permission problems in the HDFs, you can run the above command by adding Sudo-u HDFs before

First let's look at the differences between FSCK and Fs-dus

Hadoop fsck

Hadoop fsck/path/to/directory Total
 size:    16565944775310 B    <===,
 dirs:    3922 Total
 files:   418464 total
 blocks (validated):      502705 (avg. block size 32953610 B)
 minimally Replicated blocks:   502705 (100.0%) over-replicated blocks
 :        0 (0.0%) under-replicated blocks
 :       0 (0.0%)
 mis-replicated blocks:         0 (0.0%)
 Default replication factor:    3
 Average block Replication:     3.0
 corrupt blocks:                0
 Missing Replicas:              0 (0.0%) Number of
 Data-nodes:          a
 Number of racks:               1
FSCK ended at Thu Oct 20:49:59 CET in 7516 milliseconds the filesystem

under Pat H '/path/to/directory ' is healthy

Hadoop fs-du-s

Hadoop fs-dus/path/to/directory
hdfs://master:54310/path/to/directory        16565944775310    <===

You can see that the 16565944775310 Bytes (15.1 TB) space is HDFs occupied, and they all display the "normal" file size without taking into account HDFs replication (replication). In this case, the path/directory stores approximately 378.7M of data. Now Fsck tells us that the value of the average chunk copy (Average block replication) for all files on HDFs is 3, which means that the files will exist in the original HDFs storage space, or 3 times times.

3.0 x 16565944775310 (15.1 tb) = 49697834325930 Bytes (45.2 TB)

That's how much disk space HDFs really consumes.

Hadoop fs-count-q

Hadoop fs-count-q/path/to/directory

QUOTA  remaining_quota   space_quota    Remaining_space_quota
none   inf               54975581388800 5277747062870

dir_count  file_count   content_size    file_ NAME
3922       418464       16565944775310  hdfs://master:54310/path/to/directory

Here for a change of line, and added a number of column labels, you can more easily see:

Seventh column, Content_size 16565944775310 Bytes (15.1 TB) is a valid HDFs space occupancy
In the third column, Space_quota 54975581388800 Bytes (TB) is the original HDFs disk quota, which is HDFs's total space on disk, and HDFS can use TB.
The fourth column, Remaining_space_quota 5277747062870 Bytes (4.8 TB) is the remaining HDFs disk quota.

As you can see, the Hadoop fsck and Hadoop fs-du-s will show the data is effectively occupied, equal to the size of the local file system.
The third and fourth columns of Hadoop fs-count-q indirectly return the disk footprint actually consumed in the distributed cluster nodes. Under
The ratio of each HDFS block/3 copy (replications) (where 3 is already exported on the Hadoop fsck, average blocks replication=3.0),
So we can do a subtraction to figure out the actual disk footprint:

54975581388800 (TB)-5277747062870 (4.8 tb) = 49697834325930 (45.2 TB)

As you can see, the Hadoop space quota always calculates the original HDFS disk consumption,
So if you have 1 terabytes of disk, when you set up a copy (replication) =10, you can save gigabytes of individual files.
If the copy is 3 you can deposit a file 333 GB.
That's how the space quota for Hadoop is calculated. At the same time, Hadoop gives you a default value of 3 replicas when you do not set a replica value (replication). This also determines that the disk quotas for Hadoop will always compute the original HDFS disk space consumption and count the replica values.

Local file system size    Hadoop fsck/hadoop fs-dus    Hadoop fs-count-q (if replication factor = 3)
100GB            100GB                          300GB

The original is here http://www.michael-noll.com/blog/2011/10/20/understanding-hdfs-quotas-and-hadoop-fs-and-fsck-tools/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.