Understanding Hadoop HDFs Quotas and FS, fsck tool

Understanding Hadoop HDFs Quotas and FS, fsck tool _hbase

Last Update:2018-08-22 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hadoop uses HDFs to store HBase's data, and we can view the size of the HDFS using the following command. Hadoop fsck Hadoop fs-dus Hadoop fs-count-q
The above command may have permission problems in the HDFs, you can run the above command by adding Sudo-u HDFs before

First let's look at the differences between FSCK and Fs-dus

Hadoop fsck

Hadoop fsck/path/to/directory Total
 size:    16565944775310 B    <===,
 dirs:    3922 Total
 files:   418464 total
 blocks (validated):      502705 (avg. block size 32953610 B)
 minimally Replicated blocks:   502705 (100.0%) over-replicated blocks
 :        0 (0.0%) under-replicated blocks
 :       0 (0.0%)
 mis-replicated blocks:         0 (0.0%)
 Default replication factor:    3
 Average block Replication:     3.0
 corrupt blocks:                0
 Missing Replicas:              0 (0.0%) Number of
 Data-nodes:          a
 Number of racks:               1
FSCK ended at Thu Oct 20:49:59 CET in 7516 milliseconds the filesystem

under Pat H '/path/to/directory ' is healthy

Hadoop fs-du-s

Hadoop fs-dus/path/to/directory
hdfs://master:54310/path/to/directory        16565944775310    <===

You can see that the 16565944775310 Bytes (15.1 TB) space is HDFs occupied, and they all display the "normal" file size without taking into account HDFs replication (replication). In this case, the path/directory stores approximately 378.7M of data. Now Fsck tells us that the value of the average chunk copy (Average block replication) for all files on HDFs is 3, which means that the files will exist in the original HDFs storage space, or 3 times times.

3.0 x 16565944775310 (15.1 tb) = 49697834325930 Bytes (45.2 TB)

That's how much disk space HDFs really consumes.

Hadoop fs-count-q

Hadoop fs-count-q/path/to/directory

QUOTA  remaining_quota   space_quota    Remaining_space_quota
none   inf               54975581388800 5277747062870

dir_count  file_count   content_size    file_ NAME
3922       418464       16565944775310  hdfs://master:54310/path/to/directory

Here for a change of line, and added a number of column labels, you can more easily see:

Seventh column, Content_size 16565944775310 Bytes (15.1 TB) is a valid HDFs space occupancy
In the third column, Space_quota 54975581388800 Bytes (TB) is the original HDFs disk quota, which is HDFs's total space on disk, and HDFS can use TB.
The fourth column, Remaining_space_quota 5277747062870 Bytes (4.8 TB) is the remaining HDFs disk quota.

As you can see, the Hadoop fsck and Hadoop fs-du-s will show the data is effectively occupied, equal to the size of the local file system.
The third and fourth columns of Hadoop fs-count-q indirectly return the disk footprint actually consumed in the distributed cluster nodes. Under
The ratio of each HDFS block/3 copy (replications) (where 3 is already exported on the Hadoop fsck, average blocks replication=3.0),
So we can do a subtraction to figure out the actual disk footprint:

54975581388800 (TB)-5277747062870 (4.8 tb) = 49697834325930 (45.2 TB)

As you can see, the Hadoop space quota always calculates the original HDFS disk consumption,
So if you have 1 terabytes of disk, when you set up a copy (replication) =10, you can save gigabytes of individual files.
If the copy is 3 you can deposit a file 333 GB.
That's how the space quota for Hadoop is calculated. At the same time, Hadoop gives you a default value of 3 replicas when you do not set a replica value (replication). This also determines that the disk quotas for Hadoop will always compute the original HDFS disk space consumption and count the replica values.

Local file system size    Hadoop fsck/hadoop fs-dus    Hadoop fs-count-q (if replication factor = 3)
100GB            100GB                          300GB

The original is here http://www.michael-noll.com/blog/2011/10/20/understanding-hdfs-quotas-and-hadoop-fs-and-fsck-tools/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More