Hadoop uses HDFs to store HBase's data, and we can view the size of the HDFS using the following command. Hadoop fsck Hadoop fs-dus Hadoop fs-count-q
The above command may have permission problems in the HDFs, you can run the above command by adding Sudo-u HDFs before
First let's look at the differences between FSCK and Fs-dus
Hadoop fsck
Hadoop fsck/path/to/directory Total
size: 16565944775310 B <===,
dirs: 3922 Total
files: 418464 total
blocks (validated): 502705 (avg. block size 32953610 B)
minimally Replicated blocks: 502705 (100.0%) over-replicated blocks
: 0 (0.0%) under-replicated blocks
: 0 (0.0%)
mis-replicated blocks: 0 (0.0%)
Default replication factor: 3
Average block Replication: 3.0
corrupt blocks: 0
Missing Replicas: 0 (0.0%) Number of
Data-nodes: a
Number of racks: 1
FSCK ended at Thu Oct 20:49:59 CET in 7516 milliseconds the filesystem
under Pat H '/path/to/directory ' is healthy
Hadoop fs-du-s
Hadoop fs-dus/path/to/directory
hdfs://master:54310/path/to/directory 16565944775310 <===
You can see that the 16565944775310 Bytes (15.1 TB) space is HDFs occupied, and they all display the "normal" file size without taking into account HDFs replication (replication). In this case, the path/directory stores approximately 378.7M of data. Now Fsck tells us that the value of the average chunk copy (Average block replication) for all files on HDFs is 3, which means that the files will exist in the original HDFs storage space, or 3 times times.
3.0 x 16565944775310 (15.1 tb) = 49697834325930 Bytes (45.2 TB)
That's how much disk space HDFs really consumes.
Hadoop fs-count-q
Hadoop fs-count-q/path/to/directory
QUOTA remaining_quota space_quota Remaining_space_quota
none inf 54975581388800 5277747062870
dir_count file_count content_size file_ NAME
3922 418464 16565944775310 hdfs://master:54310/path/to/directory
Here for a change of line, and added a number of column labels, you can more easily see:
Seventh column, Content_size 16565944775310 Bytes (15.1 TB) is a valid HDFs space occupancy
In the third column, Space_quota 54975581388800 Bytes (TB) is the original HDFs disk quota, which is HDFs's total space on disk, and HDFS can use TB.
The fourth column, Remaining_space_quota 5277747062870 Bytes (4.8 TB) is the remaining HDFs disk quota.
As you can see, the Hadoop fsck and Hadoop fs-du-s will show the data is effectively occupied, equal to the size of the local file system.
The third and fourth columns of Hadoop fs-count-q indirectly return the disk footprint actually consumed in the distributed cluster nodes. Under
The ratio of each HDFS block/3 copy (replications) (where 3 is already exported on the Hadoop fsck, average blocks replication=3.0),
So we can do a subtraction to figure out the actual disk footprint:
54975581388800 (TB)-5277747062870 (4.8 tb) = 49697834325930 (45.2 TB)
As you can see, the Hadoop space quota always calculates the original HDFS disk consumption,
So if you have 1 terabytes of disk, when you set up a copy (replication) =10, you can save gigabytes of individual files.
If the copy is 3 you can deposit a file 333 GB.
That's how the space quota for Hadoop is calculated. At the same time, Hadoop gives you a default value of 3 replicas when you do not set a replica value (replication). This also determines that the disk quotas for Hadoop will always compute the original HDFS disk space consumption and count the replica values.
Local file system size Hadoop fsck/hadoop fs-dus Hadoop fs-count-q (if replication factor = 3)
100GB 100GB 300GB
The original is here http://www.michael-noll.com/blog/2011/10/20/understanding-hdfs-quotas-and-hadoop-fs-and-fsck-tools/