The project environment encountered a lot of small files, at first, in addition to Namenode memory, is still relatively worried about the use of file physical space. So just look at how small files occupy the physical space:
Prerequisites : HDFS block size is 64MB
Total 3 copies of documents
1, batch generation of small files (all 20M)
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6C/77/wKioL1VKBPPRLhwyAANmM4NvOaw676.jpg "title=" 1.png " alt= "Wkiol1vkbpprlhwyaanmm4nvoaw676.jpg"/>
2. record DFS space usage before testing
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6C/7C/wKiom1VKA63SxRdVAAE9MYR7Yhw541.jpg "title=" 1.png " alt= "Wkiom1vka63sxrdvaae9myr7yhw541.jpg"/>
Currently, DFS Space already uses 50.04GB
3. LOAD a file into HIVE testaa
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6C/7C/wKiom1VKBB7AyD_XAAFQxNFcLF4082.jpg "title=" 1.png " alt= "Wkiom1vkbb7ayd_xaafqxnfclf4082.jpg"/>
4. Review DFS space usage again
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6C/77/wKioL1VKBcLxHo8tAAE-TWWIfpo156.jpg "title=" 1.png " alt= "Wkiol1vkbclxho8taae-twwifpo156.jpg"/>
5. View the space occupied by the block where the file Testaa resides
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6C/7C/wKiom1VKBH7iuw7YAASlmTOZZSI376.jpg "title=" 1.png " alt= "Wkiom1vkbh7iuw7yaaslmtozzsi376.jpg"/>
Total size can be seen: Occupy physical space of 20MB
6. LOAD a file again testab
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6C/77/wKioL1VKBiLzLUL3AAFB6FiBn5U861.jpg "title=" 1.png " alt= "Wkiol1vkbilzlul3aafb6fibn5u861.jpg"/>
7. View DFS space usage
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6C/77/wKioL1VKBkPAPbqUAAE6tWRiq08927.jpg "title=" 1.png " alt= "Wkiol1vkbkpapbquaae6twriq08927.jpg"/>
DFS Space uses 50.16GB
8, View the file Testab block occupied space situation
650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/6C/77/wKioL1VKBnKQyWZtAASoCzs43gI942.jpg "title=" 1. PNG "alt=" wkiol1vkbnkqywztaasoczs43gi942.jpg "/>testab occupies physical space also 20MB
9. Bulk loading of small files
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6C/7C/wKiom1VKBTqTMYnyAAWXs9VopVI694.jpg "title=" 1.png " alt= "Wkiom1vkbtqtmynyaawxs9vopvi694.jpg"/>
10. Execute the Script
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6C/77/wKioL1VKBuCQIPGPAAcSIpNtp0g894.jpg "title=" 1.png " alt= "Wkiol1vkbucqipgpaacsipntp0g894.jpg"/>
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/6C/7C/wKiom1VKBYPRI8hvAAFIGcFbiOk521.jpg "title=" 1.png " alt= "Wkiom1vkbypri8hvaafigcfbiok521.jpg"/>
11. Review DFS space usage again
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/6C/77/wKioL1VKBxyxmfJbAAEoMKjghvU436.jpg "title=" 1.png " alt= "Wkiol1vkbxyxmfjbaaeomkjghvu436.jpg"/>
From the above tests get:
DFS start space size:50.04GB
DFS footprint Size:51.58GB after you load The 20MB files
Calculation:
calculate the per-file usage DFS space Size:
(51.58gb-50.04gb)*1024/26=60.65MB
File occupies physical space size:
20mb*3 Part =60MB
Conclusion:
smaller files that are smaller than the block size do not occupy the entire HDFS block space. In other words, more small files take up more Namenode memory (which records information such as the location of files), and in addition, there may be significant network overhead when processing files.
This article is from the "one step. One Step" blog, be sure to keep this source http://snglw.blog.51cto.com/5832405/1643587
HDFs small file physical space occupancy verification