The problem arises in the data cluster number of nodes storage disk size is different, resulting in a period of time after the small capacity of the disk space is tight.
In fact, the early configuration of the disk using the storage strategy, you can solve the problem, some networks to say that this strategy is invalid, and then hadoop2.0.1 this version is valid, the version applies to CHD4.6.
In order to find an accurate program anchor point, refer to the following Hadoop design documents.
reference
Append / Hflush / Read Design Files for HDFS File Systems in Hadoop:
In a DN disk, each DN has three directories: current em bw, current contains finallized replica, tmp contains temporary replica, rbw contains rbw, rwr, rur replicas. When a replica is created by the dfs client for the first time, it is placed in rbw. When the first creation is initiated during block replication and clust balance, the replica is placed in tmp. Once a replica has been finallized, he will be moved to current. When a DN restart, replica in tmp will be deleted, rbw will be loaded as rwr state, current will load for finallized state
We start from tmp or rbw file creation.
See java class BlockPoolSlice
From the description of the class, BlockPoolSlice is the foundation for creating a cluster data block.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.