The storage path in the datanode node stores different file data blocks. HDFS's implementation of the node storage path is abstracted into a storagedirectory class.
Storagedirectory File
The storagedirectory class contains three attributes:
File root;// A local path configured in DFS. Data. dirFilelockLock;// Exclusive lock, synchronous control node operation on the storage directory in_use.lockStoragedirtype dirtype;// Namenode or datanode
The file structure under the root directory has been introduced in the previous article. However, the files stored in the datanode and namenode directories are different.
Whether it is a namenode node or a datanode node, storagedirectory will save the data they output to its own subdirectory current/. To ensure data consistency, there will be a version file under the subdirectory current. However, the version content of the version file under the storage directory of the namenode node and the datanode node is a little different, such
Version files in the namenode storage directory:
Version files in the datanode storage directory:
In an HDFS cluster, the namespaceid of each storage path of each datanode node must be consistent with the namespaceid of the namenode node. Otherwise, the datanode node will be suspended, the namespaceid of the namenode node is generated in its format. That is to say, if the namenode node is not formatted once, a new namespaceid is generated. In addition, the storageid is the first time that a datanode node registers with a namenode node, the namenode assigns a distributed storage ID for it. The storageid of all storage paths in a datanode node is the same. Of course, the data files stored in the storage paths of namenode and datanode are also different.
Storagedirectory transaction
In addition to saving node data, storagedirectory also provides coarse-grained transaction operations on stored data, such as backup, recovery, and submission. So how does storagedirectory implement these transactional operations?
Analysis status and recovery
When a node suddenly goes down while performing the above operations, how can this node resume the last interrupted operation at the next Startup? In fact, storagedirectory analyzes its own status (analyzestorage () method) Before restoring the data it stores ), then perform the corresponding Restoration Operation (dorecover () method) based on the current status ). The analysis process and corresponding recovery operations are as follows:
Hadoop upgrade and rollback
When hadoop is upgraded on an existing cluster, as with other software upgrades, there may be new bugs or some non-compatibility changes that may affect existing applications. In any practical hdsf system, data loss is not allowed, let alone restarting HDFS. HDFS allows the Administrator to return to the previous hadoop version and roll back the cluster status before the upgrade. More details about HDFS upgrade can be found on the upgrade wiki. HDFS can have such a backup at a time. Before the upgrade, the administrator needs to use the bin/hadoop dfsadmin-finalizeupgrade command to delete the existing backup files. The following describes the general upgrade process:
1. Before upgrading the hadoop software, check whether a backup exists. If yes, run the upgrade termination operation to delete the backup. You can use the dfsadmin-upgradeprogress STATUS Command to determine whether to perform the upgrade termination operation on a cluster.
2. Stop the cluster and deploy the new version of hadoop.
3. Run the new version (bin/start-dfs.sh-Upgrade) with the-upgrade option ).
4. In most cases, clusters can run normally. Once we think that the new HDFS is running normally (maybe a few days later), we can end the upgrade. Note: Before the upgrade termination operation is performed on a cluster, deleting files that existed before the upgrade will not actually release the disk space on datanodes.
5. If you need to return to the old version
Stop the cluster and deploy the old version of hadoop.
Start the cluster (bin/start-dfs.h-rollback) with the rollback option ).
Reference URL
Http://blog.jeoygin.org/2012/03/hdfs-source-analysis-3-datanode-storage.html
Http://caibinbupt.iteye.com/blog/282580
Http://caibinbupt.iteye.com/blog/282735
Http://caibinbupt.iteye.com/blog/283480
Http://hadoop.apache.org/common/docs/r0.19.2/cn/hdfs_user_guide.html#%E5%8D%87%E7%BA%A7%E5%92%8C%E5%9B%9E%E6%BB%9A