[Read hadoop source code] [8]-datanode-storagedirectory

Source: Internet
Author: User

The storage path in the datanode node stores different file data blocks. HDFS's implementation of the node storage path is abstracted into a storagedirectory class.

Storagedirectory File

The storagedirectory class contains three attributes:

 
File root;// A local path configured in DFS. Data. dirFilelockLock;// Exclusive lock, synchronous control node operation on the storage directory in_use.lockStoragedirtype dirtype;// Namenode or datanode

The file structure under the root directory has been introduced in the previous article. However, the files stored in the datanode and namenode directories are different.

Whether it is a namenode node or a datanode node, storagedirectory will save the data they output to its own subdirectory current/. To ensure data consistency, there will be a version file under the subdirectory current. However, the version content of the version file under the storage directory of the namenode node and the datanode node is a little different, such

Version files in the namenode storage directory:

Version files in the datanode storage directory:

In an HDFS cluster, the namespaceid of each storage path of each datanode node must be consistent with the namespaceid of the namenode node. Otherwise, the datanode node will be suspended, the namespaceid of the namenode node is generated in its format. That is to say, if the namenode node is not formatted once, a new namespaceid is generated. In addition, the storageid is the first time that a datanode node registers with a namenode node, the namenode assigns a distributed storage ID for it. The storageid of all storage paths in a datanode node is the same. Of course, the data files stored in the storage paths of namenode and datanode are also different.

 

Storagedirectory transaction

In addition to saving node data, storagedirectory also provides coarse-grained transaction operations on stored data, such as backup, recovery, and submission. So how does storagedirectory implement these transactional operations?

 

Analysis status and recovery

When a node suddenly goes down while performing the above operations, how can this node resume the last interrupted operation at the next Startup? In fact, storagedirectory analyzes its own status (analyzestorage () method) Before restoring the data it stores ), then perform the corresponding Restoration Operation (dorecover () method) based on the current status ). The analysis process and corresponding recovery operations are as follows:

 

Hadoop upgrade and rollback

When hadoop is upgraded on an existing cluster, as with other software upgrades, there may be new bugs or some non-compatibility changes that may affect existing applications. In any practical hdsf system, data loss is not allowed, let alone restarting HDFS. HDFS allows the Administrator to return to the previous hadoop version and roll back the cluster status before the upgrade. More details about HDFS upgrade can be found on the upgrade wiki. HDFS can have such a backup at a time. Before the upgrade, the administrator needs to use the bin/hadoop dfsadmin-finalizeupgrade command to delete the existing backup files. The following describes the general upgrade process:

1. Before upgrading the hadoop software, check whether a backup exists. If yes, run the upgrade termination operation to delete the backup. You can use the dfsadmin-upgradeprogress STATUS Command to determine whether to perform the upgrade termination operation on a cluster.

2. Stop the cluster and deploy the new version of hadoop.

3. Run the new version (bin/start-dfs.sh-Upgrade) with the-upgrade option ).

4. In most cases, clusters can run normally. Once we think that the new HDFS is running normally (maybe a few days later), we can end the upgrade. Note: Before the upgrade termination operation is performed on a cluster, deleting files that existed before the upgrade will not actually release the disk space on datanodes.

5. If you need to return to the old version

  • Stop the cluster and deploy the old version of hadoop.
  • Start the cluster (bin/start-dfs.h-rollback) with the rollback option ).
  •  

    Reference URL

    Http://blog.jeoygin.org/2012/03/hdfs-source-analysis-3-datanode-storage.html

    Http://caibinbupt.iteye.com/blog/282580

    Http://caibinbupt.iteye.com/blog/282735

    Http://caibinbupt.iteye.com/blog/283480

    Http://hadoop.apache.org/common/docs/r0.19.2/cn/hdfs_user_guide.html#%E5%8D%87%E7%BA%A7%E5%92%8C%E5%9B%9E%E6%BB%9A

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.