Hadoop problem: The Datanode process is gone

Source: Internet
Author: User

The Datanode process is missing a description of the problem

The recent configuration of Hadoop has been followed by the use of the JPS command after startup:

Can't see the Datanode process, but can work normally, isn't it amazing?

After a Baidu Google, came to the conclusion:

Before and after I started Hadoop, I used the following commands several times to format Namenode:

hadoop namenode -format

This problem is not directly caused by your formatting, but after you format it, start Hadoop, then turn Hadoop off, reformat, and then start Hadoop, and this time you will find that the Datanode thread has disappeared in the JPS command, and it will work. Just like the one I started with. The root cause of this problem is the inconsistency between the version numbers of Namenode and Datanode. This problem will not only occur in pseudo-distributed, but also in full distribution. This is demonstrated in pseudo-distributed.

Here is the information for the normal two files.

NameNode Version File information:

namespaceID=51628800clusterID=CID-97bb16dc-c439-427c-9841-5e6e4667cb65cTime=0storageType=NAME_NODEblockpoolID=BP-1918730739-172.17.241.131-1526803461127layoutVersion=-63

DataNode Version File information:

storageid=ds- 4281731b-7a44-4c86-8844-e1927a4fc966clusterid=cid-97bb16dc-c439-< Span class= "Hljs-number" >427c-9841-5e6e4667cb65ctime=0datanodeuuid= 197c3d68-454b-4287-a5e5- 90c01ed9be53storagetype=data_nodelayoutversion=-56              

The so-called version number is inconsistent, that is, the value of the Clusterid, the above information shows the same, also shows that Namenode and Datanode are a group.

So where are these two files stored? Here is an entry in the Hadoop configuration file Core-site.xml, just below the directory specified in this entry.

<Property><!--used to specify the directory in which the Hadoop runtime generates files--<name>hadoop.tmp.dir</ name> <value >/home/hadoop-2.7.1/tmp</value> </property>           

Then I will use this configured path to find, first to the TMP directory:

As the full lookup path.

Problem analysis

The following is an analysis of this issue:

When the first format, start Hadoop, no problem, any link is new, so even if you start hadoop many times before the Namenode format can be, because before Hadoop started, Datanode version has not been generated, Only the response information, such as the version of Datanode after Hadoop is started, is generated in the specified directory, and this creates a one-to namenode relationship with Datanode.

When you turn off Hadoop after two namenode format, namenode version information, such as the re-write, the content is certainly not the same as before, this caused, I mentioned above the Clusterid inconsistency problem, so that you start Hadoop again, All functions are used normally, but the datanode thread is not visible under the JPS command, which of course makes every programmer feel alarmed.

Solution Solutions

Programme I

First, before formatting, you set the storage Hadoop information directory empty, that is, my example TMP directory, the directory is emptied. You can also delete this directory directly, and then create a new one.

Then, to format, so that the resulting namenode and datanode information are new, also is a group, the problem is solved, this is the simplest and most effective method.

Programme II

If the data is still there and you don't want to erase the data, then this is your gospel.

Since the version number is inconsistent with the problem, then we solve the issue of the version number alone, you format the Namenode after the version file found, and then the inside of the Clusterid to copy, and then find the Datanode version file, Replace the Clusterid inside, save and restart, then you can use it normally. Find the path that has been shown in, here do not repeat.

As far as the above question is, I only think of two solutions.

Hadoop problem: The Datanode process is gone

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.