Hadoop problem: The Datanode process is gone

Last Update:2018-05-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The Datanode process is missing a description of the problem

The recent configuration of Hadoop has been followed by the use of the JPS command after startup:

Can't see the Datanode process, but can work normally, isn't it amazing?

After a Baidu Google, came to the conclusion:

Before and after I started Hadoop, I used the following commands several times to format Namenode:

hadoop namenode -format

This problem is not directly caused by your formatting, but after you format it, start Hadoop, then turn Hadoop off, reformat, and then start Hadoop, and this time you will find that the Datanode thread has disappeared in the JPS command, and it will work. Just like the one I started with. The root cause of this problem is the inconsistency between the version numbers of Namenode and Datanode. This problem will not only occur in pseudo-distributed, but also in full distribution. This is demonstrated in pseudo-distributed.

Here is the information for the normal two files.

NameNode Version File information:

namespaceID=51628800clusterID=CID-97bb16dc-c439-427c-9841-5e6e4667cb65cTime=0storageType=NAME_NODEblockpoolID=BP-1918730739-172.17.241.131-1526803461127layoutVersion=-63

DataNode Version File information:

storageid=ds- 4281731b-7a44-4c86-8844-e1927a4fc966clusterid=cid-97bb16dc-c439-< Span class= "Hljs-number" >427c-9841-5e6e4667cb65ctime=0datanodeuuid= 197c3d68-454b-4287-a5e5- 90c01ed9be53storagetype=data_nodelayoutversion=-56

The so-called version number is inconsistent, that is, the value of the Clusterid, the above information shows the same, also shows that Namenode and Datanode are a group.

So where are these two files stored? Here is an entry in the Hadoop configuration file Core-site.xml, just below the directory specified in this entry.

<Property><!--used to specify the directory in which the Hadoop runtime generates files--<name>hadoop.tmp.dir</ name> <value >/home/hadoop-2.7.1/tmp</value> </property>

Then I will use this configured path to find, first to the TMP directory:

As the full lookup path.

Problem analysis

The following is an analysis of this issue:

When the first format, start Hadoop, no problem, any link is new, so even if you start hadoop many times before the Namenode format can be, because before Hadoop started, Datanode version has not been generated, Only the response information, such as the version of Datanode after Hadoop is started, is generated in the specified directory, and this creates a one-to namenode relationship with Datanode.

When you turn off Hadoop after two namenode format, namenode version information, such as the re-write, the content is certainly not the same as before, this caused, I mentioned above the Clusterid inconsistency problem, so that you start Hadoop again, All functions are used normally, but the datanode thread is not visible under the JPS command, which of course makes every programmer feel alarmed.

Solution Solutions

Programme I

First, before formatting, you set the storage Hadoop information directory empty, that is, my example TMP directory, the directory is emptied. You can also delete this directory directly, and then create a new one.

Then, to format, so that the resulting namenode and datanode information are new, also is a group, the problem is solved, this is the simplest and most effective method.

Programme II

If the data is still there and you don't want to erase the data, then this is your gospel.

Since the version number is inconsistent with the problem, then we solve the issue of the version number alone, you format the Namenode after the version file found, and then the inside of the Clusterid to copy, and then find the Datanode version file, Replace the Clusterid inside, save and restart, then you can use it normally. Find the path that has been shown in, here do not repeat.

As far as the above question is, I only think of two solutions.

Hadoop problem: The Datanode process is gone

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop problem: The Datanode process is gone

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop problem: The Datanode process is gone

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support