Hadoop can't shut down the Namenode solution __namenode

Source: Internet
Author: User
Problem Description

The department's Hadoop cluster has been running for one months and today needs to be tweaked, but it suddenly turns out that Hadoop is not shutting down properly.

Hadoop version: 2.6.0

The details are as follows:

[Root@master ~]# stop-dfs.sh
stopping namenodes on [master]
Master:no namenode to stop
Slave2:no Datanode to Stop
Slave1:no datanode to stop
...
problem Reason

Execute JPS, find Namenode,datanode and other processes are running normally. Satisfied with the stuffy.

Who reported wrong to find who it, so began to read the hadoop-daemon.sh script file, the result of finding the cause of the problem.

First find the location of the error, in the last few lines of the file:

If [f $pid]; Then
      target_pid= ' Cat $pid '
      if kill-0 $TARGET _pid >/dev/null 2>&1; then
        echo stopping $command 
  kill $TARGET _pid sleep
        $HADOOP _stop_timeout
        if kill-0 $TARGET _pid >/dev/null 2>&1; then
          echo " $command did not stop gracefully after $HADOOP _stop_timeout seconds:killing with kill-9 "
          kill-9 $TARGET _pid
        fi< C9/>else
        Echo no $command to stop
      fi
      rm-f $pid
    Else
      Echo no $command to stop
    fi

There's a lot of code, we just look at the parts we care about:

If [f $pid]; Then.
     #省略n多行
    Else
      Echo no $command to stop
 fi

It is obvious that if the PID file does not exist, it will print: no XXX to stop

So what is the PID file, why does not exist, find the PID variable declaration statement, in the script file line 107th:

pid= $HADOOP _pid_dir/hadoop-$HADOOP _ident_string-$command. PID #第107行

And then look for the declaration section of the HADOOP_PID_DIR variable:

First in the Script Annotation section to find a key word:

#   Hadoop_pid_dir The   PID files are stored./tmp by default.

As we know, the Hadoop_pid_dir variable holds the path of the PID file's storage. The default is stored in the/tmp directory with the following code:

If ["$HADOOP _pid_dir" = ""]; Then   //97~99 line
  hadoop_pid_dir=/tmp
fi

So what is this PID file? When Hadoop is started, it stores the process's PID number in a file so that the process PID can be used to shut down processes while executing the Stop-dfs script.

Now the reason for the problem is clear, that is, the Hadoop-*.pid file in the/tmp directory is not found. solve the problem

Let's see what else is in the/tmp directory:

[Root@slave1 ~]# ll/tmp/
srwxr-x---1 root root    0 Mar 13:39 aegis-<guid ( 5A2C30A2-A87D-490A-9281-6765EDAD7CBA) >
drwxr-xr-x 2 root 4096 Apr 13:55 hsperfdata_root
Srwxr-x---1 root    0 Mar 13:39 qtsingleapp-aegisg-46d2-0
srwxrwxrwx 1 root    0 Mar 13:39 qtsin gleapp-aegiss-a5d2-0

Well, we have everything but what we need.

We know that/tmp is a temporary directory, the system will periodically clean up the files in that directory. Obviously put the PID file here is not reliable, the PID file is not visited for a long time, has been cleaned up!

Now that Hadoop doesn't know what processes need to be shut down, we can only turn it off manually.

First use Ps-ef to view namenode\datanode and other processes of the PID, and then use kill-9 to kill can.

Restart Hadoop and look at the changes in the/tmp directory, with a few more files:

[Root@master ~]# ll/tmp
-rw-r--r--1 root    6 Apr 13:39 hadoop-root-namenode.pid-rw-r--r--1 root    root 6 Apr 13:39 hadoop-root-secondarynamenode.pid-rw-r--r--1 root    6 Apr 13:55 yarn-root-resourcemanager.pid
  drwxr-xr-x 4 root 4096 Apr 14:52 jetty_0_0_0_0_50070_hdfs____w2cu08
drwxr-xr-x 4 root root 4096 Apr 10 14:5 2 Jetty_0_0_0_0_50090_secondary____y6aanv
drwxr-xr-x 5 root root 4096 Apr 15:02 jetty_master_8088_cluster____ i4ls4w

The first three files are the files that hold the PID, and the three jetty_xxx format directory is a temporary directory of Hadoop's Web application, not our concern.

Open a PID file to see:

[Root@master tmp]# Cat hadoop-root-namenode.pid32169

Quite simply, the PID of the Namenode process is saved, and the PID is read from this file when the Namenode process is closed.

The problem has been settled happily here.

However, knowing that the PID file is not safe to put here, still do not modify it appears that I am too lazy.

To modify the PID file storage directory, simply add a single line of statements to the hadoop-daemon.sh script:

Hadoop_pid_dir=/root/hadoop/pid  #第25行

Remember to turn off Hadoop before you modify it, or you can't close it after you've modified it. In the same way, you also need to modify the yarn-daemon.sh

Yarn_pid_dir=/root/hadoop/pid

Then execute start-dfs.sh \ start-yarn.sh start Hadoop. Then go to the/root/hadoop/pid directory to see:

[Root@master pid]# ll-rw-r--r--1 root 5 Apr 14:52 hadoop-root-namenode.pid-rw-r--r--1 root root 5 Apr 14:52 h adoop-root-secondarynamenode.pid-rw-r--r--1 root 5 Apr 15:02 yarn-root-resourcemanager.pid

Well, never again worry about the emergence of no XXX to stop warning. cleanup Policy for/tmp directory

In addition to the replacement of the PID file save path, I can not think of another solution, do not let the operating system to delete the stored in/tmp directory of the PID file is not OK. OK, let's take a look at how the operating system cleans the/tmp directory.

Before I met this problem, I did not mind the/tmp directory, the Niang, get the answer.

Let's take a look at an important order:

Tmpwatch

Tmpwatch instructions can delete unnecessary staging files, you can set the file extended time, the unit in hours.

Common parameters:-
m or –mtime according to the file changed time-C or –ctime according to the file change state time-M or –dirtime exclude a path-X or –exclude=path according to the folder changed time Pattern excludes the path under a rule

/tmp as a temporary folder, the system clears the directory every day by default. The system executes a/etc/cron.daily/tmpwatch this script every day through a timed task. The principle is to use the Tmpwatch directive, set the cleanup strategy. Let's take a look at the script content:

/etc/cron.daily/tmpwatch
#!/bin/sh
flags=-umc
/usr/sbin/tmpwatch "$flags"-x/tmp/. X11-unix-x/tmp/. Xim-unix \
        -x/tmp/.font-unix-x/tmp/. Ice-unix-x/tmp/. Test-unix \
        x '/tmp/hsperfdata_* ' 10d/tmp
/usr/sbin/tmpwatch ' $flags ' 30d/var/tmp for
D in/var/{cache/ Man,catman}/{cat?,x11r6/cat?,local/cat?}; Do
    if [-D "$d"]; then
        /usr/sbin/tmpwatch "$flags"-F 30d "$d"
    Fidone

The 4~6 line of code is a statement that sets the cleanup policy for the/tmp directory. -X or-X is used to exclude files or directories that are not cleaned up, and 10d indicates that files that have not been accessed in the last 10 days are deleted (some may be 240 for 240 hours and 10 days).

OK, 10 days do not have to delete, Hadoop cluster run for dozens of days, of course, can not find the PID file.

Are you aware that the 6th line of code is excluded from the file. /tmp/hsperfdata_*, when we solve the problem above, the first time we view the/tmp directory there is a file that matches this pattern: Hsperfdata_root.

Then, want to not let the system delete PID file, than the durian to draw a scoop on the line. Add an exclusion condition to the Tmpwatch script:

-X '/tmp/*.pid '

However, since it is a temporary directory, the important files do not put this, or recommend the first solution.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.