Using process_monitor.sh to monitor the crontab configuration of the Hadoop process

Source: Internet
Author: User
Tags zookeeper truncated
Using process_monitor.sh to monitor the crontab configuration of the Hadoop process

You can find process_monitor.sh from the following links:
https://github.com/eyjian/mooon/blob/master/common_library/shell/process_monitor.sh


---------------------------------------------------------Script Content--------------------------------------------------------
#!/bin/sh
# https://github.com/eyjian/mooon/blob/master/common_library/shell/process_monitor.sh
# Created by Yijian on 2012/7/23
#
# Run LOG:/tmp/process_monitor.log, because multiple processes are written at the same time, not necessarily complete, only for reference.
# Please put it in the crontab, such as (note that you want to run in the background, because the script is resident does not quit):
* * * * * * */usr/local/bin/process_monitor.sh/usr/sbin/rinetd/usr/sbin/rinetd >/dev/null 2>&1 &
#
# process Monitoring script, when the specified process does not exist, perform a restart script to pull it up
Characteristics
# 1. This monitoring script can be repeatedly executed, it will automatically do mutually exclusive
# 2. Mutex is not only based on the monitor script filename, but also contains its command-line arguments, only the whole is mutually exclusive.
# 3. For a monitored process, you can specify only the process name or the command line arguments
# 4. Whether it is a monitoring script or a monitored process, it is always targeted only at processes that belong to the current user
#
# If this script works manually, but not in crontab, consider checking for commands like PS to work properly in crontab


# In fact, when you encounter a script running in crontab, you can't find the commands such as LS and PS
# There are some environments. LS and PS are located in the/usr/bin directory, not the regular/bin directory
Export Path=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/bin: $PATH
Trap "" Sigpipe # Ignore Sigpipe


# command line arguments that require a specified number
# parameter 1: the monitored process name (can contain command-line arguments)
# parameter 2: Restart the script for the monitored process
if test $#-ne 2; Then
printf "\033[1;33musage: $ process_cmdline restart_script\033[m\n"
printf "\033[1;33mexample:/usr/local/bin/process_monitor.sh \/usr/sbin/rinetd\" \ "/usr/sbin/rinetd\" \033[m\n "
printf "\033[1;33mplease install process_monitor.sh into crontab by" * * * * *\ "\033[m\n"
Exit 1
Fi


Process_cmdline= "$" The name of the process that needs to be monitored, or the complete command line, or a partial command line
Restart_script= "$" # a script to restart a process that requires executable permissions
monitor_interval=2 # timed to detect time interval, in seconds
Start_seconds=5 # How many seconds will it take to be monitored for process startup
Cur_user= ' whoami ' # User name for execution of this monitoring script


# Take the IP address on the specified network card
#eth =1&&netstat-ie|awk-f ' [:] ' begin{found=0;} {if (Match ($, "eth ' $eth")) found=1 else if (1==found) && match ($, "eth")) Found=0; if ((1==found) && Match ($, "inet addr:") && match ($, "bcast:")) print $} '


# The following script is used to prevent multiple monitoring script processes from appearing
Uid= ' id-u $cur _user '
Self_name= ' basename $ '
Self_cmdline= "$ $*"
process_name=$ (basename ' echo "$process _cmdline" |cut-d ""-f1 ")
Process_match= "${process_cmdline#*}" # keep only the parts of the argument used to match
process_match=$ (Echo $process _match) # Remove the space before and after


# used to make mutual exclusion,
# to ensure that only the first boot can run,
# But if the different parameters don't affect each other,
# This ensures that different objects can be monitored at the same time.
# because the trap command is not valid for the KILL command, it cannot be mutually exclusive by creating a file.
Active=0


# log files, which may be run by multiple users,
# so the log file name needs to be added with the username, otherwise other users may not have permission to write
log_filepath=/tmp/process_monitor-$cur _user.log
# log file size (10M)
log_filesize=10485760


# write log function with 1 parameters:
# 1 log to be written
Log ()
{
# Create a log file if it doesn't exist
if test! -F $log _filepath; Then
Touch $log _filepath
Fi


Record=$1
# Get log file size
file_size= ' ls--time-style=long-iso-l $log _filepath 2>/dev/null|cut-d "-f5"


# processing log files too large
# Log plus header [$process _cmdline], used to differentiate the monitoring of different objects
if test! -Z "$file _size"; Then
if test $file _size-lt $log _filesize; Then
printf "[$process _cmdline] $record"
printf "[$process _cmdline] $record" >> $log _filepath
Else
printf "[$process _cmdline] $record" >> $log _filepath
MV $log _filepath $log _filepath.bak # Backup


printf "[$process _cmdline][' date + '%y-%m-%d%h:%m:%s ']truncated\n"
printf "[$process _cmdline][' date + '%y-%m-%d%h:%m:%s ']truncated\n" > $log _filepath


printf "[$process _cmdline] $record"
printf "[$process _cmdline] $record" >> $log _filepath
Fi
Fi
}


# timed to detect whether a specified process exists in a dead loop
# One important reason is the crontab maximum frequency of 1 minutes, does not meet the second level of monitoring requirements
While true; Todo
Self_count= ' ps-c $self _name h-o euser,args| awk ' BEGIN {num=0;} {if ($1==uid | | $1==cur_user) && match ($, self_cmdline)) {++num}} End {printf ("%d", num);} ' uid= $uid cur_user= $cur _user self_cmdline= ' $self _cmdline '
if test! -Z "$self _count"; Then
if test $self _COUNT-GT 2; Then
Log "\033[0;32;31m[' date + '%y-%m-%d%h:%m:%s ']$0 is running[$self _count/active: $active], the current user is $cur _user.\ 033[m\n "
# tested, normally 2,
# But after running for a while, the value is 3, so it's necessary to put it in the crontab.
# If the monitor script is already running, exit does not run repeatedly
if test $active-eq 0; Then
Exit 1
Fi
Fi
Fi


# Check if the monitored process exists and reboot if it does not exist
If Test-z "$process _match"; Then
Process_count= ' ps-c $process _name h-o euser,args| awk ' BEGIN {num=0;} {if (($1==uid | | $1==cur_user)) {++num}} End {printf ("%d", num);} ' uid= $uid cur_user= $cur _user '
Else
Process_count= ' ps-c $process _name h-o euser,args| awk ' BEGIN {num=0;} {if ($1==uid | | $1==cur_user) && match ($, process_match)) {++num}} End {printf ("%d", num);} ' uid= $uid cur_user= $cur _user process_match= ' $process _match '
Fi
if test! -Z "$process _count"; Then
if test $process _count-lt 1; Then
# Perform a restart script that requires the script to pull up the specified process
Log "\033[0;32;34m[' date + '%y-%m-%d%h:%m:%s ']restart \" $process _cmdline\ "\033[m\n"
Sh-c "$restart _script" >> $log 2>&1 # Attention must be done in "sh-c" mode
Fi
Fi


Active=1
# sleep time is a bit longer because startup may not be that fast to prevent multiple processes from starting
# In some environments encounter sleep is not valid, after normal sleep "$?" Value is 0, the exception becomes "141",
# This is because you received the signal 13, you can use the trap ' sigpipe ' to ignore sigpipe.
Sleep $start _seconds
Done
Exit 0
---------------------------------------------------------Script Content--------------------------------------------------------




Assume:
1) Java installation directory for/DATA/JDK
2 Monitor Script process_monitor Monitor script process_monitor.sh installation directory is/usr/local/bin
3) Hadoop installation directory for/data/hadoop
4) HBase installation directory for/data/hbase
5) Zookeeper installation directory for/data/zookeeper




You can see the process ID by JPS, and then kill the process with the KILL command to see the effect of the monitoring pull.


The process_monitor.sh is checked every 2 seconds (specified by the variable monitor_interval) and starts immediately when it discovers that the process does not exist.
Assuming that the running user is root, the process_monitor.sh log file is/tmp/process_monitor-root.log.
Assuming the run user is test, the process_monitor.sh log file is/tmp/process_monitor-test.log, and so on.
You can learn about Process_monitor.sh's operation by tail-f the process_monitor.sh log.




process_monitor.sh with two parameters,
The first parameter is the monitored process object, and process_monitor.sh relies on the second parameter to reboot the monitored object.
The first argument is divided into two parts, the part before the first space, and the part after the space.
The first part is the process name of the monitored object, and for the Java program, the process name is Java, not the name of the jar package.
The second part of the parameter, dispensable, but it is through it to distinguish between the monitored objects, for Java, shell programs, etc. are necessary.
In addition, the process_monitor.sh to the parameter part is to adopt the fuzzy partial matching method.
You can use the PS aux command to determine the process name and parameters.


The crontab configuration is as follows:
# monitoring HDFs Namenode
* * * * * */usr/local/bin/process_monitor.sh "/data/jdk/bin/java-dproc_namenode" "/data/hadoop/sbin/hadoop-daemon.sh Start Namenode "
# monitor HDFs switch main standby namenode program
* * * * */usr/local/bin/process_monitor.sh "/DATA/JDK/BIN/JAVA-DPROC_ZKFC" "/data/hadoop/sbin/hadoop-daemon.sh start" ZKFC "
# monitoring HDFs Journalnode
* * * * * */usr/local/bin/process_monitor.sh "/data/jdk/bin/java-dproc_journalnode" "/data/hadoop/sbin/ hadoop-daemon.sh start Journalnode "
# monitoring HDFs Datanode
* * * * * */usr/local/bin/process_monitor.sh "/data/jdk/bin/java-dproc_datanode" "/data/hadoop/sbin/hadoop-daemon.sh Start Datanode "
# Monitor HBase Master
* * * * */usr/local/bin/process_monitor.sh "/data/jdk/bin/java-dproc_master" "/data/hbase/bin/hbase-daemon.sh start" Master
# monitoring HBase thrift2
* * * * */usr/local/bin/process_monitor.sh "/data/jdk/bin/java-dproc_thrift2" "/data/hbase/bin/hbase-daemon.sh start" THRIFT2--framed-nonblocking "
# Monitoring Zookeeper
* * * * */usr/local/bin/process_monitor.sh "/data/jdk/bin/java-dzookeeper" "/data/zookeeper/bin/zkserver.sh start"
# monitoring HBase Regionserver
* * * * * */usr/local/bin/process_monitor.sh "/data/jdk/bin/java-dproc_regionserver" "/data/hbase/bin/hbase-daemon.sh Start Regionserver "
# Monitoring Yarn ResourceManager
* * * * * */usr/local/bin/process_monitor.sh "/data/jdk/bin/java-dproc_resourcemanager" "/data/hadoop/sbin/ yarn-daemon.sh start ResourceManager "
# Monitoring Yarn NodeManager
* * * * * */usr/local/bin/process_monitor.sh "/data/jdk/bin/java-dproc_nodemanager" "/data/hadoop/sbin/yarn-daemon.sh Start NodeManager "
# monitoring Hiveserver
* * * * * * */usr/local/bin/process_monitor.sh "/data/jdk/bin/java HiveServer2" "/data/gongyi/hive/bin/hiveserver2 &"
# monitoring Hive Metastore
* * * * * */usr/local/bin/process_monitor.sh "/data/jdk/bin/java hivemetastore" "/data/gongyi/hive/bin/hive--service Metastore & "

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.