Using shell scripts to monitor Linux systems and process resources

Source: Internet
Author: User
Tags domain name server egrep


Shell Introduction


The Shell language is familiar to anyone who touches LINUX and is a system user interface that provides an interface for users to interact with the kernel. It receives the command entered by the user and feeds it into the kernel to execute. The shell is actually a command interpreter that interprets the commands entered by the user and sends them to the kernel. It does not have a "compile-link-run" procedure for general programming languages. In addition, the shell has its own programming language for editing commands, which allows the user to write programs composed of Shell commands. Shell programming languages have many features of a common programming language, such as a looping structure and branching control structures, and shell programs written in this programming language have the same effect as other applications. Of course, the Shell function is also very powerful. There are several types of shells, the most common of which are the Bourne shell (SH), the C Shell (CSH), and the Korn shell (Ksh). Each of the three shells has advantages and disadvantages, the Linux operating system default shell is generally the Bourne Again shell, which is Bourne shell extension, called Bash,bash command syntax is a superset of Bourne shell command syntax, and in The Bourne shell adds and enhances a number of features on the base. Here, we use Bash as an example to summarize some of the things that the Shell uses to monitor system and process resources, and hopefully it will help you.



Back to top of page


Use the Shell to monitor process resources to see if the process exists


In the process of monitoring, we generally need to get the ID of the process, the process ID is the unique identity of the process, but sometimes it may be different users on the server running a number of the same process name of the process, the following function Getpid gives the specified user under the specified process name under the process ID function (currently only consider the process under which this user starts a process name), it has two parameters for the user name and process name, it first uses the PS to find process information, while using grep to filter out the required process, and finally through SED and awk to find the process ID value (this function can be modified according to the actual situation, such as the need to filter other information, etc.).


Listing 1. Monitor the process
 function GetPID #User #Name 
 { 
    PsUser=$1 
    PsName=$2 
    pid=`ps -u $PsUser|grep $PsName|grep -v grep|grep -v vi|grep -v dbx\n 
    |grep -v tail|grep -v start|grep -v stop |sed -n 1p |awk ‘{print $1}‘` 
    echo $pid 
 }


Sample Demo:



1) source program (for example, find user root, process ID of process named Cftestapp)


  PID=`GetPID root CFTestApp` 
 
    echo $PID


2) Result output


  11426 
    [[email protected] shell]$


3) Results Analysis



Visible from the above output: 11426 is the process ID of the Cftestapp program under the root user.



4) Command Introduction


1. PS: View the instantaneous process information in the system. Parameter:-u< User ID > list the status of the program that belongs to the user, or you can specify it by using the user name. -p< Process Identification Code > Specifies the process identifier and lists the status of the process. -o Specifies the output format 2. grep: Used to find the current line in a file that matches a string. Parameter:-V reverses the selection, which shows the line without the ' search string ' content. 3. SED: A non-interactive text editor that edits files or standard input exported files and can only process one line at a time. Parameter:-N reads the next input line, processing the new row with the next command instead of the first command. The P flag prints matching line 4. AWK: A programming language that is used to process text and data under Linux/unix. Data can come from standard input, one or more files, or the output of other commands. It supports advanced functions such as user-defined functions and dynamic regular expressions, and is a powerful programming tool under Linux/unix. It is used in the command line, but more is used as a script. Awk's way of working with text and data: It scans the file row by line, from the first line to the last line, looking for rows that match a particular pattern, and doing the actions you want on those lines. If no processing action is specified, the matching rows are displayed to the standard output (screen), and if no pattern is specified, all rows specified by the operation are processed. Parameter:-F FS or--field-separator FS: Specifies the input file delimiter, FS is a string or is a regular expression, such as-f:.


Sometimes it is possible that the process does not start, the following function is to check if the process ID exists, if this process does not run the output:


The process does not exist.
     # Check if the process exists
     if ["-$ PID" == "-"]
     then
     {
         echo "The process does not exist."
     }
     fi
Detecting Process CPU Utilization


When it comes to maintenance of application services, we often experience disruptions due to the high CPU that causes business to block. High CPU may be due to traffic overload or a dead loop and other anomalies, through the script to the business process CPU monitoring, can be in the event of abnormal CPU utilization timely notify the maintenance staff, easy to maintain timely analysis, positioning, and avoid business interruption. The following function obtains the process CPU utilization for the specified process ID. It has a parameter of process ID, which first uses PS to find process information, filters out%cpu rows through grep-v, and finally through awk to find the integer portion of the CPU utilization percentage (if multiple CPU,CPU utilization in the system can exceed 100%).


Listing 2. Real-time monitoring of business process CPUs
function GetCpu 
  { 
   CpuValue=`ps -p $1 -o pcpu |grep -v CPU | awk ‘{print $1}‘ | awk -  F. ‘{print $1}‘` 
        echo $CpuValue 
    }


The following function is to obtain the CPU utilization of this process through the function getcpu above, and then through the conditional statement to determine whether the CPU utilization exceeds the limit, if more than 80% (can be adjusted according to the actual situation), the output alarm, otherwise output normal information.


Listing 3. Determine if CPU utilization exceeds the limit
 function CheckCpu 
 { 
    PID=$1 
    cpu=`GetCpu $PID` 
    if [ $cpu -gt 80 ] 
    then 
    { 
 echo “The usage of cpu is larger than 80%”
    } 
    else 
    { 
 echo “The usage of cpu is normal”
    } 
    fi 
 }


Sample Demo:



1) source program (assuming that the process ID of Cftestapp has been queried above is 11426)


Checkcpu 11426


2) Result output


 The usage of cpu is 75 
    The usage of cpu is normal 
    [[email protected] shell]$


3) Results Analysis



From the above output is visible: The Cftestapp program current CPU usage is 75%, is normal, no more than 80% of the alarm limit.


Detecting Process Memory usage


When it comes to maintenance of application services, it is also often encountered when a process crashes due to excessive memory usage (for example, a 32-bit program can address a maximum memory space of 4G, and if memory failure is exceeded, the physical memory is limited). Memory usage is too high may be due to memory leaks, message stacking, and so on, through the script to the business process memory usage monitoring, can be in memory when the use of anomalies in time to send an alarm (such as by SMS), so that maintenance personnel timely processing. The following function obtains process memory usage for the specified process ID. It has a parameter of process ID, which first uses PS to find process information, and then filters out the VSZ line through Grep-v, and then takes the memory usage in megabytes, except for 1000.


Listing 4. Monitoring of business process memory usage
   function GetMem 
    { 
        MEMUsage=`ps -o vsz -p $1|grep -v VSZ` 
        (( MEMUsage /= 1000)) 
        echo $MEMUsage 
    }


The following function is used by the above functionGetMemto obtain the memory usage of this process, and then through the conditional statement to determine whether memory usage exceeds the limit, if more than 1.6G (can be adjusted according to the actual situation), output alarm, otherwise output normal information.


Listing 5. Determine if memory usage exceeds the limit
 mem=`GetMem $PID`                
 if [ $mem -gt 1600 ] 
 then 
 { 
     echo “The usage of memory is larger than 1.6G”
 } 
 else 
 { 
    echo “The usage of memory is normal”
 } 
 fi


Sample Demo:



1) source program (assuming that the process ID of Cftestapp has been queried above is 11426)


 mem=`GetMem 11426` 

    echo "The usage of memory is $mem M"

    if [ $mem -gt 1600 ] 
    then 
    { 
         echo "The usage of memory is larger than 1.6G"
    } 
    else 
    { 
        echo "The usage of memory is normal"
    } 
    fi


2) Result output


 The usage of memory is 248 M 
    The usage of memory is normal 
    [[email protected] shell]$


3) Results Analysis



From the above output is visible: Cftestapp program Current memory usage is 248M, is normal, no more than 1.6G alarm limit.


Detection Process Handle Usage


When you maintain an application service, you also often encounter situations in which a business outage is caused by excessive handle usage. Each platform has a limited handle on the process, for example on a Linux platform, we can use the Ulimit–n command (open files (-N) 1024) or view the contents of the/etc/security/limits.conf to get the process handle limit. Handle using too high may be due to high load, handle leaks and so on, through the script to the business process handle usage to monitor, can be in time to send an alarm in the event of an exception (for example, by SMS), so that maintenance personnel timely processing. The following function obtains the process handle usage for the specified process ID. It has an argument for the process ID, which first uses the LS to output the process handle information, and then counts the number of output handles by wc-l.


   function GetDes 
    { 
        DES=`ls /proc/$1/fd | wc -l` 
        echo $DES 
    }


The following function is to obtain the handleGetDesusage of this process through the above function, and then through the conditional statement to determine whether the handle is more than the limit, if more than 900 (can be adjusted according to the actual situation), then output the alarm, otherwise the output of normal information.


des=` GetDes $PID` 
 if [ $des -gt 900 ] 
 then 
 { 
     echo “The number of des is larger than 900”
 } 
 else 
 { 
    echo “The number of des is normal”
 } 
 fi


Sample Demo:



1) source program (assuming the process ID of the above query out Cftestapp is 11426)


  des=`GetDes 11426` 

    echo "The number of des is $des"

    if [ $des -gt 900 ] 
    then 
    { 
         echo "The number of des is larger than 900"
    } 
    else 
    { 
        echo "The number of des is normal"
    } 
    fi


2) Result output


  The number of des is 528 
    The number of des is normal 
    [[email protected] shell]$


3) Results Analysis



Visible from the above output: the current handle of the Cftestapp program is used for 528, is normal, no more than 900 alarm limits.



4) Command Introduction


WC: Counts the number of bytes, words, and lines in the specified file, and displays the output of the statistic results. Parameter:-L counts the number of rows. -C Count bytes. -W count words.


Back to top of page


Use the Shell to monitor system resources to see if a TCP or UDP port is listening


Port detection is often encountered in system resource detection, especially in the case of network communication, the detection of port status is often very important. Sometimes the process, CPU, memory, etc. are in a normal state, but the port is in an abnormal state and the business is not functioning properly. The following function can determine whether the specified port is listening. It has a parameter to be detected port, it first use netstat output port occupy information, and then through grep, AWK,WC filter output listens to the number of TCP ports, the second statement is the output UDP port listening number, if TCP and UDP port monitoring is 0, return 0, otherwise return Back to 1.


Listing 6. Port detection
 function Listening 
 { 
    TCPListeningnum=`netstat -an | grep ":$1 " | \n
    awk ‘$1 == "tcp" && $NF == "LISTEN" {print $0}‘ | wc -l` 
    UDPListeningnum=`netstat -an|grep ":$1 " \n
    |awk ‘$1 == "udp" && $NF == "0.0.0.0:*" {print $0}‘ | wc -l` 
    (( Listeningnum = TCPListeningnum + UDPListeningnum )) 
    if [ $Listeningnum == 0 ] 
    then 
    { 
        echo "0"
    } 
    else 
    { 
       echo "1"
    } 
    fi 
 }


Sample Demo:



1) source program (for example, querying the status of Port 8080 is listening)


  isListen=`Listening 8080` 
    if [ $isListen -eq 1 ] 
    then 
    { 
        echo "The port is listening"
    } 
    else 
    { 
        echo "The port is not listening"
    } 
    fi


2) Result output


    The port is listening     [[email protected] shell]$


3) Results Analysis



Visible from the above output: the 8080 port of this Linux server is in the listening state.



4) Command Introduction


Netstat: Used to display statistics related to IP, TCP, UDP, and ICMP protocols, and is typically used to verify the network connectivity of each port on the machine. Parameter:-A shows all sockets in the connection. -N uses the IP address directly, not through the domain name server.


The following feature also detects whether a TCP or UDP port is in a normal state.


Tcp:netstat-an|egrep $ |awk ' $6 = = ' LISTEN ' && ' = = ' TCP ' {print $} ' udp:netstat-an|egrep ' |awk ' = = ' u DP "&& $ = =" 0.0.0.0:* "{print $} '


Command Introduction


Egrep: Finds the specified string within the file. Egrep execution effects such as GREP-E, the use of syntax and parameters can refer to the grep directive, unlike grep is the method of interpreting the string, Egrep is interpreted with extended regular expression syntax, and grep uses the basic regular expression syntax, The extended regular expression has a more complete specification than the basic regular expression. To see the number of processes a process name is running


Sometimes we may need to get the number of startup of a process on the server, the following function is to detect the number of processes that a process is running, such as a process namedCFTestApp。


Runnum= ' Ps-ef | Grep-v VI | GREP-V Tail | grep "[/]cftestapp" | Grep-v grep | Wc-l
Detecting System CPU Load


When servicing a server, there are times when a business outage occurs due to excessive system CPU (utilization) load. Multiple processes may run on the server, and it is normal for the CPU of a single process to be seen, but the CPU load of the entire system may be abnormal. Through the script to monitor the system CPU load, can send the alarm in time of abnormal, so as to facilitate the maintenance personnel timely processing, prevent accidents. The following functions can detect system CPU usage. Use Vmstat to take the idle value of 5 times the system CPU, take the average, and then get the actual CPU occupancy value by the 100 difference.


 function GetSysCPU 
 { 
   CpuIdle=`vmstat 1 5 |sed -n ‘3,$p‘ \n
   |awk ‘{x = x + $15} END {print x/5}‘ |awk -F. ‘{print $1}‘
   CpuNum=`echo "100-$CpuIdle" | bc` 
   echo $CpuNum 
 }


Sample Demo:



1) source program


 cpu=`GetSysCPU` 

 echo "The system CPU is $cpu"

 if [ $cpu -gt 90 ] 
 then 
 { 
    echo "The usage of system cpu is larger than 90%"
 } 
 else 
 { 
    echo "The usage of system cpu is normal"
 } 
 fi


2) Result output


The system CPU is 87 
 The usage of system cpu is normal 
 [[email protected] shell]$


3) Results Analysis



From the above output is visible: The current Linux server System CPU Utilization is 87%, is normal, no more than 90% of the alarm limit.



4) Command Introduction


Vmstat:virtual meomory Statistics (virtual memory Statistics) is abbreviated to monitor the operating system's virtual memory, process, and CPU activity.
Parameter:-n means that the header information for the output is displayed only once during the periodic loop output. Detecting system disk space


System disk space detection is an important part of system resource detection, in the system maintenance and maintenance, we often need to check the server disk space usage. Because some businesses want to always write a word, log, or temporary files, etc., if the disk space is exhausted, may also cause business interruption, the following function can detect the current system disk space in a directory of disk space usage. Input parameters are directory names that need to be detected, use DF to output the system disk space usage information, and then use grep and awk to filter the percentage of disk space used for a directory.


 function GetDiskSpc 
 { 
    if [ $# -ne 1 ] 
    then 
        return 1 
    fi 

    Folder="$1$"
    DiskSpace=`df -k |grep $Folder |awk ‘{print $5}‘ |awk -F% ‘{print $1}‘
    echo $DiskSpace 
 }


Sample Demo:



1) source program (Test directory is/boot)


 Folder="/boot"

 DiskSpace=`GetDiskSpc $Folder` 

 echo "The system $Folder disk space is $DiskSpace%"

 if [ $DiskSpace -gt 90 ] 
 then 
 { 
    echo "The usage of system disk($Folder) is larger than 90%"
 } 
 else 
 { 
    echo "The usage of system disk($Folder)  is normal"
 } 
 fi


2) Result output


 The system /boot disk space is 14% 
 The usage of system disk(/boot)  is normal 
 [[email protected] shell]$


3) Results Analysis



Visible from the above output: the disk space for the/boot directory on this Linux server system is now 14%, is normal, and does not exceed the 90% alarm limit.



4) Command Introduction


DF: Check the disk space usage of the file system. You can use this command to get information about how much space the hard disk is taking up, and how much space is left. Parameters:-K is displayed in K-byte units.


Back to top of page


Summarize


Under the Linux platform, Shell script monitoring is a very simple, convenient and effective way to monitor the server and process, which is very helpful for system development and process maintenance personnel. It can not only monitor the above information, send alarms, but also can monitor the process of the log and so on, I hope this article is helpful to everyone.



Back to top of page


Download
Description name size
Sample code Test.zip 4KB
Reference Learning
    • Please refer to: Bash Reference Manual (http://www.gnu.org/software/bash/manual/bashref.html).
    • Find out more about our most popular articles and tutorials in the DeveloperWorks Linux zone for more reference materials for Linux developers, including beginners for Linux.
    • Check out all Linux tips and Linux tutorials on the developerWorks.
    • Stay tuned for DeveloperWorks technical activities and webcasts.
Discuss
    • Joining the DeveloperWorks Chinese community, the DeveloperWorks community is a professional social networking community for global IT professionals who can provide community functions such as blogs, bookmarks, wikis, groups, contacts, sharing, and collaboration.
    • Join the IBM software download and technology Exchange Group to participate in online communication.


Using shell scripts to monitor Linux systems and process resources


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.