Monitor Linux system and process resources using Shell scripts

Source: Internet
Author: User
Tags processing text high cpu usage egrep
This article mainly introduces how to monitor Linux system and process resources by using Shell scripts. This article describes how to check whether a process exists, check the CPU usage of the Process, check the memory usage of the process, and check the handle usage of the process ,, for more information, see monitoring various resources on the server during server O & M, such as CPU load monitoring, disk usage monitoring, and process count monitoring, to promptly notify the system administrator of an exception. This article describes several common monitoring requirements in Linux and How to Write shell scripts.

Article directory:

1. Use Shell in Linux to check whether the process exists
2. Use Shell in Linux to check the CPU usage of processes
3. Use Shell in Linux to detect memory usage of processes
4. Use Shell in Linux to detect Process Handle usage
5. Use Shell in Linux to check whether a TCP or UDP port is listening
6. Use Shell in Linux to view the number of running process names
7. Linux uses Shell to detect the CPU load of the system
8. Use Shell in Linux to check system disk space
9. Summary

Check whether the process exists

During process monitoring, we generally need to obtain the ID of the process, which is the unique identifier of the process, however, multiple processes with the same process name may run under different users on the server, the following function, GetPID, provides the process ID function for obtaining the specified process name under a specified user (currently, only the user is considered to start a process with the specified process name ), it has two parameters: User Name and process name. It first uses ps to find process information and uses grep to filter out the required process, finally, use sed and awk to find the ID value of the desired process (this function can be modified according to the actual situation, such as filtering other information ).

List 1. Process Monitoring

The Code is as follows:


Function GetPID # User # Name
{
PsUser = $1
PsName = $2
Pid = 'ps-u $ PsUser | grep $ PsName | grep-v grep | grep-v vi | grep-v dbx \ n
| Grep-v tail | grep-v start | grep-v stop | sed-n 1 p | awk '{print $1 }''
Echo $ pid
}

Example:

1) source program (for example, find the process ID whose user is root and whose process name is CFTestApp)

The Code is as follows:


PID = 'getpid root cftestapp'

Echo $ PID


2) result output

The Code is as follows:


11426
[Dyu @ xilinuxbldsrv shell] $

3) Result Analysis

As shown in the preceding output, 11426 is the process ID of the CFTestApp program under the root user.

4) command Introduction

1. ps: View instantaneous process information in the system. Parameter:-u <user identification code> lists the status of programs belonging to this user, or you can specify them by user name. -P <process identifier> specifies the process identifier and lists the status of the process. -O specified output format 2. grep: used to find the current row in the file that matches the string. Parameter:-v is selected in reverse direction, that is, the row without the 'searchstring' content is displayed. 3. sed: A non-interactive text editor that edits files or files exported from standard input and can only process one line of content at a time. Parameter:-n reads the next input line and uses the next command to process the new line instead of the first command. P Flag print matching line 4. awk: a programming language used to process text and data in linux/unix. Data can come from standard input, one or more files, or the output of other commands. It supports advanced functions such as user-defined functions and dynamic regular expressions, and is a powerful programming tool in linux/unix. It is used in the command line, but more is used as a script. Awk's method of processing text and data: It scans files row by row, from the first row to the last row, looks for matching rows in a specific mode, and performs the operations you want on these rows. If no processing action is specified, the matched rows are displayed to the standard output (screen). If no mode is specified, all the rows specified by the operation are processed. Parameter:-F fs or-field-separator fs: Specifies the delimiter of the input file. fs is a string or a regular expression, such as-F :.
Sometimes the process may not be started. The following function is to check whether the process ID exists. If the process does not run the output:

The Code is as follows:


The process does not exist.
# Check whether a process exists
If ["-$ PID" = "-"]
Then
{
Echo "The process does not exist ."
}
Fi

Detect process CPU usage

During maintenance of application services, we often encounter service interruptions due to high CPU usage. If the CPU usage is too high, the service process CPU may be monitored from time to time through scripts due to overload of business traffic or exceptions such as endless loops. When the CPU usage is abnormal, the maintenance personnel can be notified in a timely manner, it facilitates maintenance personnel to analyze, locate, and avoid service interruptions in a timely manner. The following function obtains the CPU usage of a specified process ID. It has a parameter named process ID. It first uses ps to find process information and filters out % CPU rows through grep-v, finally, use awk to find the integer portion of CPU utilization percentage (if the system has multiple CPUs, the CPU utilization can exceed 100% ).

List 2. Monitor the CPU of a Business Process in real time

The Code is as follows:


Function GetCpu
{
CpuValue = 'ps-p $1-o pcpu | grep-v CPU | awk '{print $1}' | awk-F. '{print $1 }''
Echo $ CpuValue
}


The following function is to use the GetCpu function to obtain the CPU usage of the process, and then use a conditional statement to determine whether the CPU usage has exceeded the limit. If the limit is exceeded 80% (you can adjust it based on the actual situation ), an alarm is generated. Otherwise, the normal information is output.

Listing 3. Determine whether the CPU usage exceeds the limit

The Code is as follows:


Function CheckCpu
{
PID = $1
Cpu = 'getcpu $ PID'
If [$ cpu-gt 80]
Then
{
Echo "The usage of cpu is larger than 80%"
}
Else
{
Echo "The usage of cpu is normal"
}
Fi
}

Example:

1) source program (assuming that the process ID of CFTestApp has been found to be 11426)

The Code is as follows:


CheckCpu 11426


2) result output

The Code is as follows:


The usage of cpu is 75
The usage of cpu is normal
[Dyu @ xilinuxbldsrv shell] $


3) Result Analysis

The preceding output shows that the current CPU usage of the CFTestApp is 75%, which is normal and does not exceed the alarm limit of 80%.

Detect process memory usage

During maintenance of application services, the process crashes due to excessive memory usage, resulting in service interruption (for example, the maximum memory size of 32-bit programs is 4 GB, if the request exceeds the limit, the memory will fail and the physical memory will be limited ). If the memory usage is too high, it may be caused by memory leakage or message accumulation. You can use scripts to monitor the memory usage of business processes from time to time, and send alarms (such as text messages) When memory usage is abnormal ), it is easy for maintenance personnel to handle the problem in a timely manner. The following function obtains the process memory usage of the specified process ID. It has a parameter named process ID. It first uses ps to find process information, filters out VSZ rows through grep-v, and then obtains memory usage in megabytes by Division of 1000.

Listing 4. Monitor the memory usage of business processes

The Code is as follows:


Function GetMem
{
MEMUsage = 'ps-o vsz-p $1 | grep-v vsz'
(MEMUsage/= 1000 ))
Echo $ MEMUsage
}


The following function obtains the memory usage of the process through the above function GetMem, and then uses the conditional statement to determine whether the memory usage exceeds the limit. If the memory usage exceeds 1.6 GB (you can adjust it according to the actual situation ), an alarm is generated. Otherwise, the normal information is output.

Listing 5. Determine whether the memory usage exceeds the limit

The Code is as follows:


Mem = 'getmem $ PID'
If [$ mem-gt 1600]
Then
{
Echo "The usage of memory is larger than 1.6G"
}
Else
{
Echo "The usage of memory is normal"
}
Fi

Example:

1) source program (assuming that the process ID of CFTestApp has been found to be 11426)

The Code is as follows:


Mem = 'getmem 11426'

Echo "The usage of memory is $ mem M"

If [$ mem-gt 1600]
Then
{
Echo "The usage of memory is larger than 1.6G"
}
Else
{
Echo "The usage of memory is normal"
}
Fi

2) result output

The Code is as follows:


The usage of memory is 248 M
The usage of memory is normal
[Dyu @ xilinuxbldsrv shell] $

3) Result Analysis

The preceding output shows that the current memory usage of the CFTestApp is 248 MB, which is normal and does not exceed the alarm limit of 1.6 GB.

Detect Process Handle usage

During the maintenance of application services, service interruption is often caused by excessive use of handles. Each platform has limited use of process handles. For example, on a Linux platform, we can use the ulimit-n command (open files (-n) 1024) or to/etc/security/limits. to obtain the Process Handle limit. If the handle usage is too high, it may be caused by overload or handle leakage. You can use scripts to monitor the handle usage of business processes from time to time, and send alarms (such as text messages) in case of exceptions ), it is easy for maintenance personnel to handle the problem in a timely manner. The following function obtains the Process Handle usage of the specified process ID. It has a parameter named process ID. It first uses ls to output process handle information, and then uses wc-l to calculate the number of output handles.

The Code is as follows:


Function GetDes
{
DES = 'ls/proc/$1/fd | wc-l'
Echo $ DES
}

The following function obtains the handle usage of the process through the above function GetDes, and then uses the conditional statement to determine whether the handle usage exceeds the limit. If the usage exceeds 900 (you can adjust it according to the actual situation, an alarm is generated. Otherwise, the normal information is output.

The Code is as follows:


Des = 'getdes $ PID'
If [$ des-gt 900]
Then
{
Echo "The number of des is larger than 900"
}
Else
{
Echo "The number of des is normal"
}
Fi

Example:

1) source program (assuming the CFTestApp process ID is 11426 found above)

The Code is as follows:


Des = 'getdes 11426'

Echo "The number of des is $ des"

If [$ des-gt 900]
Then
{
Echo "The number of des is larger than 900"
}
Else
{
Echo "The number of des is normal"
}
Fi

2) result output

The Code is as follows:


The number of des is 528
The number of des is normal
[Dyu @ xilinuxbldsrv shell] $


3) Result Analysis

From the above output, we can see that the current number of handles in the CFTestApp is 528, which is normal and does not exceed the alarm limit of 900.

4) command Introduction

Wc: counts the number of bytes, number of words, and number of rows in a specified file, and displays the statistical results. Parameter:-l number of statistics rows. -C: counts the number of bytes. -W counts the number of words.

Check whether a TCP or UDP port is listening

Port detection is often encountered in system resource detection. Especially in network communication, port status detection is often very important. Sometimes the process, CPU, memory, and so on may be in the normal state, but the port is in the abnormal state, and the service is not running normally. The following function determines whether the specified port is listening. It has a parameter to be checked. It first uses netstat to output the port occupation information, and then filters out the number of TCP ports monitored by grep, awk, and wc, the second statement is the number of listeners for the output UDP port. If both the TCP and UDP port listeners are 0, 0 is returned; otherwise, 1 is returned.

Listing 6. Port Detection

The Code is as follows:


Function Listening
{
TCPListeningnum = 'netstat-an | grep ": $1" | \ n
Awk '$1 = "tcp" & $ NF = "LISTEN" {print $0}' | wc-l'
UDPListeningnum = 'netstat-an | grep ": $1" \ n
| Awk '$1 = "udp" & $ NF = "0.0.0.0: *" {print $0}' | wc-l'
(Listeningnum = TCPListeningnum + UDPListeningnum ))
If [$ Listeningnum = 0]
Then
{
Echo "0"
}
Else
{
Echo "1"
}
Fi
}


Example:

1) source program (for example, querying whether the status of port 8080 is listening)


The Code is as follows:


IsListen = 'listening 8080'
If [$ isListen-eq 1]
Then
{
Echo "The port is listening"
}
Else
{
Echo "The port is not listening"
}
Fi


2) result output

The Code is as follows:


The port is listening
[Dyu @ xilinuxbldsrv shell] $


3) Result Analysis

From the preceding output, we can see that port 8080 of the Linux server is in the listening status.

4) command Introduction

Netstat: used to display statistics related to IP, TCP, UDP, and ICMP protocols. It is generally used to check the network connection of each port on the local machine. Parameter:-a shows the sockets in all connections. -N directly uses the IP address instead of the Domain Name Server.
The following function also checks whether a TCP or UDP port is normal.

The Code is as follows:


Tcp: netstat-an | egrep $1 | awk '$6 = "LISTEN" & $1 = "tcp" {print $0 }'
Udp: netstat-an | egrep $1 | awk '$1 = "udp" & $5 = "0.0.0.0: *" {print $0 }'


Command Introduction

Egrep: searches for the specified string in the file. The execution result of egrep is like grep-E. the syntax and parameters used can be referred to the grep command, which is different from the grep method in interpreting strings. egrep is interpreted using the extended regular expression syntax, while grep uses the basic regular expression syntax, the extended regular expression has a more complete expression specification than the basic regular expression.

View the number of running process names

Sometimes we may need to get the number of processes started on the server. The following function is to check the number of processes running, for example, the process name is CFTestApp.

The Code is as follows:


Runnum = 'ps-ef | grep-v vi | grep-v tail | grep "[/] CFTestApp" | grep-v grep | wc-l

Detect System CPU load

During server maintenance, business interruption may also occur due to excessive system CPU usage load. The server may run multiple processes and check that the CPU of a single process is normal, but the CPU load of the entire system may be abnormal. The script monitors the CPU load of the system from time to time, and sends alarms in case of exceptions in a timely manner, so that maintenance personnel can handle them in time to prevent accidents. The following function can detect the CPU usage of the system. Use vmstat to obtain the idle value of the system CPU for five times, take the average value, and get the actual CPU usage value by the difference from 100.

The Code is as follows:


Function GetSysCPU
{
CpuIdle = 'vmstat 1 5 | sed-n' 3, $ P' \ n
| Awk '{x = x + $15} END {print x/5}' | awk-F. '{print $1 }'
CpuNum = 'echo "100-$ CpuIdle" | bc'
Echo $ CpuNum
}

Example:

1) source program

The Code is as follows:


Cpu = 'getsyscpu'

Echo "The system CPU is $ cpu"

If [$ cpu-gt 90]
Then
{
Echo "The usage of system cpu is larger than 90%"
}
Else
{
Echo "The usage of system cpu is normal"
}
Fi

2) result output

The Code is as follows:


The system CPU is 87
The usage of system cpu is normal
[Dyu @ xilinuxbldsrv shell] $

3) Result Analysis

The preceding output shows that the CPU usage of the Linux server is 87%, which is normal and does not exceed the alarm limit of 90%.

4) command Introduction

Vmstat: Virtual Meomory Statistics (Virtual Memory Statistics), which can monitor Virtual memory, processes, and CPU activity of the operating system.
Parameter:-n indicates that the output header is only displayed once during periodic output.

Detect System Disk Space

System Disk Space detection is an important part of system resource detection. During system maintenance and maintenance, we often need to check the disk space usage of the server. Because some services need to write tickets, logs, or temporary files from time to time. If the disk space is exhausted, service interruption may occur, the following function can detect the disk space usage of a directory in the current system disk space. the input parameter is the directory name to be checked. df is used to output the disk space usage information of the system. Then, the percentage of disk space used in a directory is obtained through grep and awk filtering.

The Code is as follows:


Function GetDiskSpc
{
If [$ #-ne 1]
Then
Return 1
Fi

Folder = "$1 $"
DiskSpace = 'df-k | grep $ Folder | awk '{print $5}' | awk-F % '{print $1 }'
Echo $ DiskSpace
}

Example:

1) source program (check Directory:/boot)


The Code is as follows:


Folder = "/boot"

DiskSpace = 'getdiskspc $ folder'

Echo "The system $ Folder disk space is $ DiskSpace %"

If [$ DiskSpace-gt 90]
Then
{
Echo "The usage of system disk ($ Folder) is larger than 90%"
}
Else
{
Echo "The usage of system disk ($ Folder) is normal"
}
Fi

2) result output

The Code is as follows:


The system/boot disk space is 14%
The usage of system disk (/boot) is normal
[Dyu @ xilinuxbldsrv shell] $


3) Result Analysis

From the above output, we can see that the disk space in the/boot directory on the Linux server has been 14%, which is normal and does not exceed the alarm limit of 90%.

4) command Introduction

Df: Check disk space usage of the file system. You can use this command to obtain the space occupied by the hard disk and the remaining space. Parameter:-k is displayed in k bytes.

Summary

In Linux, shell script monitoring is a simple, convenient, and effective method for monitoring servers and processes, which is very helpful for system development and process maintenance personnel. It not only monitors the information above, sends alerts, but also monitors process logs and other information. I hope this article will help you.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.