Linux "Health Check" indicators and linux health check indicators

Source: Internet
Author: User
Tags get ip

Linux "Health Check" indicators and linux health check indicators


In an environment where "Buddha bless server is not down" and "killing programmers and offering sacrifices to Heaven", programmers are in a war every day. They are trembling with phone calls and text messages to ensure our security, discovering server operation problems in a timely manner is not just an O & M problem. Summary of common server monitoring metrics today. We hope that all developers can run a script to ensure their own life security.

The article is often crawled, and do not specify the original address, I here update and error correction can not be synchronized, here to indicate the original address:

Obtain Server Information

When multiple machines need to be monitored at the same time, each machine needs to run a monitoring program. We must first obtain the server information to identify the machine. When a problem occurs, we can also assess the severity of the problem.

Get IP

Get Intranet IP Address:

Run the ifconfig command to obtain all network information and remove the local host and ipv6 information.

/sbin/ifconfig | grep inet | grep -v '' | grep -v inet6 | awk '{print $2}' | tr -d "addr:"

Note:ifconfigThe absolute path, because if the monitoring script runs on crontab, the execution will not contain environment information.

Get Internet IP Address:

The Internet IP address can be displayed back and forth by requesting other websites. Some websites provide this service, a website that I am too lazy to

The command is as follows:curl

Obtain system information

It is recommended to obtain system information.lsb_release -aMethod:

lsb_release -a
LSB Version:    :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch
Distributor ID: CentOS
Description:    CentOS release 6.5 (Final)
Release:    6.5
Codename:   Final

The information is rich. You can extract the required information from the string;


CPU load is the primary indicator we need to monitor. We often say that system load refers to it, and it refersPercentage of processes processed by the CPU in a period of time to the maximum number of processes processed by the CPUThat is, the maximum load of a CPU is1.0In this case, the CPU can execute all the processes. If this limit is exceeded, the system will enter the over load overload status, and the process will have to wait for the execution of other processes to end. We generally think that the CPU load is0.6The following are the health statuses.

It is usually used to view system loads on a terminal.topCommand, but it is a complex type, and the data is more complex, is not conducive to write monitoring scripts, we generally useuptimeThrough itsaverage loadField to obtain the average load of the last 1 minute, 5 minutes, and 15 minutes.

16:03:30 up 130 days, 23:33,  1 user,  load average: 4.62, 4.97, 5.08

At this time, the average system load is about 5, not because the system is overloaded and no error is displayed, because the number of CPU cores must be considered when considering the load, the number of processes simultaneously processed by a multi-core CPU is proportional to the number of cores. the maximum load is not 1, but the number of CPU cores is N.

We usenprocYou can check the number of CPU cores in the system. The number of cores on this machine is 16, so its maximum load is 16, the average load is 5/16 = 0.32, And the CPU is healthy.


Memory is another core indicator to be monitored. If the memory usage is too high, the process will no longer be able to allocate memory for execution.

We can also use the top command to view memory usage, but it is more commonly used in monitoring.freeCommand:

free -m
             total       used       free     shared    buffers     cached
Mem:         32108      18262      13846          0        487      11544
-/+ buffers/cache:       6230      25878
Swap:            0          0          0

Let's first look at the Mem line, a total of 32108 M memory, 18262 M is used, and the remaining 13846, then the memory usage is 18262/32108 * 100% = 56.88%. So what does shared, buffers, and cached mean?

In linux, memory allocation is also a lazy principle. After the memory is allocated to a process, linux will not clean up the memory immediately after the process is executed, instead, this part of memory is stored as a cache. if the process is started again, it does not need to be reloaded. If the available memory is used up, the cache is cleared and reused. In this caseThe buffers and cached parts in used can be reused at any time.Is not counted as being occupied. Shared is the shared memory part of the process, which will be used as the occupied part, but is rarely used. For more information, see the reference article at the end of this article.

Real data is the part of the third row that removes buffers and cache, that is, the real memory usage is6230/(6230+25878)*100% = 19.4%.

The fourth row of swap is used to temporarily store buffers and cache. Normally, although it can speed up the process restart, if the physical memory is small, it will cause frequent swap reads and writes, increase the I/O pressure on the server.


The Network is also an important indicator in linux as a web server. There are many related commands, but each has its own strengths. We generally monitor the following states:

Use netstat to view the listening port.

netstat -an | grep LISTEN | grep tcp | grep 80Check whether a process is monitoring port 80.

Use ping to monitor network connections

UsepingCommand to check whether the network is connected, use the-c option to control the number of requests, use the-w option to control the timeout (unit: milliseconds), and finally use&&SymbolShort CircuitFeature to control the output:

ping -w 100 -c 1 &>/dev/null && echo "connected"

Hard Disk

The hard disk is not a particularly important metric, but failure to write a file when the hard disk is full will also affect the normal execution of the process.

We usedfCommand to view the disk usage status,-h will output in readable format:

df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        40G  6.0G   32G  16% /
tmpfs            16G     0   16G   0% /dev/shm
/dev/vdb1       296G   16G  265G   6% /data0

We can use the grep command to find the Mount node to be queried, and then use the awk command to obtain the result field.

In additiondu [-h] /path/to/dir [--max-depth=n]You can view the size of a directory.--max-depth=nControl the traversal depth.


Other monitoring statuses mainly include process error log monitoring, request count monitoring, and process existence status monitoring. These can use some basic commands, suchps.

Process logs are required for more detailed information.grep 、awkTo obtain more detailed information.


Finally, the monitoring results are collected. You can use the general "push" and "pull" methods. We recommend that you push the results to one machine for statistics and alarm. You can also usersyncThe alarm method is pulled from each server. The alarm method is configured as needed, such as enterprise, SMS, and email.

Finally, system monitoring is an important thing that requires continuous attention. I wish you all the servers will never go down.

If you have any questions about this article, please leave a message below. If you think this article is helpful to you, clickRecommendationSupport me. My blog has been updated. Welcome.Follow.


Understanding Linux system load-Ruan Yifeng

Can cache in linux memory be recycled?

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.