Use scripts to record system monitoring logs for Nagios (detailed description of vmstat)

Last Update:2013-12-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

BKJIA exclusive Article] I am a linux/unix system engineer who uses Nagios to automatically monitor the company's intranet development environment and Internet application environment. Nagios has powerful alarm functions, but sometimes our system group has this need, especially when the system is busy, we want to leave logs for analysis: whether it is under attack, the developer is not properly set, or the O & M personnel have changed the system configuration. When there are few machines, the problem may not be big, but the company's CDN server cluster is more than one hundred. Currently, the situation is still growing, so I want to design a shell script to supplement Nagios, when the system is busy, logs are separated for the System team colleagues to analyze the problem and find out the crux of the problem.

Here we will introduce the vmstat-Based System Monitoring script/root/monitor. sh

This script design idea and function implementation:

① This script is designed as a supplement for Nagios monitoring. Nagios is designed to monitor the server status and trigger alarms in real time, but its status and logs cannot be recorded in the United States;

② This script has been successfully debugged and run on FreeBSD, and is also applicable to RHEL/Centos systems;

③ Here, based on the common production server hpdl316g6 (Intel Xeon E5540@2.53GHz dual quad core), the r threshold is 4;

The script content is as follows:

 
 
  
  #!/bin/bash  
  
  while :   
  
  do  
  
  vmr=`vmstat | tail -1 | awk '{print $1}'`  
  
  if [ ${vmr} -gt 4 ]  
  
  then  
  
  date   >> /root/monitor.txt  
  
  vmstat >> /root/monitor.txt  
  
  netstat -anp >> /root/monitor.txt  
  
  ps -aux>> /root/monitor.txt  
  
  last   >> /root/monitor.txt  
  
  tail -10 /var/log/messages >> /root/monitor.txt  
  
  fi  
  
  sleep 60 
  
  done

This script can be stored in the background to run sh/root/monitor. sh &. In case of CPU busy, it automatically records system logs for analysis.

The details of vmstat are provided here. This part of information reference South African spider, if you have any questions can consult the author of this article fuqin cooking wine: yuhongchun027@163.com.

Monitor memory usage with vmstat

Vmstat is short for Virtual Meomory Statistics Virtual memory Statistics. It can monitor Virtual memory, processes, and CPU activity of the operating system. It collects statistics on the overall situation of the system. The disadvantage is that a process cannot be thoroughly analyzed.

The syntax of vmstat is as follows:

Vmstat [-V] [-n] [delay [count]

Among them,-V indicates printing the published information;-n indicates that the output header information is only displayed once during cyclic output; delay indicates the delay time between two outputs; count refers to the number of times statistics are made at this interval. Run man vmstat to view the meaning of each field output by vmstat.

The vmstat command has four optional flags for use. If the machine has the virtual address cache-c flag, the output report cache will be changed to refresh statistics. The report includes the total number of cache refreshes since the system is started. The six cache types are user, context, region, segment, page, and partial page.

-I flag changes the output to the number of report interruptions. If a device name is provided, such as d1 and d2, the monitoring will be performed at the device level. * Note, see Chapter 12th for information on enabling device-level monitoring .) And report the statistics of each given device.

Modify the "common" Report to display the information of the exchange rather than the page scheduling activity. This option changes the two fields displayed: si (in) And so (in) Replace the re and mf fields.

It is worth noting that the interval and count options are invalid for the-I or-s options.

Details about vmstat Parameters

Procs: r --> Number of processes waiting in the running queue B --> Number of processes waiting for io w --> you can enter the running queue but replace the memoyswap process --> currently available swap memory k) free --> idle memory k indicates) pagesre -- "recycled page mf --" non-serious error page pi -- "enter page k indicates) po -- "Number of outgoing pages k") fr -- "Number of spare pages k) de -- "sr of the number of missed items on the page read in advance --" Displays disk operations per second on the page scanned by the clock algorithm. S indicates a scsi disk, 0 indicates that the disk number fault shows the number of interrupts per second in -- device interrupt sy -- system interrupt cy -- cpu switch cpu usage state cs -- user process time sy -- system process time id -- cpu idle time

If r is often greater than 4 and the id is often less than 40, the cpu load is heavy.

If pi and po are not equal to 0 for a long time, the memory is insufficient.

If the disk is often not equal to 0 and the queue in B is greater than 3, the io performance is poor.

Servers viewed by vmstat

Generally, the VMSTAT tool is used by two numeric parameters. The first parameter is the number of sampling intervals, in seconds, and the second parameter is the number of sampling times. For example:

[oracle@brucelau oracle]$ vmstat 1 2    procs                      memory    swap       io  system        　CPUr  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id1  0  0      0 271844 186052 255852   0   0     2     6  102    10   0   0 1000  0  0      0 271844 186052 255852   0   0     0     0  104    11   0   0 100

(Note: At present, the system is almost idle, and the output content of VMSTAT varies with operating systems)

Currently, the following measures are useful for server monitoring:

R running Queue) pi page import) us user CPU) sy system CPU) id idle)

Use VMSTAT to identify CPU bottlenecks

R running Queue) displays the number of tasks being executed and waiting for CPU resources. When the value exceeds the number of CPUs, a CPU bottleneck may occur.

Command for obtaining the number of CPUs (in LINUX ):

Cat/proc/cpuinfo | grep processor | wc-l

When the R value exceeds the number of CPUs, there will be a CPU bottleneck. There are several solutions:

1. The simplest thing is to increase the number of CPUs.

2. Adjust the task execution time. If a large task is executed when the system is not busy, the system tasks are balanced.

3. Adjust the priority of an existing Task

Use VMSTAT to identify CPU usage

The first thing to declare is that the CPU metric in vmstat is the percentage. When the value of us + sy is close to 100, it indicates that the CPU is working close to full capacity. However, when the CPU is fully loaded, UNIX always tries to keep the CPU as busy as possible to maximize the task throughput. The only value that can determine the CPU bottleneck is the r running queue.

Use VMSTAT to identify RAM bottlenecks

Database servers only have limited RAM, and memory contention is a common problem in Oracle.

First, check the number of RAM, and run the following command in LINUX ):

[root@brucelau root]#free           total       used       free       shared     buffers     cachedMem:       1027348     873312     154036     185736     187496     293964-/+ buffers/cache:       391852      635496Swap:      2096440          0      2096440

Of course, you can use other commands such as top to display RAM.

When the memory requirement is greater than the number of RAM, the server starts the virtual memory mechanism. Through the virtual memory, you can move the RAM segment to the special DISK segment of the swap disk, this will result in page export and page import of virtual memory. Page export does not indicate the RAM bottleneck. The virtual memory system often exports pages of memory segments, however, the page import operation indicates that the server requires more memory. The page import operation needs to copy the memory segment back to RAM from the swap disk, which slows down the server speed.

There are several solutions:

1. Increase the RAM

2. Modify SGA to reduce RAM requirements

3. Reduce RAM requirements such as: reduce PGA)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Use scripts to record system monitoring logs for Nagios (detailed description of vmstat)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Use scripts to record system monitoring logs for Nagios (detailed description of vmstat)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support