Introduction
Linux is increasingly being used as a server's operating system for its stability (of course, one would say: Linux is just the OS kernel:). But if we use Linux as the underlying operating system, will we be able to ensure that our services are stable over a 24x7 manner? Not also, to know that business functions are implemented by the program running on the system, to achieve the stability of business functions, the choice of Linux is only the first step, we do more work is not to let business procedures become stable short board.
When our server problems, external performance is the business function can not be provided, internal reasons, from the point of view of the program, may be the business process problems (bugs in the program itself), may also be the server on the human error (improper execution of scripts or commands); From the perspective of system resources, This could be CPU preemption, memory leaks, disk IO read and write exceptions, network exceptions, and so on . After the problem, how do we proceed with the analysis in the face of various possible causes? Do we have any tools for problem locating?
Atop introduction
This article is to introduce the atop is a tool for monitoring the Linux system resources and processes, it records the operating state of the system at a certain frequency, the collected data contains the system resources (CPU, memory, disk and network) usage and process operation, and can be stored in a log file in the disk, After the server has a problem, we can obtain the corresponding atop log file for analysis. Atop is an open source software, where we can get its source code and RPM installation package.
Atop how to use
After installing atop, we can see the current operation of the system by typing the "atop" command at the command line:
Installation: Yum install atop
System Resource monitoring field meaning
A number of fields and values are listed, what are the meanings of each field? What should we look at? The meaning of each of these fields is relative to the sampling period, so let's focus on the top half of the display.
Atop column: This column shows the hostname, information sample date, and point in time
PRC column: This column shows the overall operation of the process
- SYS, USR fields indicate the running time of the process in the kernel state and the user state, respectively
- #proc field indicates the total number of processes
- #zombie field indicates the number of zombie processes
- The #exit field indicates the number of processes that exited during the atop sampling period
CPU columns: This column shows the CPU as a whole (that is, multicore CPUs as a whole CPU Resource) usage, we know that the CPU can be used to execute processes, processing interrupts, can also be idle (two kinds of idle state, one is the active process waiting for disk IO causing the CPU to idle, the other is completely idle)
- SYS, USR field indicates the percentage of CPU time that the process is in the kernel state, the user state, when the CPU is being used to process processes
- The IRQ field indicates the percentage of time that the CPU is being used to process interrupts
- The Idle field indicates the percentage of time that the CPU is in full idle state
- The Wait field indicates the percentage of time that the CPU is in the "process waiting for disk IO to cause CPU idle" state
Each field in the CPU column indicates that the value is added as n00%, where n is the number of CPU cores.
CPU column: This column shows the usage of a core CPU, each field meaning can refer to the CPU column, each field value added result is 100%
CPL column: This column shows the CPU load condition
- AVG1, Avg5, and Avg15 fields: the average number of processes running in the queue in the last 1 minutes, 5 minutes, and 15 minutes
- The CSW field indicates the number of context exchanges
- Intr field indicates the number of interrupt occurrences
MEM column: This column indicates memory usage
- The Tot field indicates the total amount of physical memory
- The free field indicates the size of the idle memory
- The Cache field indicates the amount of memory used for the page cache
- The Buff field indicates the amount of memory used for the file cache
- The Slab field indicates the amount of memory that the system kernel occupies
SWP column: This column indicates the usage of the swap space
- The Tot field indicates the total swap area
- The free field indicates the size of the idle swap space
Pag column: This column indicates the virtual memory paging condition
Swin, swout field: Swap in and out memory pages
DSK column: This column indicates disk usage, and each disk device corresponds to a column, and if there is a SDB device, increase the list of DSK information
- SDA field: Disk device identity
- Busy field: Disk busy time scale
- Read, write fields: number of reading and writing requests
NET columns: Multi-column net shows network conditions, including transport layer (TCP and UDP), IP layer, and network port information for each activity
- The XXXi field indicates the number of packets received for each layer or active network port
- The Xxxo field indicates the number of packets for each layer or active network port
Process View
To present process information more comprehensively, atop provides a variety of process views.
Default view (Generic information)
Entering the atop information interface, we see the default view of the process information (lower part), and press the G key to jump from the other view to the default view.
Memory View (consumption)
The memory view shows how the process uses memory and presses the M key to enter the memory view.
The lower half shows the amount of virtual memory space (vsize), Memory space (rsize) size consumed by each process, and the amount of virtual memory and physical memory growth (Vgrow, Rgrow) in the previous sample cycle, which indicates the amount of physical memory the process occupies.
From the Pag column information, we can know that at this time the system memory load is high, page swap occurs, from the process view of the Vgrow and Rgrow columns can be seen in the VirtualBox process occupies a large amount of memory growth, some processes occupy a decrease in the amount of RAM (Vgrow or Rgrow field is negative), Frees up space for the VirtualBox process.
Commands view (command line)
Press C to enter the command view, which shows the commands that correspond to each process.
Sometimes one of our "careless" colleagues executes a script or command that makes the system resource usage abnormally high, and we can easily find the command that causes the exception through the atop command view.
Atop log
Each time-point sampling page is combined to form a atop log file that we can use to view the log file using the "atop-r XXX" command. So what is the format for saving atop log files?
For how to save the atop log file, we can do this:
- Save a atop log file every day, which records the information of the day
- Log files are named "ATOP_YYYYMMDD"
- Set the log expiration period, automatically delete the log file before a period of time
In fact, atop developers have provided the above log Save method, the corresponding atop.daily script can be found in the source directory. In the atop.daily script, we can change the atop information sampling period by modifying the interval variable (the default is 10 minutes), and changing the number of days to save the journal by modifying the values in the following command (default is 28 days):
(Sleep 3; find $LOGPATH-name ' atop_* '-mtime +28-exec rm {} \;) &
Finally, we modify the cron file to execute the atop.daily script every morning:
0 0 * * * root/etc/cron.daily/atop.daily
Summary
This paper introduces the Linux system resource and process monitoring tool atop, analyzes the meaning of some fields in atop information and the process view, and finally describes how to save atop log files.
The atop tool adjusts the displayed fields based on the size of the terminal interface, so you may see different parts of the field when you use atop.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
CentOS Performance Monitoring series three: monitoring tools atop detailed