Linux system and program monitoring tool atop tutorial

Source: Internet
Author: User

Introduction

With its stability, Linux is increasingly used as the operating system of servers (of course, some people will say: Linux is only the operating system kernel :). But when Linux is used as the underlying operating system, can we ensure that our services are stable? You must know that the business functions run on the system.ProgramTo achieve the stability of business functions, selecting Linux is just the first step. We work more to prevent business programs from becoming a short board of stability.

 

When a problem occurs on our servers, the external performance is that the business functions cannot be provided normally. The internal reason is that, from the perspective of the program, it may be a problem with the business program (the program's own bug), or a human error on the server (the script or command is not executed locally). From the perspective of system resources, it may be CPU preemption, memory leakage, disk I/O read/write exceptions, or network exceptions. After a problem occurs, how should we analyze the various possible causes? Do we have any tools to locate the problem?

 

Atop Introduction

Atop is a tool used to monitor Linux system resources and processes. It records the running status of the system at a certain frequency, the collected data includes the usage of system resources (CPU, memory, disk, and network) and the running status of processes, and can be saved to the disk as a log file. When a server encounters a problem, we can obtain the corresponding atop log file for analysis. Atop is an open-source software. We can obtain its source code and RPM installation package from here.

 

Atop usage

After installing atop, run the "atop" command on the command line to view the current running status of the system:

 

System resource monitoring field description

A number of fields and numerical values are listed. What are the meanings of each field? What should we do? The meaning of each field is relative to the sampling period. Next we will first focus on the upper part of the display.

Atop Column: This column displays the host name, Information Sampling Date, and time point.

PRC Column: Displays the overall running status of processes.

    1. The sys and usr fields indicate the running time of the process in the kernel and user States, respectively.
    2. # The proc field indicates the total number of processes.
    3. # The zombie field indicates the number of dead processes
    4. # The exit field indicates the number of exited processes during the atop sampling period.

CPU Column: This column shows the overall usage of the CPU (that is, the multi-core CPU as a whole CPU resource). We know that the CPU can be used to execute processes and process interruptions, it can also be in idle state (idle state can be divided into two types, one is that the active process waits for disk Io, causing the CPU to be idle, and the other is completely idle)

    1. The sys and usr fields indicate the time ratio of the CPU in the kernel state and user State when the CPU is used to process the process.
    2. The IRQ field indicates the percentage of time when the CPU is used to handle the interruption.
    3. The idle field indicates the percentage of time when the CPU is completely idle.
    4. The wait field indicates the percentage of time when the CPU is in the "CPU idle due to disk Io wait by process" status.

The result of adding each field in the CPU column is n00 %, where N is the number of CPU cores.

CPU Column: This column shows the CPU usage of a certain core. For the meanings of each field, refer to the CPU column. The result of adding the values of each field is 100%.

Cpl Column: This column displays CPU load

    1. Avg1, avg5, and avg15 fields: Average number of processes in the queue running in the past 1 minute, 5 minutes, and 15 minutes
    2. CSW field indicates the number of context switches
    3. Intr field indicates the number of interruptions

Mem Column: Indicates memory usage.

    1. The tot field indicates the total physical memory.
    2. The free field indicates the size of the idle memory.
    3. Cache field indicates the page cache memory size
    4. The buff field indicates the memory size used for File Cache.
    5. The slab field indicates the memory occupied by the system kernel.

SWP Column: Indicates the usage of the swap space.

    1. The tot field indicates the total number of swap areas.
    2. The free field indicates the size of the Free swap space.

PAG Column: This column indicates the paging status of the virtual memory.

Swin and swout fields: Number of swap-in and swap-out memory pages

DSK Column: This column indicates disk usage. Each disk device corresponds to a column. If an SDB device exists, a column of DSK information is added.

    1. SDA field: disk device ID
    2. Busy field: Percentage of disk busy hours
    3. Read and Write fields: Number of Read and Write requests

Net Column: Multi-column net displays network conditions, including transport layer (TCP and UDP), IP layer, and network port information of each activity

    1. The XXXI field indicates the number of packages received by each layer or active network port.
    2. The xxxo field indicates the number of packets sent by each layer or active network port.

 

Process view

To display process information more comprehensively, atop provides multiple process views.

 

Default view (generic information)

Go to the atop information page and see the default view of Process Information (lower half). Press g to jump from other views to the default view.

We can see that the find process with a PID of 3061 occupies 3.43 seconds of CPU time in kernel mode before exiting, and 0.96 seconds of CPU time in user mode, the CPU usage time is 4.39 seconds, and the sampling period is relative to 10 minutes. The CPU usage rate is 1%. The st column indicates the Process status, N indicates that the process is a new process in the previous sampling cycle, e indicates that the process has exited, And the exc column indicates the exit code of the process. From the process name in the "<>" symbol, we also know that the process has exited.

 

Memory view (memory consumption)

The memory view shows the memory usage of processes. Press the M key to enter the memory view.

The lower part shows the virtual memory space (vsize) and memory space (rsize) occupied by each process, in the previous sampling period, the virtual memory and physical memory increase (vgrow and rgrow). The mem column indicates the physical memory occupied by the process.

From the PAG column information, we can know that the system memory load is high at this time, and page switching occurs. From the vgrow and rgrow columns in the process view, we can see that the virtualbox process occupies a large amount of memory, the memory occupied by some processes is reduced (the vgrow or rgrow field is negative), freeing up space for the virtualbox process.

 

Command Line)

Press the C key to enter the command view, which displays the commands corresponding to each process.

Sometimes a "ma Yun" colleague executes a script or command, causing an abnormal high system resource usage. At this time, we can easily find the command that causes the exception through the atop command view.

 

Atop log

The sampling page at each time point is combined to form an atop log file. You can use the "atop-r XXX" command to view the log file. In what form do I save the atop log file?

You can save the atop log files as follows:

    1. Saves an atop log file every day, which records the day's Information
    2. The log file is named "atop_yyyymmdd ".
    3. Set the log expiration time, and automatically delete the log files a period ago.

In fact, the atop developer has provided the above log saving method. The corresponding atop. Daily script can be found in the source code directory. In atop. in the daily script, we can change the atop information sampling period by modifying the interval variable (default value: 10 minutes); by modifying the values in the following command to change the Log Retention days (default value: 28 days ):

 
(Sleep 3; find $ logpath-name 'atop _ * '-mtime +28-Exec RM {}\;)&

 

Finally, we modify the cron file and execute the atop. Daily script every morning:

 
0 0 *** root/etc/cron. daily/atop. Daily

 

Summary

This article introduces the Linux system resource and Process Monitoring Tool atop, and analyzes the meaning of some fields in the information recorded by atop and the process view, finally, it describes how to save the atop log file.

 

The atop tool will adjust the displayed fields according to the terminal interface size. Therefore, some of the fields you see when using atop may be different from the above. To learn more about the field meanings and various process views in the atop tool, click the reference link.

 

Reference: atoptool. nl

Man atop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.