Linux performance monitoring: monitoring purpose and tool Introduction

Source: Internet
Author: User

System optimization is a complex, complex, and long-term task. monitoring, collection, testing, and evaluation are required before optimization. After optimization, testing, collection, evaluation, and monitoring are also required, it is also a long-term and continuous process. It is not to say that the optimization is now done, the test is done, and it can be done once and for all in the future. It is not to say that the optimization in books is suitable for the currently running system, different systems, hardware, and applications have different optimization priorities, different optimization methods, and different optimization parameters. Performance monitoring is an important part of the system optimization process. If there is no monitoring, it is unclear where the performance bottleneck is, what is the optimization and how to optimize it? Therefore, finding performance bottlenecks is the purpose of performance monitoring and the key to system optimization. A system is composed of several subsystems. Generally, modifying a sub-system may affect another sub-system, or even cause instability or crash of the entire system. Therefore, optimization, monitoring, and testing are usually linked together, and they are a cyclical and long-term process. The following are common subsystems for monitoring:

  • CPU
  • Memory
  • IO
  • Network

These subsystems depend on each other, understand the characteristics of these subsystems, monitor the performance parameters of these subsystems, and promptly discover possible bottlenecks, which is very helpful for system optimization.

Application Type

Different systems have different purposes. To find performance bottlenecks, You Need To Know What applications and characteristics the system runs. For example, the requirements of the web server on the system must be different from those of the file server, therefore, it is important to distinguish the application types of different systems. Generally, applications can be divided into two types:

  • IO-related, IO-related applications are usually used to process a large amount of data, requiring a large amount of memory and storage, frequent IO operations to read and write data, and less requirements for CPU, most of the time, the CPU is waiting for hard disks, such as database servers and file servers.
  • CPU-related applications require a large number of CPUs, such as high-concurrency web/mail servers, image/video processing, and scientific computing.

Let's look at the actual example. The 1st are the features shown when the file server copies a large file, and the 2nd are the features shown when the CPU is doing a lot of Computing:

$ vmstat 1procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st 0  4    140 1962724 335516 4852308  0    0   388 65024 1442  563  0  2 47 52  0 0  4    140 1961816 335516 4853868  0    0   768 65536 1434  522  0  1 50 48  0 0  4    140 1960788 335516 4855300  0    0   768 48640 1412  573  0  1 50 49  0 0  4    140 1958528 335516 4857280  0    0  1024 65536 1415  521  0  1 41 57  0 0  5    140 1957488 335516 4858884  0    0   768 81412 1504  609  0  2 50 49  0
$ vmstat 1procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st 4  0    140 3625096 334256 3266584  0    0     0    16 1054  470 100 0  0  0  0 4  0    140 3625220 334264 3266576  0    0     0    12 1037  448 100 0  0  0  0 4  0    140 3624468 334264 3266580  0    0     0   148 1160  632 100 0  0  0  0 4  0    140 3624468 334264 3266580  0    0     0     0 1078  527 100 0  0  0  0 4  0    140 3624712 334264 3266580  0    0     0    80 1053  501 100 0  0  0  0

The most obvious difference between the above two examples is the id column, which represents the CPU idle rate. The id is maintained at around 50% during file copying, And the id is basically 0 when the CPU is heavily calculated.

Bottom Line

How do we know whether the system performance is good or poor? This requires a bottom line in advance. If the statistical data obtained by performance monitoring crosses this line, we can say that the performance of this system is poor. If the data can be kept online, we can say that the performance is good. To establish such a bottom line, you need to know some theories, additional load tests, and years of System Administrator experience. If you do not have many years of experience, there is a simple way to draw the bottom line: to establish this bottom line on your own expectations of the system. What kind of performance do you expect this system to have? This is the bottom line. If this requirement is not met, the performance is poor. For example, VPSee has a RAID0 test last month. The expected test result is that the IO performance of RAID0 is significantly improved than that of a single hard disk, the bottom line is that the IO of RAID 0 is at least better than that of a single hard disk, but the bottom line is at least better.) The test results show that the performance of RAID 0 is not as good as that of a single hard disk, which indicates poor performance, at this time, I need to ask why this is often the performance bottleneck. After investigation, I found that the performance test results were incorrect due to hardware defects on the original hard disk.

Monitoring tools

We only need simple tools to monitor Linux performance. The following are common tools used by VPSee:

Tools Brief Introduction
Top View process activity status and system status
Vmstat View system status, hardware, and system information
Iostat View CPU load and hard disk status
Sar Integrated tools to view system conditions
Mpstat View multi-processor status
Netstat View network conditions
Iptraf Real-time network condition monitoring
Tcpdump Capture network data packets for detailed analysis
Mpstat View multi-processor status
Tcptrace Data Packet Analysis Tools
Netperf Network bandwidth tools
Dstat Integrated tools integrate multiple information such as vmstat, iostat, ifstat, and netstat.

This series will be introduced separately in terms of CPU, memory, disk IO, and network.

Original article:

Series Navigation:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.