Your Linux server is running slow, so you follow standard procedure and run top
. You see the CPU metrics:
But what does all of those 2-letter abbreviations mean?
The 3 CPU states
Let's take a step back. There is 3 general states your CPU can is in:
- Idle, which means it had nothing to do.
- Running A user spaceprogram, type a command shell, an email server, or a compiler.
- Running The kernel, servicing interrupts or managing resources.
These three meta states can be further subdivided. For example, user space programs can is categorized as those running under their initial priority level or those running W ith a nice priority. Niceness is a-a-tweak, the priority level of a process so, it runs less frequently. The niceness level ranges from-20 (most favorable scheduling) to (least favorable). By default processes on Linux is started with a niceness of 0. See we blog post restricting process CPU usage using nice, cpulimit, and cgroups for more information on Nice.
The 7 CPU statistics explained
There is several different ways to see the various CPU statistics. The most common is probably using the top
command.
To start the top
command, just type at the top
command line:
The output from top was divided into and sections. The first few lines give a summary of the system resources including a breakdown of the number of tasks, the CPU statistic s, and the current memory usage. Beneath these stats is a live list of the current running processes. This list can is sorted by PID, CPU usage, memory usage, and so on.
The CPU line would look something like this:
%CPU (s): 24.8 us, 0.5 sy, 0.0 ni, 73.6 ID, 0.4 wa, 0.0 hi, 0.2 si, 0.0th
24.8 us
-this tells us, the processor is spending 24.8% of it time running user space processes. A User space program was any process that doesn ' t belong to the kernel. Shells, compilers, databases, Web servers, and the programs associated with the desktop is all user space processes. If the processor isn ' t idle, it is quite normal, the majority of the CPU time should be spent running user space proce SSEs.
73.6 ID
-skipping over a few of the same statistics, just for a moment, the ID statistic tell us that the processor was idle just over 73% of the time during the last sampling period. The total of the user space percentage- US
, the niced percentage- ni< /code>, and the idle percentage- ID
, should is close to 100%. Which it is in the case. If The CPU is spending a and more time in the other states then something are probably awry-see the troubleshooting section B Elow.
0.5 sy
-This is the amount of time, the CPU spent running the kernel . All the processes and system resources is handled by the Linux kernel. When a user space process needs something from the system, for example when it needs to allocate memory, perform some I/O, Or it needs to create a child process and then the kernel is running. In fact the scheduler itself which determines which process runs next are part of the kernel. The amount of time spent in the kernel should is as low as possible. In this case, just 0.5% of the time given to the different processes is spent in the kernel. This number can peak much higher, especially when there are a lot of I/O happening.
0.0 ni
-As mentioned above, the priority level a user space process can is tweaked by adjusting its Nicene The SS . the ni
stat shows How much time the CPU spent running user space processes that has B Een niced . On a system where no processes has been niced then The number would be 0.
0.4 WA
-Input and output operations, like reading or writing to a disk, is slow compared to the speed of a CPU. Although this operations happen very fast compared to everyday human activities, they is still slow when compared to the Performance of a CPU. There is times when the processor have initiated a read or write operation and then it have to wait for the result, but have Nothing else to do. In the other words it is idle and waiting for an I/O operation to complete . The time the CPU spends in this state is shown by the wa
statistic.
0.0 Hi
& 0.2 si
-these both statistics show How much time the processor have spent servicing interrupts . hi
is for hardware in Terrupts, and si
is for software interrupts. Hardware interrupts is physical interrupts sent to the CPU from various peripherals like disks and network interfaces. Software interrupts come from processes running on the system. A hardware interrupt'll actually cause the CPU to stop what's it is doing and go handle the interrupt. A software interrupt doesn ' t occur at the CPU level, but rather at the kernel level.
0.0 st
-This last number is only applies to virtual machines. When Linux was running as a virtual machine on a hypervisor, the "short for st
stolen" statistic shows how long the Virtual CPUs have spent waiting for the hypervisor to service another virtual CPUs running on a different virtual Machin E. Since in the Real-world these virtual processors is sharing the same physical processor (s) then there'll be times wh En the virtual machine wanted to run but the hypervisor scheduled another virtual machine instead.
Troubleshooting
On a busy server or desktop PC, you can expect the amount of time the CPU spends in idle to be small. However, if a system rarely have any idle time then then it's either a) overloaded (and you need a better one), or b) some Thing is wrong.
Here's a brief look at some of the things so can go wrong and how they affect the CPU utilization.
High user mode -If a system suddenly jumps from have spare CPU cycles to running flat out and then the first thin G to check is the amount of time the CPU spends running user space processes. If This was high then it probably means, a process has gone crazy and was eating up all the CPU time. Using the command you'll be able to see top
which process was to blame and restart the service or kill the process.
High kernel usage -sometimes this is acceptable. For example a program that does lots of console I/O can cause the kernel usage to spike. However if it remains higher for long periods of time then it could is an indication that something isn ' t right. A possible cause of such spikes could is a problem with a driver/kernel module.
High niced value -If The amount of time the CPU was spending running processes with a niced priority Val UE jumps then it means that someone had started some intensive CPU jobs on the system, but they had niced the TA Sk.
If the niceness level was greater than zero then the user have been courteous enough lower to the priority of the P Rocess and therefore avoid a CPU overload. There is probably little that needs to be do in this case, other than maybe find out who has started the process and Tal K about what can help out!
But if the niceness level was less than 0, then you'll need to investigate what's happening and who's Responsi BLE, as such a task could easily cripple the responsiveness of the system.
High waiting on I/O -This means there is some intensive I/O tasks running on the system the Much CPU time. If this number was high for anything other than short bursts then it means that either the I/O performed by the task was ver Y inefficient, or the data is being transferred to a very slow device, or there are a potential problem with a hard disk th At are taking a long time to process reads & writes.
High Interrupt processing -This could was an indication of a broken peripheral so is causing lots of hardware I Nterrupts or of a process that is issuing lots of software interrupts.
Large stolen time-basically this means, the host system running the hypervisor is too busy. If possible, check the other virtual machines running on the hypervisor, and/or migrate to your Vsan to another Host.
TL;DR
Linux keeps statistics on how much time the CPU spends performing different tasks. Most of it time should be spent running user space programs or being idle. However there is several other execution states including running the kernel and servicing interrupts. Monitoring these different states can help you keep your system healthy and running smoothly.
Refer:
Understanding Linux CPU Stats
Understanding Linux CPU Stats