13th Linux System Management skills (daily operation and maintenance management skills)

Last Update:2017-12-13 Source: Internet

Author: User

Tags cpu usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

13th Linux System Management skills (daily operation and Maintenance management skills)

The content of this chapter is the core, the probability of future use is also very large, as long as the basic knowledge necessary to do the primary system administrator is not a problem.

13.1 Status of the monitoring system

As an operations engineer, system administrator, if you do not understand their own system, then how to troubleshoot problems? If there is a problem, be sure to check what is the problem, where the problem, the system of resource consumption how to view.

13.1.1 using the W command to view the current system load

The first part is the system time, and the current time can be viewed using the date command.

The second part is the system running time

The third part is landing a few users, from the following can be seen, you can see which terminal is logged in. If it is a network login, the terminal is PS/0,PS/1 these; if the system is logged in, the Tty1,tty1-tty6 will appear as 6 terminals. From is where it can be seen.

Load average is a critical section, which is its system load, which is the most common use of this part. It is followed by three numbers, which are the average load values for the system in 1 minutes, 5 minutes, and 15 minutes, respectively. The first number is the number of processes that use CPU activity per unit of time, and the larger the value, the greater the pressure on the server. This number can be fraction, or 100, the current value is 0, indicating that the system has no load, there is no active process, which indicates that the server is in an idle state. This is wasteful for the Linux operating system, which is wasting. What value is the ideal state? See how many CPUs you have, this refers to the logical CPU, not the physical CPU. There are many types of CPUs, Intel, AMD, there will be several CPUs, each with a lot of logical CPUs. The commands to view the CPU are as follows:

Cat/proc/cpuinfo, the number that needs to be viewed is processor. If it is 0, it represents 1, 1 is 2, and if 39, it represents 40.

This is about the logical CPU, not the physical CPU. The number on the system is 0, which means there is a CPU, the first number found with the W command is 1 is the best, not idle and no pressure. Processor The maximum number is 7, representing a maximum of 8 CPUs, as long as the W command to find the number is not greater than 8, it is possible. [Email protected] is the time to log on, idle is idle for how long, PCPU is the use of the CPU time, what is the applicable command, the back can not control it.

Another command is: uptime, which is the same as the result of the W command.

Note: The/proc/cpuinfo file records the CPU details. There are currently more than 2 multicore CPUs on the market, and in Linux it is 2*n CPU (n is a single physical CPU with several cores). If n is 4, you will see 8 pieces of information when you view the file, and the last piece of information will show a processor of 7, viewing the current system has several CPUs, you can use the command grep-c ' processor '/proc/cpuinfo. Then, when you look at several physical CPUs, you need to see the keyword physical ID.

13.1.2 using the Vmstat command to view the status of the monitoring system

We learned to look at the load situation with the W command, if there is a high load value, such as the number of queries larger than the number of CPU cores, the CPU is not enough. At this point you need to think about why the CPU is not enough, what the process is doing, what tasks are in use of the CPU, and you may want to see where the bottleneck is in the system.

You can find the CPU cores, memory, virtual memory swap partition, IO is the disk, the system process and so on related things. Often use the Vmstat 1 command to output a state every second, feeling almost the end of Ctrl C.

Of course, can also vmstat 1 5, output once every second state, output 5 times the end.

We only need to care about the following columns:

Proc Displays information about the process:

R, a shorthand for run, indicates how many processes in the system belong to the run state. If there is only one CPU, there may only be one process at a time that is using the CPU, and the other processes are queued. But they are cyclical and each process has the opportunity to use the CPU for a while. It is in the state of R, whether it is in use or in a queue. If the value is longer than the number of server CPUs, then the CPU resources are insufficient.

B, is the block's abbreviation, indicating the number of processes waiting for the resource. b means that the process is blocked by resources other than the CPU (hard disk or network), in a waiting state, the card is dead, B is blocked. For example, the speed is slow, the process wants to send you a packet, if the network speed quickly, only 1 seconds to send the finished. But now slow, it may take 10 seconds, is because the speed is too slow, this time can only wait for the speed, this is how many processes are waiting.

Memory displays information about the RAM

A swpd that represents the amount of memory that is switched to the swap partition, in kilobytes. Previously speaking of partitions, there is a swap partition, when the memory is not enough, the system can temporarily put some of the memory data into the swap space. If this number does not change, it is OK; If the number continues to change, it means that memory and swap partitions are constantly exchanging data, which means there is not enough memory.

Free,buff,cache We talk about memory, we'll talk about the meanings of these words.

Swap shows the swap of memory

Si, so and SWAPD related, if the SWAPD change frequently, then they will also change accordingly.

Si: Indicates how much data (blocks) are entered into memory by the swap zone, in kilobytes. I represents in, which goes into memory. O means that out,so indicates how much is out of memory.

IO Displays the condition of the disk

Bi,bo is related to disk. BI represents the amount of data read from a block device (read disk) in kilobytes. Bo represents the amount of data written from a block device (write disk), in kilobytes. This amount of data is large, indicating that the disk is read and write frequently. Io is very slow compared to disks. If there is so much data to read and write, it will certainly cause a side-by-side increase, because there are many processes waiting for the disk, which is inevitable.

CPU: Displays the usage status of the CPU

US: Represents a user level, say an operating system, it is certainly not possible to run only one system, and run some services. For example, ran a website, ran a MySQL, website, MySQL or, certainly will occupy some resources, it will reflect the US this aspect, user occupancy situation. The number of us is not more than 100, because altogether 100%. The US number indicates that the resource under the user consumes a percentage of the CPU, and US long-term greater than 50 indicates insufficient system resources.

Sy: Displays the percentage of time the system itself spends on the CPU.

ID: Indicates the percentage of idle us+sy+id=100

The wa:wait wait, which indicates the percentage of CPU time that I/O waits to consume, is large enough to indicate that the CPU is insufficient.

The Vmstat command can determine the bottleneck of the system, such as insufficient CPU, insufficient memory, or disk IO too large.

13.1.3 using the top command to display the system resources that the process occupies

You can view a specific process with top.

The first line and the W command find the same, how many tasks are in total, how many tasks are running, and how many tasks are being sleeping. Zombie process, the main process was unexpectedly terminated, the child process is no one tube, it is more embarrassing, no one tube, only to fend for themselves.

Fan turn, CPU is very hot, indicating high CPU utilization. US long-term over 60% is not good for CPUs.

Here's what you really need to care about, by default, it's sorted by CPU percentage from high to low,%MEM is memory, RES is physical memory size, KB kilobytes.

By the size of the letter M, a sort of memory usage size appears, and you can see the specific size of the memory used by the process. By the size of the letter p, swap back the CPU using the size sort. Press 1 to make the following list

All CPU usage. Press the letter Q to launch the top command.

Use the TOP-C command to view the global path of the specific process command.

Using the TOP-BN1 command, you can statically display all the information at once.

This usage is appropriate when writing scripts.

Can focus on the PID, for example, to kill a command, you can use the PID to kill. The Kill PID command can kill a process.

User is which users run, PR, NI is about priority, do not need too much attention.

Note: Use the top command to focus on the details of the system resources used by the following processes.

13.1.4 Monitoring System Status with SAR command

SAR is a very comprehensive command to analyze the state of the system, mainly used to view the network card traffic. It can also look at the status of your CPU, memory, and disk, which is called the Swiss XXX in the Linux system, that is, its commands are very complex and rich. Unlike other system status monitoring tools, it can print historical information that displays system state information from 0 o'clock to the current time of day.

If you do not have this command, you can

Yum install-y sysstat to install.

After installing this command, the first time you use this command, the following error occurs because the SAR tool has not generated the corresponding database file (no need to monitor it at all times, because you do not have to query that library file).

If the SAR does not add parameters, it will call the system to keep the historical files,

This directory is the directory where the history files are generated, what is the history file? Because the SAR has a feature, every 10 minutes will filter the state of the system again, grasping one side in the file, this file exists in the above directory.

To use this command, you need to add the appropriate options and parameters, if you want to see the network card traffic, you need to use the following command:

This usage is similar to the usage of Vmstat, which is shown 1 times every 1 seconds, showing a total of 10 ends.

Iface represents the name of the NIC.

RXPCK/S and txpck/s represent the number of packages per second, in units of a. Rx is a shorthand for receive, a representative of the received, TX representative sent.

RXKB/S This column represents the amount of data collected per second (in kilobytes)

TXKB/S This column represents the amount of data sent per second (the next few columns do not need attention, Amin teacher for so many years, the following columns are not more than 0, that is, all 0)

If a website is attacked, it sends a lot of packets to your website, which means you have to accept a lot of packets. A large amount of words, means that your network card can not bear, eventually causing the network congestion, your website can not open.

How many bags are appropriate?

Sometimes also pay attention to whether the network card is full, such as your company bought a cabinet, allocated some bandwidth (such as 100M), 100M is not big, if the conversion into a conventional understanding is 12.5m/s, is not very large. If a few people download it at the same time, it will soon be full. If the RXPCK/S packet has thousands of is normal, if there are tens of thousands of, it is not normal. There are tens of thousands of, hundreds of thousands of words that are attacked. If attacked, you can see how many rxpck/s packets, this is not sure, you need to use the grab Bag tool.

Use the following command to view the network card traffic for a given day:

Sar-n Dev-f/var/log/sa/sarn,n represents a specific date

Data is retained for up to one months in this directory

Sar-q is to view the load situation of the server at some time in the past

Sar-b is a view of the disk

The following command is for reading and writing

13.1.5 using the nload command to view network card traffic

The SAR command can view network card traffic, but is not intuitive enough to use the nload command better. The nload command is not installed by default and needs to be installed using the following command:

Yum install-y Epel-release;yum install-y nload

When the installation is complete, a dynamic display network card will appear:

The top line is the NIC name and IP address, and the right arrow can be used to view network traffic for other network cards. The output is divided into two parts, incoming to enter the network card traffic, outgoing for the network card out of the traffic, we are concerned about curr that row of data, its units can be dynamically adjusted automatically, very humane. Press Q to exit the interface. Buy bandwidth is out of the bandwidth, if there is an attack, the number of incoming will be very large.

Supplemental content of SAR:

This directory will not be generated until tomorrow, Sar28 can be viewed by cat, and SA28 cannot.

13.1.6 Monitoring IO Performance (this is two commands on disk, disk status)

In the daily operation of the process, in addition to CPU, memory, disk IO is also a very important indicator. Sometimes the CPU, memory obviously has the surplus, but the system is the load is very high, we use the Vmstat command to see that the B column or WA column is relatively large, it indicates that the system disk has bottlenecks.

When we install the Sysstat package, we install the Iostat command, which belongs to the same package as the SAR.

Usage of Iostat:

can also

To talk about is the Iostat-x command, here is a very important indicator.

%util first is a percentage, this column represents IO Wait, in short, how much time the disk is used to occupy your CPU. The CPU is part of the time to process, calculate, and partly wait for Io, waiting for the disk to read and write. If this number is above 50%, it means the disk is too poor and it is very busy. If there is a problem with the hard drive, even if the CPU is faster and worse, there is still a big bottleneck. If the disk is not serious, you can only replace the disk.

If you find that disk IO is busy and frequent, you can use the Iotop command if you want to know which process is reading and writing frequently. But found not installed, use Yum install-y iotop installation.

It's like top, it's dynamic, it's ranked by rank.

We're looking at the IO percentage.

13.1.7 using the free command to view memory usage

The free command can view the total memory size of the current system and the use of memory. The CentOS 7 system's FREE command displays results that are more concise than CentOS 6, but broadly consistent.

A total of three lines, the first line is the description, the second row is the use of memory, the third line is the case of swap partitions.

Total: Overall memory size

Used: actual memory size actually used

Free: Remaining physical memory size (not allocated, pure remaining)

Shared: Share memory size, this does not concern

Buff/cache: The total amount of memory allocated to the Buff/cache of the grid.

A simple distinction between buff (buffer) and cache (cache), the flow of data is different, the name is not the same. Buff/cache are part of the memory, and the memory function is to alleviate the speed gap between CPU and IO (e.g., disk). It can be understood that the data is CPU-bound, is about to write to disk, which is used in memory for FUFF;CPU to calculate, you need to read the data from the disk, temporarily put in memory, this part of memory is the cache.

Avaliable: How large the system can use memory, which contains free. In order for the application to run faster, the system will pre-reserve a portion (Buff/cache) for some applications, although this part of the memory is not really used, but it has been allocated. However, when another service is going to use more memory, it is possible to take this part of the pre-allocated memory. So

Available=free+buff/cache

Total=used+free+buff/cache

Buffer and cache are so important that a portion of the system is pre-reserved for buffer and cache.

The key to using the free command is available.

13.1.8 using PS to view system processes

Ps-elf and PS aux results almost

Vmstat can not occupy the CPU how much time, run a bit sleep. + Speaking is the foreground process, that is, the process on the terminal.

Z represents the zombie process, which is not much and will be. If more, try to kill it.

< represents a high-priority process with high priority, and the CPU Jinzhao it first.

n indicates a low-priority process with a Z-relative, which means it's not in a hurry, and it doesn't matter if it's late.

| multithreaded processes, threads and processes are different, threads have a large process composition, and a process has multiple threads inside. The concept goes back to check that the processes and processes between the memory are unshared, but the thread uses the same process of memory between the regions. For example, to assign a process block to the process, the process would have access to the memory, but the following threads, regardless of how much, they share this piece of memory, which is the characteristics of the thread. A multithreaded process is to say that a process has multiple threads.

13.1.9 viewing network status with the netstat command

Linux as a server operating system, the server will have a lot of services, the service is often with the client to communicate with each other, which means that it should have a listening port, to have external communication port. This command looks at the status of TCP/IP communication. Install a MySQL, provide a database service, there will be a listening port, then what is the listening port? Normally a machine does not have any port to listen to, which means it has no way to communicate with other machines. If you want someone else to visit your site, it needs to listen to a port that opens up a hole. Just like a network card on a small hole out, and then the remote device to find a way to connect with the device, the data can go through the hole into the network card, into the server, and communicate with each other.

Port View Command:

L represent listen.

SSHD has two, one is TCP, one is Tcp6,tcp6 is Ipv6 (TCP and UDP data self-lookup, this is not the focus)

Master 25 port, outgoing mail port

A second command:

This command will check the TCP/IP status

Expand your knowledge : TCP/IP three handshake, four waves, this is often asked during the interview

Looking at TCP/IP, you need to focus on a value that is established, if this value is large, your system is busy. Number of concurrent connections, that is, at the same time how many clients are connecting you, we can that number to indicate the number of concurrent connections. The above indicates that there are 45 clients and service side to communicate, is connecting. This is true communication, the server is acceptable within 1000.

The Ss-an command can also show the status of TCP/IP

It has a drawback that it does not show the name of the process, the VMSTST-LNTP command can

13.1.10linux Grab Bag Tool

13th Linux System Management skills (daily operation and maintenance management skills)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More