A full-process record of the fall of Linux server into broiler
http://blog.csdn.net/smstong/article/details/44411993
The 1.top command is a common performance analysis tool under Linux that shows the resource usage of each process in real time, similar to the Task Manager of Windows.
Top display system current process and other conditions, is a dynamic display process, that is, you can continue to refresh the current state through the user keys. If the command is executed in the foreground, it will monopolize the foreground until the user terminates the program. More accurately, the top command provides real-time status monitoring of the system's processor. It will display the most "sensitive" task list on the system. This command can be used by CPU. Memory usage and execution time sorting tasks , and many of the features of the command can be set through interactive commands or in personal customization files.
2. The Vmstat command is the most common Linux/unix monitoring tool that can show the status of a server at a given time interval, including server CPU usage, memory usage, virtual memory exchange, IO Read and write situations. Instead of top, I can see the CPU, memory, and IO usage of the entire machine, rather than just seeing the CPU usage and memory usage of each process (using a different scenario).
The use of the general Vmstat tool is done by two numeric parameters, the first parameter is the number of time intervals sampled, the unit is seconds, the second parameter is the number of samples
R means running the queue (that is, how many processes are really allocated to the CPU), the server I am testing is currently idle, there is no program running, when this value exceeds the number of CPUs, there will be a CPU bottleneck. This is also related to top of the load, the general load over 3 is relatively high, more than 5 is high, more than 10 is not normal, the state of the server is very dangerous. The load on top is similar to the run queue per second. If the running queue is too large, it means that your CPU is busy, which generally results in high CPU usage.
b represents the blocking process, which is not much to say, the process is blocked, you understand.
SWPD virtual memory has been used size, if greater than 0, indicates that your machine is out of physical memory, if not the cause of program memory leaks, then you should upgrade the memory or the memory-consuming task to other machines.
Free physical memory size, my machine memory total 8G, the remaining 3415M.
Buff Linux/unix system is used to store, directory inside what content, permissions, etc. of the cache, I am the machine about more than 300 m
The cache cache is used directly to memorize the files we open, to buffer the files, I have about 300 m of this machine (this is the smart place of Linux/unix, the spare part of the physical memory to do the file and directory cache, is to improve the performance of the program execution, when the program uses memory, Buffer/cached will be used very quickly. )
Si reads the size of the virtual memory from disk every second, if this value is greater than 0, it means that the physical memory is not enough or the memory leaks, to find out the memory process. My machine has plenty of memory and everything is fine.
So per second The virtual memory is written to the size of the disk, if this value is greater than 0, ibid.
The number of blocks received per second by the BI block device, where the block device refers to all the disks and other block devices on the system, the default block size is 1024byte, I have no IO operation on this machine, so it's been 0, but I've seen the 140000/s on machines that handle copying large amounts of data (2-3t). Disk write speed of almost 140M per second
The number of blocks that Bo block devices send per second, such as when we read a file, the Bo will be greater than 0. Bi and Bo are generally close to 0, otherwise the IO is too frequent and needs to be adjusted.
In CPU interrupts per second, including time interrupts
CS per second, such as the number of context switches, such as we call the system function, the context switch, the thread of the switch, but also the process context switch, the smaller the value of the better, too big, to consider the number of threads or processes, such as Apache and Nginx in the Web server, We generally do performance testing will carry out thousands of concurrent or even tens of thousands of concurrent tests, the selection of the Web server process can be the process or the peak of the thread has been down, pressure measurement, until CS to a relatively small value, the process and the number of threads is a more appropriate value. System calls are also, each time the system function is called, our code will enter the kernel space, resulting in context switching, this is very resource-intensive, but also try to avoid frequent calls to system functions. Too many context switches means that most of your CPU is wasted in context switching, resulting in less time for the CPU to do serious work, and the CPU not being fully utilized, is undesirable.
US user CPU time, I used to do encryption and decryption very frequently on the server, you can see us approaching 100,r running queue reached 80 (the machine is doing a stress test, poor performance).
SY system CPU time, if too high, indicates a long system call time, for example, the IO operation is frequent.
ID Idle CPU time, in general, ID + US + sy = 100, generally I think ID is idle CPU usage, US is the user CPU usage, SY is the system CPU utilization.
WT waits for IO CPU time.
The 3.iostat is primarily used to monitor the IO load on system devices, and Iostat displays statistics from the start of the system startup at the first run, and then runs iostat to display statistics from the last time the command was run. Users can obtain the required statistics by specifying the number and time of the statistics.
TPS: The number of transmissions per second of the device (indicate, transfers per second, were issued to the.). "One-time transfer" means "one-time I/O request". Multiple logical requests may be merged into "one I/O request". The size of the "one transfer" request is unknown.
KB_READ/S: The amount of data read from the device (drive expressed) per second;
KB_WRTN/S: The amount of data written to the device (drive expressed) per second;
Kb_read: The total amount of data read;
KB_WRTN: The total amount of data written, these units are kilobytes.
RRQM/S: How much of this device-dependent read request is merged per second (when the system call needs to read the data, the VFS sends the request to each FS, and if FS finds that different read requests read the same block data, FS merges the request into the merge); wrqm/ S: How much of this device-related write request per second has been merge.
RSEC/S: Number of sectors read per second;
wsec/: Number of sectors written per second.
Rkb/s:the number of read requests that were issued to the device per second;
Wkb/s:the number of write requests that were issued to the device per second;
Avgrq-sz the size of the average request sector
The Avgqu-sz is the length of the average request queue. There is no doubt that the shorter the queue, the better.
Await: The average time (in milliseconds) of processing per IO request. This can be understood as the response time of IO, generally the system IO response time should be less than 5ms, if greater than 10ms is relatively large.
This time includes the queue time and service time, that is, in general, await is greater than SVCTM, their difference is smaller, then the shorter the queue time, conversely, the greater the difference, the longer the queue time, indicating that the system has a problem.
SVCTM represents the average service time (in milliseconds) for each device I/O operation. If the value of SVCTM is close to await, indicating that there is little I/O waiting, disk performance is good, and if the value of await is much higher than the value of SVCTM, the I/O queue waits too long for the applications running on the system to become slower.
%util: All processing io time, divided by total statistic time, in the statistical time. For example, if the statistic interval is 1 seconds, the device has 0.8 seconds to process Io, and 0.2 seconds is idle, then the device's%util = 0.8/1 = 80%, so this parameter implies the device's busy level
。 Generally, if this parameter is 100% indicates that the device is already running close to full load (of course if it is a multi-disk, even if%util is 100% because of the concurrency of the disk, disk usage may not be the bottleneck).
The 4.Netstat command is used to display various network-related information such as network connections, routing tables, Interface states (Interface Statistics), masquerade connections, multicast members (multicast memberships), and so on.
Common parameters
-A (All) displays all options and does not show listen related by default
-T (TCP) displays only TCP-related options
-U (UDP) displays only UDP-related options
-N refuses to display aliases, showing all numbers converted to numbers.
-l list only service status in Listen (listening)
-P Displays the program name that establishes the associated link
-R display routing information, routing table
-e display extended information, such as UID, etc.
-S statistics according to each protocol
-C executes the netstat command every other fixed time.
Hint: The status of listen and listening can only be seen with-a or-l
Netstat-anlp|grep 8080
How to check if the broiler is