Linux Monitoring platform Build-memory

Last Update:2018-03-01 Source: Internet

Author: User

Tags switches cpu usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Tag: blank product appears disk io kernel parameter inheritance div build ice

Linux Monitoring platform Build-memory

The previous article says hard disk. Just write it. Something more important. Above the phone is RAM. The machine is memory. Memory is addressed in bytes. Each address of the storage unit can hold 8bit of data, the CPU through the memory address to obtain an instruction and data. Memory Overflow Out-of-memory Killer is responsible for terminating a process that uses too much memory. For detailed details, please see the/var/log/messages file. This is often the case when indexing is established. Administrators can restrict the service from being oom. Preheating of data. When pressure is measured. Automated testing. Grayscale Publishing. Monitoring collection.

Each memory is a process-generated. What exactly is memory. In fact, the process is to have its own virtual address space. The virtual address space corresponds to the memory.

The physical address corresponding to the virtual address is not in physical memory. Generates a page fault, actually assigns a physical address, and updates the Process pages table

If physical memory exists, but is exhausted. The department page is retired to the physical disk according to the memory replacement algorithm.

The virtual address space of a process has several concepts. Such as:

the principle of memory allocation:

From the operating system perspective, the process allocates memory in two ways, with two system invocations: BRK and mmap (regardless of shared memory).

1, less than 128k. BRK is to push the highest address pointer of the data segment (. data) _edata toward the high address;

2, more than 128K. Mmap is to find an idle virtual memory in the virtual address space of the process (in the middle of the heap and stack, called the file map area).

Both of these methods allocate virtual memory and no physical memory is allocated. A page break occurs when the allocated virtual address space is first accessed, and the operating system is responsible for allocating physical memory, and then establishing a mapping between virtual memory and physical memory

1) View Linux virtual address space and physical address

Cat/proc/cpuinfo

64-bit system. is the 2^48 virtual address space is 48, 40 bits is the physical address

2) View the number of process pages broken (the physical address corresponding to the virtual address is not in physical memory.) resulting in a missing pages interrupt)

Ps-o maj_flt,min_flt-p PID View Write Data module. Can see a large number of pages interrupted

After a missing pages break, what are the actions performed?

When a process occurs with a missing fault, the process falls into the kernel state and performs the following actions:
1. Check if the virtual address you want to access is legitimate
2. Find/Assign a physical page
3, fill the physical page content (read the disk, or directly set 0, or do nothing)
4. Establish a mapping relationship (virtual address to physical address)
Re-execute the command that occurred with a missing pages interrupt
If the 3rd step, need to read the disk, then this time the fault is Majflt, otherwise it is minflt.

3) The program exits, will the memory be released?

A: The program exits, the memory will be slowly released by the system, the system has a memory cleanup mechanism,
Even if the new program is not released, the program will be released after it is stopped. But the new object is useless, the programmer should be released manually, like C language if not released, long-running will inevitably have insufficient memory. Java programs Although the JVM has a garbage collection mechanism, it is often out of memory if it goes beyond the garbage collection mechanism

4) memory overflow and memory leaks

Memory overflow out of memory, refers to the program in the application of the RAM, there is not enough memory space for its use, there is an out-of-memory; for example, an integer is applied, but a long can be stored to save it.

Memory leak memories leak, refers to the program after the application of memory, can not release the requested memory space, a memory leak damage may be ignored, but the memory leak accumulation of serious consequences, no matter how much memory, sooner or later will be occupied.

Memory leak will eventually result in out of memory!

Memory overflow is the memory you require to allocate more than the system can give you, the system can not meet the demand, so overflow.
A memory leak is when you request a system to allocate memory for use (new), but you don't return it after you've used it (delete), and you can't access the memory you're applying to (maybe you lost the address), and the system can't assign it to the required program again. A plate with all kinds of methods can only Pack 4 fruit, you installed 5, the result fell on the ground can not eat. This is overflow! such as the stack, the stack when the stack is bound to create a space overflow, called overflow, stack empty and then do the fallback stack also produces a space overflow, called underflow. Is that the allocated memory is not enough to drop the sequence of data items called a memory overflow.

5) Memory monitoring entries

Calculation method: Reads the contents of the/proc/meminfo, where the Mem.memfree is free+buffers+cached,mem.memused=mem.memtotal-mem.memfree. The user can refer to the output of the free command and the help documentation to understand

Mem.memtotal: Total Memory size

Mem.memused: How much memory was used

Mem.memused.percent: Used Memory percentage

Mem.memfree

Mem.memfree.percent

Mem.swaptotal:swap Total size

Mem.swapused: How many swaps are used

Mem.swapused.percent: The percentage of swap used

Mem.swapfree

Mem.swapfree.percent

6) View current machine memory information:

#dmidecode | Grep-a16 "Memory device$"

View each memory size

#dmidecode | Grep-a16 "Memory device$" | Grep-i "Size" | Grep-v "No"

Top-d 1

Free-m

7) Purpose of the performance analysis

1, identify system performance bottlenecks (including hardware bottlenecks and software bottlenecks);

2, to provide performance optimization of the program (upgrade hardware?) Improve system system structure? ）；

3, to achieve reasonable hardware and software configuration;

4, the system resources use to achieve the maximum balance. (in general, the system runs well when the resources reach a balance, the transition of any one of the resources will cause the balance system damage, resulting in a very high system load or slow response.) For example, CPU transition will cause a large number of processes to wait for CPU resources, the system response is slow, waiting will cause the number of processes, the increase of the process will cause memory usage increase, memory exhaustion will cause virtual memory usage, and use virtual memory will cause disk IO increase and CPU overhead.

8) factors that affect performance

1, memory (when the physical memory is not enough to use swap memory, using swap will bring the cost of disk I0 and CPU)

2. Use common performance analysis tools (Vmstat, top, free, iostat, etc.)

9) Vmstat Detailed introduction

Vmstat is a comprehensive profiling tool that observes the system's process state, memory usage, virtual memory usage, disk IO, interrupts, context switches, CPU usage, and more. For Linux performance analysis, 100% understand the meaning of vmstat output content, and can be used flexibly, the ability to analyze the system performance is basically mastered.

The following is the output of the Vmstat command: Vmstat 1 5

procs-----------Memory-------------Swap-------io-----system--------CPU-----

R b swpd free buff cache si so bi bo in CS us sy ID WA St

1 0 0 191908 188428 1259328 0 0 10 27 61 92 0 1 98 0 0

0 0 0 191892 188428 1259360 0 0 0 0 117 226 0 0 100 0 0

The output is interpreted as follows:

1, Procs

A.R column represents the number of processes running and waiting for CPU time slices, if the long-term is greater than the number of system CPUs, it indicates that CPU resources are insufficient, you can consider increasing the CPU;

The b.b column represents the number of processes waiting on a resource, such as waiting for I/O or memory exchange.

2. Memory

The A.SWPD column represents the amount of memory (in kilobytes) that is switched to the memory swap area. If the value of SWPD is not 0 or larger, and the value of Si, so long 0, then this situation generally do not have to worry about, will not affect the system performance;

The B.free column represents the amount of physical memory currently idle in kilobytes (KB);

The C.buff column represents the amount of memory buffers cache, which is generally required to read and write to the block device.

The D.cache column represents the amount of memory for page cached, which is generally used as the cached of the file system, and frequently accessed files are cached. If the cached value is large, it indicates that there are many cached files. If the bi in IO is small in this case, the file system is more efficient.

3. Swap

The A.si column indicates the amount of memory that is being transferred from the disk into the memory swap area;

The b.so column represents the amount of memory that is called into the disk, which is the memory swap area

C. In general, the values of Si and so are 0, if the value of Si and so is not 0 for a long time, it indicates that the system memory is not enough and needs to consider whether to increase system memory.

4. IO

The A.bi column represents the total amount of data read from the block device (that is, read disk, Unit kb/sec)

B.bo column indicates the total amount of data written to the block device (that is, write disk, Unit kb/sec)

The Bi+bo reference value set here is 1000, if more than 1000, and the WA value is larger, it represents the system disk IO performance bottleneck.

5. System

The a.in column represents the number of device interrupts per second observed in a time interval;

The B.cs column represents the number of context switches produced per second.

The higher these two values are, the more CPU time the kernel consumes.

6. CPU

The a.us column shows the percentage of time that the user process consumes the CPU. When the value of us is higher, it indicates that the user process consumes more CPU time, if the long-term greater than 50%, need to consider the optimizer what.

The B.sy column shows the percentage of time that the kernel process consumes the CPU. When the value of SY is higher, it indicates that the kernel consumes more CPU time, and if Us+sy exceeds 80%, it indicates that there is insufficient CPU resources.

The C.id column shows the percentage of time the CPU is in idle state;

The D.wa column represents the percentage of CPU time that the IO wait occupies. The higher the WA value, the more serious the IO waits. If the WA value exceeds 20%, the IO wait is severe.

The E.st column is generally not concerned with the percentage of time that the virtual machine occupies. (Linux 2.6.11)

10) Write memory monitoring entries must be written in swap

1, first say what is the swap partition and its role?

The role of swap partitions, or swap spaces, can be simply described as: when the physical memory of the system is not enough, it is necessary to release some of the space in physical memory for use by the currently running program.

The freed space may come from programs that have not been manipulated for a long time, and the freed space is temporarily saved in the swap space until those programs run, recovering the saved data from the swap into memory.

In this way, the system always does swap swap when physical memory is not enough. In fact, swap adjustments are critical to the performance of Linux servers, especially Web servers. By adjusting swap, you can sometimes bypass system performance bottlenecks and save on system upgrade costs.

Allocating too much swap space wastes disk space, and there is too little swap space for the system to get an error.

If the system is running out of physical memory, the system will run slowly, but still be able to run, and if the swap space is exhausted, the system will have an error.

For example, the Web server can derive multiple service processes (or threads) depending on the number of requests, and if the swap space is exhausted, the service process cannot start, and the "application is out of memory" error usually occurs, causing the deadlock of the service process to be severe.

Therefore, the allocation of swap space is very important.

Typically, the swap space should be greater than or equal to the size of the physical memory, the minimum should not be less than 64M, usually the size of the swap space should be 2-2 of physical memory. 5 times times.

However, depending on the application, there should be different configurations: if it is a small desktop system, you only need a small swap space, while the large server system requires different sizes of swap space depending on the situation.

In particular, the database server and Web server, with the increase in traffic, the swap space requirements will also increase, the specific configuration see the description of the server products.

In addition, the number of swap partitions has a significant impact on performance. Because swap operations are disk IO operations, if there are multiple swap zones, the swap space is allocated in a rotating manner to all swaps, which greatly balances the IO load and speeds up swap.

If there is only one swap area, all switching operations will make the swap area very busy, leaving the system most of the time in a waiting state and inefficient. With the performance monitoring tool you will find that the CPU is not very busy at this time, but the system is slow. This shows that the bottleneck on the IO, relying on the speed of the CPU can not solve the problem

See so much, think about sometimes in the forum some people say their memory is very large and do not need to use the swap partition, others 10 machines can solve the problem, if we use the swap partition properly, using 8 machines can solve the problem, why not?

Let's talk about the optimization of the swap partition:

1. First, try to use partitions rather than files, and remember that unless

2. It is also possible that the space is too small, then add the swap partition yourself

3. Special attention is paid to partitions with smaller partition numbers

4. Can be distributed to different devices to achieve round-robin

5. If there is more than one swap partition, you can also specify a priority, meaning that the better-performing partition is preferred

Note the writing in the configuration file/etc/fstab: (The higher the number, the higher the priority, you can also use Swapon-p to specify)

/DEV/HDA1 swap swap defaults,pri=10 0 0

/dev/hda5 swap swap defaults,pri=5 0 0

6. An important parameter:

sysctl-a | grep vm.swa

The Linux kernel tuning process has several special values, including this, not a specific percentage, but an expectation, where the closer to 0 uses the cache, the closer it is to 100 to use swap, just a trend value. Now the default is 60,DBA is usually more than 90

7. Two generally non-adjustable values:

Vm.swap_token_timeout = 300 time interval

Vm.page-cluster = 31 times the number of pages written to swap 2^3*4k = 32K

Let's Talk about memory optimization:

The management of memory allocation under Linux is mainly controlled by kernel parameters:

1. Memory-Adjustable parameters related to capacity

The following parameters are located in the/proc/sys/vm/directory of the proc file system.

Overcommit_memory: Specifies the criteria for deciding whether to accept large memory requests. There are three possible values for this parameter:

* 0 default settings. The kernel performs heuristic memory overdose processing by estimating the amount of available RAM and rejecting clearly invalid requests. Unfortunately, because memory is deployed using heuristics rather than accurate algorithm calculations, this setting can sometimes lead to overloading of available memory in the system. Do not allow excessive use, direct error

* The 1 kernel performs no memory overdose processing. Using this setting increases the likelihood of memory overloading, but it can also enhance the performance of a large number of memory-using tasks. The application is allocated when needed, allowing excessive use

* 2 memory reject is equal to or greater than the total available swap size and overcommit_ratio the specified physical RAM ratio for memory requests. This setting is best if you want to reduce the risk of over-memory use. Use swap directly, memory used = swap + RAM * 50%

Note: This setting is recommended only for systems that have a swap area larger than their physical memory overcommit_ratio, set Overcommit_memory to 2 o'clock, and specify the physical RAM ratio considered. The default is 50.

2.out-of-memory Kill tunable parameter

Out-of-memory (OOM) refers to all available memory, including the calculated state that the swap space has been allocated. By default, this state can cause the system to panic and stop working as expected. However, setting the/proc/sys/vm/panic_on_oom parameter to 0 causes the kernel to invoke the Oom_killer function when an oom occurs. Usually Oom_killer can kill the stealing process and let the system work properly.

You can set the following parameters in each process to increase your control over the processes that are killed by the Oom_killer feature. It is located in the/proc/pid/directory in the proc file system, where the PID is the process ID. Oom_adj defines a value between 15 and 16 to help determine the oom_score of a process. The higher the Oom_score value, the more the number of processes killed by Oom_killer. Setting the Oom_adj value to 17 disables Oom_killer for the process.

Note: Any process derived from an arbitrarily tuned process inherits the Oom_score of the process. For example, if the sshd process is not affected by the Oom_killer functionality, all processes generated by the SSH session will not be affected. This can affect the ability of the Oom_killer function rescue system when an OOM occurs.

Reference:

http://blog.51cto.com/asinego/1905622

Linux Monitoring platform Build-memory

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More