Analysis of performance problems caused by frequent memory allocation and release

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

From: http://blog.csdn.net/sniperhuangwei/article/details/5422016

Symptom
1. During the stress test, the performance of the tested object is not ideal. The specific performance is as follows:
The System-state CPU consumption of processes is 20, the user-state CPU consumption is 10, and the system idle is about 70.
2. Run the PS-O majflt and minflt-C program commands to check whether the increment of majflt is 0 per second and that of minflt is greater than 10000 per second.

Preliminary analysis
Majflt indicates major fault, the Chinese name is "big error", minflt indicates "minor fault", and the Chinese name is "small error.
These two values indicate the number of page-missing interruptions that have occurred since a process was started.
When a process suffers a page disconnection, the process will be in the kernel state and perform the following operations:
1. Check whether the virtual address to be accessed is legal;
2. Find/allocate a physical page;
3. Fill in the physical page content (read the disk or directly set it to 0, or do nothing );
4. Establish a ing relationship (from a virtual address to a physical address );
5. Re-execute the command with a page disconnection;
If you need to read the disk in step 1, the page Disconnection will be majflt; otherwise, it will be minflt.
The minflt of this process is so high that it can be used more than 10000 times a second, so it has to be suspected that it has a lot to do with the consumption of the kernel-mode CPU of the process.

Analyze code
Check the code and find that: a request uses malloc to allocate 2 MB of memory and free the memory after the request ends. Check the log and find that the memory allocation statement takes 10 us, and the average processing time of a request is us. The cause is found!
Although the time consumed for allocating memory statements is not significant in processing a request, this statement seriously affects the performance. To clarify the cause, you need to first understand the principle of memory allocation.

Principle of Memory Allocation
From the operating system perspective, there are two ways to allocate memory for processes: BRK and MMAP ). BRK refers (. data) the highest address pointer _ edata is pushed to the high address. MMAP finds an idle address in the virtual address space of the process (usually in the middle of the heap and stack. Both methods are allocated with virtual memory instead of physical memory. When you access the allocated virtual address space for the first time, a page disconnection occurs. The operating system allocates physical memory and establishes a ing between the virtual memory and physical memory.

In the standard C library, the malloc/free function is provided to allocate and release the memory. The two functions are called by BRK, MMAP, and munmap systems at the underlying layer.
The following example illustrates the principle of memory allocation:

1. The initial layout of the (virtual) memory space of a process during startup is shown in 1. Among them, the MMAP memory ing file is in the middle of the heap and stack (such as libc-2.2.93.so, other data files, etc.), for the sake of simplicity, the memory ing file is omitted. _ Edata pointer (defined in glibc) points to the highest address of the Data Segment.

2. After the process calls a = malloc (30 K), the memory space 2: the malloc function will call the BRK System Call and push the _ edata pointer to the high address for 30 K, the virtual memory is allocated. You may ask: As long as _ edata + 30 k is allocated to the memory? The fact is that _ edata + 30k only completes the allocation of virtual addresses. The memory of a still does not correspond to physical pages. When the process reads and writes the memory of a for the first time, when a page disconnection occurs, the kernel allocates a physical page corresponding to the memory. That is to say, if malloc is used to allocate the content of A and never access it, the physical page corresponding to a will not be allocated.

3. After the process calls B = malloc (40 K), the memory space is 3.

4. After the process calls C = malloc (200 K), the memory space is 4: by default, the malloc function allocates memory. If the requested memory is greater than 128 K (which can be adjusted by the m_mmap_threshold option ), instead of pushing the _ edata pointer, The MMAP system is called to allocate a virtual memory from the middle of the heap and stack. This is mainly because the memory allocated by BRK can be released only after the high address memory is released (for example, a cannot be released before B is released ), the memory allocated by MMAP can be released independently. Of course, there are other advantages and disadvantages. If you are interested, you can check the malloc code in glibc.

5. After the process calls d = malloc (100 K), the memory space is 5.
6. After the process calls free (c), the virtual memory corresponding to C is released together with the physical memory.

7. After the process calls free (B), 7 is shown. The virtual memory and physical memory corresponding to B are not released, because there is only one _ edata pointer. If we push back, what should we do with d memory? Of course, the memory of B can be reused. If another 40 k request is made at this time, malloc may return the memory of B.
8. process calls after free (d), as shown in figure 8. B and D are connected to form a K idle memory.
9. By default, when the free memory of the maximum address space exceeds 128 KB (which can be adjusted by the m_trim_threshold option), the memory compression operation (TRIM) is executed ). When the last step is free, the maximum address idle memory exceeds 128 kb, so the memory is reduced, as shown in figure 9.

Truth and truth
After talking about the principle of memory allocation, the reason for the high CPU consumption of the tested module in the kernel state is clear: malloc a 2 MB memory for each request. By default, malloc calls MMAP to allocate memory. When the request ends, it calls munmap to release the memory. Assuming that each request requires six physical pages, each request will have six page-missing interruptions. Under 2000 of the pressure, more than 10000 page-missing interruptions are generated per second, these page breaks do not need to be read from the disk, so they are called minflt. Page breaks are executed in kernel mode, so the kernel-state cpu Of the process consumes a lot. Page-missing interruptions are distributed throughout the request processing process. Therefore, the distribution statement takes a very small amount of time (10 US) compared to the processing time (US) of the entire request.

Solution
Change the dynamic memory to static allocation, or use malloc to allocate the dynamic memory for each thread at startup, and save it in threaddata. However, due to the special nature of this module, static allocation or start-up allocation are not feasible. In addition, the default stack size limit in Linux is 10 MB. it is risky to allocate several MB of memory on the stack.
Disable malloc from calling MMAP to allocate memory, and disable memory compression.
When the process starts, add the following two lines of code:
Mallopt (m_mmap_max, 0); // disable malloc from calling MMAP to allocate memory
Mallopt (m_trim_threshold,-1); // disable memory Compression
Effect: after the two lines of code are added, use the ps command to observe that after the pressure is stable, both majlt and minflt are 0. The System-state CPU of the process is reduced from 20 to 10.

Summary
You can run the PS-O majflt minflt-C program command to view the majflt and minflt values of the process. These two values are both accumulated values, starting from the process startup. When stress testing programs with high performance requirements, we can focus more on these two values.
If a process uses MMAP to map large data files to the virtual address space of the process, we need to focus on the majflt value, because compared with minflt, the performance damage caused by majflt is fatal. It takes several milliseconds to read a random disk. However, minflt will only affect the performance when there are a large number of disks.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Analysis of performance problems caused by frequent memory allocation and release

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Analysis of performance problems caused by frequent memory allocation and release

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support