Why Linux under Multi-threaded threads so consumes virtual memory "go"

Last Update:2017-01-11 Source: Internet

Author: User

Tags valgrind server memory

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Transferred from: http://blog.csdn.net/chen19870707/article/details/43202679

Rights statement: This article for Bo Master original article, without Bo Master permission not reproduced.

Directory (?) [-]

Explore
An epiphany
Inquisitive
Accidental discovery

Author:echo Chen (Chenbin)
Email:[email protected]
blog:blog.csdn.net/chen19870707
date:jan.27th, 2015

Recently the game has been online operation, server memory optimization, found a very strange problem, our authentication server (Authserver) is responsible for dealing with the third-party channel SDK (login and recharge), because of the use of curl blocking way, so here open 128 threads, It is strange that each time you start the virtual inside of the 2.3G, and then each processing the message to increase the 64M, the increase to 4.4G no longer increased, because we use the pre-allocation method, the thread inside there is no large memory, then where is the memory from? Make people baffled.

1. Explore

First of all to exclude memory leaks, it is impossible to leak 64M memory every time so coincidence, in order to prove my point of view, first of all, I used valgrind.

   1:valgrind--leak-check=full--track-fds=yes--log-file=./authserver.vlog &

Then start the test and run to memory no longer, and sure enough valgrind show no memory leaks. Repeated trials many times, the result is this.

After using valgrind many times, I began to wonder if the program was using MMAP and other calls, and then used strace to detect the system functions such as MMAP,BRK:

   1:strace-f-E "Brk,mmap,munmap"-P $ (pidof authserver)

The results are as follows:

   1: [PID 19343] Mmap (NULL, 134217728, Prot_none, map_private| map_anonymous| Map_noreserve,-1, 0) = 0x7f53c8ca9000

   2: [PID 19343] Munmap (0x7f53c8ca9000, 53833728) = 0

   3: [PID 19343] Munmap (0x7f53d0000000, 13275136) = 0

   4: [PID 19343] Mmap (NULL, 8392704, prot_read| Prot_write, map_private| map_anonymous| Map_stack,-1, 0) = 0x7f53d04a8000

   5:process 19495 attached

I checked the trace file and did not find a lot of memory mmap actions, even the BRK action caused the memory growth is not small. So I feel that life is not the direction, and then suspect that the file cache has taken up virtual memory, commented out the code of all read and write log code, virtual memory is still increasing, excluding this possibility.

2. An epiphany

Later, I began to reduce the number of thread to start testing, when the test accidentally found a very strange phenomenon. That is, if a process creates a thread and allocates a very small amount of 1k of memory within that thread, the entire process virtual memory immediately increases by 64M, then allocates, and the memory does not increase. The test code is as follows:

   1: #include <iostream>

   2: #include <stdio.h>

   3: #include <stdlib.h>

   4: #include <unistd.h>

   5:using namespace Std;

   7:volatile bool start = 0;

  10:void* Thread_run (void*)

  11: {

  13:while (1)

  14: {

  15:if (Start)

  16: {

  17:cout << "Thread malloc" << Endl;

  18:char *buf = new char[1024];

  19:start = 0;

  20:}

  21:sleep (1);

  22:}

  23:}

  25:int Main ()

  26: {

  27:pthread_t th;

  29:getchar ();

  30:getchar ();

  31:pthread_create (&th, 0, Thread_run, 0);

  33:while ((GetChar ()))

  34: {

  35:start = 1;

  36:}

  39:return 0;

  40:}

Its running results such as, at the beginning, the process consumes virtual memory 14M, enter 0, create a child thread, process memory reaches 23M, this increase of 10M is the size of the thread stack (view and set thread stack size available ulimit–s), first input 1, The program allocates 1k of memory, the entire process adds 64M of virtual memory, then input 2, 3, each again allocated 1k, memory no longer changes.

This result makes me ecstatic, because previously studied Google's Tcmalloc, each thread has its own buffer to solve the competition of multi-thread memory allocation, estimated the new version of GLIBC also learned this technique, so look at Pmap $ (pidof main) to see the memory situation, as follows:

Note the 65404 line, with all the indications that this plus the line above it (here is 132) is the added 64M). Then increase the number of thread, there will be a new thread number corresponding 65404 of the memory block.

3. Inquiring

After some Google and code review. Finally know that the original is glibc of malloc here mischief. GLIBC version greater than 2.11 will have this problem: on the official document of Redhat: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/ Html/6.0_release_notes/compiler.html

Red Hat Enterprise Linux 6 features version 2.11 of GLIBC, providing many features and enhancements, including ... An enhanced dynamic memory allocation (malloc) behaviour enabling higher scalability across many sockets and cores. This was achieved by assigning threads their own memory pools and by avoiding locking in some situations. The amount of additional memory used for the memory pools (if any) can be controlled using the environment variables Mallo C_arena_test and Malloc_arena_max. MALLOC_ARENA_TEST specifies that a TEST for the number of cores was performed once the number of memory pools reaches this Value. Malloc_arena_max sets the maximum number of memory pools used, regardless of the number of cores.

The developer, Ulrich Drepper, have a much deeper explanation on his blog:http://udrepper.livejournal.com/20948.html

Before, malloc tried to emulate a per-core memory pool. Every time when contention-existing memory pools was detected a new pool is created. Threads stay with the last used pool if possible ... This never worked 100% because a thread can is descheduled while executing a malloc call. When some other thread tries the memory pool used in the call it would detect contention. A second problem is so if multiple threads on multiple core/sockets happily use malloc without contention memory from th E same pool is used by different cores/on different sockets. This can leads to false sharing and definitely additional cross traffic because of the meta information updates. There is more potential problems not worth going to here in detail.

the changes which is in glibc now create per-thread memory pools. this can eliminate false sharing in the most cases. The meta data is usually accessed only on one thread (which hopefully doesn ' t get migrated off its assigned core). To prevent the memory handling from blowing up the address space use too much the number of memory pools is capped. By default we create the up to the memory pools per core on 32-bit machines and the eight memory per core on 64-bit machines . The code delays testing for the number of cores (which was not cheap, we had to read/proc/stat) until there was already t Wo or eight memory pools allocated, respectively.

While these changes might increase the number of memory pools which is created (and thus increase the address space they Use) The number can be controlled. Because using the old mechanism there could be a new pool being created whenever there is collisions the total number cou LD in theory is higher. Unlikely but true, so the new mechanism are more predictable.

... Memory use isn't that much of a premium anymore and most of the memory pool doesn ' t actually require memory until it's U SED, only address space ... We have done internally some measurements of the effects of the new implementation and they can quite dramatic.

New versions of glibc present in RHEL6 include a new arena allocator design. In several clusters we've seen this new allocator cause huge amounts of the virtual memory to being used, since when multiple thr EADS perform allocations, they each get their own memory arena. On a 64-bit system, these arenas was 64M mappings, and the maximum number of arenas is 8 times the number of CORES. We ' ve observed a DN process using 14GB of Vmem for only 300M of resident set. This causes any kinds of nasty issues for obvious reasons.

Setting Malloc_arena_max to a low number would restrict the number of memory arenas and bound the virtual memory, with no n Oticeable downside in Performance-we ' ve been recommending malloc_arena_max=4. We should set this inhadoop-env.sh to avoid this issue as RHEL6 becomes more and more common.

To sum up, glibc in order to allocate memory performance problems, using a lot called arena, memory pool, the default configuration under 64bit is each arena 64M, a process can have a maximum of cores * 8 arena. Assuming that your machine is 4-core, you can have up to 4 * 8 = 32 Arena, which is the use of 2048M memory. Of course you can also change the number of ARENA by setting the environment variable. For example, export malloc_arena_max=1
Hadoop recommends setting this value to 4. Of course, since it is a multi-core machine, and the introduction of arena is to solve the problem of multi-thread memory allocation competition, it is also a good choice to set the number of CPU cores. After setting this value, it is best to stress test your program to see if changing the number of arena will affect the performance of the program.

Mallopt (M_arena_max, XXX) If you are going to set this up in the program code, then you can call Mallopt (M_arena_max, XXX) to implement, because we authserver to use a pre-allocated way, There is no memory allocated within each thread, so this optimization is not required, it is switched off with Mallopt (M_arena_max, 1) at initialization , and set to 0, indicating that the system is automatically set by the CPU.

4. Accidental discovery

Think of Tcmalloc small objects from the thread's own memory pool allocation, large memory is still allocated from the central allocation area, do not know how glibc is designed, so the above program in the middle of each allocation of memory from 1k to 1M, as expected, and then after the allocation of 64M, still each will increase 1M, Thus, the new version of glibc completely borrowed from the Tcmalloc thought.

Busy a few days the problem finally solved, the mood is good, through today's question let me know, as a server programmer, if do not understand the compiler and operating system kernel, is completely unqualified, later to strengthen this aspect of learning.

Echo chen:blog.csdn.net/chen19870707

Why Linux under Multi-threaded threads so consumes virtual memory "go"

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More