Linux memory footprint too large analysis __linux

Source: Internet
Author: User
Tags data structures memory usage reserved

1. Use Free-g to view memory usage:


2, view the memory consumption of the process:

PS Aux|awk ' {sum+=$6} end {print sum/1024} '

It turns out it took 17G.


3. View memory allocation:
Cat/proc/meminfo
What is that slab?

Slab is a memory allocation mechanism for Linux operating systems. Its work is for some frequently allocated and released objects, such as process descriptors, these objects are generally small size, if the direct use of the partner system to allocate and release, not only will cause a large number of internal fragmentation, and processing speed is too slow. And the slab allocator is managed based on the object, objects of the same type are grouped (such as a process descriptor is a class), and whenever an object is requested, the slab allocator assigns a unit of that size from a slab list, and when it is released, it is saved back in the list. Instead of directly returning to the partner system, thus avoiding these internal fragments. Instead of discarding the allocated objects, the slab allocator releases and saves them in memory. When new objects are requested later, they can be obtained directly from memory without having to repeat the initialization.

The specific:

When allocating memory using the partner algorithm, allocate at least one page at a time. However, what should be done when the requested memory size is dozens of bytes or hundreds of bytes allocated. How to allocate small memory areas in a single page, and how to solve the internal fragmentation resulting from the allocation of small memory areas.

Linux2.0 's solution was to create 13 free-zone lists of 32-byte to 132056-byte size. Starting with Linux2.2, the MM developer employs a distribution pattern called Slab, which was developed as early as 1994 for use in the Sun Microsystem Solaris 2.4 operating system. Slab's proposal is based mainly on the following considerations:

1 The distribution of kernel internal storage depends on the type of data being stored. For example, when a page is assigned to a user-state process, the kernel calls the Get-free page () function and populates the page with 0. When assigning pages to the kernel's data structure, things are not so simple, for example, to initialize the memory in which the data structures are located, and to reclaim the memory they occupy when they are not in use. Therefore, the concept of object is introduced in slab, the object is the memory area that holds a set of data structure, its method is construct or destructor function, the constructor is used to initialize the memory area where the data structure resides, and the destructor reclaims the corresponding memory area. But for the sake of understanding, you can also see the object directly as the data structure of the kernel. To avoid repeated initialization of an object, the slab allocation pattern does not discard the allocated objects, but instead frees them to remain in memory. When you later request to assign the same object, you can retrieve it from memory without initializing it, which is the basic idea of introducing slab into Solaris.

In fact, Linux has improved the slab allocation pattern, which does not require initialization or recycling for memory area processing. For efficiency reasons, Linux does not invoke the object's construction or destructor, but rather null the pointer to both functions. The main purpose of introducing slab in Linux is to reduce the number of calls to the partner algorithm.

2 in fact, the kernel often uses a memory area over and over again. For example, whenever the kernel creates a new process, it allocates memory areas for the process-related data structures (task_struct, open file objects, and so on). When the process ends, the memory area is retracted. Because processes are created and revoked very frequently, earlier versions of Linux spend a lot of time allocating or reclaiming these memory areas over and over again. Starting with Linux2.2, save frequently used pages in the cache and reuse them.

3 can classify the memory area according to its frequency of use. For memory areas that are expected to be used frequently, you can create a set of dedicated buffers of a specific size to handle to avoid the generation of internal fragments. For less-used memory areas, it is possible to create a set of universal buffers (such as the 2 power of the Linux2.0) to handle, even if the processing pattern produces fragmentation, which has little impact on the performance of the entire system.

4 The use of the hardware cache provides another reason to minimize calls to the partner algorithm, since every call to the partner algorithm "gets dirty" with the hardware cache, which increases the average number of accesses to memory.

The slab allocation pattern places objects into buffers (although the word cache is used in English, it actually refers to areas in memory, not hardware caches). Because the organization and management of the buffer are closely related to the hit rate of the hardware cache, thus, the slab buffer is not made directly by the individual objects, but by a series of "chunks" (slab), and each chunk contains several objects of the same type, which are either allocated, or idle, As shown in Figure 6.12. Generally speaking, objects are divided into two kinds, one is a large object, the other is a small object. Small objects are those that can accommodate several objects in a single page. For example, an INODE structure accounts for more than 300 bytes, so a page can hold more than 8 inode structures, so the inode structure is a small object. Objects less than 512 bytes are called small objects in the Linux kernel. In fact, the buffer is the main memory of a region, dividing the area into multiple blocks, each piece is a slab, each slab by one or more pages, each slab storage is the object.

2, view slab cache information through the Slabtop command

Through slabtop we see the Linux system has a large number of dentry cache memory footprint, that dentry cache is what it is.

First of all, we know that the inode corresponds to the specific object on the physical disk, and Dentry is a memory entity, where the D Inode member points to the corresponding inode, so you can see dentry as a Linux file system of an index node (inode) Link, This index node can be either a file or a directory. Dentry cache is a directory entry cache, Linux is designed to improve the processing efficiency of directory item objects, it records the directory entries to the Inode mapping relationship.

3, combined with Storm service analysis

The current server is Storm cluster node, first thought of storm related work process, strace the storm worker process found that there are very frequent stat system calls occur, and stat files are always the new file name:

[@storm-yd8325 ~]# strace-fp 31984-e trace=stat

The storm worker process is further observed to create, open, close, and delete the heartbeat file frequently in the local directory, with a new filename per second:

[@storm-yd8325 ~]# sudo strace-fp 31984-e trace=open,stat,close,unlink

Summary: The frequent file IO operations of the storm process have caused the dentry_cache to occupy too much of the memory resources of the system.

4, the system of automatic slab cache recovery

In the slab cache, objects are divided into sreclaimable (recyclable) and sunreclaim (not recyclable), while most objects in the system are recyclable. The kernel has a parameter that automatically triggers the recycle operation when the system memory is used to a certain amount of time. Kernel parameters:

Vm.min_free_kbytes = 836787

1) represents the minimum amount of free memory that the system retains.

When system initialization calculates a default value based on the memory size, the calculation rule is:

Min_free_kbytes = sqrt (lowmem_kbytes *) = 4 * sqrt (lowmem_kbytes) (Note: Lowmem_kbytes can be considered system memory size)

In addition, the calculated value has the smallest maximum limit, the smallest is 128K, the maximum is 64M.

It can be seen that the min free Kbytes with the increase in system memory is not linear growth, because with the increase in memory, there is no need to set aside too much memory, to ensure that the use of emergency time is enough.

2 The main purpose of min free Kbytes is to compute three parameters that affect memory recycling Watermark[min/low/high]

(1) Watermark[high] > watermark [Low] > Watermark[min], each zone a set of

(2) When the system is free of memory less than Watermark[low], start the kernel thread KSWAPD a memory recycle (one for each zone) until the zone has reached Watermark[high amount of free memory to stop recycling. If the upper-level application for memory is too fast, causing free memory to fall to Watermark[min], the kernel makes direct reclaim (directly reclaimed), which is reclaimed directly in the process context of the application, and then uses the reclaimed free page to satisfy the memory request. As a result, the application is actually blocked, resulting in a certain latency of response, and may trigger system oom. This is because Watermark[min] the following memory belongs to the system's retained memory for special use, so it will not be used for user-state general applications.

(3) Calculation method of three watermark:

Watermark[min] = min Free kbytes conversion to page unit can be assumed as min free pages. (Because each zone has a set of watermark parameters, the actual calculation effect is based on the size of each zone the proportion of total memory size, and calculated from the per zone min free pages)

Watermark[low] = watermark[min] * 5/4
 Watermark[high] = watermark[min] * 3/2

So the buffer amount in the middle is high-low = Low-min = Per_zone_min_free_pages * 1/4. Because min_free_kbytes = 4* sqrt (lowmem_kbytes), can also be seen in the middle of the buffer volume is also with the growth rate of the memory of the root relationship.

(4) The watermark of each zone can be viewed through/proc/zoneinfo

For example:

Node 0, Zone      DMA
pages free     3960
       min      -      eight high     97

3 The influence of min free Kbytes size

The larger the min_free_kbytes, the higher the watermark line, and the corresponding increase in the amount of buffer between the three lines. This means that an earlier boot KSWAPD is recycled and more memory is recycled (until Watermark[high] stops), which allows the system to reserve too much free memory to some extent reducing the amount of memory the application can use. In extreme cases, when you set the Min_free_kbytes close to memory size, the memory left to the application is too small and may cause oom to occur frequently.

Min_free_kbytes is set too small, it can cause the system to reserve too little memory. There will also be a small amount of memory allocation (PF_MEMALLOC) flags in the process of KSWAPD recycling, which allows KSWAPD to use reserved memory, and the other case where the process is killed by the Oom selected and the reservation can be used if you need to apply for memory during the exit process. In both cases, allowing them to use reserved memory prevents the system from entering the deadlock state.

5, the final result

Finally found that slab cache occupancy is a normal problem, and when the memory reaches the system minimum free memory limit, will automatically trigger the KSWAPD process to reclaim memory, is a normal phenomenon.

Note: Measured, when the adjusted min free Kbytes value is greater than the system idle memory, the KSWAPD process does enter the running state from hibernation, and begins to reclaim memory.

According to our running Storm service make the following adjustments:

Vm.vfs_cache_pressure = 200

This file indicates the kernel's propensity to recycle cache memory for directory and inode; The default value of 100 means that the kernel will keep directory and Inode cache at a reasonable percentage according to Pagecache and Swapcache ; Lowering the value below 100 will result in the kernel tending to keep directory and Inode cache; Increasing this value by more than 100 will result in the kernel tending to recycle directory and Inode cache.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.