Cgroup-linux Memory Resource Management

Source: Internet
Author: User

Hi, this is Zorro. This is my Weibo address, I will not regularly update the article here, if you are interested, can come to watch me yo.

In addition, my other contact information:

Email: [email protected]

QQ: 30007147

This document PDF

Before we talk about the memory limit of cgroup, we need to explain it first:

Linux Memory Management Fundamentals free Command

From any point of view, the memory management of Linux is a piece of trouble, of course, we can also use a pile, a piece, a piece, a basket to describe the matter, but no doubt, with a lump to describe it is simply the right incomparable. Before I could understand it, I wouldn't even believe that subtlety and nausea could describe the same thing at the same time, yes, it seems to me. In fact, I just make a cushion, let us understand, we have to talk about the content, is not a system of knowledge, so, learning is indeed very troublesome. Even before I wrote this technical article, I thought about it for a long time. Where do you start writing? Think for a long while, or can not mundane, we helpless, still first from the free Command said:

[[email protected] ~]# free       total used       free     shared    buffers     cachedmem:     131904480    6681612  125222868          0     478428    4965180-/+ buffers/cache:    1238004  130666476Swap:      2088956          0    2088956

This command is almost always a command for anyone who has ever used Linux, but the more such commands seem to really understand the fewer people (I mean the smaller the ratio). In general, the understanding of this command can be divided into the following stages:

    1. I wipe, memory with a lot of, 6 more G, but I do not run anything ah? Why is that? Linux is good for memory.
    2. Well, according to my professional eyes, the memory is used 1G multipoint, there are a lot of remaining memory available. Buffers/cache occupy more, indicating that there are processes in the system have read and write files, but it does not matter, this part of the memory is when idle to use.
    3. That's what the free show is, okay, I got it. God horse? You ask me these memory enough, of course I do not know! How do I know how your program is written?

If you are aware of the first stage, then please continue to add knowledge about Linux Buffers/cache. If you're in the second phase, well, you're already an old hand, but it's a reminder that God will give you a dog when you close a door. Yes, the strategy of Linux is: memory is used, not for viewing. But, as long as it is used, there is no cost. What is the cost, with your understanding of the Buffer/cache, should be able to think out. Generally I agree with the third situation, generally light with a free command display, is unable to determine any valuable information, we need to combine the business scenarios and other outputs to comprehensively judge the current problems encountered. Of course, it may be that such a person's first impression is that he is a layman, or he is really a layman.

In any case, the free command does give me some useful information, such as the total amount of memory, how much is left, how much is used on the Buffers/cache, how much Swap is used, and if you can see some other content with other parameters, do not list them here. So here's another concept, what is buffer? What is the cache? What is swap? This leads us directly to another command:

[[email protected] ~]# cat/proc/meminfomemtotal:131904480 kbmemfree:125226660 kbbuffers:478 504 kbcached:4966796 kbswapcached:0 kbactive:1774428 kbinactive:3770380 kbactive (A           Non): 116500 kbinactive (anon): 3404 kbactive (file): 1657928 kbinactive (file): 3766976 kbunevictable: 0 kbmlocked:0 kbswaptotal:2088956 kbswapfree:2088956 kbdirty:336 kbwriteback : 0 kbanonpages:99504 kbmapped:20760 kbshmem:20604 kbslab:30129 2 kbsreclaimable:229852 kbsunreclaim:71440 kbkernelstack:3272 kbpagetables:3320 kbnfs_unstab  le:0 kbbounce:0 kbwritebacktmp:0 kbcommitlimit:68041196 kbcommitted_as:352412 kbvmalloctotal:34359738367 kbvmallocused:493196 kbvmallocchunk:34291062284 kbhardwarecorrupted:0 KBAnonH ugepages:49152 Kbhugepages_total:0hugepages_free:0hugepages_rsvd:0hugepages_surp:0hugepagesize:2048 KBD irectmap4k:194816 kbdirectmap2m:3872768 kbdirectmap1g:132120576 KB

What are the things that are shown above?

In fact, the answer to this question is the answer to another question: How does Linux use memory? It is necessary to understand this problem, because only when we know how to use memory in Linux, can we know how memory can be limited, and what is the problem after we have made a restriction? In this precedent, we cite the meaning of several common concepts:

Memory, as a relatively limited resource, the kernel when considering its management, should be mainly from the following starting point to consider:

    1. What to do when memory is sufficient?
    2. What if I don't have enough memory?

When memory is sufficient, the core idea is how to maximize the efficiency of resource utilization, in order to speed up the overall response speed and throughput of the system? As a result, memory is the function of a large buffer between CPU and I/O. To do this, the kernel designs the following systems for this function:

Buffers/cached

Buffer and cache are two nouns that are used in computer technology, which can have different meanings in a non-context. In memory management, we need to specifically clarify that the buffer here refers to the Linux memory: buffer cache. The cache here refers to the memory of Linux: Page cache. Translated into Chinese can be called buffer cache and page cache. Historically, one (buffer) was used as a cache for IO devices, while the other (cache) was used as a read cache for IO devices, where IO devices mainly referred to block device files and ordinary files on file systems. But now, the meaning of them is different. In the current kernel, page cache is the name of the cache for memory pages, and it is plain that the page cache can be used as its cache if the memory is managed with page allocation. Of course, not all of the memory is managed by pages (page), and many are managed against blocks (block), which are used in buffer cache if the cache function is to be used. (from this point of view, is it better to rename the buffer cache called block cache?) However, not all blocks have a fixed length, the length of the block on the system is determined primarily by the block device used, and the page length is 4k on the X86 either 32-bit or 64-bit.

The difference between these two sets of caching systems is understood, and it is essential to understand what they can do.

What is page cache

Page cache is used primarily as a cache of file data on the file system, especially when the process has read/write operations on the file. If you think about it, as a system call that can map files to memory: Mmap is it natural that you should also use page cache? If you think about it again, will malloc use page cache?

The above questions are asked to consider their own, this document will not give a standard answer.

In the current implementation, page cache is also used as a caching device for other file types, so in fact page cache is responsible for most of the block device file caching.

What is buffer cache

The buffer cache is designed to be used by a system that caches chunks of data when the system is read and written to a block device. However, since page cache is also responsible for caching the read and write of block device files, the current buffer cache is actually responsible for a relatively small number of tasks. This means that some operations on blocks are cached using buffer cache, such as when we are formatting the filesystem.

In general, two cache systems are used together, such as when we write to a file, the contents of the page cache are changed, and the buffer cache can be used to mark the page as a different buffer, and to record which buffer was modified. This way, the kernel does not have to write the entire page back when it performs subsequent write-back (writeback) of the dirty data, only to write back the modified part.

People with large-scale system experience know that the cache is like a balm, where there is a bottleneck of speed difference, where it can be erased. But one of its costs is that it needs to maintain the consistency of the data. The memory cache is no exception, the kernel needs to maintain its consistency, the overall efficiency of the cache system decreases as the dirty data is produced faster or with a large amount of data, because the dirty data writeback is also consumed by IO. This phenomenon can also be manifested in a situation where you find free when you use a large amount of memory, but the remainder of the Buffer/cache is removed after the use. In a general sense, it is true that the kernel will divide the memory occupied by Buffer/cache into the process if it requests memory at this point. However, the cost is that when allocating this portion of memory that has already been buffer/cache, the kernel first writes back the dirty data above it, ensuring that the data is consistent before it is emptied and distributed to the process. If your process is suddenly requesting a lot of memory at this time, and your business is always generating a lot of dirty data (such as logs), and the system does not write back in time, the system to the process of allocating memory efficiency will be very slow, system IO will be very high. So at this point do you think Buffer/cache can be used as free memory?

Study questions: When does Linux write dirty data back to external devices? How does this process make human intervention?

This proves that, with the complexity of memory management, we must evaluate the data given by system monitoring commands in conjunction with the application state on the system, which is the correct way to evaluate. If you don't, then you can easily get the "Linux system sucks!" "Such a conclusion. Perhaps this time, in fact, you run on this system application sucks cause of the problem.

Next, what to do when memory is not available?

It seems that we have analyzed a state where memory is not enough, that is, the large number of buffer/cache that are almost full of memory. But based on the Linux memory usage principle, this is not enough, but this state causes IO to become higher. Let's think about it, assuming the system has cleaned up enough buffer/cache to give memory, and the process is yelling about what to do with memory?

At this point the kernel will start a series of means to make the process as normal as possible at this time to run.

请注意我在这说的是一种异常状态!我之所以要这样强调是因为,很多人把内存用满了当称一种正常状态。他们认为,当我的业务进程在内存使用到压力边界的情况下,系统仍然需要保证让业务进程有正常的状态!这种想法显然是缘木求鱼了。另外我还要强调一点,系统提供的是内存管理的机制和手段,而内存用的好不好,主要是业务进程的事情,责任不能本末倒置。

Who should SWAP?

The first is the Swap mechanism. Swap is an exchange technology, which means that when memory is not enough, we can selectively use a disk, partition or a file as a swap space, the memory of some temporarily unused data into the swap space to free memory resources to the emergency process.

What data might be swapped out? Conceptually, if a piece of in-memory data is accessed frequently, it should not be swapped out to external devices, because such data, if swapped out, can result in a severe decrease in system responsiveness. Memory management requires that memory be differentiated into active (active) and inactive (Inactive), plus the user-space memory mappings used by a process include file mapping (files) and anonymous innuendo (anon), so it includes active (anon), Inactive (anon), Active (file), and Inactive (file). You say God horse? What are file mappings (files) and anonymous innuendo (anon)? Well, we can simply understand that anonymous innuendo is mainly such as the process using malloc and mmap map_anonymous Way to request the memory, and the file is mapped to the file system using mmap innuendo files, such file system files both include ordinary files, Also includes a temporary file system (TMPFS). This means that the IPC and POSIX IPC of Sys V (IPC is the interprocess communication mechanism, where the primary means of shared memory, array of semaphores, and Message Queuing) are reflected in the user-space memory by means of file innuendo. Both of the mapped memory will be counted as the RSS of the process, but it will also be displayed in the cache memory count, and in another statistic of the relevant cgroup, the use of shared memory and the file cache will be counted as the total amount of cache used in Cgroup. This statistic shows the following method:

[Email protected] ~]# Cat/cgroup/memory/memory.statcache 94429184rss 102973440rss_huge 50331648mapped_file 21512192swap 0pgpgin 656572990pgpgout 663474908pgfault 2871515381pgmajfault 1187inactive_anon 3497984active_anon 120524800inactive_file 39059456active_file 34484224unevictable 0hierarchical_memory_limit 9223372036854775807hierarchical_memsw_limit 9223372036854775807total_cache 94429184total_rss 102969344total_rss_ Huge 50331648total_mapped_file 21520384total_swap 0total_pgpgin 656572990total_pgpgout 663474908total_pgfault 2871515388total_pgmajfault 1187total_inactive_anon 3497984total_active_anon 120524800total_inactive_file 39059456total_active_file 34484224total_unevictable 0

Well, said so half a day finally contacted a cgroup memory limit related to the file. In this need to explain is that you see me nonsense so much, because we must first understand the Linux system memory management mode, in order to further the memory limitations in Cgroup planning to use, otherwise the same noun will have a lot of ambiguity. For example, when we look at the cache occupancy data in a cgroup, how do we actually understand it? Do you really think of it as a free space?

We're a little far away, so what does that have to do with Swap? Or just the question, what content should be swapped out of memory? File cache is definitely not needed, because it is the cache, it means that it is the file on the hard disk (of course you should know now, it is not only the file), then if the hard disk files, you do not swap out, as long as the data to write back the dirty data, and then clear after the same , this is the cache clear mechanism just mentioned. But we also need to know that not all the space marked as cache can be written back to the hard disk (yes, like shared memory). Then the memory that can be swapped out should mainly include Inactive (anon) this part of memory. The main note is that the kernel also counts shared memory as a count near Inactive (anon) (yes, shared memory can also be Swap). Also, add that if the memory is locked by the mlock tag, it will not be exchanged, which is the only effect on memory plus mlock lock. The counts we discussed just now are likely to change as the Linux kernel version changes, but for a long period of time, we can understand that.

We basically understand the effect of swap this mechanism, then since swap is the internal device and external device data copy, then add a cache is very necessary, this cache is Swapcache, in the Memory.stat file, Swapcache is anon pag E is recorded together in RSS, but does not contain shared memory. In addition, Hugepages will not be exchanged. Obviously, how much of the current swap space is used, and how much of it, we can also find the answer in the relevant data.

There are also some nouns in these concepts that you may not know about, such as RSS or hugepages. Please check the information yourself to fill in this knowledge. In order for everyone to really understand what RSS is, consider the VSZ shown in the PS aux command, RSS and cat/proc/pid/smaps, which show the difference between the three processes that PSS uses for memory metrics?

When to SWAP?

Figure out who should swap, then know when to swap. This looks simple, memory is exhausted, and the cache has nothing to recycle, it should trigger swap. In fact, the reality is not so simple, in fact, the system in memory pressure may not be the case will swap, this situation is not the scope of our discussion today.

Study questions: In addition to the memory is exhausted when the swap, and when will swap? How do I adjust the behavior of kernel swap? How do I see what the swap space is for the current system? What are the types? What is swap weight? What's the point of swap weights?

In fact, most of the time, when the swap is not important, and after the swap is relatively more important. Most of the memory is not enough, but temporarily not enough, such as the sudden increase in the burst, such as unexpected situations, this situation is characterized by a short time, the swap mechanism as a temporary transit measures, can play a role in the protection of business processes. Because if there is no swap, the result of memory exhaustion is usually triggered by the Oom killer, which will kill the process of high integration at this time. If more serious, the memory is not enough to also trigger the process D-state deadlock, which generally occurs when more than one process to request memory at the same time, the Oom killer mechanism may also be invalidated, because the need to be eliminated the higher integration process is likely to need to request memory process, While the process itself is in the state of D because it is scrambling for memory, then kill may not be valid for it at this point.

But swap does not always have a good protective effect. If the memory request is long-term and large, then the exchange of data will be due to the long time residing on the external device, causing the process to call this memory greatly increased, when the process is very frequent use of the memory it has been swapped out, the whole system will be in the busy state of Io, the process will be severely degraded response time , causing the entire system to be compacted to death. For system administrators, this is completely unacceptable, because the first priority after a failure is to recover the service quickly, but the frequent IO busy state of swap causes the system to recover from this state in addition to the power-off restart, so this situation is to be avoided as much as possible. At this point, if necessary, we can even consider not swap, even if the memory overuse is oom, or the status of Process D is better than the condition that the swap causes the system to die. If your environment needs to be this way, consider closing swap.

What happens when a process requests memory?

Just now we have a brief description of what is Buffer/cache and swap from a macroscopic point of view of the system. Let's take a more microscopic look at when the process of a memory application and the mechanisms involved are triggered in tandem. The process described in this article is based on the Linux 3.10 kernel version, and the basic process of Linux 4.1 has changed little. If you want to make sure what's going on on your system, please flip through the relevant kernel code yourself.

Process request memory may be used in many ways, the most common is malloc and mmap. But this is not important to us because either malloc or mmap, or any other method of applying memory, does not really allow the kernel to allocate an actual physical memory space to the process. The behavior that really triggers the allocation of physical memory is a missing pages exception .

The fault is what we can see in the Memory.stat Total_pgfault, this anomaly is generally divided into two kinds, one is called Major fault, the other is called minor fault. The main difference between these two exceptions is whether the memory data requested by the process will cause disk IO? If it is, it is a majfault, if not, it is minfault. That is, if major fault is produced, this data basically means that it has been swapped into swap space.

The processing of fault pages may be organized into the following paths:

First check that the virtual address you want to access is legitimate, and if it is legal, continue to find and assign a physical page, as follows:

    1. Check that the virtual address where the exception occurred is not present in the physical page table? If it is, and is anonymous innuendo, then apply for 0 of anonymous mapping memory, it is also possible to allude to some kind of virtual file system, such as shared memory, then to allude to the relevant memory area, or the occurrence of COW write the copy request new memory. If it is a file mapping, there are two possibilities, one is that the mapping area is a page cache, directly refers to the relevant page cache area, or COW new memory storage needs to be mapped file content. If the page cache does not exist, it means that the area has been swapped to swap space and should be processed for swap.
    2. If the page table already exists need to allude to memory, then check whether to write to the memory, if not write, then directly reuse, if you want to write, it happens COW write copy, at this time the COW with the above process is not exactly the same, in the kernel, here is mainly through the Do_wp_page method implementation.

If you need to request new memory, you willrequest new memory through Alloc PAGE_VMA, and the core method of this function is _alloc_pages_nodemask, which is the Linux kernel's famous memory management system partner system The implementation.

The allocation process first checks that the Free page table has no pages to apply, the implementation method is: Getpage_from_freelist, we do not care about the normal situation, the point of course all OK. More importantly, exception handling, if not available in idle, will go into the _alloc_pages_slowpath method for processing. The main logic of this process is probably this:

    1. Wake up the KSWAPD process, swap out the memory that can be swapped out, and let the system have memory available.
    2. Continue checking to see if there is memory in the idle. With the OK, there is no further next:
    3. Attempts to clean the page cache, and the process is set to the D state when it is cleaned up. If you have not yet applied to the memory:
    4. Start Oom killer kill some processes to free up memory, if that doesn't work:
    5. Go back to step 1 again!

Of course, the above logic to meet some conditions, but this is generally the default state of the system, for example, you must enable the Oom killer mechanism. In addition, there are many other states in this logic that are not related to this article, such as checking memory watermarks, checking for high-priority memory applications, and, of course, the processing of the state of NUMA nodes, which I did not list. In addition, the logic above is not only to clean up the cache when the process into the D state, and other logic will do the same. This is why, in the case of insufficient memory, the Oom killer sometimes does not take effect, because the process that is likely to be killed is just in the D state of the logic.

The above is the process of memory application, probably what will happen. Of course, this time we are really focused on the focus of this article cgroup memory limitations, when we deal with the limit, the more we need to care about when the memory is overrun what will happen? The processing of the boundary conditions is our topic, so I do not have the normal application to the details of the situation, nor the user state when using malloc to use SBRK or mmap to apply for memory to make details, after all, it is the normal state of the program when things, Later, you can write another memory-optimized article that mainly explains that part.

Now we should get to the point:

Cgroup Configuration of Memory limits

When limiting memory, we'd better figure out what happens if memory is overrun. What to do with it? Can the business accept such a state? That's why we're talking about how to limit the "nonsense" before saying so much basic knowledge. In fact, the simplest is how to limit, our system environment is still the same as the last one to explain the CPU memory isolation environment, using cgconfig and cgred services for Cgroup configuration management. Or create a Zorro user that has a memory limit on the process that this user is generating. Basic configuration method no longer say, if you do not know, please refer to this document.

Once the environment is configured, we can check the relevant files. The related directory of memory limit is placed in the/cgroup/memory directory according to Cgconfig.config configuration, if you have made the same configuration with me, then the content of this directory should be like this:

[[email protected] ~]# ls/cgroup/memory/cgroup.clone_children memory.failcnt memory.kmem.slabinfo Memory.kmem.usage_in_bytes memory.memsw.limit_in_bytes Memory.oom_control memory.usage_in_by        TES Shrekcgroup.event_control memory.force_empty memory.kmem.tcp.failcnt memory.limit_in_bytes Memory.memsw.max_usage_in_bytes memory.pressure_level memory.use_hierarchy Taskscgroup.procs Mem      ory.kmem.failcnt memory.kmem.tcp.limit_in_bytes memory.max_usage_in_bytes memory.memsw.usage_in_bytes Memory.soft_limit_in_bytes Zorrocgroup.sane_behavior memory.kmem.limit_in_bytes Memory.kmem.tcp.max_u Sage_in_bytes memory.meminfo memory.move_charge_at_immigrate Memory.stat Notify_on_releasej Erry memory.kmem.max_usage_in_bytes memory.kmem.tcp.usage_in_bytes memory.memsw.failcnt Memo            Ry.numa_stat     Memory.swappiness release_agent 

Among them, Zorro, Jerry, and Shrek are directories that are conceptually similar to the CPU-isolated tree structure. Related configuration file contents:

[Email protected] ~]# cat/etc/cgconfig.conf    Mount {    cpu =/CGROUP/CPU;    Cpuset  =/cgroup/cpuset;    Cpuacct =/cgroup/cpuacct;    Memory  =/cgroup/memory;    devices =/cgroup/devices;    Freezer =/cgroup/freezer;    Net_cls =/cgroup/net_cls;    Blkio   =/cgroup/blkio;} Group Zorro {    CPU {        cpu.shares = 6000;#       cpu.cfs_quota_us = "600000";    }    Cpuset {#       Cpuset.cpus = "0-7,12-19"; #       Cpuset.mems = "0-1";    }    Memory {    }}

The configuration adds a true memory of the empty configuration item, we wait for a moment to add the configuration.

[Email protected] ~]# cat/etc/cgrules.conf Zorro       cpu,cpuset,cpuacct,memory zorrojerry   cpu,cpuset       , Cpuacct,memory   Jerryshrek       cpu,cpuset,cpuacct,memory   Shrek

Remember to restart the service when you have finished modifying the file:

[[Email protected] ~]# service cgconfig restart[[email protected] ~]# service cgred restart

Let's continue to see what configuration parameters are true for memory:

 [[email protected] ~]# ls/cgroup/memory/zorro/cgroup.clone_children memory.kmem.failcnt memory.kme M.tcp.limit_in_bytes memory.max_usage_in_bytes memory.memsw.usage_in_bytes memory.soft_limit_in_bytes CG                   Roup.event_control memory.kmem.limit_in_bytes memory.kmem.tcp.max_usage_in_bytes Memory.meminfo Memory.move_charge_at_immigrate Memory.stat Notify_on_releasecgroup.procs Memory.kmem.max_usage _in_bytes memory.kmem.tcp.usage_in_bytes memory.memsw.failcnt Memory.numa_stat MEMORY.SW Appiness tasksmemory.failcnt memory.kmem.slabinfo memory.kmem.usage_in_bytes memory. Memsw.limit_in_bytes Memory.oom_control Memory.usage_in_bytesmemory.force_empty Memory.kmem.tcp.fai lcnt memory.limit_in_bytes memory.memsw.max_usage_in_bytes memory.pressure_level memory. Use_hierarchy 

First we have known the Memory.stat file, the content of this file can not be modified, it is actually output current Cgroup related memory usage information. Common data and its meanings as we have already said, we will not repeat it here.

Cgroup Memory Limit

memory.memsw.limit_in_bytes: Total limit used for memory + swap space.

memory.limit_in_bytes: Memory usage limit.

The meaning of these two things is clear, if you decide to turn off the swap function in your cgroup, you can set the content of the two files to the same value. As for why believe that everyone can think clearly.

OOM Control

Memory.oom_control: Oom behavior control after memory overrun.
There are two values in this file:

Oom_kill_disable 0

The default is 0 to open the Oom killer, which means that the kill process is triggered when the memory is over-timed. If set to 1 to turn off Oom killer, the memory overrun does not trigger a kernel kill process. Instead of compacting the process (hang/sleep), the kernel is actually setting the process to the D state and putting the related process into a queue called Oom-waitqueue. At this point the process can kill. If you want to keep these processes going, you can choose from several methods:

    1. Increase the memory so that the process has memory to continue applying.
    2. Kill some processes so that there is memory available in this group.
    3. Move some of the processes to a different cgroup to make the memory available within the Cgroup.
    4. Deleting some TMPFS files is a memory-intensive file, such as shared memory or other memory-intensive files.

To put it bluntly, a process that is suspended in the Oom-waitqueue queue can continue to run only if more memory is available in the Cgroup.

Under_oom 0

This value is only used to see if the current Cgroup state is already oom, and if so, this value will be displayed as 1. We manage the behavior after Cgroup memory overrun by setting and monitoring these two values in this file. In the default scenario, if you use swap, then your cgroup limit memory after the most common exception effect is IO higher, if the business is unacceptable, we generally do is to close swap, then Cgroup memory oom will trigger the kill process, if we use the L A container such as XC or Docker, it may also kill the entire container. Of course, the kill process often causes the entire Docker or LXC container to simply not be killed because the process is in the D state. As for the reasons, it has been made clear in the preceding words. What should we do when we meet this dilemma? A good way is to close the Oom killer, let the memory overrun, the process hangs, after all, this way is relatively controllable. At this point we can check the value of the under_oom to see if the container is in an overrun state, and then decide how to handle the business according to the characteristics of the business. My recommended approach is to close some processes or restart the entire container, as it can be imagined that the service hosted by the container technology should be a fault-tolerant business on the overall software architecture, and the typical scenario is a Web service. Container technology is characterized by a short life cycle, in such a scenario, kill a few processes or several containers, should have little impact on the overall stability of the service, and the container start speed is very fast, in fact, we should think that the start speed of the container should be comparable with the process start speed can be compared. Will your business be unstable due to the death of several processes? If not, please rest assured of them, the big deal will soon start up again. But if your business is not that way, then make a follow-up strategy based on your own situation.

When we have memory limitations, memory overrun occurs more frequently than using physical machines, because the amount of memory that is limited is generally less than the actual memory. Therefore, services that use memory-throttling-based container technology should consider more of their own memory usage, especially how business exception handling after memory overrun should reduce service impact to a lower level. Work together at the system level and at the application level to achieve the best possible effect of memory isolation.

Memory Resource Audits

memory.memsw.usage_in_bytes: The current amount of memory + swap used for cgroup.

memory.usage_in_bytes: The amount of memory used for the current cgroup.

memory.max_usage_in_bytes: The maximum amount of memory + swap used in Cgroup.

memory.memsw.max_usage_in_bytes: The maximum amount of memory used by Cgroup.

At last

The memory limit of Linux is so much, when we limit the memory, compared to the use of physical machines, in fact, for the application is less usable memory, so the business is relatively more frequent exposure to memory resources in a state of tension. Compared to virtual machines (KVM, Xen), where multiple cgroup are shared cores, we can think of some of the many features of "container" technology relative to virtual machines and physical machines from a memory-constrained perspective:

    1. Memory is more intense and application memory leaks can cause relatively more serious problems.
    2. Container life cycle time is shorter, if the physical machine start-up running time is calculated in the year, then the virtual machine is calculated in the month, and the container should be the same as the lifetime of the process, at most in days units. Therefore, the application of the container to run should be able to be restarted frequently.
    3. When there are multiple cgroup (containers) running at the same time, we can no longer use the physical machine or virtual machine to understand how resources are used to plan the overall operation, we need more detail to understand what is the cache, what is swap, what is the shared memory, which are counted into which resource count? In an environment where the kernel does not conflict, these resources are used independently for a particular business and are not ambiguous in understanding, even if they are not clear. However, in Cgroup, we need to thoroughly understand these details in order to pre-contract the situation encountered and plan different processing strategies.

Maybe we can get more understanding from it, let's think about it together?

The content of this article is not complete because of the word limit. See PDF for full details.

Cgroup-linux Memory Resource Management

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.