Threat from Killer Kernel configuration change-Swappiness

Source: Internet
Author: User

Threat from Killer Kernel configuration change-Swappiness

We are under non-hacker attack. We use the Linux kernel version 3.5-rc1 and RedHat backport patch to deal with swappiness = 0. This is a real threat. One of our customers is affected and the OOM mechanism is used to crash the MySQL master database server. This "tiny" Change in the kernel causes the system to fail to perform Swap properly, which directly causes the OOM mechanism to kill the MySQL process. This raises doubts about the following explanation: the system already has GB of memory, many of which are in idle state, and GB of idle virtual memory, so the OOM mechanism should not be enabled under any circumstances.


 

We thought it was because NUMA (previously written about NUMA), but if so, we will see some excessive Swapping due to intra-node. By installing numctl and configuring mysql-safe, we can use the NUMA interactive mode, but it eventually crashes.


 

Originally, the server had a new RHEL/Centos 6.4 kernel 2.6.32-358, which was released in February 2013. This version of the kernel and later versions all have backport patches, and the system can be upgraded to 6.4 or higher. We expect many problems in this key field.


 

This is frustrating because RedHat should not have changed some behaviors in the backport or in a lifecycle like RHEL6. Their purpose is clear and such things won't happen, for example, the behavior of the system is consistent within 5-10 years. Therefore, when a major problem such as this occurs in a product life cycle, the situation is very bad, such as forced upgrades, configuration changes, default installation upgrades, monitoring, and audit changes. Most of the latest Debian/Ubuntu systems will also have these problems, because they also have Kernel updates, maybe the same backport.


 

Swappiness is often misunderstood by engineers. It can be set to a value from 0 to notify the kernel which is more important, whether it is pagecache (file cache) or application memory. The default value is 60, indicating that pagecache memory can be used more, but this is a very wrong configuration for the server. From the virtualization point of view, all servers require application memory, which is worse than file cache. Therefore, we always set it to 0, indicating that the file cache will be released until any application memory of swap. But now, this bug leads to fewer swapping and greatly increases the chance that the OOM mechanism works under memory pressure. This problem is indeed not what we want. What are the technical solutions that can be quickly solved? Fortunately, we have a very simple solution. Set swappiness to 1, which is almost the same priority as 0 to protect application memory, but does not trigger kernel changes. In this case, 1 is better than 0.

 


 

As always, we will monitor and manage these types of problems for our customers, constantly upgrade the default installation configuration, and upgrade cyclically to influence the system.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.