009_ shut down the Linux THP

Source: Internet
Author: User

Background:
Company A large business system feedback the recent database server is always down (described here is not accurate, explained later), finally, the customer, operation and maintenance personnel feel really unbearable, the project manager called to find I asked if I can help diagnose, Just the next day to go to the scene to communicate the test requirements of another system, so promised to look at the next day.
------------------------------------

To troubleshoot the resolution process:
The next day came to the scene, is communicating the needs of the time, operation and maintenance personnel suddenly said, operations began to card again,
So even on the server, first with top probably looked at the use of resources, at this time the CPU is basically full load, and can be found that the user state of the CPU is not high, most of the time is the kernel state CPU occupied,

I started to wonder if the database service was having a problem with one of the underlying calls, a dead loop?
So immediately with perf top probably looked at a bit,

found that the larger proportion is the spin lock and a compaction_alloc, memory defragmentation?
Judging from this information, what might be the memory operation caused many threads to wait in the critical section.
to further figure out what the operation is causing, the call stack for the kernel parameters is sampled
perf record-a-g-f "G" means to store data according to the calling relationship; "-F-Sleep 60" Represents a minute of the frequency of taking 1000 samples per second.
After taking the sample, open the sampled data using perf Report-g, you can see the following call stack:

It is obvious that this spin lock is caused by the defragmentation of the memory page, and the defragmentation is caused by Hugepage ,
When I see this, I suddenly think of a thp feature of Linux, which seems to be kelnel After the 2.6.38 version began to add,
This feature is actually the use of such a huge page is transparent to the user, the user does not need to make a huge page configuration,
Memory will automatically be a continuous 512 ordinary pages as a giant page processing,
as we saw in the previous call stack, This feature requires that memory fragmentation be collated,
so what we see is a memory fragmentation page that moves resulting spin locks, and the root cause is the THP attribute.
know the cause of the problem, the solution is easy, as long as the THP is closed. The
closes as follows:
Vi/etc/rc.local
Add the following command at the end of the file:
if test-f/sys/kernel/mm/redhat_transparent_hugepage/enabled; Then
   echo never >/sys/kernel/mm/redhat_transparent_hugepage/enabled
fi
If Test-f/sys/ Kernel/mm/redhat_transparent_hugepage/defrag; Then
   echo never >/sys/kernel/mm/redhat_transparent_hugepage/defrag
Fi

is saved and restarted.
PS: Here are different versions of the Linux path will be a little different, you watch the

vi/sys/kernel/mm/redhat_transparent_hugepage/enabled
If it appears as follows:

That is, the off THP takes effect.

In fact, this is not a complete solution to the problem, as we said before,
The introduction of THP is to reduce the maintenance staff to configure the huge pages of work , we have the thp characteristics turned off,
The best practice is that we should re-hugepage the shared memory size required by our database service.
After all, in the current dozens of G, even hundreds of memory, if in accordance with the 4K normal page size to maintain the TLB, is also a significant overhead.
Here Hugepage configuration, because the database is different, even the database version is different, the configuration process is not the same, the most important point, I found that this log is a bit too long to write.
Therefore, here does not unfold the repeat, has the time can open the post to speak.
-----------------------------------------------

Solution Effect:
After the two-step process, continuous observation for a few days, sure enough, there is no so-called "downtime" event.
Here "down" with the quotation marks, corresponding to the most front feedback problem when the project manager said the server outage description, in fact, this description is wrong in itself, tomorrow I am ready to explain this in detail: how to correct the question.

What to do: How to turn transparent hugepages off

reference:43270517

009_ shut down the Linux THP

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.