"Cloud and Ink" performance optimization: A reasonable allocation of large memory pages (hugepage) in a Linux environment

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original 2016-09-12 Xiong June

"Cloud and Ink" performance optimization: A reasonable allocation of large memory pages (hugepage) in a Linux environment

Xiong June (Old Bear)

General manager, Cloud and Sumida West

Oracle Aced,acoug Core Membership

The development of PC server to today, the performance has made great strides. The 64-bit CPU has been in the ordinary home PC a few years ago, not to mention the higher-end PC Server, and with the efforts of both Intel and AMD's big processor giants, the x86 CPU has been improving in processing power, and as the manufacturing process develops, the PC The amount of memory that can be installed on the server is also growing, with dozens of g of memory now ubiquitous in PC Server. It is the development of hardware, which makes the processing power of PC server more and more powerful and more and more high performance. On the stability side, with the PC server and Linux operating system, the same can be full of important business systems required stability and reliability. Of course, in terms of cost, quoting a user in the industry software vendors words, "If you do not use PC server to a minicomputer, then what money do we make?" ”。 PC server is much cheaper than a minicomputer with the same processing power, regardless of the initial purchase, the power consumption and maintenance costs of the run-time. It is under the influence of two important factors, performance and cost, that more and more databases are running on the PC server. Some of the clients I serve, even virtualization of high-end pcserver into multiple machines, run a set of Oracle databases on each virtual machine, and these databases are loaded with important production systems.

Without a doubt, running an Oracle database on a PC server, the most appropriate operating system is Linux. As an operating system that is very similar to UNIX, it has the same performance in terms of stability, reliability, and performance as UNIX. However, Linux has an obvious flaw in the memory paging mechanism compared to AIX, HP-UX and other operating systems, and this flaw is particularly noticeable on Oracle databases with larger SGA, which have a significant negative impact on database performance and can even cause the database to stop responding completely. This article will detail this flaw from a case, and use the large memory pages under Linux to solve this problem.

Introduction of the case

Customer's system, a serious performance problem has arisen. When the problem arises, the system is basically unusable and all business operations on the application are completely unresponsive. The system's database is Oracle 10.2.0.4 Oracle DATABASE,CPU running in Rhel 5.2 (Red Hat Enterprise Linux Server Release 5 (Tikanga)) for 4 4-core Xeon processors (in Tel (r) Xeon (r) CPU E7430 @ 2.13GHz), that is, the logical CPU is 16, memory 32GB. During a failure, the database server's CPU remains at 100% for a long time. Even after all of the application's WebLogic servers have been shut down, the database server's CPU utilization has been 100% in a matter of minutes, then gradually dropped, which takes about 20 minutes to fall to normal idle state, because all applications are closed at this time, Only very low CPU utilization is the normal state. According to the database maintenance personnel of this system, this situation has appeared many times, even if the database is restarted, after a day or two, such a failure will also occur. At the same time, the system has not made any major changes recently.

After receiving the fault report, the author can connect to the database database by SSH is very slow, it takes almost 1 minutes to connect. A simple look at the performance of the server, the development of IO very low, memory surplus is more, at least more than 1GB, there is no page in/page out. And the most significant phenomenon is that CPU utilization is quite high, has been maintained at 100%, while the CPU utilization of the SYS part, are above 95%. and the operating system running queue has been more than 200. The usage of server memory is as follows:

From the phenomenon, SYS CPU high is an important clue of the analysis problem.

Once you know the performance at the operating system level at the fastest speed, connect to the database via Sqlplus and see the performance information inside the database: (Note: The following data about SQL, server name, database name and other related information is processed.) ）

... To save space, omit part of the content ...

Call Waitevent to see the wait event

There is no significant exception from the activity in the database and the wait events. It is worth noting that when the database server CPU utilization for a long time in 100%, or physical memory exhaustion and with a large number of swap memory swap out, you need to carefully diagnose the performance phenomena in the database, such as some kind of more waiting events, Is the result of a lack of CPU or memory or because the specific activity in these databases is root cause causes high CPU or memory exhaustion.

From the above data, the active session is not particularly large, less than 50, plus the number of background processes, compared with the operating system up to 200 of the operation, there is no small difference. There are three main classes of non-idle wait events in the database, IO-related waits such as db file sequential Read,database link related sql*net more data from Dblink and latch related wait events. In these three kinds of species, it is usually only latch such wait events that will cause CPU utilization to increase.

By analyzing and comparing the AWR report, there is no noticeable difference between the time of the failure and the normal period from the database activity. But in the system statistics, the difference is large:

The above data is from the 1-hour (1st) and 1-hour (2nd) time-of-day data for the AWR that contains the time period of the failure. For fault analysis, especially if the failure time is relatively short, the 1-hour AWR report is not accurate enough to reflect the performance during the failure period. But when we trouble shooting, the first thing we need to do is to determine the direction from the various data. As mentioned earlier, the CPU utilization of the SYS section is a very important clue, and the other performance data inside the database is very different, you can start with the CPU.

CPU usage Analysis in the operating system

So what does the two different utilization of SYS and user represent in an operating system? Or what is the difference between them?

In short, the SYS part of CPU utilization refers to the CPU used by the operating system kernel (Kernel), which is the CPU consumed by the code running in the kernel state, the most common being the CPU consumed by system calls (SYS call). The user section is the CPU used by the application's own code, which is the CPU consumed by the code running in the user state. For example, when Oracle executes SQL, read data from disk to DB buffer cache, need to initiate a read call, this read call is mainly run by the operating system kernel, including the device driver code, so the CPU consumption is calculated to the SYS section While Oracle parses the data read from the disk, it is only the Oracle's own code running, so the CPU consumed is calculated to the user section.

So what are the main operations or system calls generated by the SYS portion of the CPU:

1. I/O operations, such as reading and writing files, accessing peripherals, transmitting data over a network, etc. This part of the operation generally does not consume too much CPU, because the primary time consumption will be on the IO operating device. For example, when you read a file from disk, the primary time is on the inside of the disk, and the CPU time consumes only a fraction of the I/O operation response time. It is possible to increase the SYS CPU only if there is too much concurrent I/O.

2. Memory management, such as application process to the operating system to request memory, operating system to maintain the available memory, swap space for paging and so on. In fact, similar to Oracle, the larger the memory, the more frequent memory management operations, the higher the CPU consumption.

3. Process scheduling. This part of the CPU is used in the operating system, the length of the queue, the longer the running queue, indicating that the more processes need to dispatch, then the burden of the kernel is higher.

4. Other, including interprocess communication, semaphore processing, some activities within the device driver, and so on.

In terms of performance data from a system failure, memory management and process scheduling are probably the reasons why the SYS CPU is high. However, up to 200 of the running queue is likely due to high CPU utilization, rather than high CPU utilization due to high running queues. The number of active sessions is not particularly high from the database. So next, do you need to focus on whether the system memory management issues are causing high CPU utilization?

Reviewing the system memory aspects of/proc/meminfo collected at the beginning of this article, you can find an important piece of data:

pagetables:4749076 KB

As you can see from the data, the Pagetables memory reaches 4637MB. Pagetables literally means "page table". Simply put, it is the table that the operating system kernel uses to maintain the relationship between the linear virtual address of the process and the actual physical memory address.

Modern computers are typically managed and assigned to physical memory in pages (page Frame), with a page size of 4K on the x86 processor architecture. Processes running on the operating system, the accessible address space is called the virtual address space, and is related to the number of processor bits. For a 32-bit x86 processor, the process has an accessible address space of 4GB. Each process running in the operating system has its own virtual address space or linear address space, which is also managed by page (pages), where the page size is typically 4KB. When the process accesses memory, it is coordinated by the operating system and hardware, and is responsible for translating the virtual address of the process into a physical address. Two different processes, with the same virtual linear address, pointing to the physical memory, may be the same, such as shared memory, or it may be different, such as the private memory of the process.

is about the virtual address and the physical memory correspondence:

Assuming that there are two processes a, B, and a memory pointer to the address of 0x12345 (0x for 16), such as a process fork or clone out of another process, then these 2 processes will have a pointer to the same memory address of the case. When the process accesses the memory pointed to by the 0x12345 address, the operating system translates the address into a physical address, such as a process for the 0x23456,b process as 0x34567, which does not affect each other. So when does this physical address come in? For process private memory (this is the case in most cases), it is the process that is required to allocate memory to the operating system request. When a process allocates memory to the operating system request, the operating system assigns the free physical memory to the process in page, while generating a virtual thread address for the process, establishing a mapping between the virtual address and the physical memory address, which is returned as a result to the process.

The page table is the data structure used by the operating system to maintain the virtual address of the process and the corresponding relationship of the physical memory. is a relatively simple case of the page Table:

The following is a simple description of how the operating system translates between the virtual address of the process and the actual physical address when the page size is 4K under a 32-bit system.

1. The catalog table is the data structure used for index page tables, each of which occupies 32 bits, or 4 bytes, to store the location of a page table. The catalog table occupies just 1 pages of memory, or 4KB, and can store 1024 directory entries, which is where 1024 page tables can be stored.

2. The Page Table entry (page Entry) is 4 bytes in size and stores a physical memory page start address. Each page table also occupies 4K of memory and can store 1024 physical memory page start addresses. Since the physical memory page start address is aligned in 4KB, the 32-bit only requires 20 bits to represent the address, and the other 12 bits are used for other purposes, such as whether the 1 memory pages are read-only or writable, and so on.

3.1024 page table, each page table 1024 physical memory page start address, total is 1M address, each address point to the physical memory page size is 4KB, the total is 4GB.

4. When the virtual address is mapped to a physical address by the operating system and hardware, the 31-22 of the virtual address is used to index one of the 1024 page tables from the catalog entry; the 12-21 bits of the virtual address are used to index one of the 10 page table entries from the page table. Get the starting address of the physical memory page from this index to the page table entry, and then use the 0-11 12 bits of the virtual address as offsets in the 4KB memory page. Then the physical memory page start address plus the offset is the address of the physical memory that the process needs to access.

Take a look at the table of contents and the page table. The 2 types of data structures that occupy a lot of space. Catalog table fixed only 4KB. and the page table? Because there are up to 1024 page tables, each page table consumes 4KB, so the page table consumes up to 4MB of memory. In fact, the process in 32-bit Linux is usually not that large page table. It is impossible for a process to run out of all 4GB-size address space, or even have a 1GB virtual address space assigned to the kernel. At the same time, Linux does not create such a large page table for a process at once, and only when the process allocates and accesses memory does the operating system establish a mapping of the corresponding address for the process.

This describes only the paging mappings in the simplest case. There are actually four levels in the page table directory along with the page table. PAE or 64-bit systems are enabled at the same time under 32 bits, and their page table structure is more complex than the above. However, the structure of the last level, the page table, is consistent. In a 64-bit system, page table entries in page table are changed from 32 bits to 64 bits in size compared to 32-bit. So how much does this affect? If a process, access to the physical memory 1GB, that is, 262,144 memory pages, in 32-bit systems, the page table needs to 262144*4/1024/1024=1MB, and under the 64-bit system, the page table occupies 1 time times the space, that is 2MB. And then look at the Oracle database running on the Linux system. In this case, the SGA size of the database is 12GB, and if a oracleprocess accesses all of the SGA memory, the page table size will be 24MB, which is an astonishing number. This ignores the PGA because the average PGA per process is not more than 2M, which is too small compared to the SGA. From the AWR report, there are 300 or so sessions, so the 300 connected page tables will reach 7200MB, except that not every process accesses all the memory in the SGA. And from Meminfo to see the page tables size reached 4637MB, so large page table space, it is 300 sessions, the SGA size reached 12GB results.

There is obviously not only page table, the only memory management data structure in the system, but also some other data structures for managing memory. These too large memory management structures will undoubtedly greatly increase the burden on the operating system kernel and the CPU consumption. Changes in memory demand, such as multiple processes requesting large amounts of memory at the same time, can cause the CPU to peak in a short period of time, causing problems if there is a change in load or other cause.

Use large memory pages to solve problems

Although there is no real evidence, and there is not enough time to gather enough evidence to prove that the page table is too large to cause the problem, it will require more than half an hour of system unavailability failures. But for now, this is the biggest suspicious point. Therefore, it is decided to use large memory pages to tune the memory usage of the system first.

Large memory pages are collectively referred to as large page in the lower version of Linux, while the current mainstream Linux version is huge page. The following is an example of huge page to illustrate the advantages of huge page and how to use it.

What are the benefits of using large memory pages:

1. Reduce the page table size. Each huge page, corresponding to the continuous 2MB of physical memory, so 12GB of physical memory only need 48KB Page Table, compared to the original 24MB a lot less.

2. Huge page memory can only be locked in physical memory and cannot be swapped to the swap area. This avoids the performance impact caused by the exchange.

3. The decrease in the number of page tables makes the TLB in the CPU, which is understood to be CPU-to-page cache, significantly increased.

4. Page tables for huge pages can be shared between processes and reduce the size of the page table. In fact, this can reflect the shortcomings of Linux in the paging processing mechanism. Other operating systems, such as AIX, share the same page table with the shared memory segment, avoiding the Linux problem. Like the author maintains a set of systems, the number of connections is usually more than 5000, the case of the SGA in about 60GB, if you press the Linux paging process, most of the system memory will be used by the page table.

So, how do I enable large memory pages for Oracle (Huge page)? Here are the steps to implement. Since the database involved in the case has been adjusted for a period of time to 18G, here is an example of 18G:

1. Check the/proc/meminfo to confirm that the system supports Hugepage:

Hugepages total represents the number of large memory page pages configured in the system. Hugepages free represents the number of large memory pages that have not been accessed, and this is an easy misunderstanding, as explained later. Hugepages RSVD represents the number of pages that have been allocated but not yet used. Hugepagesize represents large memory page size, here is 2MB, note that in some kernel configuration may be 4MB.

For example Hugepages total 11gb,sga_max_size for 10gb,sga_target to 8GB. Then after the database starts, it allocates hugepage memory according to Sga_max_size, here is 10GB, the real free hugepage memory is 11-10=1g. But Sga_target only 8GB, then there will be 2GB will not be accessed, then hugepage_free to 2+1=3GB,HUGEPAGE_RSVD memory 2GB. The only 1GB that can actually be used for other instances is only 1GB in the real sense of free.

1. Plan the number of memory pages you want to set up. So far, large memory pages can only be used for a small amount of memory, such as shared memory segments. Once the physical memory is used as a large memory page, these physical memory cannot be used for other purposes, such as private memory as a process. Therefore, you cannot set too much memory to a large memory page. We typically use large memory pages as the SGA for an Oracle database, so the number of large memory pages:

Hugepages_total=ceil (sga_max_size/hugepagesize) +n

For example, the sga_max_size set for a database is 18GB, so the number of pages can be ceil (18*1024/2) +2=9218. Adding n here is the need to set the hugepage memory space slightly larger than the sga_max_size, typically 1-2. We see the size of the shared memory segment through the IPCS-M command, and we can see that the size of the shared memory segment is actually approximately larger than the sga_max_size. If you have more than one Oracle instance on the server, you need to consider the portion of the shared memory segment that is larger for each instance, that is, the greater the N value. In addition, the Oracle database either uses large memory pages or does not use large memory pages at all, so inappropriate hugepages_total will result in a waste of memory.

In addition to using sga_max_size calculations, a more accurate hugepages_total can be calculated by ipcs-m the shared memory segment size obtained.

Hugepages_total=sum (Ceil (share_segment_size/hugepagesize))

2. Modify the/etc/sysctl.conf file to add the following line:

vm.nr_hugepages=9218

Then execute the sysctl–p command to make the configuration effective.

Here vm.nr_hugepages This parameter value is the number of large memory pages calculated for the 2nd step. Then check the/proc/meminfo, if the hugepages_total is less than the set number, then it indicates that there is not enough contiguous physical memory for these large memory pages and the server needs to be restarted.

3. Add the following line to the/etc/security/limits.conf file:

Oracle Soft Memlock 18878464

Oracle Hard Memlock 18878464

This sets the size of the memory that Oracle users can lock, in kilobytes.

Then re-connect to the database server with the Oracle user, using the ulimit-a command, you can see:

Max Lockedmemory (Kbytes, L) 18878464

It is also possible to configure Memlock as unlimited here.

4. If the database uses manual to manage the SGA, you need to change to Auto mode and set Sga_target_size to a value greater than 0. For 11g, since hugepage can only be used for shared memory and cannot be used for PGA, AMM cannot be set, that is, the Memory_target is greater than 0, and only the SGA and PGA,SGA can be managed by auto mode respectively.

5. Finally start the database and check/proc/meminfo to see if the hugepages_free has been reduced. If it has been reduced, it indicates that it has been used to hugepage Memory. However, when viewing the/proc/meminfo on the failed database server, it was found that there was no hugepage related information, sysctl-a view all system parameters and did not find vm.nr_hugepages this parameter. This is because the Linux kernel is not compiled into the hugepage feature. We need to use a different kernel to enable hugepage.

View/boot/grub/grub.conf:

found that the kernel used by this system with the word "Xen", we modify the file, change the default=0 to default=2, or the previous 2 kernels with the # number to screen off, and then restart the database server, found that the new kernel has supported hugepage.

After the database has enabled large memory pages, the performance issues described in this article are not even in the case of an enlarged SGA. Observing/proc/meminfo data, Pagetables occupies less than 120M of memory, reducing 4500MB compared to the original. It is observed that the utilization of CPU is also lower than the use of hugepages, and the system operation is fairly stable, at least there is no bug caused by the use of hugepage.

Testing has shown that for OLTP systems, enabling Hugepage on Linux running Oracle database can improve database processing and response times to a maximum of 10% or more.

Summarize

This paper introduces the function of the large memory page in the Linux operating system and how to set the corresponding parameters to enable the large memory pages in a case. At the end of this article, I recommend that when you run an Oracle database in a Linux operating system, you enable large memory pages to avoid performance issues that are encountered in this case or to further improve system performance. It can be said that Hugepage is a small number of features that do not require additional cost to improve performance. It is also gratifying to note that the new version of the Linux kernel provides transparent Huge pages so that applications running on Linux can use large memory pages more broadly and easily, rather than just shared memory to use large memory pages. Let us wait and see the changes caused by this feature.

--the End

About Me

......................................................................................................................... ......................................................................................................................... ........................................................

This article is from the public reprint article, if there is infringement, please contact wheat seedlings timely Delete, thank the original author's selfless dedication

This article has synchronized updates on itpub (http://blog.itpub.net/26736162), Blog Park (http://www.cnblogs.com/lhrbest) and personal public number (XIAOMAIMIAOLHR)

QQ Group: 230161599 Group: Private Chat

Original address: http://mp.weixin.qq.com/s?__biz=MjM5MDAxOTk2MQ==&mid=2650271630&idx=1&sn= cff27dce68652932016c43c9ba477832&chksm= Be487798893ffe8e7ecbf93bda6f2d180d8425d0e12a3091d339cd890e749de2aa51d9f96e65&scene=1&srcid= 0912gud6gpoi9usucqczrxow#rd

Other information shared by wheat seedlings: http://blog.itpub.net/26736162/viewspace-1624453/

Wheat seedling Cloud disk address: http://blog.itpub.net/26736162/viewspace-1624453/

QQ Group: 230161599 Group: Private Chat

Contact me please add QQ friends (642808185), indicate the reason for adding

The article is derived from the notes of wheat seedling finishing, if there is infringement or improper place also please understand!

Mobile Phone long Press identify QR code or client scan the QR code below to pay attention to the wheat seedling public number: XIAOMAIMIAOLHR, Free learn the most practical database technology.

"Cloud and Ink" performance optimization: A reasonable allocation of large memory pages (hugepage) in a Linux environment

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Cloud and Ink" performance optimization: A reasonable allocation of large memory pages (hugepage) in a Linux environment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Cloud and Ink" performance optimization: A reasonable allocation of large memory pages (hugepage) in a Linux environment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support