Linux Memory architecture for Linux Performance and Tuning Guide (translation)

Last Update:2015-07-03 Source: Internet

Author: User

Tags extend

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This document is a translation of section 1.2 of the IBM Redbook Linux Performanceand Tuning guidelines.
Original address: Http://www.redbooks.ibm.com/redpapers/pdfs/redp4285.pdf
Original Eduardo Ciliendo, Takechika Kunimasa, Byron Braswell

The translation is as follows:

1.2 Linux Memory Architecture

In order to execute a process, the Linux kernel allocates a portion of the memory area to the requested process. The process uses the memory region as its workspace and performs the requested work. It works like a desk with your application and then uses a desk to place paper, documents and memos to perform your work. The difference is that the kernel must use a more dynamic way to allocate memory space. Sometimes the number of processes running is tens of thousands of, but the amount of memory is limited. Therefore, the Linux kernel must handle memory efficiently. In this section, we will talk about the memory structure of Linux, address distribution, and how Linux effectively manages memory space.

1.2.1 Physical and virtual memory

Today we have to face the problem of choosing 32-bit and 64-bit systems. One of the most important differences for an enterprise customer is whether the address of the virtual memory can exceed 4GB. From a performance perspective, it is important to understand how the Linux kernel in 32-bit and 64-bit systems maps physical memory to virtual cores.

From Figure 1-10, you can see the obvious difference in how the Linux kernel handles 32-bit and 64-bit system memory. The mapping details of memory to virtual memory are beyond the scope of this article, so this article focuses on some of the details of the Linux memory architecture.

On a 32-bit architecture, such as the Ia-32,linux kernel can only directly access the first 1GB of physical memory (when considering a partial reservation is 896MB). The memory above the so-called zone_normal must be mapped to more than 1GB of memory. The mapping is completely transparent to the application, but requesting memory pages in ZONE_HIGHMEM results in a slight decrease in performance.

On the other hand, on a 64-bit architecture, such as x86-64 (also known as x64), Zone_highmem can extend to 64GB, or it can extend to 128GB on IA-64 systems. As you can see, the mapping overhead of memory pages from Zone_highmem to Zone_normal is eliminated through a 64-bit architecture.

Figure 1-10 Linux kernel memory layout for 32-bit and 64-bit systems

Virtual memory address Layout
Figure 1-11 shows the Linux virtual address layout for 32-bit and 64-bit architectures.

On a 32-bit architecture, the maximum address space a process can access is 4GB. This is a limitation of 32-bit virtual addresses. In the standard implementation, the virtual address space is divided into 3GB user space and 1GB of kernel space. This is a bit similar to the variant of the 4G/4G addressing layout implementation.

On the other hand, in 64-bit architectures, such as x86-64 and IA64, there is no such limit. Each individual process can benefit from a vast and huge address space.

Figure 1-11 Virtual memory address layout for 32-bit and 64-bit architectures

1.2.2 Virtual Memory Management
The physical memory architecture of the operating system is usually not visible to applications and users because the operating system maps any physical memory to virtual memory. If we want to understand the possibility of tuning in the Linux operating system, we have to understand how Linux handles virtual memory. As described in "Physical memory and virtual memory" in 1.2.1, applications cannot request physical memory, but when a memory map of a certain size is requested from the Linux kernel, a mapping of virtual memory is obtained. As shown in 1-12, virtual memory is not necessarily mapped to physical memory. If your app requests a lot of memory, some of these in-memory may be mapped to a swap file on the disk.

Figure 1-12 shows that the application usually writes directly to the disk instead of directly, but rather to the direct write cache or buffer (buffer). When the pdflush kernel thread is idle or the file size exceeds the cache buffer size, the Pdfflush kernel thread empties the cached/buffered data and writes it to disk. See "Emptying dirty buffers".

Figure 1-12 Linux virtual memory management

The Linux kernel handles the write operation of the physical disk in close connection with the Linux management disk cache. Other operating systems allocate only a portion of memory as a disk cache, while Linux handles memory resources more efficiently. The default virtual memory management configuration allocates all available free memory as the disk cache. Therefore, in Linux systems with large amounts of memory, only 20MB of free memory is often seen.

In the same situation, the Linux management swap space is also very efficient. Swap space is not meant to be a memory bottleneck, it proves exactly how the Linux management system resources work. See "Page Frame Recycling" For more details.

Allocation of page Frames
A page is a contiguous set of linear physical memory (page frames) or virtual memory. The Linux kernel manages memory in pages. The size of a page is usually 4K bytes. When a process requests a certain number of pages, if the available pages are sufficient, the Linux kernel is immediately assigned to the process. Otherwise, the memory pages must be fetched from some other process or memory page cache. Linux memory knows the number and location of available memory pages.

Partner system
The Linux kernel manages the free pages through a mechanism known as a partner system . The partner system manages the free pages and tries to allocate pages for allocation requests. It tries its best to keep the memory area contiguous. If you do not consider scattered small pages, it will cause memory fragmentation and make it difficult to request a large segment of pages within a contiguous region. It will result in inefficient memory usage and performance degradation.

Figure 1-13 illustrates how the partner system allocates pages.

Figure 1-13 Partner System

Page recycling is activated when an attempt to allocate a page fails. See "Page frame recycling."

You can find information about the partner system through/proc/buddyinfo. See "Memory used in a zone".

Page Frame Recycling
When a process requests a certain number of page mappings, if the page is not available, the new request for the Linux kernel tries to allocate memory to the process by freeing some pages (pages that were previously used but are no longer used but are still marked as active based on some principles). This process is called polygon frame recycling. KSWAPD Kernel threads and try_to_free_page () kernel functions are used to take care of page recycling.

KSWAPD threads are typically in an interruptible sleep state, and KSWAPD threads are called by the partner system when free pages in a region are below a threshold. It tries to find the candidate pages from the active page based on least recently used algorithms. The least recently used pages will be released first. Activity lists and inactive lists are used to maintain candidate pages. KSWAPD scans the list of activities and checks the usage of pages, placing pages that have not been used recently in the inactive list. You can use the VMSTAT-A command to see which memory is active and which memory is inactive.

KSWAPD also follows other principles. Pages are used primarily for two purposes: the page cache and the process address space. Page caching is a page map to a disk file. A page that belongs to a process address space (known as anonymous memory because it is not mapped to any file or name) is used for heaps and stacks. See 1.1.8, "Process memory segments". When KSWAPD recycles a page, it compresses the page cache as much as possible rather than the page of the process (or swap out).

page out and swapout: Both "page out" and "swap out" are often confused. "Page out" refers to placing the page (part of the entire address space) in the swap area, while "swap out" refers to placing the entire address space in the swap area. But they can sometimes be exchanged for use.

The recycling of most of the page caches that are recycled and the process address space depends on their usage scenarios and will have an impact on performance. You can control this behavior by using/proc/sys/vm/swappiness.

Swap (Swap area)
As mentioned earlier, when page recycling occurs, the candidate pages that are part of the process's address space in the inactive list will be page out. The exchange itself does not mean what is happening. While in other systems, swap is only a guarantee of excessive allocation of major memory, Linux uses swap space more efficiently. As shown in 1-12, virtual memory is composed of physical memory and disk or swap partitions. In the implementation of virtual memory management of Linux, if a memory page has been allocated, but has not been used for a period of time, Linux will move the memory page into swap space.

You can often see daemons such as Getty, which are usually started when the system is started, but are rarely used. It seems more efficient to free up the precious main memory of the page and move it to the swap area. This is how Linux manages swap, so you don't need to panic when you find that the swap area is already using 50%. The fact that swap space is being used does not imply a memory bottleneck; instead, it proves how efficiently Linux manages system resources.

Linux Memory architecture for Linux Performance and Tuning Guide (translation)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More