KSM profiling memory decoupling in the--linux kernel

Source: Internet
Author: User
Tags virtual environment kvm hypervisor

Summary: As a hypervisor (hypervisor), Linux® has several innovations, and an interesting change in the 2.6.32 kernel is that KSM (Kernel samepage merging) allows this hypervisor to increase the number of concurrent virtual machines by merging memory pages. This article explores the concepts behind KSM (e.g., storage-decoupling), the implementation of KSM, and how to manage KSM.

Server virtualization

Virtualization technology has emerged from the 60 's, and has been popular with ibm®system/360® mainframe machines. 50 years later, virtualization technology has evolved to make it possible to share a single server across multiple operating systems and applications. This special purpose, known as Server virtualization, is evolving into a datacenter because a single physical machine can be used to host 10 (typically) or more virtual machines (VMS), as shown in 1. This virtualization makes the infrastructure more dynamic, more power-saving, and, therefore, more economical.

Figure 1. Server Consolidation through Virtualization

the pages are all the same. This is useful if the operating system and application code and the constant data are the same between VMs. When the page is temporary, they can be merged, freeing up memory for use by other applications. Figure 2 illustrates memory sharing and shows the benefits of more free memory when sharing pages between VMs with the same content.

Figure 2. Memory sharing across VMs

Attribute naming
The characteristics described in this article are very new; therefore, its name has undergone some changes. You will find this Linux kernel feature known as Kernel Shared Memory or Kernel samepage merging.

You will soon find that although memory sharing in Linux has an advantage in a virtual environment (KSM was originally designed for kernel-based virtual machines), it is still useful in non-virtualized environments. In fact, KSM is useful even in embedded Linux systems, demonstrating the flexibility of this approach. Below, we will explore this method of Linux memory sharing and how to use this method to increase the memory density of the server, thereby increasing its ability to host other applications or VMs.

Other technical support

One of the latest advances in storage technology called de-coupling (de-duplication) is the pioneer of memory sharing in Linux and other hypervisor systems. Decoupling this technique reduces stored data by removing redundant data (based on blocks of data, or based on larger pieces of data, such as files). Public data fragments are merged (in a copy-on-write [CoW] manner), freeing up space for other purposes. With this approach, storage costs are lower, and ultimately less storage is required. Given the current rate of data growth, this feature is very important.

KSM operation

KSM exists as a daemon (known as KSMD) in the kernel, and it periodically performs page scans, identifies replica pages, and merges replicas to release the pages for use. The process of KSM performing the above actions is transparent to the user. For example, the Copy page is merged (and then marked as read-only), but if one of the users of the page changes the page for some reason, the user will receive his or her copy (in CoW mode). The full implementation of the KSM kernel module can be found in the kernel source code./MM/KSM.C.

KSM relies on advanced applications to provide guidance to determine the areas of candidate memory that are merged. Although KSM can scan only anonymous pages in the system, this wastes CPU and memory resources (taking into account the space required to manage the page merge process). Therefore, an application can register a virtual region that may contain a copy page.

The KSM application Programming Interface (API) is implemented by Madvise system calls (see Listing 1) and a new recommended parameter (advice parameter) Madv_mergeable (indicating that defined zones can be merged). You can delete a region from a merged state by madv_unmergeable the parameters (immediately canceling merging any merged pages from a region). Note that deleting a page area via madvise may result in a eagain error because the operation may run out of memory during the merge process, which can cause more trouble (out of memory).

Listing 1. Madvise system Call

#include <sys/mman.h>

int madvise (void *start, size_t length, int advice);

Once a zone is defined as "can be merged," KSM adds the area to its list of working memory. When KSM is enabled, it searches for the same page, preserves one page in write-protected CoW mode, and frees up another page for it to use.

The methods used by KSM are different from those used in memory decoupling. In traditional decoupling, the object is hashed and then the initial similarity check is performed using the hash value. When the hash value is consistent, the next step is to make a real object comparison (in this case a memory comparison) to formally determine whether the objects are consistent. KSM used this approach in its first implementation, but later developed a more intuitive way to simplify it.

In the current KSM, the page is managed by two "red-black" trees, where a "red-black" tree is temporary. The first tree, called the unstable tree, is used to store new pages that are not yet understood as stable. In other words, pages that are candidates for merging (which have not changed over time) are stored in this unstable tree. Pages in the unstable tree are not write-protected. The second tree, called the Stable tree, stores pages that have been found to be stable and merged through KSM. To determine whether a page is a stable page, KSM uses a simple 32-bit checksum (checksum). When a page is scanned, its checksum is computed and stored with the page. In a subsequent scan, if the newly computed checksum is not equal to the checksum previously computed, the page is changing and therefore is not a qualified merge candidate.

When working with a single page using the KSM process, the first step is to check whether the page can be found in the stable tree. The process of searching a stable tree is interesting because each page is treated as a very large number (the content of the page). A memcmp (memory Compare) operation will be performed on the page and on the page of the associated node. If MEMCMP returns 0, the page is the same, and a matching value is found. Conversely, if memcmp returns-1, the candidate page is smaller than the current node, and if 1 is returned, the page with the candidate page larger than the current node is represented. Although comparing 4KB pages seems to be a fairly heavyweight comparison, in most cases, once a discrepancy is found, the memcmp will end prematurely. See Figure 3 for a visual representation of this process.

Figure 3. Search process for pages in a tree

If the candidate page is in the stable tree, the page is merged and the candidate page is freed. The code is in Ksm.c/stable_tree_search () (called Ksm.c/cmp_and_merge_page ()). Conversely, if no candidate pages are found, you should go to the unstable tree (see Ksm.c/unstable_tree_search ()).

When searching in an unstable tree, the first step is to recalculate the checksum on the page. If the value differs from the original checksum, subsequent searches of this scan will discard the page (because it has changed and is not worth tracking). If the checksum is not changed, the unstable tree is searched for the candidate pages. There are some differences between the processing of unstable trees and the processing of stable trees. First, if the search code does not find the page in the unstable tree, add a new node to the page in the unstable tree. However, if a page is found in the unstable tree, the page is merged, and the node is migrated to the stable tree.

When the scan is complete (through Ksm.c/ksm_do_scan () execution), the stable tree is saved, but the unstable tree is deleted and rebuilt on the next scan. This process greatly simplifies the work, because the organization of the unstable tree can vary depending on the page (remember that the pages in the unstable tree are not write-protected?). )。 Because all the pages in the stability tree are write-protected, a page failure is generated when one of the pages tries to write, allowing the CoW process to cancel the page merge for the writer (see Ksm.c/break_cow ()). Orphaned pages in the stability tree will be deleted later (unless two or more users of the page exist, indicating that the page is still being shared).

As mentioned earlier, KSM uses a "red-black" tree to manage pages to support quick queries. In fact, Linux contains a number of "red-black" trees as a reusable data structure that can be used extensively. The red-black tree can also be used by Completely Fair Scheduler (CFS) to store tasks in chronological order. You can find this implementation of the "red-black" tree in the./lib/rbtree.c.

KSM Configuration and Monitoring

KSM's management and monitoring is performed through SYSFS (located at the root/sys/kernel/mm/ksm). In this SYSFS subdirectory, you will find some files, some for control, and others for monitoring.

The first file run is used to enable and disable KSM's page merge. KSM is disabled by default (0), but you can enable the KSM daemon (for example, echo 1 > Sys/kernel/mm/ksm/run) by writing a 1 to this file. By writing a 0, you can disable the daemon from the running state (but keep the current collection of merged pages). Additionally, by writing to a 2, you can stop KSM from the running State (1) and request that all merged pages be canceled.

When KSM runs, it can be controlled by 3 parameters (files in Sysfs). The Sleep_millisecs file defines the number of milliseconds that ksmd sleeps before performing another page scan. The Max_kernel_pages file defines the maximum number of pages that the KSMD can use (the default is 25% of available memory, but one can be written to specify Infinity). Finally, the Pages_to_scan file defines the number of pages that can be scanned in a given scan. Any user can view these files, but users must have root permissions to modify them.

There are also 5 monitored files (all read-only) that are exported via SYSFS, which indicate the operation and effectiveness of the KSMD. The Full_scans file indicates the number of full-zone scans that have been performed. The remaining 4 files indicate KSM's page-level stats:

PAGES_SHARED:KSM The number of non-exchangeable kernel pages being used.
pages_sharing: A memory storage indication.
pages_unshared: The number of unique pages that are repeatedly checked for consolidation.
Pages_volatile: The number of pages that change frequently.
KSM author definition: A higher pages_sharing/pages_shared ratio indicates efficient page sharing (which in turn indicates a waste of resources).

Conclusion

Linux is not the only hypervisor that uses page sharing to improve memory efficiency, but it is unique in that it is implemented as an operating system feature. VMware's ESX server hypervisor names this feature Transparent Page sharing (TPS), which XEN calls Memory CoW. Regardless of the name and implementation, this feature provides better memory utilization, allowing the operating system (KVM hypervisor) to overuse memory and support more applications or VMS. You can find ksm-and many other interesting features in the latest 2.6.32 Linux kernel.

Http://tech.ddvip.com/2010-05/1273717017153364_2.html

Http://www.linux-kvm.com/content/using-ksm-kernel-samepage-merging-kvm

Http://www.linux-kvm.org/page/KSM

KSM profiling memory decoupling in the--linux kernel

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.