Implementation of CpuMemSets in Linux

Source: Internet
Author: User
The implementation of CpuMemSets in the Linux operating system-Linux Enterprise Application-Linux server application information. The following is a detailed description. I. Preface

The Non-Uniform Memory Access structure is the main branch of the Distributed Shared Memory (Distributed Shared Memory) architecture. It combines the Distributed Memory technology with a single system image (SSI) technology, achieving a compromise between SMP system programming and MPP system scalability, has become one of the mainstream architecture of today's high-performance servers. At present, famous foreign server manufacturers have successively launched high-performance Servers Based on the NUMA architecture, such as HP Superdome, SGI's Altix 3000, Origin 3000, IBM's x440, NEC's TX7, and AMD's Opteron.

As the high-performance servers of the NUMA architecture are gradually promoted, system software has made a lot of optimization work in terms of scheduler, storage management, and user-level interfaces in view of the features of this distributed shared memory architecture. For example, the SGI Origin 3000 ccNUMA system has been widely used in many fields and is a very successful system. To optimize the performance of Origin 3000, SGI's IRIX operating system implements CpuMemSets on it. By binding the application to the processor and memory, it gives full play to the advantages of NUMA's local memory access. The Linux community has also implemented CpuMemSets in its NUMA project and has been applied in SGI's Altix 3000 server.

In this paper, we take the SGI ProPack v2.2 as the research object and analyze the specific implementation of CpuMemSets in Linux-2.4.20. CpuMemSets is an open source code project of SGI. It consists of four parts: patches for the Linux2.4 kernel, user libraries, python modules, and runon commands, to implement partitions of processors and memory blocks, control the distribution of system resources (processor and memory block) to kernels, tasks, and virtual storage areas, provides support for dplace, RunOn, and other NUMA tools to optimize NUMA performance in Linux.

   Ii. Related Work

Partition Technology (Partition) was first deployed on the MainFrame and is now widely used in the server field, it supports running multiple instances of an operating system or multiple instances of multiple operating systems on a single server. The main features are machine independence, reliable barrier, and single point management. With the support of the partitioning technology, multiple operating systems running on multiple servers can run simultaneously on one server in the same location, it is better to distribute multiple servers in an organization to support different operating systems, thus effectively implementing server integration. Servers that support partition technology can be used as application servers and run Windows platforms for marketing departments. At the same time, they can run Linux systems for engineering departments. You can also test other operating systems for the Development Group in another partition while most users are running the Windows 2000 Advanced Server System, or all nodes are applied in one operating system environment. The main difference of various partitioning implementation technologies is that partition fault isolation methods (hardware or software) partition resource granularity, flexibility of partition resources, virtual partition resources, and support for dynamic partition restructuring. Typical include ibm lpar and DLAPAR (AIX 5L 5.1), HP nPartitions and vPartitions (HP-UX 11i), SUN's Dynamic Domains (Solaris 8) and Compaq Alpha Servers (Tru64 Unix 5.1 ). However, the Partitioning technology used by the NUMA system is in conflict with the single-system image advantage of the NUMA system.

From the user's perspective, the NUMA system provides transparency for local and remote master memory access. However, from the performance perspective, because the storage modules are physically distributed on different nodes, the storage access latency is inconsistent, which also has a great impact on the system performance. In such systems, the access latency of a node to remote node storage is generally one to two orders of magnitude higher than the local access latency. Page migration and page replication are one of the main methods to dynamically optimize data locality. The essence is a prediction technology that predicts future access to the page based on the collected information, and then makes a decision to migrate or copy the page. Using appropriate page replication and page migration policies can reduce cache capacity and conflict failures, balance the inconsistency between remote and local access delays, and optimize NUMA system performance. However, most of the existing page migration and page replication policies rely heavily on the architecture and special hardware support, resulting in high overhead and poor universality.

In a NUMA-structured multi-processor system, a task can run on any processor. However, in various situations, the execution of a task is interrupted. When the interrupted task is resumed, if you choose to resume execution on another processor, it will cause it to lose the original processor cache data. We know that it takes only a few nanoseconds to access the cache data, and it takes about 50 nanoseconds to access the primary storage. At this time, the processor runs at the access level to the master memory until the task runs for enough time, and the data required for running the task is refilled with the cache of the processor. To solve this problem, the system can use the processor to dispatch tasks on each node in close proximity to the scheduling policy: the system records the processor that finally executes the task and maintains this relationship, when resuming an interrupted task, try to resume the task execution on the processor that finally executes the task. However, because applications have different characteristics and the working set has dynamic attributes, the function of close-to-scheduling of processors is limited.

Users are system users and performance reviewers. They are the most clear about the system requirements and Evaluation Indicators of applications. In a large NUMA system, users often want to control a portion of the processor and memory for some special applications. CpuMemSets allow users to have more flexible control (it can overlap and divide the processor and memory of the system), allow multiple processes to regard the system as a single system image, and do not need to restart the system, ensure that some processor and memory resources are allocated to the specified application at different times. This is also a useful supplement to partition technology, page migration, and close scheduling policies.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.