Reproduced cgroups introduction of Linux resource management

Source: Internet
Author: User

Original: http://tech.meituan.com/cgroups.html

Introduction

Cgroups is a mechanism provided by the Linux kernel that can limit the resources used by a single process or multiple processes, and can provide granular control over resources such as CPU, memory, and so on, and the increasingly lightweight container Docker uses the resource-throttling capabilities provided by Cgroups to complete the CPU. Memory and other parts of the resource control.

In addition, developers can use the granular control provided by cgroups to limit the use of resources for one or a group of processes. For example, on a eight-core server that deploys both front-end Web services and back-end compute modules, you can use Cgroups to restrict Web server to use only six of these cores, leaving the remaining two cores to the backend compute modules.

This article describes the principle and usage of cgroups from the following four aspects:

    1. The concept and principle of cgroups
    2. Cgroups file system Concepts and principles
    3. Cgroups Introduction to How to use
    4. Examples of cgroups in practice
Concept and Principle cgroups subsystem

The full name of Cgroups is control groups,cgroups, which defines a subsystem for each of the resources that can be controlled. The typical subsystems are described below:

    1. CPU subsystem, which mainly restricts the CPU usage of the process.
    2. CPUACCT subsystem, you can count the CPU usage reports of the processes in the cgroups.
    3. Cpuset subsystem that can allocate separate CPU nodes or memory nodes for processes in the cgroups.
    4. Memory subsystem, which can limit the memory usage of the process.
    5. The Blkio subsystem can limit the block device IO of a process.
    6. The devices subsystem allows you to control which processes can access certain devices.
    7. The NET_CLS subsystem, which can mark the network packets of a process in cgroups, can then control the packet using the TC module (traffic control).
    8. Freezer subsystem that can suspend or resume processes in the cgroups.
    9. The NS subsystem enables different cgroups processes to use different namespace.

Each subsystem in this need to work with the other modules of the kernel to complete the control of the resources, such as the CPU resources are limited by the process scheduling module according to the configuration of the CPU subsystem, the memory resources are limited by the memory module according to the configuration of the storage subsystem to complete, The control subsystem of the network data packet needs to be traffic to complete. This article does not discuss how the kernel uses each subsystem to implement resource constraints, but rather focuses on how the kernel can effectively organize the configuration of cgroups to restrict resources, how the kernel associates cgroups configurations with processes, and how the kernel is passed cgroups The file system exposes the function of the cgroups to the user state.

Cgroups hierarchy (Hierarchy)

The kernel uses the Cgroup struct to represent a control group's resource constraints on one or several cgroups subsystems. Cgroup structures can be organized into a tree form, and each tree composed of cgroup structures is called a cgroups hierarchy. The cgroups hierarchy can attach one or several cgroups subsystems, and the current hierarchy can limit the resources of its attach cgroups subsystem. Each cgroups subsystem can only be attach into a single CPU hierarchy.

For example, two cgroups hierarchy, each hierarchical structure is a tree structure, each node of the tree is a cgroup structure (such as CPU_CGRP, MEMORY_CGRP). The first cgroups hierarchy attach the CPU subsystem and the CPUACCT subsystem, and the cgroup structure in the current cgroups hierarchy can limit the CPU's resources and count the CPU usage of the process. The second cgroups hierarchy attach the memory subsystem, and the cgroup structure in the current cgroups hierarchy can limit memory resources.

In each cgroups hierarchy, each node (cgroup struct) can set a limit weight that is different for the resource. For example, processes in the CGRP1 group can use 60% of the CPU time slices, while processes in the CGRP2 group can use 20% of the CPU time slices.

Cgroups and process

The above section mentions that the kernel uses the cgroups subsystem to limit the resources of the system, and also mentions that the cgroups subsystem needs to be attach to the Cgroups hierarchy to control the process resources. This section focuses on how the kernel links processes to the Cgroups hierarchy.

After you have created a node in the cgroups hierarchy (cgroup struct), you can add the process to the control task List of one node, and all processes in the control list of one node are constrained by the resources of the current node. At the same time, a process can be added to the nodes of different cgroups hierarchies, because different cgroups hierarchies can be responsible for different system resources. So the process and cgroup structure is a many-to-many relationship.

The above diagram depicts the relationship between the process and the cgroups from the overall structure. The bottom P represents a process. There is a pointer in the descriptor for each process pointing to a secondary data structure css_set (cgroups subsystem set). A process that points to one is css_set added to the current list of css_set process links. A process can only be subordinate to one css_set , one css_set may contain multiple processes, and the process subordinate to the same is css_set subject to css_set the same resource limit.

The "MXN Linkage" in the description is a css_set many-to-many association with the cgroups node through a secondary data structure. However, the implementation of Cgroups does not allow css_set multiple nodes to be associated with the same cgroups hierarchy at the same time. This is because Cgroups does not allow multiple throttling configurations for the same resource.

css_setwhen a node is associated with multiple cgroups hierarchies, it indicates that there is a need to css_set control multiple resources under the current process. When a cgroups node is associated with more than one css_set , it indicates that the css_set list of processes under multiple is under the same limit for the same resource.

Cgroups File System

Linux uses a variety of data structures in the kernel to implement the Cgroups configuration, associated with the process and cgroups nodes, then how does Linux allow the user-state process to use the Cgroups function? The Linux kernel has a very powerful module called VFS (Virtual File System). VFS can hide details of specific file systems and provide a unified filesystem API interface for user-state processes. Cgroups is also through the VFS to expose the function to the user state, the interface between Cgroups and VFS is called the Cgroups file system. The following first introduces the basic knowledge of VFS, and then introduces the implementation of the next Cgroups file system.

Vfs

VFS is a kernel abstraction layer that can hide the implementation details of specific file systems, thus providing a unified API interface for user-state processes. VFS uses a common file system design, the specific file system as long as the implementation of the VFS design interface, you can register to the VFS, so that the kernel can read and write this file system. This is much like the relationship between the abstract class and the subclass in object-oriented design, the abstract class is responsible for the design of the external interface, the subclass is responsible for the concrete implementation. In fact, VFS itself is a set of object-oriented interfaces implemented in C language.

Generic file model

The VFS Universal file model contains the following four kinds of metadata structures:

    1. A Superblock object that holds information about a file system that has already been registered. such as EXT2,EXT3 and other basic disk file system, and the socket file system for reading and writing sockets, as well as the current Cgroups file system for reading and writing cgroups configuration information.

    2. An index node object (Inode objects) that holds information about a specific file. For the general disk File system, the Inode node usually holds information such as the storage block of the file on the hard disk, and for the socket file system, the Inode holds the relevant properties of the socket, and for special file systems such as Cgroups, the inode is stored with Cgroup Node-related property information. One of the more important parts of this is a struct called inode_operations, which defines the specific implementation of creating files, deleting files, etc. in a specific file system.

    3. A file object, a file object that represents a file that is opened in the process, and the file object is stored in the file descriptor of the process. The more important part of this file is a struct called file_operations, which describes the specific file system read and write implementations. When a process invokes a read-write operation on a file descriptor, the method defined in File_operations is actually called. For the ordinary disk file system, File_operations is defined in the ordinary block device read and write operations, for the socket file system, File_operations is defined in the socket corresponding to the SEND/RECV and other operations And for a special file system such as Cgroups, the file_operations is defined as the operation Cgroup structure and other specific implementations.

    4. The Catalog Item object (Dentry object), in each file system, when the kernel finds a file in a path, it generates a directory entry object for each component on the kernel path, and the directory item object is able to find the corresponding Inode object, which is generally cached. This improves the kernel lookup speed.

Implementation of the Cgroups file system

The VFS implementation-based file system must implement these objects as defined by the VFS Universal file model and implement some of the functions defined in these objects. Cgroup file system is no exception, the following is a look at the definition of these objects in cgroups.

First look at the structure of the Cgroups file system type:

static struct file_system_type cgroup_fs_type = {        .name = "cgroup",        .mount = cgroup_mount,        .kill_sb = cgroup_kill_sb,};

Here are two functions that are required to perform the installation and uninstallation of a Cgroup file system, respectively. Each time a cgroups subsystem is installed to a mount point, the Cgroup_mount method is called, which generates a cgroups_root (root of the cgroups hierarchy) and encapsulates it as a super fast object.

Then take a look at the actions defined by the Cgroups Super Block object:

static const struct super_operations cgroup_ops = {        .statfs = simple_statfs,        .drop_inode = generic_delete_inode,        .show_options = cgroup_show_options,        .remount_fs = cgroup_remount,};

There is only part of the implementation of the function, because for a particular file system, the supported operations may only be a subset of the operations defined in Super_operations, for example, for file objects on a block device, it is certainly supported for fseek-like operations to find a location, but for Such an operation is not supported by a special file system such as a socket or cgroups.

It is also simple to look at the special implementation functions defined by the Cgroups file system for the Inode object and the file object:

static const struct inode_operations cgroup_dir_inode_operations = {        .lookup = cgroup_lookup,        .mkdir = cgroup_mkdir,        .rmdir = cgroup_rmdir,        .rename = cgroup_rename,};static const struct file_operations cgroup_file_operations = {        .read = cgroup_file_read,        .write = cgroup_file_write,        .llseek = generic_file_llseek,        .open = cgroup_file_open,        .release = cgroup_file_release,};

This article does not look at what the code implementations of these functions are, but from the code it can be inferred that cgroups maintains the details of the Cgroups hierarchy by implementing the VFS common file system model, which is hidden in these implementation functions of the Cgroups file system.

On the other side, the user's operation of the Cgroups file system is transformed by VFS into the maintenance of the cgroups hierarchy. In this way, the kernel exposes the functionality of cgroups to the process of user-state.

Cgroups How to use cgroups file system mount

Linux, users can use the Mount command to mount the Cgroups file system in the form of: mount -t cgroup -o subsystems name /cgroup/name , where subsystems represents a cgroups subsystem that needs to be mounted,/cgroup/name represents a mount point, as mentioned above, This command also creates a cgroups hierarchy in the kernel.

For example, Mount Cpuset, CPU, CPUACCT, Memory 4 subsystem to/CGROUP/CPU_AND_MEM directory, you can usemount -t cgroup -o remount,cpu,cpuset,memory cpu_and_mem /cgroup/cpu_and_mem

Under CentOS, the yum install libcgroup cgroups subsystem mount point is automatically generated in the/etc/cgconfig.conf file after using the Cgroups module installed:

mount {    cpuset    = /cgroup/cpuset;    cpu    = /cgroup/cpu;    cpuacct    = /cgroup/cpuacct;    memory    = /cgroup/memory;    devices    = /cgroup/devices;    freezer    = /cgroup/freezer;    net_cls    = /cgroup/net_cls;    blkio    = /cgroup/blkio;}

Each of the above configurations is equivalent to the expanded Mount command, for example mount -t cgroup -o cpuset cpuset /cgroup/cpuset . This will automatically mount the subsystems to the appropriate mount points after the system starts.

Child nodes and processes

After mounting a cgroups subsystem to a mount point, you can create a node in the Cgroups hierarchy by creating a folder below the mount point or by using the Cgcreate command method. For example, cgcreate -t sankuai:sankuai -g cpu:test you can create a node named Test under the CPU subsystem by command. The results are as follows:

[[email protected] cpu]# lscgroup.event_control  cgroup.procs  cpu.cfs_period_us  cpu.cfs_quota_us  cpu.rt_period_us      cpu.rt_runtime_us  cpu.shares  cpu.stat  lxc  notify_on_release  release_agent  tasks  test

You can then configure the resources that need to be restricted by writing the required values to different files under test. Each subsystem can be configured under a variety of different configurations, the parameters need to be configured differently, detailed parameter settings need to refer to the Cgroups manual. The Cgset command can also be used to set the parameters of the cgroups subsystem in the format cgset -r parameter=value path_to_cgroup .

When you need to delete a cgroups node, you can use the Cgdelete command, such as to delete the test node above, you can use the cgdelete -r cpu:test command to delete

There are several ways to add a process to the Cgroups child node, and you can write the PID directly to the task file under the child node. You can also add a process through cgclassify, cgclassify -g subsystems:path_to_cgroup pidlist in the format, or you can use Cgexec to start a process under a cgroups in the format gexec -g subsystems:path_to_cgroup command arguments .

Examples in practice

It is believed that most people have not read the source code of Docker, but through this article, we can estimate that Docker can create more complex cgroups nodes and configuration files when it realizes the resource isolation and control between different Container. Then for processes in the same Container, these process PID can be added to the same set of cgroups child nodes that have reached the same resource limit for these processes.

Through the major Internet companies in the online technical articles, you can also see that many of the company's cloud platform is based on cgroups technology to build, in fact, it is to group processes, and then add the entire process group to the same group of cgroups nodes, subject to the same resource constraints.

I am in the ad group, a part of the task is to cooperate with the advertising site to generate "product information", advertising site Use this information, advertising on their respective sites. But sometimes there are malicious crawlers crawling through the product information, so we generate another "small" data for lower priority users to download, this time basically can be separated from most of the malicious crawler. For such a "small piece" of data, the requirements of timely updating is not high, the production of commodity information is a cost-efficient task, so we have this task CPU resource utilization limit of 50%.

First, a HALFAPI child node is created under the CPU subsystem: cgcreate abc:abc -g cpu:halfapi .

Then write the configuration data in the configuration file: echo 50000 > /cgroup/cpu/halfapi/cpu.cfs_quota_us . cpu.cfs_quota_usThe default value in is 100000, and writing 50000 indicates that only 50% of the CPU uptime can be used.

Finally, start this task in this cgroups:cgexec -g "cpu:/halfapi" php halfapi.php half >/dev/null 2>&1

Before Cgroups introduces the kernel, you want to limit the CPU usage of one process as described above, you can only adjust the priority of the process with the Nice command, or the CPULimit command restricts the CPU usage of the process using the process. However, the disadvantage of these commands is that you cannot limit the resource usage limits for a single process group, and you cannot complete the resource restrictions required for this type of lightweight container that Docker or other cloud platforms require.

Again, it is almost impossible to limit the physical memory usage of one or a group of processes before cgroups. Using the features provided by cgroups, you can easily limit the physical memory usage of a group of services within the system. For network packets, device access, or IO resource control, Cgroups also provides fine-grained control that could not have been done before.

Conclusion

This paper first introduces the implementation of cgroups in the kernel, and then introduces how cgroups can expose the relevant functions to the user through VFS, then introduces the methods of Cgroups, and finally analyzes some cgroups in practice, and further shows Cgrou PS's powerful fine-grained control capability.

I hope that through the introduction of the whole article, the reader can understand what the cgroups can accomplish, and hope that the reader in the use of cgroups functions, can generally know the kernel through a kind of way to achieve this function.

Reference

1 cgroups Explanation: http://files.cnblogs.com/files/lisperl/cgroups%E4%BB%8B%E7%BB%8D.pdf
2 How to use cgroup:http://tiewei.github.io/devops/howto-use-cgroup/
3 Control groups, part 6:a look under the hood:http://lwn.net/articles/606925/

Reproduced cgroups introduction of Linux resource management

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.