[Reprint] Linux Cgroup technology Introduction

Source: Internet
Author: User
Tags int size sprintf cpu usage

Original: http://coolshell.cn/articles/17049.html

The famous left ear mouse article, very simple introduction of the Linux Cgroup technology, after reading must let you understand Cgroup technology

Earlier, we introduced the Linux Namespace, but Namespace solved the problem mainly is the environment isolation problem, this is only one of the most fundamental steps in virtualization, we also need to solve the use of computer resources to isolate. In other words, although you jail me into a specific environment through namespace, the processes in which I use CPU, memory, disk, and so on, can actually be arbitrary. Therefore, we want to limit or control the use of resources in the process. That's why the Linux Cgroup came out.

Linux cgroup full name Linux control group, is a function of the Linux kernel, to restrict, control and separate the resources of a process group (such as CPU, memory, disk input and output, etc.). The project was first launched by Google's engineers in 2006 (mainly Paul Menage and Rohit Seth), with the earliest name being the process containers. In the 2007, because the term container (container) was too broad in the Linux kernel, it was renamed Cgroup to avoid confusion, and was merged into the kernel of version 2.6.24. Then, the others began his development.

Linux Cgroupcgroup can be??? Let??? You??? For??? System??? Unified??? Medium??? The??? Transport??? Yes??? Any??? Service??? (In??? Ride??? ) of??? Use??? Households??? Fixed??? Yi??? Group??? Group??? Divided??? With??? Chinese??? Source??? -Than??? Such as??? CPU is??? Room???、??? System??? Unified??? Inside??? Save?????? Network??? Contact??? Take??? Wide??? Or??? Person??? This??? Some??? Chinese??? Source??? Of??? Group??? Hopewell???。??? You??? Can??? To??? Supervision??? Control??? You??? With??? Place??? Of??? Cgroup, Refuse??? Absolutely??? Cgroup Visit??? Ask??? A??? Some??? Chinese??? Source???, Very??? To??? In??? Transport??? Yes??? Of??? System??? Unified??? Medium??? Move??? State??? With??? Place??? You??? Of??? Cgroup

The following features are mainly provided:

    • Resource limitation: Limit resource usage, such as upper memory usage and file system cache limits.
    • prioritization: Priority control, for example: CPU utilization and disk IO throughput.
    • Accounting: Some audits or some statistics, the main purpose is for billing.
    • Control: Suspends the process and resumes the execution process.

Make??? Use??? Cgroup, Department??? Unified??? Tube??? Daniel??? Member??? Can??? More??? With??? Body??? To??? Control??? System??? Right??? System??? Unified??? Chinese??? Source??? Of??? Divided??? With?????? Excellent??? First??? Shun??? Order?????? Refused??? Absolutely???、??? Tube??? Daniel??? And??? Supervision??? Control???.??? Can??? More??? Good??? To??? Root??? According to??? Any??? Service??? And??? Use??? Households??? Divided??? With??? Hard??? Pieces??? Chinese??? Source???, Lift??? High??? Total??? Body??? Effect??? Rate???.

In practice, system administrators typically use Cgroup to do the following (a bit like assigning resources to a virtual machine):

    • Isolate a collection of processes (for example, all nginx processes) and limit the resources they consume, such as the cores that bind the CPU.
    • Allocate enough memory for this set of processes to use
    • Assign the appropriate network bandwidth and disk storage limits to this set of processes
    • Restrict access to certain devices (by setting a whitelist of devices)

So how did Cgroup do it? Let's get some perceptual knowledge first.

First, Linux Cgroup the fact that it has a file system that you can mount. Under my Ubuntu 14.04, you can see that Cgroup has been installed for you by typing the following commands.

[Email protected]:~$ mount-t cgroupcgroup on/sys/fs/cgroup/cpuset type Cgroup (rw,relatime,cpuset) Cgroup ON/SYS/FS/CGR OUP/CPU type Cgroup (rw,relatime,cpu) cgroup on/sys/fs/cgroup/cpuacct type Cgroup (RW,RELATIME,CPUACCT) Cgroup on/sys/fs /cgroup/memory type Cgroup (rw,relatime,memory) cgroup on/sys/fs/cgroup/devices type Cgroup (rw,relatime,devices) Cgroup on/sys/fs/cgroup/freezer type Cgroup (rw,relatime,freezer) cgroup On/sys/fs/cgroup/blkio type Cgroup (rw, Relatime,blkio) Cgroup On/sys/fs/cgroup/net_prio type Cgroup (rw,net_prio) cgroup on/sys/fs/cgroup/net_cls type Cgroup ( RW,NET_CLS) Cgroup on/sys/fs/cgroup/perf_event type Cgroup (rw,relatime,perf_event) Cgroup on/sys/fs/cgroup/hugetlb Type Cgroup (rw,relatime,hugetlb)

or use the Lssubsys command:

$ Lssubsys  -mcpuset/sys/fs/cgroup/cpusetcpu/sys/fs/cgroup/cpucpuacct/sys/fs/cgroup/cpuacctmemory/sys/fs/ Cgroup/memorydevices/sys/fs/cgroup/devicesfreezer/sys/fs/cgroup/freezerblkio/sys/fs/cgroup/blkionet_cls/sys/fs /cgroup/net_clsnet_prio/sys/fs/cgroup/net_prioperf_event/sys/fs/cgroup/perf_eventhugetlb/sys/fs/cgroup/hugetlb

As we can see, there is a cgroup directory under/SYS/FS, and there are many subdirectories in this directory, such as: Cpu,cpuset,memory,blkio ... These, these are the subsystems of cgroup. Used to do different things, respectively.

If you don't see the above list, you can mount it yourself, here's an example:

mkdir cgroupmount-t tmpfs cgroup_root./cgroupmkdir cgroup/cpusetmount-t cgroup-ocpuset cpuset./cgroup/cpuset/mkdir CG Roup/cpumount-t cgroup-ocpu CPU./cgroup/cpu/mkdir cgroup/memorymount-t cgroup-omemory memory./cgroup/memory/

Once mount is successful, you will see that there are good files in these directories, such as the CPU and Cpuset subsystems as shown below:

[Email protected]:~$ ls/sys/fs/cgroup/cpu/sys/fs/cgroup/cpuset//sys/fs/cgroup/cpu:cgroup.clone_children  Cgroup.sane_behavior  cpu.shares         release_agentcgroup.event_control   cpu.cfs_period_us     cpu.stat           taskscgroup.procs           cpu.cfs_quota_us      notify_on_release  user/sys/fs/cgroup/cpuset/:cgroup.clone_ Children  Cpuset.mem_hardwall             Cpuset.sched_load_balancecgroup.event_control   cpuset.memory_migrate           cpuset.sched_relax_domain_levelcgroup.procs           cpuset.memory_pressure          notify_on_ Releasecgroup.sane_behavior   cpuset.memory_pressure_enabled  release_agentcpuset.cpu_exclusive   Cpuset.memory_spread_page       taskscpuset.cpus            cpuset.memory_spread_slab       usercpuset.mem_exclusive   Cpuset.mems

You can go down to/sys/fs/cgroup's subdirectories and make a dir, and you'll find that once you create a subdirectory, there are a lot of files in that subdirectory.

[Email protected]:/sys/fs/cgroup/cpu$ sudo mkdir haoel[sudo] password for hchen: [Email protected]:/sys/fs/cgroup/cpu$ ls./haoelcgroup.clone_children  cgroup.procs       cpu.cfs_quota_us  cpu.stat           Taskscgroup.event_ Control   cpu.cfs_period_us  cpu.shares        notify_on_release

OK, let's take a look at a few examples.

CPU limit

Suppose we have a very CPU-eating program called Deadloop, whose source code is as follows:

int main (void) {    int i = 0;    for (;;) i++;    return 0;}

After executing with sudo, there is no doubt that the CPU has been dried to 100% (below is the output of the top command)

  PID USER      PR  NI    VIRT    RES    SHR S%cpu%MEM     time+ COMMAND      3529 root   0    4196    736    656 R 99.6  0.1   0:23.13 Deadloop   

Then we did not create a Haoel group under/SYS/FS/CGROUP/CPU. Let's start by setting the CPU usage limit for this group:

[Email protected]:~# cat/sys/fs/cgroup/cpu/haoel/cpu.cfs_quota_us-1[email protected]:~# echo 20000 >/sys/fs/ Cgroup/cpu/haoel/cpu.cfs_quota_us

We see that the PID of this process is 3529, and we add this process to this cgroup:

# echo 3529 >>/sys/fs/cgroup/cpu/haoel/tasks

Then, you'll see in top that the utilization of the CPU immediately drops to 20%. (The 20000 we set above is the meaning of 20%)

  PID USER      PR  NI    VIRT    RES    SHR S%cpu%MEM     time+ COMMAND      3529 root   0    4196    736    656 R 19.9  0.1   8:06.11 Deadloop    

The following code is an example of a thread:

#define _GNU_SOURCE/* See Feature_test_macros (7) */#include <pthread.h> #include <stdio.h> #include &lt ;stdlib.h> #include <sys/stat.h> #include <sys/types.h> #include <unistd.h> #include <sys/ syscall.h>const int num_threads = 5;void *thread_main (void *threadid) {/* Add yourself to Cgroup (Syscall (Sys_gettid) to get the thread of the system T    ID) */char cmd[128];    sprintf (cmd, "Echo%ld >>/sys/fs/cgroup/cpu/haoel/tasks", Syscall (Sys_gettid));     System (CMD);    sprintf (cmd, "Echo%ld >>/sys/fs/cgroup/cpuset/haoel/tasks", Syscall (Sys_gettid));    System (CMD);    Long Tid;    Tid = (long) ThreadID; printf ("Hello world!        It's me, thread #%ld, PID #%ld!\n ", Tid, Syscall (Sys_gettid));     int a=0;    while (1) {a++; } pthread_exit (NULL);}    int main (int argc, char *argv[]) {int num_threads;    if (argc > 1) {num_threads = Atoi (argv[1]);    } if (Num_threads<=0 | | num_threads>=100) {num_threads = num_threads; }    /* Set CPU utilization to 50% */mkdir ("/sys/fs/cgroup/cpu/haoel", 755);    System ("echo 50000 >/sys/fs/cgroup/cpu/haoel/cpu.cfs_quota_us");    mkdir ("/sys/fs/cgroup/cpuset/haoel", 755);    /* Limit CPU to use only # # kernel and # # kernel */System ("Echo \" 2,3\ ">/sys/fs/cgroup/cpuset/haoel/cpuset.cpus");    pthread_t* threads = (pthread_t*) malloc (sizeof (pthread_t) *num_threads);    int RC;    Long T;        for (t=0; t<num_threads; t++) {printf ("in main:creating thread%ld\n", t);        rc = Pthread_create (&threads[t], NULL, Thread_main, (void *) t); if (RC) {printf ("ERROR;            Return code from Pthread_create () is%d\n ", RC);        Exit (-1);    }}//Last thing this main () should do */Pthread_exit (NULL); Free (threads);}
Memory usage Limits

Let's look at a memory-constrained example (the code below is a dead loop, and the rest of the memory is constantly allocated, 512 bytes each, one second at a time):

#include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include < Unistd.h>int Main (void) {    int size = 0;    int chunk_size = +;    void *p = NULL;    while (1) {        if (p = malloc (P, chunk_size)) = = NULL) {            printf ("Out of memory!! \ n ");            break;        }        Memset (P, 1, chunk_size);        Size + = Chunk_size;        printf ("[%d]-memory is allocated [%8d] bytes \ n", Getpid (), size);        Sleep (1);    }    return 0;}

Then, on our other side:

You will see that the process above will be killed because of memory problems.

Disk I/O limits

Let's take a look at our hard drive Io, our simulation commands are as follows: (read data from/dev/sda1, output to/dev/null)

sudo dd if=/dev/sda1 of=/dev/null

We can see from the Iotop command that the associated IO speed is 55mb/s (inside the virtual machine):

  TID  PRIO  USER     disk READ  disk WRITE  swapin     io>    COMMAND           8128 be/4 root       55.74 M    /s 0.00 B/S  0.00% 85.65% dd if=/de~=/dev/null ...

Then we first create a Blkio (block device io) cgroup

Mkdir/sys/fs/cgroup/blkio/haoel

and limit the read IO to 1mb/s, and put the PID in front of the DD command (note: 8:0 Is the device number that you can obtain via Ls-l/dev/sda1):

[Email protected]:~# echo ' 8:0 1048576 '  >/sys/fs/cgroup/blkio/haoel/blkio.throttle.read_bps_device [email protected]:~# echo 8128 >/sys/fs/cgroup/blkio/haoel/tasks

With the Iotop command, you can immediately see that the reading speed is limited to 1mb/s or so.

  TID  PRIO  USER     disk READ  disk WRITE  swapin     io>    COMMAND           8128 be/4 root      973.20 K/S    0.00 B/S  0.00% 94.41% dd if=/de~=/dev/null ...
Subsystems of the Cgroup

Well, with the above perceptual knowledge, let's take a look at what subsystems the control group has:

    • blkio-this??? A??? Child??? System??? Unified??? For??? Block??? Set??? Preparation??? Set??? Fixed??? Lose??? Into??? /Loss??? Out??? Limit??? ???, Than??? Such as??? Material??? Daniel??? Set??? Preparation??? (Magnetic??? Disk???, Solid??? State??? Hard??? Disk???, USB, etc.??? and other??? )。
    • cpu-this??? A??? Child??? System??? Unified??? Make??? Use??? Tune??? Degree??? Ride??? Order??? Mention??? For??? Right??? ??? Of the CPU Cgroup Ren??? Service??? Visit??? Ask???.???
    • cpuacct-this??? A??? Child??? System??? Unified??? Self -??? Move??? Raw??? Yes??? ??? In Cgroup Any??? Service??? The??? Make??? Use??? Of??? CPU report??? Sue???.???
    • cpuset-this??? A??? Child??? System??? Unified??? For??? ??? In Cgroup Of??? Any??? Service??? Divided??? With??? Alone??? Li??? CPU (in??? Many??? Nuclear??? System??? Unified??? ) and??? Inside??? Save??? Section??? Point???.???
    • devices-this??? A??? Child??? System??? Unified??? Can??? Acceptable??? Xu??? Or??? Person??? Refused??? Absolutely??? ??? In Cgroup Of??? Any??? Service??? Visit??? Ask??? Set??? Preparation???.???
    • freezer-this??? A??? Child??? System??? Unified??? Hanging??? From??? Or??? Person??? Restore??? Complex??? ??? In Cgroup Of??? Any??? Service???。???
    • memory-this??? A??? Child??? System??? Unified??? Set??? Fixed??? ??? In Cgroup Any??? Service??? Make??? Use??? Of??? Inside??? Save??? Limit??? ???, and??? Self -??? Move??? Raw??? Yes????? Inside??? Save??? Chinese??? Source Use??? Newspaper??? Sue???.???
    • net_cls-this??? A??? Child??? System??? Unified??? Make??? Use??? and other??? Level??? Knowledge??? Don't??? Character??? (ClassID) Standard??? Remember??? Network??? Contact??? Number??? According to??? Package???, can be??? Acceptable??? Xu??? Linux streaming??? Volume??? Control??? System??? Ride??? Order??? (TC) Knowledge??? Don't??? From??? With??? Body??? ??? In Cgroup Raw??? Yes??? Of??? Number??? According to??? Package???.???
    • Net_prio-This subsystem is designed to prioritize network traffic
    • hugetlb-this subsystem is mainly for the hugetlb system to restrict, this is a large page file system.

???

Note that you may not see the two cgroup of Net_cls and Net_prio under Ubuntu 14.04 and you need to mount it manually:

$ sudo modprobe cls_cgroup$ sudo mkdir/sys/fs/cgroup/net_cls$ sudo mount-t cgroup-o net_cls none/sys/fs/cgroup/net_cls $ sudo modprobe netprio_cgroup$ sudo mkdir/sys/fs/cgroup/net_prio$ sudo mount-t cgroup-o net_prio none/sys/fs/cgroup/n Et_prio

For details on the parameters of each subsystem, as well as more documentation for Linux Cgroup, you can look at the following documents:

    • Official documentation for Linux kernel
    • Official Documents of Redhat
Cgroup's terminology

Cgroup has the following terms:

    • Task: is a process of the system.
    • control group: A group of processes, such as professor and student in official documents, or WWW and system, that represent a group of processes, according to a certain standard. Resource control in Cgroups is implemented in the control group as a unit. A process can be added to a control group. The resource limit is defined on this group, just like the Haoel I used in the example above. Simply put, Cgroup's presentation is a directory with a series of configurable files.
    • hierarchy (Hierarchy): Control groups can be organized into hierarchical forms, both a control group of trees (directory structure). The child nodes on the control group tree inherit the properties of the parent node. Simply put, hierarchy is the cgroups directory tree on one or more subsystems.
    • Subsystem (SUBSYSTEM): A subsystem is a resource controller, such as a CPU subsystem is a controller that controls CPU time allocation. Subsystems must be attached to a level to function, and when a subsystem is attached to a hierarchy, all control groups at that level are controlled by this subsystem. Cgroup's subsystems can be a lot, and they are growing.
The next generation of Cgroup

Above, we can see some common methods of cgroup and related terminology. Generally speaking, such a design in general still no problem, in addition to the user experience is not very good operation, but basically meet our general needs.

However, in this, there is a call Tejun Heo classmate very uncomfortable, he in the Linux community Cgroup Spit a slot, but also triggered a variety of kernel group discussion.

For Tejun Heo classmate, Cgroup design is quite bad. He gives examples to the effect that, if there are multiple levels of relationships, that is, there are many ways to classify processes, for example, we can divide by user, divided into professor and student, at the same time, but also according to the application of similar, such as WWW and NFS. Then, when a process is professor and WWW, then there will be multi-layered orthogonal situation, resulting in the chaos of process management. In addition, a case is, if there is a layer a bound CPU, and the level B bound memory, there is a level C binding cputset, and some process some need AB, some need AC, some need ABC, management is quite difficult.

Hierarchies are cumbersome to operate, and if they are more hierarchical, more difficult to operate and manage, although that is a good way to achieve, but there are a lot of complexities in using them. You can imagine a library of book classification problems, you can have a variety of different categories, classification and books is a many-to-many relationship.

So, after kernel 3.16, a new design of the unified hierarchy was introduced, which introduced a feature called __devel__sane_behavior , which clearly implies that the test phase is still under development. It can be all the subsystems are attached to the root level, only the leaf node can exist tasks, non-leaf node only for resource control.

Let's mount a look:

$ sudo mount-t cgroup-o __devel__sane_behavior cgroup./cgroup$ ls.  /cgroupcgroup.controllers Cgroup.procs Cgroup.sane_behavior  Cgroup.subtree_control $ cat./cgroup/cgroup.controllerscpuset CPU CPUACCT Memory devices Freezer net_cls Blkio perf_event Net_prio hugetlb

We can see that there are four files, and then you mkdir a subdirectory here, and there will be four files in it. the superior's Cgroup.subtree_control control the subordinate cgroup.controllers.

For example: Suppose we have the following directory structure, b stands for blkio,m Code memory, where a is root, including all subsystems ().

# A (b,m)-B (b,m)-C (b) #               \-D (b)-e# The following command, + means enable,-represents disable# on B's Enable blkio# echo +blkio > A/cgroup.s ubtree_control# on C and D enable Blkio # echo +blkio > a/b/cgroup.subtree_control# on B Enable Memory  # echo +memory > A/cgroup.subtree_control

In the above structure,

    • Cgroup only on-line control subordinate, cannot pass to subordinate. Therefore, there is no memory limitation in C and D, and there are no Blkio and memory limitations in E. And the cgroup.controllers file of this layer is a read-only, the content of which is to see what is in the superior Subtree_control.
    • any directory that has been configured for Subtree_control cannot bind processes, except for root nodes . So, A,c,d,e can tie up the process, but B can't.

We can see that this is a clean way to distinguish between two things, one is the grouping of the processes, the other is the control of the resources of the group (the two things were completely mixed up), and added some restrictions on directory inheritance, which avoids some ambiguous situations.

Of course, the matter is still evolving, cgroup these issues is currently cgroup Tejun Heo and Huawei's Li Zefan students to solve the problem. In short, this is a system management problem, and the change will affect a lot of things, but once the plan is determined, the old Cgroup way will be gone forever.

Reference
    • Linux Kernel Cgroup Documents
    • Reahat Resource Management Guide
    • Fixing control groups
    • The unified control group hierarchy in 3.16
    • Cgroup v2 (PDF)

(End of full text)

[Reprint] Linux Cgroup technology Introduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.