Docker Application Container Basics technology: A Linux Cgroup learning tutorial

Source: Internet
Author: User
Tags documentation int size memory usage mkdir sprintf

Although you have jail me into a specific environment through namespace, the processes I use in the CPU, memory, disk, and so on, can actually be arbitrary. So, we want to limit or control the resources used in the process. That's why Linux cgroup out.
Docker cpu usage
The Linux cgroup full name Linux control group is a function of the Linux kernel to restrict, control and detach resources (such as CPU, memory, disk input and output, etc.) of a process group. The project was first launched by Google's engineers in 2006 (mainly Paul Menage and Rohit Seth), the earliest name being the process container (processes containers). In 2007, because the term container (container) was too broad in the Linux kernel to avoid confusion, it was renamed Cgroup and merged into the 2.6.24 version of the kernel. Then, the others began his development.
Docker container cpu 
Linux Cgroupcgroup allows you to assign resources to the user-defined groups of tasks (processes) that are running in your system-such as CPU time, system memory, network bandwidth, or a combination of these resources. You can monitor your configured Cgroup, deny cgroup access to certain resources, and even dynamically configure your Cgroup on a running system.

Mainly provides the following functions: docker stats cpu usage
Docker container cpu usage
Resource limitation: Restrict resource usage, such as upper memory usage and file system cache limits.
Prioritization: Priority control, such as CPU utilization and disk IO throughput.
Accounting: Some audits or some statistics, the main purpose is to billing.
Control: Suspends the process and resumes the execution process.
Docker cpu
With Cgroup, system administrators can more specifically control the allocation, prioritization, rejection, management, and monitoring of system resources. Can better allocate hardware resources according to tasks and users, improve overall efficiency.

In practice, system administrators typically use Cgroup to do the following (a bit like allocating resources to a virtual machine):
Docker check cpu usage
Isolate a process collection (for example, all processes of nginx) and limit the resources they consume, such as the kernel of a bound CPU.
Allocate enough memory for this group of processes to use
Allocate appropriate network bandwidth and disk storage limits for this group of processes
Restrict access to certain devices (by setting a white list of devices)
Docker run cpu
So how did Cgroup do it? Let's get some perceptual knowledge first.

First of all, Linux put the fact of cgroup a file system, you can mount. Under my Ubuntu 14.04, you can see that Cgroup has been installed for you by entering the following command.

hchen@ubuntu:~$ mount-t Cgroup
Cgroup on/sys/fs/cgroup/cpuset type Cgroup (rw,relatime,cpuset)
Cgroup on/sys/fs/cgroup/cpu type Cgroup (RW,RELATIME,CPU)
Cgroup On/sys/fs/cgroup/cpuacct type Cgroup (RW,RELATIME,CPUACCT)
Cgroup on/sys/fs/cgroup/memory type Cgroup (rw,relatime,memory)
Cgroup on/sys/fs/cgroup/devices type Cgroup (rw,relatime,devices)
Cgroup on/sys/fs/cgroup/freezer type Cgroup (Rw,relatime,freezer)
Cgroup On/sys/fs/cgroup/blkio type Cgroup (Rw,relatime,blkio)
Cgroup On/sys/fs/cgroup/net_prio type Cgroup (Rw,net_prio)
Cgroup on/sys/fs/cgroup/net_cls type Cgroup (RW,NET_CLS)
Cgroup on/sys/fs/cgroup/perf_event type Cgroup (rw,relatime,perf_event)
Cgroup on/sys/fs/cgroup/hugetlb type Cgroup (rw,relatime,hugetlb)

or use the Lssubsys command:

$ lssubsys-m

We can see that under the/sys/fs there is a cgroup directory, there are many subdirectories under this directory, such as: Cpu,cpuset,memory,blkio ... These, these are the subsystems of the Cgroup. Used to do different things, respectively.

If you don't see the above directory, you can mount it yourself, and here's an example:

mkdir Cgroup
Mount-t Tmpfs cgroup_root./cgroup
mkdir Cgroup/cpuset
Mount-t cgroup-ocpuset Cpuset./cgroup/cpuset/
Mount-t cgroup-ocpu CPU./cgroup/cpu/
mkdir cgroup/memory
Mount-t cgroup-omemory memory./cgroup/memory/

Once the mount is successful, you will see that these directories have good files, such as the CPU and Cpuset subsystems shown below:

hchen@ubuntu:~$ ls/sys/fs/cgroup/cpu/sys/fs/cgroup/cpuset/
Cgroup.clone_children Cgroup.sane_behavior cpu.shares release_agent
Cgroup.event_control Cpu.cfs_period_us cpu.stat Tasks
Cgroup.procs Cpu.cfs_quota_us notify_on_release User

Cgroup.clone_children Cpuset.mem_hardwall cpuset.sched_load_balance
Cgroup.event_control cpuset.memory_migrate Cpuset.sched_relax_domain_level
Cgroup.procs cpuset.memory_pressure Notify_on_release
Cgroup.sane_behavior cpuset.memory_pressure_enabled release_agent
Cpuset.cpu_exclusive cpuset.memory_spread_page Tasks
Cpuset.cpus Cpuset.memory_spread_slab User
Cpuset.mem_exclusive Cpuset.mems

You can go down to/sys/fs/cgroup's subdirectories and make a dir, and you'll find that once you create a subdirectory, there's a lot of files in the subdirectory.

hchen@ubuntu:/sys/fs/cgroup/cpu$ sudo mkdir Haoel
[sudo] password for hchen:
hchen@ubuntu:/sys/fs/cgroup/cpu$ ls./haoel
Cgroup.clone_children Cgroup.procs Cpu.cfs_quota_us cpu.stat Tasks
Cgroup.event_control Cpu.cfs_period_us cpu.shares Notify_on_release

OK, let's take a look at a few examples.
CPU limit

Suppose we have a very CPU-eating program called Deadloop, whose source code is as follows:

int main (void)
int i = 0;
for (;;) i++;
return 0;

When executed with sudo, there is no doubt that the CPU has been dried to 100% (below is the output of the top command)

PID USER PR NI virt RES SHR S%cpu%mem time+ COMMAND
3529 Root 0 4196 736 656 R 99.6 0.1 0:23.13 Deadloop

Then we didn't create a Haoel group under/SYS/FS/CGROUP/CPU. Let's set the CPU utilization limit for this group first:

hchen@ubuntu:~# Cat/sys/fs/cgroup/cpu/haoel/cpu.cfs_quota_us
root@ubuntu:~# echo 20000 >/sys/fs/cgroup/cpu/haoel/cpu.cfs_quota_us

We see that the PID of this process is 3529, and we add this process to this cgroup:

# echo 3529 >>/sys/fs/cgroup/cpu/haoel/tasks

Then, you'll see the CPU's utilization drop to 20% in the top. (The 20000 that we set before is 20%)

PID USER PR NI virt RES SHR S%cpu%mem time+ COMMAND
3529 Root 0 4196 736 656 R 19.9 0.1 8:06.11 Deadloop

The following code is an example of a thread:

#define _GNU_SOURCE/* Feature_test_macros (7) * *

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/syscall.h>

const int num_threads = 5;

void *thread_main (void *threadid)
/* Add yourself to the Cgroup (Syscall (Sys_gettid) to get the thread's system tid) * *
Char cmd[128];
sprintf (cmd, "Echo%ld >>/sys/fs/cgroup/cpu/haoel/tasks", Syscall (Sys_gettid));
System (CMD);
sprintf (cmd, "Echo%ld >>/sys/fs/cgroup/cpuset/haoel/tasks", Syscall (Sys_gettid));
System (CMD);

Long Tid;
Tid = (long) ThreadID;
printf ("Hello world! It ' s me, thread #%ld, PID #%ld!\n, Tid, Syscall (Sys_gettid));

int a=0;
while (1) {
Pthread_exit (NULL);
int main (int argc, char *argv[])
int num_threads;
if (argc > 1) {
Num_threads = Atoi (argv[1]);
if (num_threads<=0 | | num_threads>=100) {
Num_threads = num_threads;

/* Set CPU utilization to 50% */
mkdir ("/sys/fs/cgroup/cpu/haoel", 755);
System ("echo 50000 >/sys/fs/cgroup/cpu/haoel/cpu.cfs_quota_us");

mkdir ("/sys/fs/cgroup/cpuset/haoel", 755);
/* Limit CPU to use only the core and #3 core/
System ("Echo \" 2,3\ ">/sys/fs/cgroup/cpuset/haoel/cpuset.cpus");

pthread_t* threads = (pthread_t*) malloc (sizeof (pthread_t) *num_threads);
int RC;
Long T;
For (t=0 t<num_threads; t++) {
printf ("In main:creating thread%ld\n", t);
rc = Pthread_create (&threads[t], NULL, Thread_main, (void *) t);
if (RC) {
printf ("ERROR; Return code from Pthread_create () is%d\n ", RC);
Exit (-1);

/* Last thing this main () should do */
Pthread_exit (NULL);
Free (threads);
Memory Usage Restrictions

Let's take a look at an example of limiting memory (the following code is a dead loop, the other is constantly allocating memory, 512 bytes at a time, one second at a time):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>

int main (void)
int size = 0;
int chunk_size = 512;
void *p = NULL;

while (1) {

if (p = malloc (P, chunk_size)) = = NULL) {
printf ("Out of memory!! \ n ");
Memset (P, 1, chunk_size);
Size + = Chunk_size;
printf ("[%d]-memory is allocated [%8d] bytes \ n", Getpid (), size);
Sleep (1);
return 0;

Then, on the other side of us:

# Create memory Cgroup
$ mkdir/sys/fs/cgroup/memory/haoel
$ echo 64k >/sys/fs/cgroup/memory/haoel/memory.limit_in_bytes

# Add the PID of the process above to this Cgroup
$ echo [PID] >/sys/fs/cgroup/memory/haoel/tasks

You will see that the above process will be killed for memory problems.
Disk I/O restrictions

Let's take a look at our hard drive Io, our simulation commands are as follows: (read data from/dev/sda1, output to/dev/null)

sudo dd if=/dev/sda1 of=/dev/null

We can see from the iotop command that the relevant IO speed is 55MB/S (inside the virtual machine):

TID prio USER disk READ disk WRITE swapin io> COMMAND
8128 BE/4 root 55.74/m 0.00 B/s 0.00% 85.65% dd if=/de~=/dev/null ...

Then, we first create a Blkio (block device io) cgroup


and limit the read IO to 1mb/s and put the PID of the DD command in front of it (note: 8:0 Is the device number, you can obtain by Ls-l/dev/sda1:

root@ubuntu:~# Echo ' 8:0 1048576 ' >/sys/fs/cgroup/blkio/haoel/blkio.throttle.read_bps_device
root@ubuntu:~# echo 8128 >/sys/fs/cgroup/blkio/haoel/tasks

With the Iotop command, you will soon be able to see that the speed of reading is limited to 1mb/s.

TID prio USER disk READ disk WRITE swapin io> COMMAND
8128 BE/4 root 973.20 k/s 0.00 b/s 0.00% 94.41% dd if=/de~=/dev/null ...
subsystem of the Cgroup

Well, with the above perceptual knowledge, let's take a look at what subsystems the control group has:

Blkio-This subsystem sets the input/output limits for block devices, such as physical devices (disks, solid-state drives, USB, etc.).
Cpu-this subsystem uses a scheduler to provide Cgroup mission access to the CPU.
Cpuacct-this subsystem automatically generates CPU reports used by tasks in Cgroup.
Cpuset-this subsystem assigns independent CPUs (in multicore systems) and memory nodes to tasks in Cgroup.
Devices-this subsystem allows or denies the task access devices in Cgroup.
Freezer-this subsystem to suspend or restore tasks in Cgroup.
memory-This subsystem sets the memory limits used for tasks in Cgroup and automatically generates memory resource usage reports.
Net_cls-this subsystem uses a level identifier (CLASSID) to mark network packets, allowing the Linux Traffic Control Program (TC) to identify the packets generated from the specific cgroup.
Net_prio-This subsystem is designed to prioritize network traffic
hugetlb-this subsystem is mainly for the hugetlb system, which is a large paging file system.

Note that you may not see the two cgroup of Net_cls and Net_prio in Ubuntu 14.04 and you need to mount them manually:

$ sudo modprobe cls_cgroup
$ sudo mkdir/sys/fs/cgroup/net_cls
$ sudo mount-t cgroup-o net_cls none/sys/fs/cgroup/net_cls

$ sudo modprobe netprio_cgroup
$ sudo mkdir/sys/fs/cgroup/net_prio
$ sudo mount-t cgroup-o Net_prio None/sys/fs/cgroup/net_prio

For details on the parameters of each subsystem, and for more Linux Cgroup documents, you can look at the following documentation:

Official documentation for Linux kernel
Official documents of the Redhat

Terminology of Cgroup

Cgroup has the following terms:

Task: is a process of the system.
Control group: A group of processes that are divided by a standard, such as professor and student in an official document, or WWW and system, which represent a group of processes. The resource control in Cgroups is implemented in the control group as the unit. A process can be added to a control group. The resource constraints are defined on this group, just like the Haoel I used in the example above. Simply put, Cgroup's rendering is a directory with a series of configurable files.
Hierarchy (Hierarchy): The control group can be organized into hierarchical form, a tree of control groups (directory structure). The control of child nodes on the group tree inherits the properties of the parent node. Simply put, hierarchy is the cgroups directory tree on one or more subsystems.
Subsystem (SUBSYSTEM): A subsystem is a resource controller, such as a CPU subsystem is a controller that controls CPU time allocation. Subsystems must be attached to a hierarchy to function, and after a subsystem is attached to a hierarchy, all control groups at this level are controlled by this subsystem. The subsystems of Cgroup can be many and are increasing.

The cgroup of the next generation

Above, we can see some common methods and related terminology of cgroup. Generally speaking, such a design in general still no problem, in addition to the operation of the user experience is not very good, but basically meet our general needs.

However, there is a classmate named Tejun Heo is very unhappy, he in the Linux community to Cgroup Spit a slot, but also triggered a variety of discussions of the kernel group.

For Tejun Heo classmate, Cgroup design is quite bad. He gave some examples, to the effect that, if there are multiple levels of relationship, that is, there are many ways to classify the process, for example, we can divide by user, divided into professor and student, but also in accordance with the application of similar points, such as WWW and NFS. Then, when a process is professor and WWW, there will be multiple levels of orthogonal situations, resulting in confusion over the management of the process. In addition, a case is, if there is a level a bound CPU, and the level B binding memory, there is a level C binding cputset, and some process some need AB, some need AC, some need ABC, management is very difficult.

The level of operation is more cumbersome, and if the hierarchy becomes more difficult to operate and manage, although that way is well implemented, but in the use of a lot of complexity. You can imagine a library classification problem, you can have a variety of different classifications, categories and books is a many-to-many relationship.

So, after kernel 3.16, the new design of the unified hierarchy was introduced, which introduced a feature called __devel__sane_behavior (the name clearly implies that it is still in the development test phase), It can mount all subsystems to the root level, only the leaf node can exist tasks, the non-leaf node only for resource control.

Let's mount a look:

$ sudo mount-t cgroup-o __devel__sane_behavior cgroup./cgroup

$ ls./cgroup
Cgroup.controllers Cgroup.procs Cgroup.sane_behavior Cgroup.subtree_control

$ cat./cgroup/cgroup.controllers
Cpuset CPU CPUACCT Memory devices freezer net_cls Blkio perf_event Net_prio hugetlb

We can see there are four files, and then you mkdir a subdirectory here, and there will be these four files. The superior's Cgroup.subtree_control control subordinates ' cgroup.controllers.

For example: Suppose we have the following directory structure, b represents the Blkio,m code memory, where a is root, including all subsystems ().

# A (b,m)-B (b,m)-C (b)
# \-D (b)-E

# in the following command, + means enable,-indicates disable

# The Enable Blkio on B
# echo +blkio > A/cgroup.subtree_control

# Enable Blkio on C and D
# echo +blkio > A/b/cgroup.subtree_control

# Enable memory on B
# echo +memory > A/cgroup.subtree_control

In the above structure,

Cgroup only the online control subordinates, can not be delivered to the lower subordinates. Therefore, there are no memory restrictions in C and D, and there are no Blkio and memory restrictions in E. And this layer of cgroup.controllers file is a read-only, the content of which depends on what the superior Subtree_control.
Any directory that has been configured for Subtree_control cannot bind processes, except for root nodes. So, A,c,d,e can tie up the process, but B can't.

We can see that this kind of clean distinction is open to two things, one is the grouping of the processes, the other is the resource control of the group (the previous two things are completely mixed up), the directory inheritance has added some restrictions, so as to avoid some ambiguous situation.

Of course, the matter is still evolving, and the cgroup of these issues is now being addressed by Cgroup Tejun Heo and Huawei's Li Zefan classmates. In short, this is a system management problem, and change will affect a lot of things, but once the plan is determined, the old Cgroup way will be gone forever.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.