Linux cgroups Introduction
The above is the namespace technology for building Linux containers, which helps isolate the process from its own space, but how does docker limit the size of each space and ensure that they don't scramble for each other? Then you need to use Linux's cgroups technology.
Concept
The Linux Cgroups (Control Groups) provides the ability to restrict, control, and count the resources of a set of processes and future child processes, including CPU, memory, storage, network, etc. With Cgroups, it is easy to restrict the resource usage of a process and monitor the process monitoring and statistics in real time.
Three components in the cgroups:
- Cgroup
Cgroup is a mechanism for group management of Processes, a cgroup contains a set of processes, and can be added to this cgroup configuration of various parameters of the Linux subsystem, associating a set of processes with a set of subsystem system parameters.
Subsystem
subsystem is a set of resource-controlled modules that typically contain:
- Blkio setting access control for input and output of block devices (such as hard disks)
- CPU Settings Cgroup The CPU of the process in the scheduled policy
- CPUACCT can count the CPU usage of the process in Cgroup
- Cpuset sets the CPU and memory that can be used by processes in Cgroup on multicore machines (where memory is used only in NUMA architectures)
- Devices controlling the process access to the device in the Cgroup
- Freezer for processes in suspend (suspends) and restore (resumes) cgroup
- Memory used to control the footprint of processes in the Cgroup
- The NET_CLS is used to classify the network packet (classify) generated by the process in the Cgroup so that the Linux TC (traffic controller) can separate packets from a ClassID from the classification (Cgroup) area and do the current limit or monitor.
- Net_prio set the priority of the network traffic generated by the process in the Cgroup
- NS This subsystem is special, its role is cgroup in the process in the new namespace fork new process (newns), the creation of a new Cgroup, this cgroup contains the new namespace in the process.
Each subsystem is associated to a cgroup that defines the appropriate limits, and the processes in this cgroup are appropriately constrained and controlled, and these subsystem are gradually merged into the kernel, how do you see which subsystem the current kernel supports? You can install the Cgroup command-line tool ( apt-get install cgroup-bin
), and then lssubsys
see the subsystem supported by kernel.
# / lssubsys -acpusetcpu,cpuacctblkiomemorydevicesfreezernet_cls,net_prioperf_eventhugetlbpids
Hierarchy
The function of hierarchy is to string a group of cgroup into a tree-like structure, one such tree is a hierarchy, through this tree-like structure, cgroups can inherit. For example, my system to a set of scheduled task process through Cgroup1 limit CPU utilization, and then one of the time dump log process also need to limit the disk IO, in order to avoid limiting the impact to other processes, you can create cgroup2 inherit from Cgroup1 and limit the IO of the disk, This cgroup2 inherits the CPU limit in the CGROUP1 and increases the disk IO limit without affecting other processes in the cgroup1.
Three components of each other's relationship:
With the description of the above components, it is not difficult to see that the cgroups is based on the collaboration of the three components of the implementation, then what is the relationship between the three components?
- After the system creates a new hierarchy, all the processes in the system are added to the hierarchy root cgroup node, the Cgroup root node is created by default, The subsequent creation of the Cgroup in this hierarchy is the child node of the root Cgroup node.
- A subsystem can only be attached to a hierarchy
- One hierarchy can attach multiple subsystem
- A process can be a member of more than one cgroup, but these cgroup must be in a different hierarchy
- When a process fork out of a child process, the child process is in the same cgroup as the parent process, or it can be moved to other cgroup as needed.
These words do not understand the moment is not okay, we will be in the actual use of the process of understanding the relationship between them gradually.
Kernel interface:
The above describes so many cgroups structure, how to call kernel in the end to configure cgroups? It is learned that cgroups in hierarchy is a tree-like structure, kernel in order to make the configuration of cgroups more intuitive, cgroups through a virtual tree file system to do configuration, through the hierarchical directory virtual Cgroup tree, Let's take a configuration example to understand how to operate the cgroups.
-
First, we will create and mount a hierarchy (Cgroup tree):
~ mkdir Cgroup-test # Create a hierarchy mount point ~ sudo mount-t cgroup-o none,name=cgroup-test cgroup-test./cgroup-test # mount a Hierarchy ~ ls./cgroup-test # After mounting we can see that the system generated some default files in this directory Cgroup.clone_children Cgroup.procs cgroup.sane_behavior notify_on_ Release release_agent tasks
These files are the root node cgroup configuration items in this hierarchy, respectively, which means:
-
Cgroup.clone_children
cpuset Subsystem will read this configuration file, if this value is 1 (by default, 0), the child Cgroup will inherit the Cgroup configuration of the parent cpuset. The
-
Cgroup.procs
is the process group ID in the cgroup of the current node in the tree, and now we are in the root node, which will now have all the process group IDs in the system.
-
notify_on_release
and release_agent
are used together, notify_on_release
Indicates whether release_agent
is executed when the Cgroup last process exits, and release_agent
is a path. It is often used as a process exit to automatically clean up cgroup that are no longer in use.
-
Tasks
also represents the process ID below the Cgroup, and if a process ID is written to the tasks
file, the process is added to the cgroup.
-
Then we create a two sub-cgroup in the root cgroup of the hierarchy that we just created:
cgroup-test sudo mkdir cgroup-1 # Create child cgroup "cgroup-1" cgroup-test sudo mkdir cgroup-2 # Create Child Cgroup "Cgroup-1" Cgroup-test tree.| --cgroup-1| |--cgroup.clone_children| |--cgroup.procs| |--notify_on_release| '--tasks|--cgroup-2| |--cgroup.clone_children| |--cgroup.procs| |--notify_on_release| '--tasks|--cgroup.clone_children|--cgroup.procs|--cgroup.sane_behavior|--notify_on_release|--release_agent '-- Tasks
You can see that folders are created under a cgroup directory, kernel will mark the folder as Cgroup child Cgroup, and they will inherit the properties of the parent cgroup.
To add and move processes in Cgroup:
A process can only exist on one cgroup node in a cgroups hierarchy, and all processes of the system are at the root node by default, and the process can be moved between cgroup nodes by simply writing the process ID to the tasks file of the Cgroup node that is moved to.
cgroup-1 echo $$7475 cgroup-1 sudo sh -c "echo $$ >> tasks" # 将我所在的终端的进程移动到cgroup-1中 cgroup-1 cat /proc/7475/cgroup13:name=cgroup-test:/cgroup-111:perf_event:/10:cpu,cpuacct:/user.slice9:freezer:/8:blkio:/user.slice7:devices:/user.slice6:cpuset:/5:hugetlb:/4:pids:/user.slice/user-1000.slice3:memory:/user.slice2:net_cls,net_prio:/1:name=systemd:/user.slice/user-1000.slice/session-19.scope
You can see that our current 7475
process has been added cgroup-test:/cgroup-1
.
Restricting resources for processes in Cgroup through subsystem
We created hierarchy, but this hierarchy is not associated with any subsystem, so there is no way to restrict the resource usage of the process through the cgroup in that hierarchy. In fact, the system has already created a default hierarchy for each subsystem, such as the memory hierarchy:
~ mount | grep memorycgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory,nsroot=/)
Can see, in the /sys/fs/cgroup/memory
directory is hanging in the memory subsystem hierarchy. Below, we will restrict the memory occupied by the process by creating Cgroup in this hierarchy:
memory stress --vm-bytes 200m --vm-keep -m 1 # 首先,我们不做限制启动一个占用内存的stress进程 memory sudo mkdir test-limit-memory && cd test-limit-memory # 创建一个cgroup test-limit-memory sudo sh -c "echo "100m" > memory.limit_in_bytes" sudo sh -c "echo "100m" > memory.limit_in_bytes" # 设置最大cgroup最大内存占用为100m test-limit-memory sudo sh -c "echo $$ > tasks" # 将当前进程移动到这个cgroup中 test-limit-memory stress --vm-bytes 200m --vm-keep -m 1 # 再次运行占用内存200m的的stress进程
The results of the operation are as follows (through top monitoring):
PID PPID TIME+ %CPU %MEM PR NI S VIRT RES UID COMMAND8336 8335 0:08.23 99.0 10.0 20 0 R 212284 205060 1000 stress8335 7475 0:00.00 0.0 0.0 20 0 S 7480 876 1000 stressPID PPID TIME+ %CPU %MEM PR NI S VIRT RES UID COMMAND8310 8309 0:01.17 7.6 5.0 20 0 R 212284 102056 1000 stress8309 7475 0:00.00 0.0 0.0 20 0 S 7480 796 1000 stress
As you can see through Cgroup, we successfully limited the maximum memory footprint of the stress process to 100m.
How Docker uses the Cgroups:We know that Docker is the resource restriction and monitoring of containers through cgroups, and we see below a real container example of how Docker is configured for cgroups:
~ # docker run -m 设置内存限制 ~ sudo docker run -itd -m 128m ubuntu957459145e9092618837cf94a1cb356e206f2f0da560b40cb31035e442d3df11 ~ # docker会为每个容器在系统的hierarchy中创建cgroup ~ cd /sys/fs/cgroup/memory/docker/957459145e9092618837cf94a1cb356e206f2f0da560b40cb31035e442d3df11 957459145e9092618837cf94a1cb356e206f2f0da560b40cb31035e442d3df11 # 查看cgroup的内存限制 957459145e9092618837cf94a1cb356e206f2f0da560b40cb31035e442d3df11 cat memory.limit_in_bytes134217728 957459145e9092618837cf94a1cb356e206f2f0da560b40cb31035e442d3df11 # 查看cgroup中进程所使用的内存大小 957459145e9092618837cf94a1cb356e206f2f0da560b40cb31035e442d3df11 cat memory.usage_in_bytes430080
You can see Docker by creating cgroup for each container and configuring resource throttling and resource monitoring through Cgroup.
Using the go language to restrict the resources of containers through Cgroup
Below we add the cgroup limit to the container in the previous section, the following demo implements the function of restricting the memory of the container:
Package Mainimport ("os/exec" "Path" "OS" "FMT" "Io/ioutil" "Syscall" "StrConv") const CGROUPMEMORYHI Erarchymount = "/sys/fs/cgroup/memory" Func main () {if OS. Args[0] = = "/proc/self/exe" {//container process FMT. Printf ("Current PID%d", Syscall. Getpid ()) fmt. Println () cmd: = Exec.command ("sh", "-C", ' stress--vm-bytes 200m--vm-keep-m 1 ') cmd. Sysprocattr = &syscall. sysprocattr{} cmd. Stdin = OS. Stdin cmd. Stdout = OS. Stdout cmd. Stderr = OS. Stderr If err: = cmd. Run (); Err! = Nil {fmt. PRINTLN (ERR) OS. Exit (1)}} cmd: = Exec.command ("/proc/self/exe") cmd. Sysprocattr = &syscall. sysprocattr{Cloneflags:syscall. clone_newuts | Syscall. Clone_newpid | Syscall. Clone_newns,} cmd. Stdin = OS. Stdin cmd. Stdout = OS. Stdout cmd. Stderr = OS. Stderr If err: = cmd. Start (); Err! = Nil {fmt. Println ("ERROR", err) OS. Exit (1)} else { Get the fork out process map in the external namespace of the PID FMT. Printf ("%v", cmd. PROCESS.PID)//Create Cgroup OS on the system by default creating a hierarchy attached to memory subsystem. Mkdir (path. Join (Cgroupmemoryhierarchymount, "Testmemorylimit"), 0755)//Add the container process to this cgroup ioutil. WriteFile (path. Join (Cgroupmemoryhierarchymount, "Testmemorylimit", "Tasks"), []byte (StrConv. Itoa (cmd. process.pid)), 0644)//Limit the Cgroup process to use Ioutil. WriteFile (path. Join (Cgroupmemoryhierarchymount, "Testmemorylimit", "memory.limit_in_bytes"), []byte ("100m"), 0644)} cmd. Process.wait ()}
By configuring the Cgroups virtual file system, we have limited the memory footprint of the stress process to the container 100m
.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND10861 root 20 0 212284 102464 212 R 6.2 5.0 0:01.13 stress
Docker's Linux Cgroups