Docker's Linux Cgroups

Source: Internet
Author: User
Tags stdin

Linux cgroups Introduction

The above is the namespace technology for building Linux containers, which helps isolate the process from its own space, but how does docker limit the size of each space and ensure that they don't scramble for each other? Then you need to use Linux's cgroups technology.

Concept

The Linux Cgroups (Control Groups) provides the ability to restrict, control, and count the resources of a set of processes and future child processes, including CPU, memory, storage, network, etc. With Cgroups, it is easy to restrict the resource usage of a process and monitor the process monitoring and statistics in real time.

Three components in the cgroups:
  • Cgroup
    Cgroup is a mechanism for group management of Processes, a cgroup contains a set of processes, and can be added to this cgroup configuration of various parameters of the Linux subsystem, associating a set of processes with a set of subsystem system parameters.
  • Subsystem
    subsystem is a set of resource-controlled modules that typically contain:

      • Blkio setting access control for input and output of block devices (such as hard disks)
      • CPU Settings Cgroup The CPU of the process in the scheduled policy
      • CPUACCT can count the CPU usage of the process in Cgroup
      • Cpuset sets the CPU and memory that can be used by processes in Cgroup on multicore machines (where memory is used only in NUMA architectures)
      • Devices controlling the process access to the device in the Cgroup
      • Freezer for processes in suspend (suspends) and restore (resumes) cgroup
      • Memory used to control the footprint of processes in the Cgroup
      • The NET_CLS is used to classify the network packet (classify) generated by the process in the Cgroup so that the Linux TC (traffic controller) can separate packets from a ClassID from the classification (Cgroup) area and do the current limit or monitor.
      • Net_prio set the priority of the network traffic generated by the process in the Cgroup
      • NS This subsystem is special, its role is cgroup in the process in the new namespace fork new process (newns), the creation of a new Cgroup, this cgroup contains the new namespace in the process.

    Each subsystem is associated to a cgroup that defines the appropriate limits, and the processes in this cgroup are appropriately constrained and controlled, and these subsystem are gradually merged into the kernel, how do you see which subsystem the current kernel supports? You can install the Cgroup command-line tool ( apt-get install cgroup-bin ), and then lssubsys see the subsystem supported by kernel.

    # / lssubsys -acpusetcpu,cpuacctblkiomemorydevicesfreezernet_cls,net_prioperf_eventhugetlbpids
  • Hierarchy
    The function of hierarchy is to string a group of cgroup into a tree-like structure, one such tree is a hierarchy, through this tree-like structure, cgroups can inherit. For example, my system to a set of scheduled task process through Cgroup1 limit CPU utilization, and then one of the time dump log process also need to limit the disk IO, in order to avoid limiting the impact to other processes, you can create cgroup2 inherit from Cgroup1 and limit the IO of the disk, This cgroup2 inherits the CPU limit in the CGROUP1 and increases the disk IO limit without affecting other processes in the cgroup1.

Three components of each other's relationship:

With the description of the above components, it is not difficult to see that the cgroups is based on the collaboration of the three components of the implementation, then what is the relationship between the three components?

    • After the system creates a new hierarchy, all the processes in the system are added to the hierarchy root cgroup node, the Cgroup root node is created by default, The subsequent creation of the Cgroup in this hierarchy is the child node of the root Cgroup node.
    • A subsystem can only be attached to a hierarchy
    • One hierarchy can attach multiple subsystem
    • A process can be a member of more than one cgroup, but these cgroup must be in a different hierarchy
    • When a process fork out of a child process, the child process is in the same cgroup as the parent process, or it can be moved to other cgroup as needed.

These words do not understand the moment is not okay, we will be in the actual use of the process of understanding the relationship between them gradually.

Kernel interface:

The above describes so many cgroups structure, how to call kernel in the end to configure cgroups? It is learned that cgroups in hierarchy is a tree-like structure, kernel in order to make the configuration of cgroups more intuitive, cgroups through a virtual tree file system to do configuration, through the hierarchical directory virtual Cgroup tree, Let's take a configuration example to understand how to operate the cgroups.

  • First, we will create and mount a hierarchy (Cgroup tree):

     ~ mkdir  Cgroup-test # Create a hierarchy mount point ~ sudo mount-t cgroup-o none,name=cgroup-test cgroup-test./cgroup-test # mount a Hierarchy ~ ls./cgroup-test # After mounting we can see that the system generated some default files in this directory Cgroup.clone_children Cgroup.procs cgroup.sane_behavior notify_on_ Release release_agent tasks  

    These files are the root node cgroup configuration items in this hierarchy, respectively, which means:

    • Cgroup.clone_children  cpuset Subsystem will read this configuration file, if this value is 1 (by default, 0), the child Cgroup will inherit the Cgroup configuration of the parent cpuset. The
    • Cgroup.procs is the process group ID in the cgroup of the current node in the tree, and now we are in the root node, which will now have all the process group IDs in the system.
    • notify_on_release and release_agent are used together, notify_on_release Indicates whether release_agent is executed when the Cgroup last process exits, and release_agent is a path. It is often used as a process exit to automatically clean up cgroup that are no longer in use.
    • Tasks also represents the process ID below the Cgroup, and if a process ID is written to the tasks file, the process is added to the cgroup.
  • Then we create a two sub-cgroup in the root cgroup of the hierarchy that we just created:

     cgroup-test sudo mkdir cgroup-1 # Create child cgroup "cgroup-1" cgroup-test sudo mkdir cgroup-2 # Create Child Cgroup "Cgroup-1" Cgroup-test tree.|   --cgroup-1|   |--cgroup.clone_children|   |--cgroup.procs|   |--notify_on_release|   '--tasks|--cgroup-2|   |--cgroup.clone_children|   |--cgroup.procs|   |--notify_on_release| '--tasks|--cgroup.clone_children|--cgroup.procs|--cgroup.sane_behavior|--notify_on_release|--release_agent '-- Tasks  

    You can see that folders are created under a cgroup directory, kernel will mark the folder as Cgroup child Cgroup, and they will inherit the properties of the parent cgroup.

  • To add and move processes in Cgroup:
    A process can only exist on one cgroup node in a cgroups hierarchy, and all processes of the system are at the root node by default, and the process can be moved between cgroup nodes by simply writing the process ID to the tasks file of the Cgroup node that is moved to.

     cgroup-1 echo $$7475 cgroup-1 sudo sh -c "echo $$ >> tasks" # 将我所在的终端的进程移动到cgroup-1中 cgroup-1 cat /proc/7475/cgroup13:name=cgroup-test:/cgroup-111:perf_event:/10:cpu,cpuacct:/user.slice9:freezer:/8:blkio:/user.slice7:devices:/user.slice6:cpuset:/5:hugetlb:/4:pids:/user.slice/user-1000.slice3:memory:/user.slice2:net_cls,net_prio:/1:name=systemd:/user.slice/user-1000.slice/session-19.scope

    You can see that our current 7475 process has been added cgroup-test:/cgroup-1 .

  • Restricting resources for processes in Cgroup through subsystem
    We created hierarchy, but this hierarchy is not associated with any subsystem, so there is no way to restrict the resource usage of the process through the cgroup in that hierarchy. In fact, the system has already created a default hierarchy for each subsystem, such as the memory hierarchy:

      ~ mount | grep memorycgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory,nsroot=/)

    Can see, in the /sys/fs/cgroup/memory directory is hanging in the memory subsystem hierarchy. Below, we will restrict the memory occupied by the process by creating Cgroup in this hierarchy:

     memory stress --vm-bytes 200m --vm-keep -m 1 # 首先,我们不做限制启动一个占用内存的stress进程 memory sudo mkdir test-limit-memory && cd test-limit-memory # 创建一个cgroup test-limit-memory sudo sh -c "echo "100m" > memory.limit_in_bytes" sudo sh -c "echo "100m" > memory.limit_in_bytes" # 设置最大cgroup最大内存占用为100m test-limit-memory sudo sh -c "echo $$ > tasks" # 将当前进程移动到这个cgroup中 test-limit-memory stress --vm-bytes 200m --vm-keep -m 1 # 再次运行占用内存200m的的stress进程

    The results of the operation are as follows (through top monitoring):

    PID  PPID     TIME+ %CPU %MEM  PR  NI S    VIRT    RES   UID COMMAND8336  8335   0:08.23 99.0 10.0  20   0 R  212284 205060  1000 stress8335  7475   0:00.00  0.0  0.0  20   0 S    7480    876  1000 stressPID  PPID     TIME+ %CPU %MEM  PR  NI S    VIRT    RES   UID COMMAND8310  8309   0:01.17  7.6  5.0  20   0 R  212284 102056  1000 stress8309  7475   0:00.00  0.0  0.0  20   0 S    7480    796  1000 stress

    As you can see through Cgroup, we successfully limited the maximum memory footprint of the stress process to 100m.

    How Docker uses the Cgroups:

    We know that Docker is the resource restriction and monitoring of containers through cgroups, and we see below a real container example of how Docker is configured for cgroups:

     ~ # docker run -m 设置内存限制 ~ sudo docker run -itd -m  128m ubuntu957459145e9092618837cf94a1cb356e206f2f0da560b40cb31035e442d3df11 ~ # docker会为每个容器在系统的hierarchy中创建cgroup ~ cd /sys/fs/cgroup/memory/docker/957459145e9092618837cf94a1cb356e206f2f0da560b40cb31035e442d3df11  957459145e9092618837cf94a1cb356e206f2f0da560b40cb31035e442d3df11 # 查看cgroup的内存限制 957459145e9092618837cf94a1cb356e206f2f0da560b40cb31035e442d3df11 cat memory.limit_in_bytes134217728 957459145e9092618837cf94a1cb356e206f2f0da560b40cb31035e442d3df11 # 查看cgroup中进程所使用的内存大小 957459145e9092618837cf94a1cb356e206f2f0da560b40cb31035e442d3df11 cat memory.usage_in_bytes430080

    You can see Docker by creating cgroup for each container and configuring resource throttling and resource monitoring through Cgroup.

Using the go language to restrict the resources of containers through Cgroup

Below we add the cgroup limit to the container in the previous section, the following demo implements the function of restricting the memory of the container:

Package Mainimport ("os/exec" "Path" "OS" "FMT" "Io/ioutil" "Syscall" "StrConv") const CGROUPMEMORYHI Erarchymount = "/sys/fs/cgroup/memory" Func main () {if OS. Args[0] = = "/proc/self/exe" {//container process FMT. Printf ("Current PID%d", Syscall. Getpid ()) fmt. Println () cmd: = Exec.command ("sh", "-C", ' stress--vm-bytes 200m--vm-keep-m 1 ') cmd. Sysprocattr = &syscall. sysprocattr{} cmd. Stdin = OS. Stdin cmd. Stdout = OS. Stdout cmd. Stderr = OS. Stderr If err: = cmd. Run (); Err! = Nil {fmt. PRINTLN (ERR) OS. Exit (1)}} cmd: = Exec.command ("/proc/self/exe") cmd. Sysprocattr = &syscall. sysprocattr{Cloneflags:syscall. clone_newuts | Syscall. Clone_newpid | Syscall. Clone_newns,} cmd. Stdin = OS. Stdin cmd. Stdout = OS. Stdout cmd. Stderr = OS. Stderr If err: = cmd. Start (); Err! = Nil {fmt. Println ("ERROR", err) OS.     Exit (1)} else {   Get the fork out process map in the external namespace of the PID FMT. Printf ("%v", cmd. PROCESS.PID)//Create Cgroup OS on the system by default creating a hierarchy attached to memory subsystem. Mkdir (path. Join (Cgroupmemoryhierarchymount, "Testmemorylimit"), 0755)//Add the container process to this cgroup ioutil. WriteFile (path. Join (Cgroupmemoryhierarchymount, "Testmemorylimit", "Tasks"), []byte (StrConv. Itoa (cmd. process.pid)), 0644)//Limit the Cgroup process to use Ioutil. WriteFile (path. Join (Cgroupmemoryhierarchymount, "Testmemorylimit", "memory.limit_in_bytes"), []byte ("100m"), 0644)} cmd. Process.wait ()}

By configuring the Cgroups virtual file system, we have limited the memory footprint of the stress process to the container 100m .

 PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND10861 root      20   0  212284 102464    212 R  6.2  5.0   0:01.13 stress

Docker's Linux Cgroups

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.