Dockone technology Sharing (28): Analysis of OCI Standard and Runc principle

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.
"Editor's words" This share first to the OCI standard interpretation. Then from the source level of the Runc container operating principle of interpretation. Explains how Runc uses a given configuration file to run a container. Finally, the configuration and principle of thermal migration are introduced.

Over the past two years, with the development of Internet and container technology, almost all of the IT vendors and cloud service providers have started to adopt container technology-based solutions, and container-related organizations have mushroomed. Therefore, in order to ensure the container's mobility, the container format and the establishment of runtime standards is particularly important.

As a result, the Linux Foundation established the OCI (open Container Initiative) organization in June 2015 to develop an open industrialization standard around container formats and runtimes. The organization was supported by a range of cloud computing vendors, including Google, Microsoft, Amazon, and Huawei.

1. What is the container format standard?

The purpose of formulating standards for container formats is generally not tied to the superstructure, such as specific clients, orchestration stacks, etc., and is not bound by a particular vendor or project, i.e. not limited to a particular operating system, hardware, CPU architecture, public cloud, etc.

The standard is currently maintained and developed by Libcontainer and APPC project leaders (maintainer), whose specification documents are maintained on GitHub as a project.

1.1 Purpose of container standardization

The purpose of the standardized container is divided into the following five articles.
    1. Standardization of Operations: container standardization operations include creating, starting, stopping containers using standard containers, copying and creating container snapshots using standard file system tools, and downloading and uploading using standardized network tools.
    2. Content-Independent: content-independent means that the container standard operation can produce the same effect regardless of the specific container content. Containers can be uploaded and started in the same way, whether it is a PHP application or a MySQL database service.
    3. Infrastructure-agnostic: whether it's an individual laptop or AWS S3, or OpenStack, or any other infrastructure, it's all about supporting containers.
    4. Tailor-made for automation: the establishment of a uniform standard of the container, is the operation of the content of irrelevant, platform-independent one of the fundamental purpose, is to enable the container operation of the entire platform automation.
    5. Industrial-grade Delivery: The goal of setting a container standard is to make software distribution possible to achieve industrial-grade delivery.


1.2 Container standard Package (bundle) and configuration

A standard container package should contain at least three pieces of part:

Config.json: A basic configuration file that includes specific information related to the host's standalone and application, such as security permissions, environment variables, parameters, and so on. Specific as follows:
    1. Container format version
    2. Rootfs path and whether it is read-only
    3. Various file mount points and the corresponding container mount directory (This configuration information must be consistent with the Runtime.json configuration)
    4. Initial process configuration information, including whether the terminal is bound, the working directory running the executable, environment variable configuration, executable and execution parameters, UID, GID, and additional GID, hostname, low-level operating system, and CPU architecture information to be added.


Runtime.json: A runtime configuration file that contains runtime-related information about the host, such as memory limits, local device access permissions, mount points, and so on. In addition to the above configuration information, the run-time configuration file provides a "hook (hooks)" feature that allows you to execute some custom scripts before and after the container runs. The hooks configuration contains the execution script path, parameters, environment variables, and so on.

rootfs/: Root file system directory containing the necessary environmental dependencies for container execution, such as/bin,/Var,/lib,/dev,/usr, and the corresponding files. The Rootfs directory must exist at the top of the container directory with the Config.json file that contains configuration information.

1.3 Container runtime and life cycle

The container standard format also requires the container to persist the state of its runtime to disk so that it can be used and interpreted by other external tools. The run-time state is stored in JSON-encoded format. It is recommended that the JSON file for runtime status be stored in the temporary file system so that it will be automatically removed after the system restarts.

Linux kernel-based operating system, this information should be stored uniformly in the/run/opencontainer/containers directory, under the directory structure under the Container ID folder (/run/opencontainer/containers/< Containerid>/state.json) to store the status information of the container and update it in real time. With this default container state information storage location, the external application can easily find all the running containers on the system.

The specific information contained in the State.json file needs to be:
    • Version information: The specific version number that holds the OCI standard.
    • Container ID: Typically a hash value or an easy-to-read string. The container ID is added to the State.json file in order to facilitate the previously mentioned run-time hooks simply loading State.json to locate the container, then detecting State.json, discovering that the file is missing, and then performing the corresponding predefined script operation.
    • PID: The process number of the first process running in the container on a host.
    • Container file directory: The directory where the container rootfs and the corresponding configuration are stored. External programs can locate the container file directory on the host by simply reading the State.json.
      A standard container life cycle should contain three basic processes.
    • Container creation: Create various content including file system, namespaces, cgroups, user rights, and so on.
    • Start of the container process: Runs the container process, the executable file of the process is defined in the Config.json, and the args entry.
    • Container pause: The container is actually a process that can be shut down by an external program (kill), and then the container standard specification should contain a capture of the container pause signal, and do the appropriate resource recovery processing to avoid the orphan process.


1.4 Specific implementations based on the Open container Format (OCF) standard

From the above points, the Open container specification of the format requirements are very loose, it does not limit the specific implementation technology and not limit the corresponding framework, has been based on the implementation of the OCF, I believe that more and more projects will soon appear.

The container runtime Opencontainers/runc, that is, the Runc project mentioned in this article, is the reference standard of the later people.

When the virtual running is HYPERHQ/RUNV, the Open container specification is implemented based on hypervisor technology.

Test HUAWEI-OPENLAB/OCT is based on the Open container specification of the test framework.

2. Runc working principle and realization mode

The change of 2.1 runc from Libcontainer

The predecessor of Runc was actually the evolution of Docker's Libcontainer project. Runc is actually the Libcontainer with a lightweight client.

Essentially, a container is an execution environment that provides a shared kernel with a host system but is isolated from other process resources in the system. Docker "isolates" the above execution environment by calling the Libcontainer package to manage and assign namespaces, cgroups, capabilities, and file systems. Similarly, Runc is also a call to the Libcontainer package, removing the advanced features such as mirroring and volume that Docker contains, and achieving the OCF-compliant container management implementation in the simplest and most simple way.

On the whole, from the Libcontainer project to the Runc project so far, its functions and characteristics have not changed much, specifically the following points.
    1. Remove the original nsinit, put it outside, the command name to Runc, the same use CLI.GO implementation, at a glance.
    2. A configuration file that mixes all of the original information in accordance with the Open container standard is split into Config.json and Runtime.json two.
    3. Added a hook script function that executes before and after a container is set in accordance with the open container standard.
    4. The Runc kill command is added to the INIT process that sends a SIG_KILL signal to the specified container ID, compared to the original nsinit period directive.


Overall, the features that Runc wants to include are:
    1. Support for all Linux namespaces, including user namespaces. Currently user namespaces is not included.
    2. Supports all existing security-related features on Linux systems, including SELinux, Apparmor, Seccomp, cgroups, capability drop, pivot_root, Uid/gid dropping, and more. Support for the above features is now complete.
    3. Supports container thermal migration, implemented through CRIU technology. The current functionality has been implemented, but it can also create problems.
    4. Support for containers running on Windows 10 platforms, developed by Microsoft engineers. Currently only Linux platforms are supported.
    5. Support for ARM, Power, and SPARC hardware architectures will be supported by arm, Intel, Qualcomm, IBM and the entire hardware manufacturer ecosystem.
    6. The program supports cutting-edge hardware features such as DPDK, SR-Iov, TPM, secure enclave, and more.
    7. High-performance adaptation optimization in production environments, contributed by Google engineers based on their experience of container deployment in a production environment.
    8. As a formal real and comprehensive concrete standard exists!


How does the 2.2 Runc start the container?

From the Open container standard we have defined two copies of the container's configuration file and a dependency package, which is runc to start a container. First we follow the official steps to operate.

Runc runtime needs to have ROOTFS, the simplest is that you have already installed the local Docker, through docker pull busyboxDownload a basic image and then pass the
docker export $(docker create busybox) > busybox.tarExport the container-mirrored Rootfs file compression package named Busybox.tar. Then unzip it to the Rootfs directory, mkdir rootfstar -C rootfs -xf busybox.tar
At this point we have the OCF standard Rootfs directory, it should be explained that we use Docker just to get the Rootfs directory convenience, Runc run itself does not rely on Docker.

And then you need to config.jsonAnd runtime.jsonUse runc specYou can generate a standard list of config.jsonAnd runtime.jsonConfiguration files, of course you can also follow the format to write your own.

If you have not installed Runc, then you need to follow the steps below to install, currently runc only support the Linux platform temporarily.
# Create a ' github.com/opencontainers ' in your GOPATH/SRCCD github.com/opencontainersgit clone https://github.com/ OPENCONTAINERS/RUNCCD Runcmakesudo make Install

Final execution runc startYou've started a container.

2.3 Runc Start Operation principle

As mentioned above, Runc is Libcontainer is wrapped in a thin layer of the CLI. The CLI is a development package implemented to quickly develop a command-line application for the Go language, which can handle such things as sub-command definitions, flag-bit definitions and setup help information, and so on. And the CLI is also an open source project hosted on Git, with the address: github.com/codegangsta/cli.
From the source point of view, analysis Runc start execution process, the entire analysis process, such as:

2.3.1. Everything starts with the main () function

The entire program first executes the main () function in Main.go, in which the program prescribes the individual subcommands, parameters, version numbers, and help information for Runc through the CLI package. The program then invokes the corresponding handler function through the user-entered subcommand, where it calls the Startcontainer () function in Start.go.

2.3.2. Creating logical containers container and logical processes

The so-called logical container container and logic processes are not the containers and processes that are actually running, but the structures defined in the Libcontainer. The logical container container contains various configuration information such as namespace, cgroups, device, and Mountpoint. The logical process includes the instructions to be run in the container with its parameters and environment variables.

For runc, the container definition requires only one, and the different containers are just the contents of the instance (attributes and parameters). For Libcontainer, because it needs to deal with the underlying, different platforms need to create completely heterogeneous "logical container objects" (such as Linux containers and Windows containers), which explains why "Factory mode" is used here. : Future Libcontainer can support the implementation of various types of containers on more platforms without having to change the calling interface.

The following explains the process of creating logical container container and logical processes.

In the Startcontainer () function, the program first loads the *.json into a struct config that can be used by Libcontainer. Then use config as the parameter to invoke. Libcontainer. New () generates the factory factory used to produce container. Call Factory again. Create (config), a logical container container that contains the config is generated. The next call to Newprocess (config) is to populate the process structure with information about the commands to be run in the container in config, which is the logical process. Use container. Start (process) to launch the logical container.

2.3.3. Starting the Logical container container

Runc calls Start (), the start () function is in Libcontainer/container_linux.go, and the main task is to call Newparentprocess () To generate Parentprocess instances (structs) and pipelines for Runc to communicate with each other in the container init process.

In parentprocess instances, there is a very important field that is the cmd in addition to the pipelines and various basic configurations that record future communication with the process in the container.
The cmd field is a struct defined in the Os/exec package. The Os/exec package is primarily used to create a new process and execute the specified command in this process. Developers can import the Os/exec package in the project, then populate the CMD structure, the path and program name of the program to be run, the parameters required by the program, environment variables, various operating system-specific properties and extended file descriptors.

In Runc, the program populates the Application path field of CMD to/proc/self/exe (that is, the application itself, Runc). The parameter field, args, is populated with Init, which indicates that the container is initialized. The Sysprocattr field is populated with properties such as namespace that are required to be enabled for various runc.

Then call Parentprocess.cmd.Start () to start the Init process in the physical container. The process number of the INIT process in the physical container is then added to the Cgroup control group to implement resource control for the process within the container. The configuration parameters are then routed through the pipeline to the INIT process. Finally, the pipeline waits for the INIT process to complete all initialization work based on the above configuration, or exits with an error.

2.3.4. Configuration and creation of physical containers

The init process in the container first calls the Startinitialization () function, which receives various configuration parameters from the parent process through the pipeline. The container is then configured as follows:
    1. If specified by the user, the Init process is added to its specified namespace.
    2. Sets the session ID of the process.
    3. Initializes the network device.
    4. Mount the file system under the specified directory and switch the root directory to the newly mounted file system. Set hostname to load profile information.
    5. Finally, the exec system call is used to execute the program specified by the user to run in the container.


3. Introduction to the configuration and principle of thermal migration

3.1 Introduction to thermal migration

The so-called heat transfer is a container for the checkpoint operation, and a series of files, using this series of files can be on the local or other host for the restore work of the container. Currently, Criu is used as a tool for thermal migration in Runc, and the checkpoint and restore functions of the container are implemented. The brief process is as shown.

Brief introduction of heat transfer principle of 3.2 Runc

The main task of hot migration in Runc is to call Criu (Checkpoint and Restore in userspace) to complete. The Ciru is responsible for freezing the process and storing it as a series of files on the hard disk. and is responsible for using these files to restore this frozen process.

Runc uses swrk mode to invoke Criu. This pattern is Criu the other two modes of the CLI and RPC, allowing the user to run Criu as if they were using a command-line tool, and to accept requests that the user calls remotely.

Runc mainly through the following two steps to complete the heat transfer work.

    1. Generates a container, State.json, or configuration file *.json to generate a container struct.

    2. Using the SWRK mode call Criu,runc first collects and organizes information about the container to be checkpoint or restore operations, and fills in the structure to be sent to Criu in swrk mode. The main contents of the structure are as follows:
      Req: = &criurpc. criureq{
      Type: &t,//c or R
      Opts: &rpcopts,//criu related parameters
      }

      Where the field t specifies whether the request is for a checkpoint operation or a restore operation, the various user-specified options in field rpcopts and the parameters required for the Criu to run.


Then through the syscall. Socketpair () Creates a communication pipeline between Runc (Criuclient) and Ciru (Criuserver). Then use the Os/exec package in the Go language to start the Criu in swrk mode. Send request to Criuserver again via Criuclient. Finally, the execution result can be received through criuclient.

3.3 Configuration and use of Runc thermal migration under current release

Because the current version of Criu is not perfect and does not fully support a small subset of features in Runc, there are some modifications to the configuration file that need to be made during the thermal migration process. The content and reasons for the specific changes are as follows:
    • Because Criu does not support Seccomp, it is necessary to empty the relevant contents of the Config.json file about Seccomp.
    • Because the Criu does not have an external terminal, the value of terminal in the Config.json file needs to be set to false.
    • Because the Criu requirement runc the file system that is mounted is readable, the read-write nature of the file system in the Config.json file is set to readable.


The partial configuration is as shown.

By properly installing Criu and its associated dependencies and making the above modifications to Config.json, you can use the Runc built-in commands to heat-migrate the containers.

Q&a

Q: To the container heat transfer less listening, but also more curious, want to ask the heat migration when the container depends on the file system how to solve, is the direct copy to the target machine, can use the public network file system to solve it, and the target machine have any special requirements?

A: Rely on file systems to be consistent. It is best to pre-provision the filesystem in advance on the target machine. Some containers involve devices that cannot be migrated, and so the container cannot be migrated. The target machine has the same number of OS bits.
Q: What is the difference between Runc and Docker, and what are the advantages?

A:runc's advantages are reflected in the lightweight and standardized, which is not Docker. And the huge system of Docker is not runc.
can q:dumpfiles be persisted, or is it just for a one-time heat transfer exchange?

A: A hard drive is available and can be used multiple times.
Q:criu Thermal Migration Network configuration information will be saved, and there is probably the thermal migration delay is how much?

A: As long as the set parameters will be saved, heat transfer delay This depends on your supporting facilities and supporting software.
Q: I don't know about this heat transfer. Want to ask, this heat migration is mainly the migration configuration Rootfs, can migrate the current container state, such as the ongoing calculation?

A: Yes, this will have a corresponding rootfs on the corresponding machine to be migrated. State or anything can be migrated.
Q: Does that mean I can use the Criu tool to solidify a container, similar to a snapshot of a virtual machine, to recover or migrate at some point in the future?

A: Yes, that's it.
===========================
The above content is organized according to the October 27, 2015 Night Group sharing content. Share Kao Sanglin, a graduate of the SEL lab at Zhejiang University, currently works in research and development at the Cloud Platform team. With a deep research and two development experience in PAAs, Docker, big Data and mainstream open source cloud computing, the team is now working with the community to contribute to some of the technical articles in the hope that it will help readers. Dockone Weekly will organize the technology to share, welcome interested students add: LIYINGJIESX, into group participation, you want to listen to the topic can give us a message.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.