Linux namespace details of Docker Foundation technology

Source: Internet
Author: User
Tags message queue

Docker is the product of "new bottles of old wine," relying on the Linux kernel technology chroot, namespace and Cgroup. This article first look at namespace technology.

Docker, like virtual machine technology, implements resource isolation from the operating system level, which is essentially a process on a host (container process), so resource isolation is primarily the isolation of process resources. The core technology for resource isolation is Linux namespace. This technique is consistent with the design of many language namespaces (e.g., C + + namespace).

Isolation means that multiple lightweight kernels (container processes) can be abstracted, which can take full advantage of the host's resources, which can be enjoyed by the host's resource container processes, but are isolated from each other, and similarly, the use of resources between different container processes is also isolated, so that the same operations are performed between each other. will not interfere with each other, security is guaranteed.

To support these features, Linux namespace implements 6 resource isolation, which basically covers the operational features of a small operating system, including host name, user rights, file system, network, process number, and interprocess communication.

These 6 resource isolates correspond to 6 system calls, which are completed by invoking the Clone () function by passing in the parameters in the previous table.

int clone(int (*child_func)(void *), void *child_stack, int flags, void *arg);

The Clone () function is not a stranger, it is a more general implementation of the fork () function, by calling clone (), and passing in the parameters that need to isolate the resource, you can create a container (isolate what we control).

A container process can also clone () out of a container process, which is a container's nesting.

If you want to see what namespace isolation is under the current process, you can view the file/proc/[pid]/ns (Note: This method is limited to the kernel after version 3.8).

As you can see, each item namespace comes with a number that uniquely identifies the namespace, and if two processes point to the same namespace number, they are the same namespace. Also note that there is a cgroup, this namespace is 4.6 version of the kernel is supported. Docker's support for it is not yet high. So we don't think about it for the time being.

The following is a simple code to achieve the isolation of 6 kinds of namespace, so that everyone has an intuitive impression.

UTS namespace

UTS namespace provides isolation of host names and domain names so that each container has a separate hostname and domain name, which can be treated as a separate node on the network, and the naming of hostname in the container will not have any impact on the host.

First, let's look at the overall code skeleton:

#define _GNU_SOURCE#include <sys/types.h>#include <sys/wait.h>#include <stdio.h>#include <sched.h>#include <signal.h>#include <unistd.h>#define STACK_SIZE (1024 * 1024)static char container_stack[STACK_SIZE];char* const container_args[] = {   "/bin/bash",   NULL};// 容器进程运行的程序主函数int container_main(void *args){   printf("在容器进程中!\n");   execv(container_args[0], container_args); // 执行/bin/bash   return 1;}int main(int args, char *argv[]){   printf("程序开始\n");   // clone 容器进程   int container_pid = clone(container_main, container_stack + STACK_SIZE, SIGCHLD, NULL);   // 等待容器进程结束   waitpid(container_pid, NULL, 0);   return 0;}

The program skeleton calls the Clone () function to implement the creation of the child process, and defines the execution function of the child process, the second parameter of clone () specifies the size of the stack space that the child process is running, and the third parameter is the key to creating a different namespace isolation.

For UTS namespace, the incoming clone_newuts is as follows:

int container_pid = clone(container_main, container_stack + STACK_SIZE, SIGCHLD | CLONE_NEWUTS, NULL);

In order to be able to see changes in the container and the host name outside the container, we add the child process execution function:

sethostname("container", 9);

The final run can see the following effects:

IPC namespace

The IPC namespace enables inter-process communication isolation, including several common inter-process communication mechanisms such as semaphores, Message Queuing, and shared memory. We know that to complete the IPC, we need to apply for a globally unique identifier, the IPC identifier, so the IPC resource isolation is mostly done by isolating the IPC identifier.

Similarly, the code modification only needs to add the parameter CLONE_NEWIPC, as follows:

int container_pid = clone(container_main, container_stack + STACK_SIZE, SIGCHLD | CLONE_NEWUTS | CLONE_NEWIPC, NULL);

To see the change, first set up a message queue on the host:

Then run the program, go into the container to view the IPC, did not find the originally established IPC logo, reached the IPC isolation.

PID namespace

The PID namespace completes the process number isolation, and also adds the Clone_newpid parameter to clone (), such as:

int container_pid = clone(container_main, container_stack + STACK_SIZE, SIGCHLD | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWPID, NULL);

The effect is as follows, the Echo $$ output the PID number of the shell, has changed.

But there are no changes to commands like Ps/top:

For specific reasons and what's next (including Mount Namespace,network namespace and user namespace), you can focus on my public reading, where the reading experience will be better.

PS: Small partners interested in cloud computing can follow my public number: Aclouddeveloper, focus on cloud computing, and insist on sharing dry goods.

Linux namespace details of Docker Foundation technology

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.