Basic Docker technology: Linux Namespace (I)

Source: Internet
Author: User
Tags docker run

Basic Docker technology: Linux Namespace (I)
GuideDocker is the most popular technology nowadays. Many people think that Docker is a new technology. In fact, Docker is not a new thing except for its programming language to use go, that is, a New bottle of Old wine, The New "Old Stuff ". Docker and Docker are derived from a lot of cool technologies. I will use several articles to introduce these technologies to you. I hope you can build a stack version of docker by yourself through these articles. Start with Linux Namespace.

Introduction

Linux Namespace is a kernel-level environment isolation method provided by Linux. I don't know if you still remember that there was a chroot System Call in Unix a long time ago (by modifying the root directory to put the user under a specific directory ), chroot provides a simple Isolation Mode: The file system inside the chroot cannot access external content. Linux Namespace provides isolation mechanisms for UTS, IPC, mount, PID, network, and User.

For example, we all know that the PID of the Super father process in Linux is 1, so like chroot, if we can take the user's process space jail to a process branch, as with chroot, the super parent process's PID shown in the process below is 1, so the resource isolation effect can be achieved (processes in different PID namespaces cannot see each other)

Linux Namespace has the following types,

Mainly three system calls

Mainly three system calls
Clone ()-implement the System Call of the thread to create a new process and isolate it by designing the above parameters.
Wagunshare ()-disconnects a process from a namespace
Wagsetns ()-Add a process to a namespace
Unshare () and setns () are both relatively simple. You can use man on your own. I will not talk about it here.

Let's take a look at some examples (the following test program is best run in Linux kernel 3.8 or later, I use ubuntu 14.04 ).

Clone () system call

First, let's look at the example of the simplest clone () system call. (later, our program will be modified based on this program ):

# Define _ GNU_SOURCE # include/* defines a stack for clone. The stack size is 1 MB */# define STACK_SIZE (1024*1024) static char container_stack [STACK_SIZE]; char * const container_args [] = {"/bin/bash", NULL}; int container_main (void * arg) {printf ("Container-inside the container! /N ");/* execute a shell directly to check whether the resources in the process space are isolated */execv (container_args [0], container_args ); printf ("Something's wrong! /N "); return 1;} int main () {printf (" Parent-start a container! /N ");/* call the clone function, where a function exists and there is a stack space (why is the tail pointer passed because the stack is reversed) */int container_pid = clone (container_main, container_stack + STACK_SIZE, SIGCHLD, NULL);/* wait until the sub-process ends */waitpid (container_pid, NULL, 0 ); printf ("Parent-container stopped! /N "); return 0 ;}

From the above program, we can see that this is basically the same gameplay as pthread. However, for the above programs, there is no difference in the process space of the Parent and Child processes, and the Child processes that the parent process can access can also.

Next, let's take a few examples to see what Linux Namespace looks like.

UTS Namespace

In the following code, I omitted the header files and data structure definitions above, only the most important part.

Int container_main (void * arg) {printf ("Container-inside the container! /N "); sethostname (" container ", 10);/* Set hostname */execv (container_args [0], container_args); printf (" Something's wrong! /N "); return 1;} int main () {printf (" Parent-start a container! /N "); int container_pid = clone (container_main, container_stack + STACK_SIZE, CLONE_NEWUTS | SIGCHLD, NULL);/* enables CLONE_NEWUTS Namespace isolation */waitpid (container_pid, NULL, 0 ); printf ("Parent-container stopped! /N "); return 0 ;}

Run the preceding program and you will find that the hostname of the sub-process is changed to container.

hchen@ubuntu:~$ sudo ./utsParent - start a container!Container - inside the container!root@container:~# hostnamecontainerroot@container:~# uname -ncontainer
IPC Namespace

The full name of IPC is Inter-Process Communication, which is a method for Inter-Process Communication in Unix/Linux. IPC provides methods such as shared memory, semaphore, and message queue. Therefore, in order to isolate, we also need to isolate IPC so that processes in the same Namespace can communicate with each other. If you are familiar with the principle of IPC, you will know that IPC requires a global ID, which means that our Namespace needs to be isolated from this ID, it cannot be seen by other Namespace processes.

To enable IPC isolation, you only need to add the CLONE_NEWIPC parameter when calling clone.

int container_pid = clone(container_main, container_stack+STACK_SIZE,            CLONE_NEWUTS | CLONE_NEWIPC | SIGCHLD, NULL);

First, create an IPC Queue (as shown below, the global Queue ID is 0)

hchen@ubuntu:~$ ipcmk -QMessage queue id: 0hchen@ubuntu:~$ ipcs -q------ Message Queues --------key        msqid      owner      perms      used-bytes   messages0xd0d56eb2 0          hchen      644        0            0

If we run a program without CLONE_NEWIPC, we can see that the fully-enabled IPC Queue can still be seen in the sub-process.

hchen@ubuntu:~$ sudo ./utsParent - start a container!Container - inside the container!root@container:~# ipcs -q------ Message Queues --------key        msqid      owner      perms      used-bytes   messages0xd0d56eb2 0          hchen      644        0            0

However, if we run the program with CLONE_NEWIPC added, we will see the following results:

root@ubuntu:~$ sudo./ipcParent - start a container!Container - inside the container!root@container:~/linux_namespace# ipcs -q------ Message Queues --------key        msqid      owner      perms      used-bytes   messages

We can see that the IPC has been isolated.

PID Namespace

We will continue to modify the above program:

Int container_main (void * arg) {/* view the PID of the sub-process, we can see that the pid of the output sub-process is 1 */printf ("Container [% 5d]-inside the container! /N ", getpid (); sethostname (" container ", 10); execv (container_args [0], container_args); printf (" Something's wrong! /N "); return 1;} int main () {printf (" Parent [% 5d]-start a container! /N ", getpid ();/* enable PID namespace-CLONE_NEWPID */int container_pid = clone (container_main, container_stack + STACK_SIZE, CLONE_NEWUTS | CLONE_NEWPID | SIGCHLD, NULL ); waitpid (container_pid, NULL, 0); printf ("Parent-container stopped! /N "); return 0 ;}

The running result is as follows (we can see that the pid of the sub-process is 1 ):

hchen@ubuntu:~$ sudo ./pidParent [ 3474] - start a container!Container [    1] - inside the container!root@container:~# echo $$

PID is 1. In traditional UNIX systems, the process with PID 1 is init, which has a very special position. As the parent process of all processes, it has many privileges (such as blocking signals). In addition, it will check the status of all processes. We know that, if a child process is out of the parent process (the parent process does not have wait), init recycles resources and ends the child process. Therefore, to isolate the process space, first create a process with a PID of 1. It is best to change the PID of the sub-process to 1 in the container like chroot.
However, we will find that we can still see all processes by entering ps, top, and other commands in the sub-process shell. The description is not completely isolated. This is because commands like ps and top will read the/proc file system, because the/proc file system is the same in both the parent and child processes, therefore, these commands display the same things.
Therefore, we also need to isolate the file system.

Mount Namespace

In the following routine, we have enabled mount namespace and re-mounted the/proc file system in the sub-process.

Int container_main (void * arg) {printf ("Container [% 5d]-inside the container! /N ", getpid (); sethostname (" container ", 10 ); /* re-mount the proc file system to/proc */system ("mount-t proc/proc"); execv (container_args [0], container_args ); printf ("Something's wrong! /N "); return 1;} int main () {printf (" Parent [% 5d]-start a container! /N ", getpid ();/* enable Mount Namespace-add CLONE_NEWNS parameter */int container_pid = clone (container_main, container_stack + STACK_SIZE, CLONE_NEWUTS | CLONE_NEWPID | SIGCHLD, NULL); waitpid (container_pid, NULL, 0); printf ("Parent-container stopped! /N "); return 0 ;}

The running result is as follows:

hchen@ubuntu:~$ sudo ./pid.mntParent [ 3502] - start a container!Container [    1] - inside the container!root@container:~# ps -elfF S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD4 S root         1     0  0  80   0 -  6917 wait   19:55 pts/2    00:00:00 /bin/bash0 R root        14     1  0  80   0 -  5671 -      19:56 pts/2    00:00:00 ps -elf

We can see that there are only two processes, and the pid = 1 process is our/bin/bash. We can also see that the/proc directory is much cleaner:

root@container:~# ls /proc1          dma          key-users   net            sysvipc16         driver       kmsg        pagetypeinfo   timer_listacpi       execdomains  kpagecount  partitions     timer_statsasound     fb           kpageflags  sched_debug    ttybuddyinfo  filesystems  loadavg     schedstat      uptimebus        fs           locks       scsi           versioncgroups    interrupts   mdstat      self           version_signaturecmdline    iomem        meminfo     slabinfo       vmallocinfoconsoles   ioports      misc        softirqs       vmstatcpuinfo    irq          modules     stat           zoneinfocrypto     kallsyms     mounts      swapsdevices    kcore        mpt         sysdiskstats  keys         mtrr        sysrq-trigger

We can also see that the top command in the sub-process only shows two processes.

Here, let's talk about it. After creating a mount namespace using CLONE_NEWNS, the parent process copies its own file structure to the child process. All the mount operations in the new namespace in the sub-process only affect the file system, without any external impact. In this way, strict isolation can be achieved.

You may ask, do we still need to mount some other file systems like this? Yes.

Docker's Mount Namespace

Next I will show you a "shanzhai image" that imitates the Mount Namespace of Docker.

First, we need a rootfs, that is, we need to copy the commands in the image we want to do to the directory of A rootfs. We build the following directory by imitating Linux:

hchen@ubuntu:~/rootfs$ lsbin  dev  etc  home  lib  lib64  mnt  opt  proc  root  run  sbin  sys  tmp  usr  var

Then, we need to copy some of the commands We Need To The rootfs/bin directory (the sh command must be copied in, otherwise we cannot chroot)

hchen@ubuntu:~/rootfs$ ls ./bin ./usr/bin./bin:bash   chown  gzip      less  mount       netstat  rm     tabs  tee      top       ttycat    cp     hostname  ln    mountpoint  ping     sed    tac   test     touch     umountchgrp  echo   ip        ls    mv          ps       sh     tail  timeout  tr        unamechmod  grep   kill      more  nc          pwd      sleep  tar   toe      truncate  which./usr/bin:awk  env  groups  head  id  mesg  sort  strace  tail  top  uniq  vi  wc  xargs

Note: You can use the ldd command to copy the so files related to these commands to the corresponding directory:

hchen@ubuntu:~/rootfs/bin$ ldd bash    linux-vdso.so.1 =>  (0x00007fffd33fc000)    libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo.so.5 (0x00007f4bd42c2000)    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f4bd40be000)    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4bd3cf8000)    /lib64/ld-linux-x86-64.so.2 (0x00007f4bd4504000)

Below are some so files in my rootfs:

hchen@ubuntu:~/rootfs$ ls ./lib64 ./lib/x86_64-linux-gnu/./lib64:ld-linux-x86-64.so.2./lib/x86_64-linux-gnu/:libacl.so.1      libmemusage.so         libnss_files-2.19.so    libpython3.4m.so.1libacl.so.1.1.0  libmount.so.1          libnss_files.so.2       libpython3.4m.so.1.0libattr.so.1     libmount.so.1.1.0      libnss_hesiod-2.19.so   libresolv-2.19.solibblkid.so.1    libm.so.6              libnss_hesiod.so.2      libresolv.so.2libc-2.19.so     libncurses.so.5        libnss_nis-2.19.so      libselinux.so.1libcap.a         libncurses.so.5.9      libnss_nisplus-2.19.so  libtinfo.so.5libcap.so        libncursesw.so.5       libnss_nisplus.so.2     libtinfo.so.5.9libcap.so.2      libncursesw.so.5.9     libnss_nis.so.2         libutil-2.19.solibcap.so.2.24   libnsl-2.19.so         libpcre.so.3            libutil.so.1libc.so.6        libnsl.so.1            libprocps.so.3          libuuid.so.1libdl-2.19.so    libnss_compat-2.19.so  libpthread-2.19.so      libz.so.1libdl.so.2       libnss_compat.so.2     libpthread.so.0libgpm.so.2      libnss_dns-2.19.so     libpython2.7.so.1libm-2.19.so     libnss_dns.so.2        libpython2.7.so.1.0

Configuration files that these commands depend on:

hchen@ubuntu:~/rootfs$ ls ./etcbash.bashrc  group  hostname  hosts  ld.so.cache  nsswitch.conf  passwd  profileresolv.conf  shadow

You will now say, I rely on, some configurations I want to set for the container when it starts, rather than hard code in the image. For example,/etc/hosts,/etc/hostname, And the/etc/resolv. conf file of DNS. Okay. Then, outside rootfs, we create another conf directory and put these files in this directory.

hchen@ubuntu:~$ ls ./confhostname     hosts     resolv.conf

In this way, our parent process can dynamically set the configurations of these files required by the container, and then mount them into the container. In this way, the configurations in the container image are more flexible.

Well, we finally got to our program.

# Define _ GNU_SOURCE # include # define STACK_SIZE (1024*1024) static char container_stack [STACK_SIZE]; char * const container_args [] = {"/bin/bash", "-l", NULL}; int container_main (void * arg) {printf ("Container [% 5d]-inside the container! /N ", getpid (); // set hostname sethostname (" container ", 10 ); // remount "/proc" to make sure the "top" and "ps" show container's information if (mount ("proc", "rootfs/proc ", "proc", 0, NULL )! = 0) {perror ("proc");} if (mount ("sysfs", "rootfs/sys", "sysfs", 0, NULL )! = 0) {perror ("sys");} if (mount ("none", "rootfs/tmp", "tmpfs", 0, NULL )! = 0) {perror ("tmp");} if (mount ("udev", "rootfs/dev", "devtmpfs", 0, NULL )! = 0) {perror ("dev");} if (mount ("devpts", "rootfs/dev/pts", "devpts", 0, NULL )! = 0) {perror ("dev/pts");} if (mount ("shm", "rootfs/dev/shm", "tmpfs", 0, NULL )! = 0) {perror ("dev/shm");} if (mount ("tmpfs", "rootfs/run", "tmpfs", 0, NULL )! = 0) {perror ("run");}/** copy the Docker configuration file from the external container * You can view: /var/lib/docker/containers // directory, * You will see the files of docker. */If (mount ("conf/hosts", "rootfs/etc/hosts", "none", MS_BIND, NULL )! = 0 | mount ("conf/hostname", "rootfs/etc/hostname", "none", MS_BIND, NULL )! = 0 | mount ("conf/resolv. conf", "rootfs/etc/resolv. conf", "none", MS_BIND, NULL )! = 0) {perror ("conf");}/* imitates-v in the docker run Command, -- volume = [] What the parameter does */if (mount ("/tmp/t1", "rootfs/mnt", "none", MS_BIND, NULL )! = 0) {perror ("mnt");}/* chroot isolation directory */if (chdir ("./rootfs ")! = 0 | chroot ("./")! = 0) {perror ("chdir/chroot");} execv (container_args [0], container_args); perror ("exec"); printf ("Something's wrong! /N "); return 1;} int main () {printf (" Parent [% 5d]-start a container! /N ", getpid (); int container_pid = clone (container_main, container_stack + STACK_SIZE, CLONE_NEWUTS | counter | CLONE_NEWPID | counter | SIGCHLD, NULL); waitpid (container_pid, NULL, 0); printf ("Parent-container stopped! /N "); return 0 ;}

When sudo runs the above program, you will see the following mount information and a so-called "image ":

hchen@ubuntu:~$ sudo ./mountParent [ 4517] - start a container!Container [    1] - inside the container!root@container:/# mountproc on /proc type proc (rw,relatime)sysfs on /sys type sysfs (rw,relatime)none on /tmp type tmpfs (rw,relatime)udev on /dev type devtmpfs (rw,relatime,size=493976k,nr_inodes=123494,mode=755)devpts on /dev/pts type devpts (rw,relatime,mode=600,ptmxmode=000)tmpfs on /run type tmpfs (rw,relatime)/dev/disk/by-uuid/18086e3b-d805-4515-9e91-7efb2fe5c0e2 on /etc/hosts type ext4 (rw,relatime,errors=remount-ro,data=ordered)/dev/disk/by-uuid/18086e3b-d805-4515-9e91-7efb2fe5c0e2 on /etc/hostname type ext4 (rw,relatime,errors=remount-ro,data=ordered)/dev/disk/by-uuid/18086e3b-d805-4515-9e91-7efb2fe5c0e2 on /etc/resolv.conf type ext4 (rw,relatime,errors=remount-ro,data=ordered)root@container:/# ls /bin /usr/bin/bin:bash   chmod  echo  hostname  less  more    mv   ping  rm   sleep  tail  test     top    truncate  unamecat    chown  grep  ip        ln    mount   nc   ps    sed  tabs   tar   timeout  touch  tty       whichchgrp  cp     gzip  kill      ls    mountpoint  netstat  pwd   sh   tac    tee   toe      tr     umount/usr/bin:awk  env  groups  head  id  mesg  sort  strace  tail  top  uniq  vi  wc  xargs

For how to create a chroot directory, here is a tool called DebootstrapChroot. You can follow the link to see it (in English)

You can play the next thing on your own. I believe in your imagination. :)

Today's content will be introduced here. In the basic Docker technology: Linux Namespace (next), I will introduce you to User Namespace, Network Namespace, and other aspects of Namespace.

From: http:// OS .51cto.com/art/201609/517640.htm

Address: http://www.linuxprobe.com/docker-linux-namespace-1.html


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.