Guide |
Now the hottest technology is Docker, many people think Docker is a new technology, but in fact, Docker in addition to its programming language with go relatively new, actually it is really not a new thing, that is, a new bottle of old wine, the so-called "the Stuff". Docker and Docker derived things with a lot of cool technology, I will use a few articles to introduce these technologies to you, hope that through these articles you can build a cottage version of Docker. Start with the Linux namespace first. |
Introduction
Linux namespace is a kernel-level environment isolation method provided by Linux. Don't know if you remember a long ago Unix has a system call called chroot (by modifying the root directory to put the user into a specific directory), Chroot provides a simple isolation mode: chroot internal file system cannot access external content. On this basis, Linux namespace provides an isolation mechanism for UTS, IPC, Mount, PID, network, user, and so on.
For example, we all know that the PID of the Super Father process under Linux is 1, so, like chroot, if we can jail the user's process space to a process branch, and the PID of the super-parent process that the following process sees as Chroot, is 1, So you can achieve the effect of resource isolation (the processes in different PID namespace cannot see each other)
Linux Namespace has the following types,
mostly three system calls
Mostly three system calls
Clone () – Implements a thread's system call to create a new process and can be isolated by designing the above parameters.
? Unshare () – Leaving a process out of a namespace
? Setns () – Add a process to a namespace
Unshare () and Setns () are relatively simple, we can own man, I do not say here.
Let's take a look at some examples (the following test programs are best run on Linux kernels for more than 3.8 versions, I use Ubuntu 14.04).
Clone () system call
First, let's take a look at one of the simplest clone () system invocation examples, (later, our program will make changes based on this program):
#define _gnu_source#include #include #include #include #include #include/* Define a stack for clone, stack size 1M */#define STACK_SIZE (10 * 1024x768) static char container_stack[stack_size];char* const container_args[] = { "/bin/bash", null};int Container_main (void* Arg) { printf ("Container-inside the container!/n"); /* Execute a shell directly so we can see if the resources in the process space are quarantined /EXECV (container_args[0], Container_args); printf ("Something ' s wrong!/n"); return 1;} int main () { printf ("Parent-start a container!/n"); /* Call the Clone function, where a function is passed, and there is a stack space (why the tail pointer, because the stack is reversed) * /int container_pid = Clone (Container_main, container_stack+ Stack_size, SIGCHLD, NULL); /* Wait for the child process to end * /Waitpid (container_pid, NULL, 0); printf ("Parent-container stopped!/n"); return 0;}
From the above program, we can see that this and pthread are basically the same gameplay. However, for the above program, there is no difference in the process space of the parent-child process, and the child process can access it.
Below, let's look at a few examples of what Linux namespace is.
UTS Namespace
The following code, I omitted the above header files and data structure definition, only the most important part.
int Container_main (void* arg) { printf ("Container-inside the container!/n"); SetHostName ("container", 10); /* Set hostname * /EXECV (container_args[0], Container_args); printf ("Something ' s wrong!/n"); return 1;} int main () { printf ("Parent-start a container!/n"); int container_pid = Clone (Container_main, Container_stack+stack_size, clone_newuts | SIGCHLD, NULL); /* Enable Clone_newuts namespace Isolation * /Waitpid (container_pid, NULL, 0); printf ("Parent-container stopped!/n"); return 0;}
Running the above program you will find (requires root permission), the hostname of the child process becomes container.
[Email protected]:~$ sudo./utsparent-start a container! Container-inside the container! [Email protected]:~# hostnamecontainer[email protected]:~# Uname-ncontainer
IPC Namespace
IPC Full name inter-process communication, is a way of communication between the Unix/linux process, IPC has shared memory, Semaphore, message queue and other methods. So, in order to isolate, we also need to isolate the IPC so that only processes under the same namespace can communicate with each other. If you are familiar with the principles of IPC, you will know that IPC needs to have a global ID, that is, the global, then it means that our namespace need to isolate this ID, not to let other namespace process to see.
To start IPC isolation, we only need to add the CLONE_NEWIPC parameter when cloning is called.
int container_pid = Clone (Container_main, Container_stack+stack_size, clone_newuts | CLONE_NEWIPC | SIGCHLD, NULL);
First, we first create an IPC queue (as shown below, the global queue ID is 0)
[Email protected]:~$ ipcmk-qmessage queue Id:0[email protected]:~$ ipcs-q------Message Queues--------Key Msqid
owner perms used-bytes messages0xd0d56eb2 0 hchen 644 0 0
If we run a program without CLONE_NEWIPC, we will see that this full-boot IPC Queue is still visible in the child process.
[Email protected]:~$ sudo./utsparent-start a container! Container-inside the container! [Email protected]:~# ipcs-q------Message Queues--------Key msqid owner perms used-bytes MESSAGES0XD0D56EB2 0 hchen 644 0 0
However, if we run the CLONE_NEWIPC program, we will have the following result:
[Email protected]:~$ Sudo./ipcparent-start a container! Container-inside the container! [Email protected]:~/linux_namespace# ipcs-q------Message Queues--------Key msqid owner perms Used-bytes Messages
We can see that the IPC has been quarantined.
PID Namespace
We continue to modify the above program:
int Container_main (void* arg) {/* To view the PID of the subprocess, we can see that its output subprocess has a PID of 1 * /printf ("Container [%5d]-Inside the Containe r!/n ", Getpid ()); SetHostName ("Container", ten); EXECV (Container_args[0], Container_args); printf ("Something ' s wrong!/n"); return 1;} int main () { printf ("Parent [%5d]-Start a container!/n", getpid ()); /* Enable PID namespace-clone_newpid*/ int container_pid = CLONE (Container_main, Container_stack+stack_size, clone_newuts | Clone_newpid | SIGCHLD, NULL); Waitpid (container_pid, NULL, 0); printf ("Parent-container stopped!/n"); return 0;}
The result of the operation is as follows (we can see that the PID of the subprocess is 1):
[Email protected]:~$ sudo./pidparent [3474]-Start a container! Container [ 1]-Inside the container![ Email protected]:~# Echo $$
PID 1, in the traditional UNIX system, PID 1 process is init, the status is very special. As the parent process of all processes, he has many privileges (such as masking signals, etc.), and it also checks the state of all processes, and we know that if a child process is out of the parent process (the parent process does not wait for it), then Init is responsible for reclaiming the resource and ending the child process. Therefore, to achieve the separation process space, the first to create a PID 1 process, preferably like chroot, the processing of the PID in the container into 1.
However, we will find that we can still see all the processes by entering commands such as ps,top in the shell of the child process. The description is not completely isolated. This is because, like PS, top these commands read the/proc file system, so because the/proc file system is the same in both parent and child processes, these commands display the same things.
Therefore, we also need to isolate the file system.
Mount Namespace
In the following routines, we have mount namespace enabled and the/proc file system is re-mount in the child process.
int Container_main (void* arg) { printf ("Container [%5d]-Inside the container!/n", Getpid ()); SetHostName ("Container", ten); /* Re-mount the proc file system to/proc * /System ("MOUNT-T proc Proc/proc"); EXECV (Container_args[0], Container_args); printf ("Something ' s wrong!/n"); return 1;} int main () { printf ("Parent [%5d]-Start a container!/n", getpid ()); /* Enable Mount Namespace-Add clone_newns parameter * /int container_pid = Clone (Container_main, Container_stack+stack_size, Clone_newuts | Clone_newpid | clone_newns | SIGCHLD, NULL); Waitpid (container_pid, NULL, 0); printf ("Parent-container stopped!/n"); return 0;}
The results of the operation are as follows:
[Email protected]:~$ sudo./pid.mntparent [3502]-Start a container! Container [ 1]-Inside the container![ Email protected]:~# ps-elff s UID PID PPID C PRI NI ADDR SZ wchan stime TTY time CMD4 s root 1 0 0 0- 6917 wait 19:55 pts/2 00:00:00/bin/bash0 R root 1 0 0- 5671- 19:56 pts/2 00:00:00 ps-elf
Above, we can see only two processes, and the pid=1 process is our/bin/bash. We can also see a lot of clean in the/proc directory:
[Email protected]:~# ls/proc1 DMA key-users net sysvipc16 driver kmsg pagetypeinfo timer_listacpi execdomains kpagecount partitions timer_statsasound FB Kpageflags sched_debug ttybuddyinfo filesystems loadavg schedstat Uptimebus FS locks SCSI versioncgroups interrupts mdstat self version_ Signaturecmdline iomem meminfo slabinfo vmallocinfoconsoles ioports Misc Softirqs vmstatcpuinfo IRQ modules stat zoneinfocrypto kallsyms Mounts swapsdevices kcore MPT sysdiskstats keys mtrr Sysrq-trigger
, we can see that the top command in the sub-process only sees two processes.
Here, say more. After the Mount namespace is created through clone_newns, the parent process copies its own file structure to the child process. All of the Mount operations in the new namespace in the child process affect only their own file system, without any impact to the outside world. This allows for more rigorous isolation.
You might ask, is there any other file system that we need to mount? Yes.
Docker's Mount Namespace
Below I will demonstrate a "cottage image" that mimics the Docker Mount Namespace.
First of all, we need a rootfs, that is, we need to make a copy of those commands in the image we want to do in a rootfs directory, we imitate Linux to build the following directory:
[Email protected]:~/rootfs$ lsbin Dev etc Home Lib lib64 mnt opt proc root run sbin sys tmp usr var
Then we copy some of the commands we need into the Rootfs/bin directory (SH command must be copied in, otherwise we can't chroot)
[Email protected]:~/rootfs$ ls/bin./usr/bin./bin:bash chown gzip less mount netstat rm tabs Tee top ttycat CP hostname LN mountpoint ping sed tac Test Touch umountchgrp echo IP ls mv PS sh tail Timeout tr unamechmod grep kill more NC pwd sleep tar toe truncate Which./usr/bin:awk env groups head ID mesg sort strace tail top uniq vi WC xargs
Note: You can use the LDD command to copy the so files associated with these commands to the corresponding directory:
[Email protected]:~/rootfs/bin$ ldd bash linux-vdso.so.1 = (0x00007fffd33fc000) libtinfo.so.5 = >/lib/x86_64-linux-gnu/libtinfo.so.5 (0x00007f4bd42c2000) libdl.so.2 =/lib/x86_64-linux-gnu/ Libdl.so.2 (0x00007f4bd40be000) libc.so.6 =/lib/x86_64-linux-gnu/libc.so.6 (0x00007f4bd3cf8000) / Lib64/ld-linux-x86-64.so.2 (0x00007f4bd4504000)
Here are some of the so files in my rootfs:
[email protected]:~/rootfs$ ls./lib64./lib/x86_64-linux-gnu/./lib64:ld-linux-x86-64.so.2./lib/x86_64- Linux-gnu/:libacl.so.1 libmemusage.so libnss_files-2.19.so libpython3.4m.so.1libacl.so.1.1.0 libmount.so. 1 libnss_files.so.2 libpython3.4m.so.1.0libattr.so.1 libmount.so.1.1.0 libnss_hesiod-2.19.so Lib Resolv-2.19.solibblkid.so.1 libm.so.6 libnss_hesiod.so.2 libresolv.so.2libc-2.19.so libncurses.so .5 libnss_nis-2.19.so libselinux.so.1libcap.a libncurses.so.5.9 libnss_nisplus-2.19.so libtinfo. so.5libcap.so libncursesw.so.5 libnss_nisplus.so.2 libtinfo.so.5.9libcap.so.2 libncursesw.so.5.9 Libnss_nis.so.2 libutil-2.19.solibcap.so.2.24 libnsl-2.19.so libpcre.so.3 Libutil.so.1libc.s O.6 libnsl.so.1 libprocps.so.3 libuuid.so.1libdl-2.19.so libnss_compat-2.19.so libpthread-2 .19.so libz.so.1libdl.so.2Libnss_compat.so.2 libpthread.so.0libgpm.so.2 libnss_dns-2.19.so libpython2.7.so.1libm-2.19.so Libnss_dns . so.2 libpython2.7.so.1.0
Include some of the configuration files that these commands depend on:
[email protected]:~/rootfs$ ls/ETCBASH.BASHRC Group hostname hosts Ld.so.cache nsswitch.conf passwd profileresolv.conf Shadow
You're going to say, I am, I hope it was set when the container was started, not hard code in the mirror. For example:/etc/hosts,/etc/hostname, and the DNS/etc/resolv.conf file. Good. So we're outside the ROOTFS, we'll create a conf directory and put those files in this directory.
[email protected]:~$ ls./confhostname hosts resolv.conf
In this way, our parent process can dynamically set the configuration of these files required by the container and then mount them into the container, so that the configuration in the container's image is more flexible.
Well, finally to our program.
#define _gnu_source#include #include #include #include #include #include #include stack_size (1024x768) static C Har container_stack[stack_size];char* const container_args[] = {"/bin/bash", "-l", Null};int container_main (void * arg) {printf ("Container [%5d]-Inside the container!/n", Getpid ()); Set hostname sethostname ("container", 10); Remount "/proc" to make sure the "top" and "PS" Show container ' s information if (Mount ("proc", "Rootfs/proc", "Proc", 0, NULL)!=0) {perror ("proc"); } if (Mount ("Sysfs", "Rootfs/sys", "Sysfs", 0, NULL)!=0) {perror ("sys"); } if (Mount ("None", "rootfs/tmp", "Tmpfs", 0, NULL)!=0) {perror ("tmp"); } if (Mount ("Udev", "Rootfs/dev", "Devtmpfs", 0, NULL)!=0) {perror ("dev"); } if (Mount ("Devpts", "rootfs/dev/pts", "Devpts", 0, NULL)!=0) {perror ("dev/pts"); } if (Mount ("Shm", "Rootfs/dev/shm", "Tmpfs", 0, NULL)!=0) {perror ("Dev/shm"); } if (MoUNT ("Tmpfs", "Rootfs/run", "Tmpfs", 0, NULL)!=0) {perror ("Run"); }/* * Emulate Docker's mount-related profile from an outgoing container * You can view:/var/lib/docker/containers//directory, * you will see these files for Docker. */if (Mount ("Conf/hosts", "rootfs/etc/hosts", "none", Ms_bind, NULL)!=0 | | Mount ("Conf/hostname", "Rootfs/etc/hostname", "none", Ms_bind, NULL)!=0 | | Mount ("conf/resolv.conf", "rootfs/etc/resolv.conf", "none", Ms_bind, NULL)!=0) {perror ("conf"); }/* Emulate the-V,--volume=[] parameter in the Docker Run command */if (Mount ("/tmp/t1", "rootfs/mnt", "none", Ms_bind, NULL)!=0) { Perror ("mnt"); }/* Chroot Quarantine Directory */if (ChDir ("./rootfs")! = 0 | | chroot ("./")! = 0) {perror ("chdir/chroot"); } execv (Container_args[0], Container_args); Perror ("exec"); printf ("Something ' s wrong!/n"); return 1;} int main () {printf ("Parent [%5d]-Start a container!/n", getpid ()); int container_pid = Clone (Container_main, Container_stack+stack_size, Clone_newuts | CLONE_NEWIPC | Clone_newpid | clone_newns | SIGCHLD, NULL); Waitpid (container_pid, NULL, 0); printf ("Parent-container stopped!/n"); return 0;}
Sudo runs the above program, you will see the following mount information and a so-called "mirror":
[email protected]:~$ sudo./mountparent [4517]-Start a container! Container [1]-Inside the container! [email protected]:/# mountproc on/proc type proc (rw,relatime) Sysfs on/sys type Sysfs (rw,relatime) None on/tmp Typ e Tmpfs (rw,relatime) udev on/dev type DEVTMPFS (rw,relatime,size=493976k,nr_inodes=123494,mode=755) devpts on/dev/pts Type devpts (rw,relatime,mode=600,ptmxmode=000) Tmpfs on/run type TMPFS (rw,relatime)/dev/disk/by-uuid/ 18086e3b-d805-4515-9e91-7efb2fe5c0e2 on/etc/hosts type EXT4 (rw,relatime,errors=remount-ro,data=ordered)/dev/disk/ By-uuid/18086e3b-d805-4515-9e91-7efb2fe5c0e2 on/etc/hostname type EXT4 (rw,relatime,errors=remount-ro,data=ordered )/dev/disk/by-uuid/18086e3b-d805-4515-9e91-7efb2fe5c0e2 on/etc/resolv.conf type EXT4 (rw,relatime,errors= remount-ro,data=ordered) [email protected]:/# ls/bin/usr/bin/bin:bash chmod echo hostname less more MV Pi ng RM sleep tail Test top truncate unamecat chown grep IP Ln Mount nc ps sed tabs tar timeout touch TTY whichchgrp CP gzip kill LS mountpoint n Etstat pwd sh tac tee toe tr Umount/usr/bin:awk env groups head ID MESG sort strace tail Top Uniq VI WC xargs
about how to make a chroot directory, here is a tool called Debootstrapchroot, you can follow the link to see (The English OH)
The next thing that you can play by yourself, I believe in your imagination. :)
Today's content is introduced here, in the Docker basic technology: Linux Namespace (next), I will introduce you to the user Namespace, Network Namespace and other things Namespace.
Originally from: http://os.51cto.com/art/201609/517640.htm
Free to provide the latest Linux technology tutorials Books, for open-source technology enthusiasts to do more and better: http://www.linuxprobe.com/
Docker basic technology: Linux Namespace (top)