"Do-it-Yourself Writing docker" book Pick One: Linux namespace__linux

Source: Internet
Author: User
Tags message queue posix stdin gocode
Introduction to Linux Namespace

We often hear that Docker is a virtualization tool that uses Linux Namespace and cgroups, but what is Linux Namespace how it is used in Docker, where many people are confused, Let's begin by introducing the Linux Namespace and how they are used in the container. Concept

Linux Namespace is a feature of kernel that isolates resources from a range of systems, such as PID (Process ID), User ID, network, and so on. In general, many people will think of a command chroot, just as chroot allows the current directory to become the root (isolated), NAMESAPCE can also isolate processes on some resources, including process trees, network interfaces, mount points, and so on.

For example, a company sells its own computing resources to the outside world. The company has a good performance server, and each user buys a Tomcat instance to run their own applications. Some naughty customers may accidentally enter someone else's Tomcat instance, modify or shut down some of these resources, which can cause each customer to interfere with each other. You might say that we can restrict the permissions of different users so that users can only access Tomcat in their own name, but some operations may require system-level permissions, such as root. It is not possible to grant root privileges to each user, nor is it possible to provide each user with a new physical host to isolate them from each other, so this Linux namespace comes in handy. Using namespace, we can do the UID level of isolation, that is, we can use the UID as n users, virtualization out of a namespace, in this namespace, the user has root permissions. But on the real physical machine, he is the user with the UID N, which solves the problem of isolation between users. Of course this is just namespace one of the simple features.

In addition to the user Namespace, PID can also be virtual. namespaces to establish different views of the system, for each namespace, from the user looks like a separate Linux computer, have their own init process (PID 1), the other process of the PID increment, A and B space have PID 1 of the Init process, The process of the child container is mapped to the parent container's process, and the parent container knows the running state of each child container, and the child container is isolated from the child container. As we can see from the diagram, Process 3 The PID in the parent namespace is 3, but within the child namespace, he is 1. That is, the user looks at process 3 from child namespace A as the init process, thinking that the process is its own initialization process, but from the entire host, He's actually just a space for the 3rd process to be virtualized.

Currently, Linux implements six different types of namespace altogether.

namespace Type System Call Parameters Kernel version
Mount namespaces Clone_newns 2.4.19
UTS namespaces Clone_newuts 2.6.19
IPC namespaces Clone_newipc 2.6.19
PID namespaces Clone_newpid 2.6.24
Network namespaces Clone_newnet 2.6.29
User namespaces Clone_newuser 3.8

The NAMESAPCE API primarily uses three system calls to Clone ()-To create a new process. Depending on the system call parameters, which type of namespace is created, and their subprocess is also included in namespace Unshare ()-Moves the process out of a namespace setns ()-Joins the process into the NAMESP UTS Namespace

UTS namespace mainly isolates nodename and domainname two system identities. Inside the UTS namespace, each namespace is allowed to have its own hostname.

Below we will use go to do a UTS Namespace example. In fact for Namespace this kind of system call, uses the C language to describe is the best, but the goal of this book is to realize Docker, because Docker is to use go development, then we use go to explain overall. First look at the code, very simple:

Package main

Import (
    "os/exec"
    "Syscall"
    "OS" "
    Log"
)

func main () {
    cmd: = exec. Command ("sh")
    cmd. Sysprocattr = &syscall. sysprocattr{
        Cloneflags:syscall. Clone_newuts,
    }
    cmd. Stdin = OS. Stdin
    cmd. Stdout = OS. Stdout
    cmd. Stderr = OS. Stderr

    If err: = cmd. Run (); Err!= nil {
        log. Fatal (Err)
    }
}

To explain the code, Exec.command (' sh ') is to specify the execution environment for the current command, and we use SH to perform it by default. The following is to set the system call parameters, as we mentioned earlier, using the clone_newuts identifier to create a UTS Namespace. Go helps us encapsulate the call to the Clone () function, which is executed into an SH run environment.

We run this program on Ubuntu 14.04, kernel version 3.13.0-65-generic,go version 1.7.3, performing go run main.go, and we use PSTREE-PL in this interactive environment to look at the relationship between processes in the system

|-SSHD (19820)---bash (19839)---Go (19901)-+-main (19912)-+-sh (19915)---
    pstree (19916)   

And then we output the current PID

# echo $$
19915

Verify that our parent and child processes are not in the same UTS namespace

# readlink/proc/19912/ns/uts
uts:[4026531838]
# readlink/proc/19915/ns/uts
uts:[4026532193]

You can see that they are really not in the same UTS namespace. Because the UTS namespace is to hostname to do the isolation, then we in this environment changes the hostname should not affect the external host, below we will do the experiment.

Executed in this SH environment

Modify hostname for bird then print out 
# hostname-b bird
# hostname
Bird    

Let's start another shell and run it on the host. Hostname look at the effect

root@iz254rt8xf1z:~# hostname
iz254rt8xf1z

You can see that the external hostname has not been affected by internal changes, thus understanding the role of UTS namespace. IPC Namespace

IPC Namespace is used to isolate System V IPC and POSIX message queues. Each IPC Namespace has its own System V IPC and POSIX message queue.

We changed the code a little bit based on the previous version.

Package main

Import (
    "Log" "
    os" "
    os/exec"
    "Syscall"
)

func main () {
    cmd: = exec. Command ("sh")
    cmd. Sysprocattr = &syscall. sysprocattr{
        Cloneflags:syscall. clone_newuts | Syscall. CLONE_NEWIPC,
    }
    cmd. Stdin = OS. Stdin
    cmd. Stdout = OS. Stdout
    cmd. Stderr = OS. Stderr

    If err: = cmd. Run (); Err!= nil {
        log. Fatal (Err)
    }
}

We can see that we just add syscall. CLONE_NEWIPC represents our desire to create an IPC Namespace. Below we need to open two shells to demonstrate the effect of isolation.

First open a shell on the host

View the existing IPC message queues
root@iz254rt8xf1z:~# ipcs-q

------message queues--------
key        msqid      Owner      perms      used-bytes   messages

Below we create a message queue
root@iz254rt8xf1z:~# ipcmk-q
Message Queue id:0
then check the 
root@iz254rt8xf1z:~# ipcs-q

------message queues--------
key        Msqid      owner      perms      used-bytes   messages
0x5e8f3f1e 0          Root       644        0            0

Here we find that we can see a queue. Here we use another shell to run our program.

root@iz254rt8xf1z:~/gocode/src/book# Go run main.go
# ipcs-q

------message queues--------
key        Msqid      owner      perms      used-bytes   messages

From here we can find that in the newly created Namespace, we do not see the message queue that has been created on the host, indicating that our IPC Namespace was created successfully and the IPC has been quarantined. PID Namesapce

PID namespace is used to isolate process IDs. The same process can have different PID in different PID Namespace. This can be understood, in the Docker container inside, we use ps-ef often found that the container in the foreground running the process of the PID is 1, but we are outside the container, the use of PS-EF will find the same process has different PID, this is the PID namespace Things to do.

On top of the previous code, we then modify the code to add a syscall. Clone_newpid

Package main

Import (
    "Log" "
    os" "
    os/exec"
    "Syscall"
)

func main () {
    cmd: = exec. Command ("sh")
    cmd. Sysprocattr = &syscall. sysprocattr{
        Cloneflags:syscall. clone_newuts | Syscall. CLONE_NEWIPC | Syscall. Clone_newpid,
    }
    cmd. Stdin = OS. Stdin
    cmd. Stdout = OS. Stdout
    cmd. Stderr = OS. Stderr

    If err: = cmd. Run (); Err!= nil {
        log. Fatal (Err)
    }
}

We need to open two shell, first of all, we look at the process tree on the host, look for the real PID of our process

root@iz254rt8xf1z:~# pstree-pl
 |-sshd (894)-+-sshd (9455)---bash (9475)---bash (19619)
    |           | -SSHD (19715)---bash (19734)
    |           | -SSHD (19853)---bash (19872)---Go (20179)-+-main (20190)-+-sh (20193)
    | | |             -{main} (20191)
    |             | | '-{main} (20192)
    | | |                                       -{go} (20180)
    | | |                                       -{go} (20181)
    | | |                                       -{go} (20182)
    |           |                                       '-{go} (20186)
    |           '-sshd (20124)---bash (20144)---pstree (20196)

As you can see, our go main function runs with a PID of 20190. Now let's open another shell and run our code.

root@iz254rt8xf1z:~/gocode/src/book# Go run main.go
# echo $$
1

As you can see, we printed the current namespace PID and found it to be 1, that is to say. This 20190 pid is mapped to the namesapce inside of the PID is 1. You cannot use PS to view this, as PS and top commands will use the/proc content and we will explain it in the Mount Namesapce below. Mount Namespace

Mount namespace is used to isolate the mount point view that each process sees. Processes in different namespace see file system levels that are not the same. invoking Mount () and Umount () in the Mount namespace will only affect the file system within the current namespace, but it has no effect on the global file system.

When you see this, you may think of chroot (). It also turns a subdirectory into a root node. But mount namespace not only implements this functionality, but can be implemented in a more flexible and secure manner.

Mount namespace is the Namesapce type of Linux's first implementation, so its system call parameters are newns (the abbreviation for new namespace). It seems that people did not realize that there will be many types of namespace to join the Linux family.

We made a little change to the code above to add the Newns logo.

Package main

Import (
    "Log" "
    os" "
    os/exec"
    "Syscall"
)

func main () {
    cmd: = exec. Command ("sh")
    cmd. Sysprocattr = &syscall. sysprocattr{
        Cloneflags:syscall. clone_newuts | Syscall. CLONE_NEWIPC | Syscall. Clone_newpid | Syscall. Clone_newns,
    }
    cmd. Stdin = OS. Stdin
    cmd. Stdout = OS. Stdout
    cmd. Stderr = OS. Stderr

    If err: = cmd. Run (); Err!= nil {
        log. Fatal (Err)
    }
}

First, after we run the code, look at the contents of the/proc file. Proc is a file system that provides additional mechanisms for sending information from the kernel and kernel modules to the process.

# ls/proc 1 19872 739 865 bus filesystems kpagecount pagetypeinfo-SYSVI PC 145 2 348 866 cgroups FS KPAGEFLAGS partitions timer_list 10     0 1472 869 CmdLine interrupts Latency_stats sched_debug timer_stats 11 1475 20124 353 894 consoles Iomem loadavg schedstat TTY 1174 15 2   0129 6 776 9 cpuinfo ioports Locks SCSI Uptime 1192 154 20144 28 37 937 Crypto IPMI Mdstat self version 12 155 20215 29 38 5 607 7   945 devices IRQ Meminfo slabinfo version_signature 1255 16 20226 3 39 50 61   8 9460 diskstats kallsyms misc Softirqs vmallocinfo 1277 17 20229 30 391 51 62 827 967 DMA KCOre modules stat Vmstat 1296 20231-M-836-driver-mo           Unts swaps Xen 7 860 ACPI Execdomains keys MTRR
 SYS zoneinfo 1309 19853 733 862 buddyinfo FB kmsg net Sysrq-trigger

Because the/proc here is still the host, so we see the inside will be more messy, below we will mount/proc to our own namesapce below.

# mount-t proc Proc/proc
# ls/proc
1      consoles   execdomains  ipmi       kpagecount     Misc      Sched_debug  swaps          uptime
5      cpuinfo    fb       IRQ        kpageflags     Modules       Schedstat    SYS        version
ACPI       crypto     filesystems  kallsyms   latency_stats  Mounts    SCSI     sysrq-trigger  version_signature
buddyinfo  devices    FS       Kcore      loadavg        mtrr      self     sysvipc        vmallocinfo
bus    diskstats  interrupts   Key-users  Locks      Net       slabinfo timer_list     vmstat
cgroups    DMA        iomem    keys       mdstat         pagetypeinfo  softirqs timer_stats    xen     cmdline      driver Ioports Kmsg       meminfo        partitions    stat     TTY        zoneinfo

As you can see, there are a lot less commands in an instant. Here we can use PS to view the process of the system.

# ps-ef
UID        PID  PPID  C stime TTY time          CMD
root         1     0  0 20:15 pts/4    00:00:00 sh< C10/>root         6     1  0 20:19 pts/4    00:00:00 ps-ef

As you can see, in the current NAMESAPCE, our SH process is PID 1 process. This shows that the mount and the outer space in our current mount Namesapce are isolated, and the mount operation does not affect the external. Docker volume also exploits this feature. User Namesapce

User namespace is primarily an isolated user group ID. That is, the user ID and group ID of a process can be different inside and outside the user namespace. It is more commonly used to create a user namespace on a host computer, running as a non-root user, and then mapping the user namespace to root. This means that the process has root permissions in the user namespace, but does not have root permissions outside the user namespace. Starting with Linux kernel 3.8, the non-root process can also create the user namespace, and the process can be mapped to root in namespace and has root permissions within namespace.

Let's continue to describe it as an example.

Package main

Import (
    "Log" "
    os" "
    os/exec"
    "Syscall"
)

func main () {
    cmd: = exec. Command ("sh")
    cmd. Sysprocattr = &syscall. sysprocattr{
        Cloneflags:syscall. clone_newuts | Syscall. CLONE_NEWIPC | Syscall. Clone_newpid | Syscall. clone_newns |
            Syscall. Clone_newuser,
    }
    cmd. Sysprocattr.credential = &syscall. Credential{uid:uint32 (1), Gid:uint32 (1)}
    cmd. Stdin = OS. Stdin
    cmd. Stdout = OS. Stdout
    cmd. Stderr = OS. Stderr

    If err: = cmd. Run (); Err!= nil {
        log. Fatal (Err)
    }
    os. Exit ( -1)
}

We have added syscall on the original basis. Clone_newuser. First we run the program with Root, and before we run it, we look at the current user and user groups on the host.

root@iz254rt8xf1z:~/gocode/src/book# ID uid=0 (root) gid=0 (root) groups=0 (root)

We can see that we are the root user, we run the program

root@iz254rt8xf1z:~/gocode/src/book# Go run main.go
$ id
uid=65534 (nobody) gid=65534 (Nogroup) groups=65534 (Nogroup)
Network Namespace

Network namespace is used to isolate network devices, IP address ports and other network stacks of namespace. Network namespace allows each container to have its own stand-alone network device (virtual), and the application within the container can be bound to its own port, and the ports within each NAMESAPCE will not conflict with each other. After the network Bridge is built on the host, the communication between the containers can be realized conveniently, and the same port can be used in each container.

Again, we add a little bit to the original code. We have added Syscall. Clone_newnet here identifier.

Package main

Import (
    "Log" "
    os" "
    os/exec"
    "Syscall"
)

func main () {
    cmd: = exec. Command ("sh")
    cmd. Sysprocattr = &syscall. sysprocattr{
        Cloneflags:syscall. clone_newuts | Syscall. CLONE_NEWIPC | Syscall. Clone_newpid | Syscall. clone_newns |
            Syscall. Clone_newuser | Syscall. Clone_newnet,
    }
    cmd. Sysprocattr.credential = &syscall. Credential{uid:uint32 (1), Gid:uint32 (1)}
    cmd. Stdin = OS. Stdin
    cmd. Stdout = OS. Stdout
    cmd. Stderr = OS. Stderr

    If err: = cmd. Run (); Err!= nil {
        log. Fatal (Err)
    }
    os. Exit ( -1)
}

First, we check our network devices on the host.

root@iz254rt8xf1z:~/gocode/src/book# ifconfig docker0 Link encap:ethernet hwaddr 02:42:d7:5d:c3:b9 inet:  192.168.0.1 bcast:0.0.0.0 mask:255.255.240.0 up broadcast multicast mtu:1500 RX metric:1
          errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 RX bytes:0  (0.0 B) TX bytes:0  (0.0 B) eth0 Link encap:ethernet hwaddr 00:16:3e:00:38:cc inet addr:10.170.174.187 mask:255.255.248.0 up broadcast RUNNING multicast mtu:1500 metric:1 RX packets:5605 errors:0 Droppe
          d:0 overruns:0 frame:0 TX packets:1819 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 RX txqueuelen:1000  (7.1 MB) TX bytes:159780 (159.7 KB) eth1 Link encap:ethernet hwaddr 00:16:3e:00:6d:4d inet addr:101.200.126.205 B
     cast:101.200.127.255 mask:255.255.252.0     Up broadcast RUNNING multicast mtu:1500 metric:1 RX packets:15433 errors:0 dropped:0 overruns:0 frame:0 TX packets:6888 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 RX txqueuelen:1000 : 13287762 (13.2 MB) TX bytes:1787482 (1.7 mb) Lo Link encap:local loopback inet addr:127.0.0.1
          5.0.0.0 up loopback RUNNING mtu:65536 metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 RX bytes:0  (0.0 B) TX bytes:0
 (0.0 B)

We can see that the host has lo, eth0, eth1 and other network equipment, the following we run the program to network NAMESPCE inside to see.

root@iz254rt8xf1z:~/gocode/src/book# Go run main.go
$ ifconfig
$

We found that there were no network devices in the namespace. This will show the network isolation between network namespace and host hosts. Summary

In this section, we mainly introduced the Linux Namespace, a total of six categories of Namespace, we introduced a brief, and then the go language as an example to do a demo, so that everyone convenient to have an intuitive understanding, we will use in the later chapters of this knowledge, And for these namespace applications, later chapters will have more complex examples waiting for everyone.

Related book recommendation << write docker>> yourself

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.