Introduction to Linux Namespace
We often hear that Docker is a virtualization tool that uses Linux Namespace and cgroups, but what is Linux Namespace how it is used in Docker, where many people are confused, Let's begin by introducing the Linux Namespace and how they are used in the container. Concept
Linux Namespace is a feature of kernel that isolates resources from a range of systems, such as PID (Process ID), User ID, network, and so on. In general, many people will think of a command chroot, just as chroot allows the current directory to become the root (isolated), NAMESAPCE can also isolate processes on some resources, including process trees, network interfaces, mount points, and so on.
For example, a company sells its own computing resources to the outside world. The company has a good performance server, and each user buys a Tomcat instance to run their own applications. Some naughty customers may accidentally enter someone else's Tomcat instance, modify or shut down some of these resources, which can cause each customer to interfere with each other. You might say that we can restrict the permissions of different users so that users can only access Tomcat in their own name, but some operations may require system-level permissions, such as root. It is not possible to grant root privileges to each user, nor is it possible to provide each user with a new physical host to isolate them from each other, so this Linux namespace comes in handy. Using namespace, we can do the UID level of isolation, that is, we can use the UID as n users, virtualization out of a namespace, in this namespace, the user has root permissions. But on the real physical machine, he is the user with the UID N, which solves the problem of isolation between users. Of course this is just namespace one of the simple features.
In addition to the user Namespace, PID can also be virtual. namespaces to establish different views of the system, for each namespace, from the user looks like a separate Linux computer, have their own init process (PID 1), the other process of the PID increment, A and B space have PID 1 of the Init process, The process of the child container is mapped to the parent container's process, and the parent container knows the running state of each child container, and the child container is isolated from the child container. As we can see from the diagram, Process 3 The PID in the parent namespace is 3, but within the child namespace, he is 1. That is, the user looks at process 3 from child namespace A as the init process, thinking that the process is its own initialization process, but from the entire host, He's actually just a space for the 3rd process to be virtualized.
Currently, Linux implements six different types of namespace altogether.
namespace Type |
System Call Parameters |
Kernel version |
Mount namespaces |
Clone_newns |
2.4.19 |
UTS namespaces |
Clone_newuts |
2.6.19 |
IPC namespaces |
Clone_newipc |
2.6.19 |
PID namespaces |
Clone_newpid |
2.6.24 |
Network namespaces |
Clone_newnet |
2.6.29 |
User namespaces |
Clone_newuser |
3.8 |
The NAMESAPCE API primarily uses three system calls to Clone ()-To create a new process. Depending on the system call parameters, which type of namespace is created, and their subprocess is also included in namespace Unshare ()-Moves the process out of a namespace setns ()-Joins the process into the NAMESP UTS Namespace
UTS namespace mainly isolates nodename and domainname two system identities. Inside the UTS namespace, each namespace is allowed to have its own hostname.
Below we will use go to do a UTS Namespace example. In fact for Namespace this kind of system call, uses the C language to describe is the best, but the goal of this book is to realize Docker, because Docker is to use go development, then we use go to explain overall. First look at the code, very simple:
Package main
Import (
"os/exec"
"Syscall"
"OS" "
Log"
)
func main () {
cmd: = exec. Command ("sh")
cmd. Sysprocattr = &syscall. sysprocattr{
Cloneflags:syscall. Clone_newuts,
}
cmd. Stdin = OS. Stdin
cmd. Stdout = OS. Stdout
cmd. Stderr = OS. Stderr
If err: = cmd. Run (); Err!= nil {
log. Fatal (Err)
}
}
To explain the code, Exec.command (' sh ') is to specify the execution environment for the current command, and we use SH to perform it by default. The following is to set the system call parameters, as we mentioned earlier, using the clone_newuts identifier to create a UTS Namespace. Go helps us encapsulate the call to the Clone () function, which is executed into an SH run environment.
We run this program on Ubuntu 14.04, kernel version 3.13.0-65-generic,go version 1.7.3, performing go run main.go, and we use PSTREE-PL in this interactive environment to look at the relationship between processes in the system
|-SSHD (19820)---bash (19839)---Go (19901)-+-main (19912)-+-sh (19915)---
pstree (19916)
And then we output the current PID
# echo $$
19915
Verify that our parent and child processes are not in the same UTS namespace
# readlink/proc/19912/ns/uts
uts:[4026531838]
# readlink/proc/19915/ns/uts
uts:[4026532193]
You can see that they are really not in the same UTS namespace. Because the UTS namespace is to hostname to do the isolation, then we in this environment changes the hostname should not affect the external host, below we will do the experiment.
Executed in this SH environment
Modify hostname for bird then print out
# hostname-b bird
# hostname
Bird
Let's start another shell and run it on the host. Hostname look at the effect
root@iz254rt8xf1z:~# hostname
iz254rt8xf1z
You can see that the external hostname has not been affected by internal changes, thus understanding the role of UTS namespace. IPC Namespace
IPC Namespace is used to isolate System V IPC and POSIX message queues. Each IPC Namespace has its own System V IPC and POSIX message queue.
We changed the code a little bit based on the previous version.
Package main
Import (
"Log" "
os" "
os/exec"
"Syscall"
)
func main () {
cmd: = exec. Command ("sh")
cmd. Sysprocattr = &syscall. sysprocattr{
Cloneflags:syscall. clone_newuts | Syscall. CLONE_NEWIPC,
}
cmd. Stdin = OS. Stdin
cmd. Stdout = OS. Stdout
cmd. Stderr = OS. Stderr
If err: = cmd. Run (); Err!= nil {
log. Fatal (Err)
}
}
We can see that we just add syscall. CLONE_NEWIPC represents our desire to create an IPC Namespace. Below we need to open two shells to demonstrate the effect of isolation.
First open a shell on the host
View the existing IPC message queues
root@iz254rt8xf1z:~# ipcs-q
------message queues--------
key msqid Owner perms used-bytes messages
Below we create a message queue
root@iz254rt8xf1z:~# ipcmk-q
Message Queue id:0
then check the
root@iz254rt8xf1z:~# ipcs-q
------message queues--------
key Msqid owner perms used-bytes messages
0x5e8f3f1e 0 Root 644 0 0
Here we find that we can see a queue. Here we use another shell to run our program.
root@iz254rt8xf1z:~/gocode/src/book# Go run main.go
# ipcs-q
------message queues--------
key Msqid owner perms used-bytes messages
From here we can find that in the newly created Namespace, we do not see the message queue that has been created on the host, indicating that our IPC Namespace was created successfully and the IPC has been quarantined. PID Namesapce
PID namespace is used to isolate process IDs. The same process can have different PID in different PID Namespace. This can be understood, in the Docker container inside, we use ps-ef often found that the container in the foreground running the process of the PID is 1, but we are outside the container, the use of PS-EF will find the same process has different PID, this is the PID namespace Things to do.
On top of the previous code, we then modify the code to add a syscall. Clone_newpid
Package main
Import (
"Log" "
os" "
os/exec"
"Syscall"
)
func main () {
cmd: = exec. Command ("sh")
cmd. Sysprocattr = &syscall. sysprocattr{
Cloneflags:syscall. clone_newuts | Syscall. CLONE_NEWIPC | Syscall. Clone_newpid,
}
cmd. Stdin = OS. Stdin
cmd. Stdout = OS. Stdout
cmd. Stderr = OS. Stderr
If err: = cmd. Run (); Err!= nil {
log. Fatal (Err)
}
}
We need to open two shell, first of all, we look at the process tree on the host, look for the real PID of our process
root@iz254rt8xf1z:~# pstree-pl
|-sshd (894)-+-sshd (9455)---bash (9475)---bash (19619)
| | -SSHD (19715)---bash (19734)
| | -SSHD (19853)---bash (19872)---Go (20179)-+-main (20190)-+-sh (20193)
| | | -{main} (20191)
| | | '-{main} (20192)
| | | -{go} (20180)
| | | -{go} (20181)
| | | -{go} (20182)
| | '-{go} (20186)
| '-sshd (20124)---bash (20144)---pstree (20196)
As you can see, our go main function runs with a PID of 20190. Now let's open another shell and run our code.
root@iz254rt8xf1z:~/gocode/src/book# Go run main.go
# echo $$
1
As you can see, we printed the current namespace PID and found it to be 1, that is to say. This 20190 pid is mapped to the namesapce inside of the PID is 1. You cannot use PS to view this, as PS and top commands will use the/proc content and we will explain it in the Mount Namesapce below. Mount Namespace
Mount namespace is used to isolate the mount point view that each process sees. Processes in different namespace see file system levels that are not the same. invoking Mount () and Umount () in the Mount namespace will only affect the file system within the current namespace, but it has no effect on the global file system.
When you see this, you may think of chroot (). It also turns a subdirectory into a root node. But mount namespace not only implements this functionality, but can be implemented in a more flexible and secure manner.
Mount namespace is the Namesapce type of Linux's first implementation, so its system call parameters are newns (the abbreviation for new namespace). It seems that people did not realize that there will be many types of namespace to join the Linux family.
We made a little change to the code above to add the Newns logo.
Package main
Import (
"Log" "
os" "
os/exec"
"Syscall"
)
func main () {
cmd: = exec. Command ("sh")
cmd. Sysprocattr = &syscall. sysprocattr{
Cloneflags:syscall. clone_newuts | Syscall. CLONE_NEWIPC | Syscall. Clone_newpid | Syscall. Clone_newns,
}
cmd. Stdin = OS. Stdin
cmd. Stdout = OS. Stdout
cmd. Stderr = OS. Stderr
If err: = cmd. Run (); Err!= nil {
log. Fatal (Err)
}
}
First, after we run the code, look at the contents of the/proc file. Proc is a file system that provides additional mechanisms for sending information from the kernel and kernel modules to the process.
# ls/proc 1 19872 739 865 bus filesystems kpagecount pagetypeinfo-SYSVI PC 145 2 348 866 cgroups FS KPAGEFLAGS partitions timer_list 10 0 1472 869 CmdLine interrupts Latency_stats sched_debug timer_stats 11 1475 20124 353 894 consoles Iomem loadavg schedstat TTY 1174 15 2 0129 6 776 9 cpuinfo ioports Locks SCSI Uptime 1192 154 20144 28 37 937 Crypto IPMI Mdstat self version 12 155 20215 29 38 5 607 7 945 devices IRQ Meminfo slabinfo version_signature 1255 16 20226 3 39 50 61 8 9460 diskstats kallsyms misc Softirqs vmallocinfo 1277 17 20229 30 391 51 62 827 967 DMA KCOre modules stat Vmstat 1296 20231-M-836-driver-mo Unts swaps Xen 7 860 ACPI Execdomains keys MTRR
SYS zoneinfo 1309 19853 733 862 buddyinfo FB kmsg net Sysrq-trigger
Because the/proc here is still the host, so we see the inside will be more messy, below we will mount/proc to our own namesapce below.
# mount-t proc Proc/proc
# ls/proc
1 consoles execdomains ipmi kpagecount Misc Sched_debug swaps uptime
5 cpuinfo fb IRQ kpageflags Modules Schedstat SYS version
ACPI crypto filesystems kallsyms latency_stats Mounts SCSI sysrq-trigger version_signature
buddyinfo devices FS Kcore loadavg mtrr self sysvipc vmallocinfo
bus diskstats interrupts Key-users Locks Net slabinfo timer_list vmstat
cgroups DMA iomem keys mdstat pagetypeinfo softirqs timer_stats xen cmdline driver Ioports Kmsg meminfo partitions stat TTY zoneinfo
As you can see, there are a lot less commands in an instant. Here we can use PS to view the process of the system.
# ps-ef
UID PID PPID C stime TTY time CMD
root 1 0 0 20:15 pts/4 00:00:00 sh< C10/>root 6 1 0 20:19 pts/4 00:00:00 ps-ef
As you can see, in the current NAMESAPCE, our SH process is PID 1 process. This shows that the mount and the outer space in our current mount Namesapce are isolated, and the mount operation does not affect the external. Docker volume also exploits this feature. User Namesapce
User namespace is primarily an isolated user group ID. That is, the user ID and group ID of a process can be different inside and outside the user namespace. It is more commonly used to create a user namespace on a host computer, running as a non-root user, and then mapping the user namespace to root. This means that the process has root permissions in the user namespace, but does not have root permissions outside the user namespace. Starting with Linux kernel 3.8, the non-root process can also create the user namespace, and the process can be mapped to root in namespace and has root permissions within namespace.
Let's continue to describe it as an example.
Package main
Import (
"Log" "
os" "
os/exec"
"Syscall"
)
func main () {
cmd: = exec. Command ("sh")
cmd. Sysprocattr = &syscall. sysprocattr{
Cloneflags:syscall. clone_newuts | Syscall. CLONE_NEWIPC | Syscall. Clone_newpid | Syscall. clone_newns |
Syscall. Clone_newuser,
}
cmd. Sysprocattr.credential = &syscall. Credential{uid:uint32 (1), Gid:uint32 (1)}
cmd. Stdin = OS. Stdin
cmd. Stdout = OS. Stdout
cmd. Stderr = OS. Stderr
If err: = cmd. Run (); Err!= nil {
log. Fatal (Err)
}
os. Exit ( -1)
}
We have added syscall on the original basis. Clone_newuser. First we run the program with Root, and before we run it, we look at the current user and user groups on the host.
root@iz254rt8xf1z:~/gocode/src/book# ID uid=0 (root) gid=0 (root) groups=0 (root)
We can see that we are the root user, we run the program
root@iz254rt8xf1z:~/gocode/src/book# Go run main.go
$ id
uid=65534 (nobody) gid=65534 (Nogroup) groups=65534 (Nogroup)
Network Namespace
Network namespace is used to isolate network devices, IP address ports and other network stacks of namespace. Network namespace allows each container to have its own stand-alone network device (virtual), and the application within the container can be bound to its own port, and the ports within each NAMESAPCE will not conflict with each other. After the network Bridge is built on the host, the communication between the containers can be realized conveniently, and the same port can be used in each container.
Again, we add a little bit to the original code. We have added Syscall. Clone_newnet here identifier.
Package main
Import (
"Log" "
os" "
os/exec"
"Syscall"
)
func main () {
cmd: = exec. Command ("sh")
cmd. Sysprocattr = &syscall. sysprocattr{
Cloneflags:syscall. clone_newuts | Syscall. CLONE_NEWIPC | Syscall. Clone_newpid | Syscall. clone_newns |
Syscall. Clone_newuser | Syscall. Clone_newnet,
}
cmd. Sysprocattr.credential = &syscall. Credential{uid:uint32 (1), Gid:uint32 (1)}
cmd. Stdin = OS. Stdin
cmd. Stdout = OS. Stdout
cmd. Stderr = OS. Stderr
If err: = cmd. Run (); Err!= nil {
log. Fatal (Err)
}
os. Exit ( -1)
}
First, we check our network devices on the host.
root@iz254rt8xf1z:~/gocode/src/book# ifconfig docker0 Link encap:ethernet hwaddr 02:42:d7:5d:c3:b9 inet: 192.168.0.1 bcast:0.0.0.0 mask:255.255.240.0 up broadcast multicast mtu:1500 RX metric:1
errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) eth0 Link encap:ethernet hwaddr 00:16:3e:00:38:cc inet addr:10.170.174.187 mask:255.255.248.0 up broadcast RUNNING multicast mtu:1500 metric:1 RX packets:5605 errors:0 Droppe
d:0 overruns:0 frame:0 TX packets:1819 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 RX txqueuelen:1000 (7.1 MB) TX bytes:159780 (159.7 KB) eth1 Link encap:ethernet hwaddr 00:16:3e:00:6d:4d inet addr:101.200.126.205 B
cast:101.200.127.255 mask:255.255.252.0 Up broadcast RUNNING multicast mtu:1500 metric:1 RX packets:15433 errors:0 dropped:0 overruns:0 frame:0 TX packets:6888 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 RX txqueuelen:1000 : 13287762 (13.2 MB) TX bytes:1787482 (1.7 mb) Lo Link encap:local loopback inet addr:127.0.0.1
5.0.0.0 up loopback RUNNING mtu:65536 metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 RX bytes:0 (0.0 B) TX bytes:0
(0.0 B)
We can see that the host has lo, eth0, eth1 and other network equipment, the following we run the program to network NAMESPCE inside to see.
root@iz254rt8xf1z:~/gocode/src/book# Go run main.go
$ ifconfig
$
We found that there were no network devices in the namespace. This will show the network isolation between network namespace and host hosts. Summary
In this section, we mainly introduced the Linux Namespace, a total of six categories of Namespace, we introduced a brief, and then the go language as an example to do a demo, so that everyone convenient to have an intuitive understanding, we will use in the later chapters of this knowledge, And for these namespace applications, later chapters will have more complex examples waiting for everyone.
Related book recommendation << write docker>> yourself