Use an independent PID namespace to prevent mistaken killing of processes

Source: Internet
Author: User
A piece of wrong code

First, let's look at an error code:

#! /Bin/bashslice = 100; slppid = 1; pidfile =/var/run/vpnrulematch. pid # Stop the previous sleepkill_prev () {pid = $1;/bin/kill-0 $ PID; exist = $? Ppid = $ (/bin/CAT/proc/$ PID/status |/usr/bin/awk-F '''/ppid/{print $2 }'); if ["$ exist" = 0] & ["$ ppid" = $]; then/bin/kill $ PID; FI} echo $> $ pidfile # loop processing sleep while true; do nostate = 0;/bin/sleep $ slice & slppid = $ !; Wait... Done

The above code is intended to stop the previous sleep and start a new sleep when receiving the signal. Look at the complicated kill_prev operation. The complicated operation is caused by "false blocking". Only the process indicated by the process ID exists and is sleep, in addition, the kill operation is performed only when the sub-process of this script is used. It seems that there is no problem at first and it is very strict, but pay attention to the gap between the if judgment and the kill operation. If sleep is completed during that time, and there is a new process in the system that started to run at that time, and occupies the PID of the sleep process just now, the process will be immediately killed by mistake! Even if the Linux PID allocation policy increments as much as possible to prevent this phenomenon, it is still subject to the total number of allowed PIDs. If the maximum PID value is 10, this will easily happen!
What should I do? The answer is to isolate the script and its sub-processes and other related PID from the PID of other processes in the system. But can this be done in Linux? Yes. Use namespace.
About namespaces

The so-called namespace is actually an address space (nonsense, not to mention !!), If a courier finds your home address, the home address is an address, all existing and unused potential home addresses form a namespace. A namespace generally serves only one action. Different namespaces cannot interact with each other.
One thing can be named in different namespaces, such as Gaius. ulius. caesar and Gaius Julius Caesar refer to the same person, but they are in different namespaces. You find a person in Italy and tell him Gaius. ulius. caesar, he may not know what you are talking about. That is to say, you cannot kill the namespace for addressing. If you have a child in China, give him the name Gaius Julius Caesar, then it and Gaius. ulius. caesar has no association, that is to say, there is no relationship between the same names of different namespaces; but if you are proficient in ancient Rome and both Chinese and Italian, then you can immediately change Gaius. ulius. caesar is associated with Gaius Julius Caesar and may intentionally name his son Gaius Julius Caesar. That is to say, at a higher level, cross-namespace interaction is supported.
Linux PID namespace structure and implementation

Linux 2.6 kernel introduced namespace, and later implemented PID with NS. This may be to better support virtualization. Essentially, a process can belong to different namespaces. Linux organizes the PID namespace into a tree, and the sub-namespace is visible to the parent namespace. In turn, the parent namespace is invisible to the sub-namespace, the implementation of PID namespace in Linux is shown in:

By introducing a PID struct and joining task_struct, all the implementations of the PID namespace are in this PID struct:

struct pid{    atomic_t count;    unsigned int level;    /* lists of tasks that use this pid */    struct hlist_head tasks[PIDTYPE_MAX];    struct rcu_head rcu;    struct upid numbers[1];};

The following upid array indicates the value of the PID in multiple namespaces:

struct upid {    /* Try to keep pid_chain in the same cacheline as nr for find_vpid */    int nr;    struct pid_namespace *ns;    struct hlist_node pid_chain;};

The above upid struct contains the PID value itself and a namespace reference. A pid_namespace contains many things related to process control, such as independent PID allocation bitmap and process reference on the first part, proc mount points, and other fields related to the organization. For example, parent points to the parent namespace and level indicates the depth of the current namespace. Here, we need to explain the role of process 1. in UNIX, process 1 is very important because after entering the new namespace, the process 1 of the parent namespace will be disconnected from each other. Therefore, a new process 1 is required in the new namespace. In the Linux implementation, the process cloned using clone_newpid plays the role of process 1. In fact, its process number is actually 1.
Let's take a look at the implementation of alloc_pid:

For (I = NS-> level; I> = 0; I --) {// TMP is the currently traversed namespace, use its Independent Bitmap to assign the PID value Nr = alloc_pidmap (TMP); If (NR <0) goto out_free; PID-> numbers [I]. nr = nR; PID-> numbers [I]. NS = TMP; TMP = TMP-> parent ;}

As you can see, the default PID namespace is always traced. Each passing PID namespace will assign a PID value to the new process. Therefore, the process in the independent PID namespace will have multiple PID values, each namespace has one.
One experiment

Finally, it's time to try it out. First, execute the program for compiling the code below the samples:

# Include <sched. h> # include <unistd. h> # include <sys/types. h> # include <signal. h> # include <errno. h> # include <sys/Wait. h> char Arg [16] = {0}; int new_ns (void * NUL) {execl ("/bin/bash", "/bin/bash", null );} int main (INT argc, char ** argv) {int res; pid_t newid; long ssize = sysconf (_ SC _pagesize); void * stack = alloca (ssize) + ssize; PID = clone (new_ns, stack, clone_newpid | clone_newns, null); // The waitpid (newid, & res, 0 );}

The code is super simple. Will it enter the new PID namespace after execution? Try it! After the execution, PS-e looked at it and found that process 1 was still Init! Why? Is there anything wrong? It was originally caused by procfs. We know that the ps command is obtained by parsing the content of procfs, And the PID directory of the procfs root directory is based on the PID namespace at the time of Mount, which is reflected in the get_sb callback of procfs. Therefore, you only need to mount proc again:
Mount-T proc/proc
However, you can also write the following code into the new_ns function:

 mount("proc", "/proc", "proc", 0, "");
Correct code

How should I change the wrong code at first? The Linux system has a command called unshare. However, it seems that the unshare PID cannot be used, so you have to write one by yourself. In fact, you don't have to worry about it. Just change the above Code, in new_ns, exec Bash is not used, but parameterization. During execution, you can pass in the initial script as a parameter. However, there is another problem, that is, since the new PID namespace has been reached, the following code is incorrect:

echo $$ >$pidfile

Because the PID of the script is obviously 1, not in the caller's PID namespace, writing this logic is obviously intended for other processes to find the script process and send a signal to it, in this way, the PID is written to the new namespace, and echo $ >$ pidfile is obviously incorrect for other processes. However, after the new namespace, the script cannot know its PID in the parent namespace by itself. Therefore, other processes can only search for the PID in the PS-Ef mode, because although the script is sent to the new namespace, however, it still has a PID in the parent namespace.
I don't know why Linux does not provide system calls such as get_all_pid because it is so insecure that it violates the original intention of isolating namespaces?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.