Module Introduction
New contact on the line when the general will encounter supervise this tool, the instructor of this module is generally explained that this module is the monitoring process, when the process is hung, supervise will start the process. This way, when the process
Problems, such as when the process is down due to core, supervise will immediately start the process up to quickly restore service.
The new person will generally remember this, and then have not been able to understand, supervise specifically how to do this, and from the supervise we can get what information.
In fact supervise is one of the tools of the open source Toolset Daemontools (http://cr.yp.to/daemontools.html), which the company stripped away from Daemontools alone,
And according to the company's needs to change the source code, this job is spring students responsible
Principle of modular tools
The module works in fact very simple, supervise start fork a sub-process, the child process to perform EXECVP system calls, to replace themselves with the execution of the module,
The module becomes supervise, while the supervise is running, and the Waitpid or WAIT3 system call chooses the non-blocking way to listen to the child process running.
Of course, it will also read the pipe file Svcontrol command, and then according to the command to perform different actions,
If a child process exits for some reason, the supervise is learned through waitpid or WAIT3, and continues to boot the module, which causes the supervise to fall into a dead loop if the module is not able to start, and keeps the boot module
The principle is summed up is the tutor to the new person introduced.
Supervise use of the files, and we can use the
Supervise requires a status directory under the current directory, while the status directory will have a directory of module names launched through supervise, and there will be three files in the directory, the lock status Svcontrol, respectively, The corresponding features and information we can get from them are as follows
- Lock
- Role: Supervise file lock, through the file lock to control concurrency, prevent the same status directory to start multiple processes, causing confusion
- Available information: You can use the/sbin/fuser command to get the lock file usage information, if the Fuser return value is 0, indicating that supervise has been started, and Fuser returned the supervise process PID
- Status
- Function: This file is used by supervise to record some information, it can be easily understood as Char status[20], where status[16] records the supervise of the child process started pid,status[17]-status[19] Is the child process PID do not move right 8 bits, do not understand why supervise to do so, and status[0]-status[11] no use, status[12]-status[14] is the flag bit, generally no use, status[15] directly 0
- Available information: can be directly through the OD command to read the file, it is generally useful that the OD-AN-J16-N2-TU2 status can be directly to the supervise responsible for the child process PID
- Svcontrol
- Function: Can be understood as a control interface, supervise read the information of this pipeline, and then according to the information of the pipeline to control the sub-process, and through the control of the child process is actually to send a signal to the child process, Thus, in fact, there is no essential difference in the function of signaling to the child process of supervise by the kill command, but the accuracy can be very high by means of controlling the interface.
- Available information: Direct write command to the file, such as Echo ' d ' > svcontorl, let supervise to control the sub-process, the more commonly used commands are as follows
- D: Stop the child process and do not start
- U: Start child process with respect to D
- K: Send a Kill signal, because after kill supervise immediately restart, it can be considered a restart action
- X: Flag Supervise exit
For the above three files, the usefulness of the information obtained from it, I think mainly in the optimization module of the control script above,
The existing control module start-stop script is usually through the grep command to obtain the corresponding process PID, and then send a signal to the corresponding PID,
This approach is to obtain information from an external means, the accuracy of which will be almost, may cause manslaughter, and if through the status directory of three files to obtain information about the module,
And through the Svcontrol control interface to control, this way is to obtain from the internal mode of the module running some information, can guarantee the accuracy
Space now part of the module's start-stop script has been changed, but there is no comprehensive promotion, because the demand is not strong, so the promotion is not scheduled
Transmit and memcached using the supervise problem
OP alumni found that when using supervise to start transmit, and then through Pstree space (or other work account) to observe the process, you will find the process printing is not standardized, as follows
Supervise.trans---Supervise.trans
Transmit---2*[transmit]
The specific reason is this, supervise through the waitpid or wait3 to the case of the process, but because transmit this module in the internal logic inside the request to form a guardian process,
And then out of the parent process of supervise, as the child process of init, when Waitpid or wait3 return is not expected, supervise considers the child process exception
Thus entered the start transmit stage, to restart the start of a transmit, but the newly-started transmit because the port is already occupied, so it cannot be started
Own process exits, supervise also think that the child process is abnormal, and thus into a dead loop, constantly restart transmit, so through pstree to look at the form of that format
Because at this time transmit and superivse.transmit are not father and son relationship. Of course, it is necessary to mention that when the supervise cycle restarts the child process,
Sleep for a period of time before each reboot, so you can think of almost no machine-consuming resources
memcached If you use the-D option, you'll see a problem like this, and the-D option is for memcached to form its own daemon,
So now space memcached generally put the-d option off, theoretically, as long as the process of forming a daemon, using supervise to start the time will have this problem
This has a lot to do with Rd, personal Advice Rd write programs do not go to the program as a daemon, no meaning
[Reprint] in-depth superviser