1. Description of the problem
A server SA is a multithreaded server, the main thread calls fork, and then exec generates the worker process SB.
In fact, the main thread of the SA fork out a sub-thread, but does not perform exec.
# PS AJXF | grep r2server
14022 28342 28341 14022 pts/2 28341 s+ 0 0:00 | \_ grep r2server
1 28046 28037 3823? -1 Sl 0 31:25./r2server. /conf/r2server.conf
28046 28075 28037 3823? -1 S 0 0:00 \_./r2server. /conf/r2server.conf
2. Problem locating
2.1 Observe the current stack state of 2 processes with Pstack.
# Pstack 28075
#0 0x00007f40f24bf264 in __lll_lock_wait () from/lib64/libpthread.so.0
#1 0x00007f40f24ba508 in _l_lock_854 () from/lib64/libpthread.so.0
#2 0x00007f40f24ba3d7 in Pthread_mutex_lock () from/lib64/libpthread.so.0
#3 0x000000000043b407 in r2::log::logfactory::log_printf (Char const*, char const*, int, char const*, int, char const*,.. .) ()
The discovery was locked.
Google "Pthread_mutex_lock owner" found literature Https://en.wikibooks.org/wiki/Linux_Applications_Debugging_Techniques/Deadlocks,
To install the above method, locate:
F 3
(GDB) Info reg
rax 0xfffffffffffffe00 -512
rbx 0X7517A0 7673760
rcx 0xffffffffffffffff -1
rdx 0x7f40d9ff86bf 139916512036543
rsi 0x80
rdi 0x753fb0 7684016
rbp 0x753fb0 0x753fb0
rsp 0x7f40d9ff85c0 0x7f40d9ff85c0
r8 0x753fb0 7684016
(GDB) p * (pthread_mutex_t*) 0x753fb0
$ = {__data = {__lock = 2, __count = 0, __owner = 28049,
Indicates that the current thread is waiting for a lock that is occupied by 28049.
# Pstack 28049
Thread 1 (Process 28049):
#0 0x00007f40f1a65ef3 in epoll_wait () from/lib64/libc.so.6
Indicates that the thread has released this lock.
Therefore, the cause is a bug caused by multithreading +fork: When the main thread in process group 28046 calls fork 28046, thread 28049 takes up a lock a (playing log) and creates a child process 28075.
The child process executes the pre-exec code, encounters a log_printf call, and goes to request lock a. Because lock A is lock, the process is deadlocked and cannot be executed to exec.
After thread 28049 of process group 28046 finishes log, the lock a (process group 28046 and process 28075 are two different process spaces, with different page table. Release lock A will then use the copy on write technology to create a new lock a) and continue to execute normally.
Root cause: Because of the existence of multithreading, a thread takes up a lock, so when fork, the process address space of the fork contains the occupied lock. If the call requests this lock before exec, it will cause a deadlock.
3. Solution
1. Multi-threaded or fork two select one.
2. When multithreading +fork, fork to exec only calls async-signal function (man 7 singal), because the process state is unsafe at this time.
Reference documents:
Http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them
Https://en.wikibooks.org/wiki/Linux_Applications_Debugging_Techniques/Deadlocks
Bug lookup caused by multithreading +fork