Localtime deadlock-multi-thread fork sub-process, localtimefork

Source: Internet
Author: User

Localtime deadlock-multi-thread fork sub-process, localtimefork

Recently we tested our self-improved redis and found that the sub-process will remain hang during rdb, And the stack on gdb attach is as follows:

(gdb) bt#0  0x0000003f6d4f805e in __lll_lock_wait_private () from /lib64/libc.so.6#1  0x0000003f6d49dcad in _L_lock_2164 () from /lib64/libc.so.6#2  0x0000003f6d49da67 in __tz_convert () from /lib64/libc.so.6#3  0x0000000000421004 in redisLogRaw (level=2, msg=0x7fff9f412b50 "[INFQ_INFO]: [infq.c:1483] infq persistent dump, suffix: 405665, start_index: 35637626, ele_count: 4") at redis.c:332#4  0x0000000000421256 in redisLog (level=2, fmt=0x4eedcf "[INFQ_INFO]: %s") at redis.c:363#5  0x000000000043926b in infq_info_log (msg=0x7fff9f413090 "[infq.c:1483] infq persistent dump, suffix: 405665, start_index: 35637626, ele_count: 4") at object.c:816#6  0x00000000004b5677 in infq_log (level=1, file=0x501465 "infq.c", lineno=1483, fmt=0x5024e0 "infq persistent dump, suffix: %d, start_index: %lld, ele_count: %d") at logging.c:81#7  0x00000000004b37f8 in dump_push_queue (infq=0x17d07f0) at infq.c:1480#8  0x00000000004b1566 in infq_dump (infq=0x17d07f0, buf=0x7fff9f413650 "", buf_size=1024, data_size=0x7fff9f413a5c) at infq.c:720#9  0x00000000004440e6 in rdbSaveObject (rdb=0x7fff9f413c50, o=0x7f8c0b4d0470) at rdb.c:600#10 0x000000000044429c in rdbSaveKeyValuePair (rdb=0x7fff9f413c50, key=0x7fff9f413b90, val=0x7f8c0b4d0470, expiretime=-1, now=1434687031023) at rdb.c:642#11 0x0000000000444471 in rdbSaveRio (rdb=0x7fff9f413c50, error=0x7fff9f413c4c) at rdb.c:686#12 0x0000000000444704 in rdbSave (filename=0x7f8c0b410040 "dump.rdb") at rdb.c:750#13 0x00000000004449cd in rdbSaveBackground (filename=0x7f8c0b410040 "dump.rdb") at rdb.c:831#14 0x0000000000422b0e in serverCron (eventLoop=0x7f8c0b45a150, id=0, clientData=0x0) at redis.c:1240#15 0x000000000041d47e in processTimeEvents (eventLoop=0x7f8c0b45a150) at ae.c:311#16 0x000000000041d7c0 in aeProcessEvents (eventLoop=0x7f8c0b45a150, flags=3) at ae.c:423#17 0x000000000041d8de in aeMain (eventLoop=0x7f8c0b45a150) at ae.c:455#18 0x0000000000429ae3 in main (argc=2, argv=0x7fff9f414168) at redis.c:3843

Are blocked on redisLog for printing logs. Call localtime to generate logs. View the glibc code glibc-2.9/time/localtime. c:

/* Return the `struct tm' representation of *T in local time,   using *TP to store the result.  */struct tm *__localtime_r (t, tp)     const time_t *t;     struct tm *tp;{  return __tz_convert (t, 1, tp);}weak_alias (__localtime_r, localtime_r)/* Return the `struct tm' representation of *T in local time.  */struct tm *localtime (t)     const time_t *t;{  return __tz_convert (t, 1, &_tmbuf);}libc_hidden_def (localtime)

Both localtime and localtime_r call the _ tz_convert function to complete the actual function, then look at this function, in the glibc-2.9/time/tzset. c:

/* This locks all the state variables in tzfile. c and this file. */_ libc_lock_define_initialized (static, tzset_lock)/* Return the 'struct tm 'representation of * TIMER in the local timezone. use local time if USE_LOCALTIME is nonzero, UTC otherwise. */struct tm * _ tz_convert (const time_t * timer, int use_localtime, struct tm * tp) {long int leap_correction; int leap_extra_secs; if (timer = NULL) {_ set_errno (EINVAL); return NULL;} // locks _ libc_lock_lock (tzset_lock); // some outgoing logic // unlock _ libc_lock_unlock (tzset_lock ); return tp ;}

This function uses the tzset_lock global lock and is a static variable. Because of lock access, localtime_r is thread-safe, but localtime uses global variables, so it is not thread-safe. However, neither of these functions ensures signal security. If they are used in signal processing functions, the deadlock should be considered. For example, if the program calls localtime_r and the signal occurs after the lock occurs, and localtime_r is also called in the signal processing function, it will be blocked because the lock cannot be obtained.

Why is the above localtime deadlock not occurring in native redis?

Because native redis does not use multiple threads to call the localtime function. During the fork sub-process, the call to localtime is complete, that is, the lock is released.

Because our improved redis uses multithreading and will call redisLog to print logs, a thread may be in the localtime function call (locked, but not unlocked). In this case, the sub-process shares the memory space of the master process in the copy-on-write mode, so the corresponding localtime lock is also occupied, therefore, sub-processes are always blocked.

What is the solution?

If we have control over the lock, we can use the library function pthead_atfork to unlock the sub-process before calling fork to create the sub-process, to achieve consistent state.

     #include <pthread.h>     int     pthread_atfork(void (*prepare)(void), void (*parent)(void), void (*child)(void));
The prepare function pointer is called before fork, and parent and child are called after fork is returned in the parent and child processes respectively. In this way, all locks can be released in the prepare, and the parent locks as needed.

The above method does not work because there is no way to operate the lock used by localtime. Here, we adopt the compromise method: The serverCron timer in redis is used to update localtime and save it to the global variable. When the multi-thread printing log of the component is used, it only gets the cached global variable, this prevents multiple threads from calling the localtime function. Since serverCron is executed at a maximum interval of 10 ms, there will be no too many errors, which is completely available for logs.
In conclusion, such functions with global locks are not signal-safe, such as localtime, free, and malloc. At the same time, this type of function can be called in multi-threaded mode and may be deadlocked during fork sub-processes. To avoid this situation, it is to ensure that no lock will occur during fork (you can avoid multi-threaded calls or use custom lock zone control ).


Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.