Inter-process shared memory causes deadlock due to a process exception exit

Source: Internet
Author: User
Tags fpm mutex php and sleep usleep

Resolve shared data issues between Nginx and fpm-php and more internal processes

Concept Description:

1. minit:php extended initialization method, the entire module startup time is called once

2. Rinit:php extended initialization method, each request is called once

3. Clustermap (cm): Provide service location and cluster map function, collect node state information by receiving heartbeat and active detection mode, manage heterogeneous clusters uniformly, replace hard load balancing equipment

4. CMSUBPROXY:CLUSTERMAP Internal Subscriber Client Agent, regular and server-side communication, get the latest cluster information, update the internal maintenance of the machine list




Problem Description





Nginx or php-cgi are using multiple processes to provide a large concurrency service, if the service internally want to provide a common function module, users need to write a extension or module, the most recent in the Clustermap Subscriber client, Subscriber client is an extension of PHP, when the request arrives, the PHP extension will get the latest list of machines with cmserver communication, but it is expensive to get the machine list every time you request it.





In Apache module mode, the implementation is simple, Apache first starts the parent process a calls the Minit method, after the call completes fork other httpd subprocess B,a and B is a parent-child relationship, so that the parent process a can update the cluster information periodically, It then communicates through the pipeline and the subprocess, which, when each request comes in, reads the pipeline message (that is, the machine list) to achieve service positioning, but the php-fpm pattern is slightly different, and the PHP-FPM process Manager starts process A to invoke the Minit method. Then fork out a fpm-master process B, process B starts more than one php-cgi subprocess C, after the startup process is complete, the start processes a exits, the child process calls Rinit when each request comes in, then the parent-child AC process Pipeline communication is not established, the pipeline data is not able to consume, Causes the child process C to block if it is fully written. In fact, this problem is very common, if you modify the source code of PHP to solve is not necessarily a good solution.





Problem Analysis





To sum up the above problems, frankly speaking is a number of service processes, each process at the request, the first need to service positioning, access to the latest list of machines (need a network overhead), and then forward the request to other services, then we take fpm-php as an example to address the above problems


Scenario One: Each request in rinit gets the latest list of machines on the server side first





1.2. 3.4. 5.6. 7.8. 9.10. 10 seconds passed.


If you haven't found a problem then you don't have to learn to nginx this thing that rides faster than a donkey, it's obvious that each request in Rinit has a network overhead, and the server side gets the latest list of machines, greatly increasing the response time of the entire request.


At this time some people said that the problem is good to solve, I do not need so frequent updates on the good, 10 requests, 100 requests to update once, or 1s seconds, 10s update once, neither affect performance, but also to achieve the update effect, in performance and update the frequency of doing trade-off, so always can it, so there is a plan two





Scenario Two: Each request in rinit first to the server to get the latest list of machines, while getting an expiration time from the server side, the subsequent request if no more than the expiration time will not need to go to the server side to get updates





Plan II can be said to be able to solve the problem, strictly speaking only part of the problem, the symptoms do not cure


Because if you want better performance, for each process, it is necessary to slow down the update cycle, loss of accuracy, if you want a higher accurate line, you need to update each process frequently, in search and advertising such a large concurrency, overtime sensitive services in front of this solution is too unfriendly, the most important thing, Every worker process has to be updated, although each process gets exactly the same information, this is not to say that the Nginx model is not good, this model has its meaning, and the fact that many processes are the nginx, each worker is an independent process, The programming is simple and does not need to lock, the process does not affect each other, reduces the risk.





Scenario three: Using shared memory, start an update process individually, update cluster node information in real time, write shared memory, rinit each request read shared memory get the latest machine list





Scenario three takes advantage of the same characteristics as the machine list obtained by multiple worker processes, sharing data between processes through shared memory, so that the worker process does not require network overhead and can quickly get the latest list of machines





resolution Method





At present, the Clustermap adopts scheme three to solve this problem by sharing memory.





Cmsubproxy is a separate update process, every 500ms will send a request to Cmserver, get the latest list of machines, after receiving a response message, Cmsubproxy will update the process of internal maintenance of the list of machines, after the successful update will be written to the shared memory;


The PHP-FPM process, when each request arrives, reads the list of machines in the shared memory first, and then forwards the request to one of the available machines in the list, and the machine chooses multiple strategies (polling, randomization, weights, consistency hashes, etc.);





Shared memory is mmap open, it should be noted that in the update and read the need to read and write locks, and lock semaphore to be in shared memory, about multiple processes shared memory lock





pthread_rwlockattr_t attr;


Pthread_rwlockattr_init (&ATTR);


Pthread_rwlockattr_setpshared (&attr, pthread_process_shared);





Description of pthread_rwlockattr_setpshared in the Handbook





Pthread_rwlockattr_setpshared (pthread_rwlockattr_t *attr, int pshared);





DESCRIPTION


The pthread_rwlockattr_setpshared () function sets the process-shared attribute of attr to the value referenced by pshared. Pshared may be one of two values:





pthread_process_shared any thread of no PROCESS that has access to the memory where the Read/write lock resides can mans Ipulate the lock.





The data written in shared memory is cmsubproxy and incremental, and the incremental data is written after the full amount of data, which is not described in detail here.














resolves shared memory between processes, causing a deadlock problem due to a process exception exit





Now that the problem has been confirmed is to get read lock after the process abnormal exit caused, I write a test program to reproduce the problem





(! 2293)-> Cat Test/read_shared.cpp





#include





sharedupdatedata* _sharedupdatedata = NULL;


cm_sub::cmmapfile* _mmapfile = NULL;





int32_t initsharedmemread (const std::string& Mmap_file_path)


{


_mmapfile = new (Std::nothrow) cm_sub::cmmapfile ();


if (_mmapfile = NULL | |!_mmapfile->open (MMAP_FILE_PATH.C_STR (), file_open_write))


{


return-1;


}


_sharedupdatedata = (sharedupdatedata*) _mmapfile->offset2addr (0);


return 0;


}





int main (int argc, char** argv)


{


if (Initsharedmemread (argv[1])!= 0) return-1;





int cnt = 100;


while (CNT > 0)


{


Pthread_rwlock_rdlock (& (_sharedupdatedata->_lock));


fprintf (stdout, "version =%ld, readers =%u\n",


_sharedupdatedata->_version, _sharedupdatedata->_lock.__data.__nr_readers);


if (cnt = 190)


{


Exit (0);


}


Sleep (1);


Pthread_rwlock_unlock (& (_sharedupdatedata->_lock));


-CNT;


Usleep (100*1000);


}


Delete _mmapfile;


}





(! 2293)-> Cat Test/write_shared.cpp





#include





sharedupdatedata* _sharedupdatedata = NULL;


cm_sub::cmmapfile* _mmapfile = NULL;





int32_t initsharedmemwrite (const char* Mmap_file_path)


{


_mmapfile = new (Std::nothrow) cm_sub::cmmapfile ();


if (_mmapfile = NULL | |!_mmapfile->open (MMAP_FILE_PATH, File_open_write, 1024))


{


return-1;


}


_sharedupdatedata = (Sharedupdatedata *) _mmapfile->offset2addr (0);


Madvise (_sharedupdatedata, 1024, madv_sequential);





pthread_rwlockattr_t attr;


memset (&attr, 0x0, sizeof (pthread_rwlockattr_t));


if (Pthread_rwlockattr_init (&attr)!= 0 | | pthread_rwlockattr_setpshared (&ATTR, pthread_process_shared)!= 0)


{


return-1;


}


Pthread_rwlock_init (& (_sharedupdatedata->_lock), &attr);


_sharedupdatedata->_updatetime = Autil::timeutility::currenttime ();


_sharedupdatedata->_version = 0;


return 0;


}





int main ()


{


if (Initsharedmemwrite ("Data.mmap")!= 0) return-1;





int cnt = 200;


while (CNT > 0)


{


Pthread_rwlock_wrlock (& (_sharedupdatedata->_lock));


+ + _sharedupdatedata->_version;


fprintf (stdout, "version =%ld, readers =%u\n",


_sharedupdatedata->_version, _sharedupdatedata->_lock.__data.__nr_readers);


Sleep (1);


Pthread_rwlock_unlock (& (_sharedupdatedata->_lock));


-CNT;


Usleep (100*1000);


}


Delete _mmapfile;


}





Whether it is read process or write process, get the lock after the release is too late to hang out will have this problem




How to solve





The problem has been reproduced, think about how to solve it in a good way, search on the Internet, for reading and writing lock there is no good solution, only in the logic of their own solution, can think of is the use of time-out mechanism, that is, write the process to add a time-out, if the process to write to this time still can not get the lock, that the deadlock, Will read the process count minus 1, which is a violent solution that does not explain if there is a good solution to guide me under





Read and write lock code, read and write lock and mutex lock, compared to more suitable for reading and writing less scenes, if the reading process needs to be locked for a long time, it is more appropriate to use the read and write lock, I should be the scene is, read more write less, reading and writing time are very short; it is considered that the performance of mutexes and read-write locks should not In fact, read and write lock inside the same use of mutual-exclusion lock, but is a relatively short lock time, lock the mutex area, go in to see if someone is writing, and then released,


Note that the read-write lock defaults to write first, that is, when writing, or into the write queue ready to write, reading locks are not added, need to wait





OK, let's see if the mutex solves our problem, there is a property called robust lock inside the mutex.





Set lock to robust lock: PTHREAD_MUTEXATTR_SETROBUST_NP





The robustness attribute defines the behavior when the owner


of a mutex dies. The value of robustness could be either


PTHREAD_MUTEX_ROBUST_NP or PTHREAD_MUTEX_STALLED_NP, which


are defined by the header <pthread.h>. The default value of


The robustness attribute is pthread_mutex_stalled_np.





When the owner of a mutexes with the PTHREAD_MUTEX_STALLED_NP


Robustness attribute dies, all future calls to


Pthread_mutex_lock (3C) for this mutex is blocked from


Progress in a unspecified manner.





Repair of inconsistent robust locks: pthread_mutex_consistent_np





A consistent mutex becomes inconsistent and is unlocked if


Its owner dies while holding it, or if the process contain-


ing the owner of the mutex unmaps the memory containing the


Mutex or performs one of the EXEC (2) functions. A subsequent


Owner of the mutex would acquire the mutex with


Pthread_mutex_lock (3C), which'll return Eownerdead to


Indicate the acquired mutex is inconsistent.





The PTHREAD_MUTEX_CONSISTENT_NP () function should be called


While holding the mutex acquired by a previous call to


Pthread_mutex_lock () that returned eownerdead.





Since the critical section protected by the mutex could have


been left in a inconsistent state by the dead owner, the


Caller should make the mutex consistent only if it is able


To make the critical section protected by the mutex con-


Sistent.





In simple terms, when eownerdead is found, the PTHREAD_MUTEX_CONSISTENT_NP function internally determines whether the mutex is a robust lock, and if so, and he ownerdie, then he sets the owner of the lock to his own process ID. , so that this lock can be restored to usable, very simple.





Lock deallocation can be resolved, but when there is shared data between processes within a share, there is also the need to note that the correctness of the data, that is, the integrity of the process to share a different thread, if it is a process of multiple threads, then the process of abnormal exit, the other threads also exit, process sharing is independent, If a write thread exits abnormally while writing the shared data, cause the written data is incomplete, read the process will read the incomplete data, in fact, the data integrity is very good to solve, just want to add a complete tag in the shared memory, lock the shared area, write the data, write after the mark as complete, It's OK, read the process to judge the completion mark when reading





Test code See:





(! 2295)-> Cat Test/read_shared_mutex.cpp





#include





sharedupdatedata* _sharedupdatedata = NULL;


cm_sub::cmmapfile* _mmapfile = NULL;





int32_t initsharedmemread (const std::string& Mmap_file_path)


{


_mmapfile = new (Std::nothrow) cm_sub::cmmapfile ();


if (_mmapfile = NULL | |!_mmapfile->open (MMAP_FILE_PATH.C_STR (), file_open_write))


{


return-1;


}


_sharedupdatedata = (sharedupdatedata*) _mmapfile->offset2addr (0);


return 0;


}





int main (int argc, char** argv)


{


if (argc!= 2) return-1;


if (Initsharedmemread (argv[1])!= 0) return-1;





int cnt = 10000;


int ret = 0;


while (CNT > 0)


{


ret = Pthread_mutex_lock (& (_sharedupdatedata->_lock));


if (ret = = Eownerdead)


{


fprintf (stdout, "%s:version =%ld, lock =%d,%u,%d\n",


Strerror (ret),


_sharedupdatedata->_version,


_sharedupdatedata->_lock.__data.__lock,


_sharedupdatedata->_lock.__data.__count,


_sharedupdatedata->_lock.__data.__owner);


ret = PTHREAD_MUTEX_CONSISTENT_NP (& (_sharedupdatedata->_lock));


if (ret!= 0)


{


fprintf (stderr, "%s\n", Strerror (ret));


Pthread_mutex_unlock (& (_sharedupdatedata->_lock));


Continue


}


}


fprintf (stdout, "version =%ld, lock =%d,%u,%d\n",


_sharedupdatedata->_version,


_sharedupdatedata->_lock.__data.__lock,


_sharedupdatedata->_lock.__data.__count,


_sharedupdatedata->_lock.__data.__owner);


Sleep (5);


Pthread_mutex_unlock (& (_sharedupdatedata->_lock));


Usleep (500*1000);


-CNT;


}


fprintf (stdout, "Go on\n");


Delete _mmapfile;


}





(! 2295)-> Cat Test/write_shared_mutex.cpp





#include





sharedupdatedata* _sharedupdatedata = NULL;


cm_sub::cmmapfile* _mmapfile = NULL;








int32_t initsharedmemwrite (const char* Mmap_file_path)


{


_mmapfile = new (Std::nothrow) cm_sub::cmmapfile ();


if (_mmapfile = NULL | |!_mmapfile->open (MMAP_FILE_PATH, File_open_write, 1024))


{


return-1;


}


_sharedupdatedata = (Sharedupdatedata *) _mmapfile->offset2addr (0);


Madvise (_sharedupdatedata, 1024, madv_sequential);





pthread_mutexattr_t attr;


memset (&attr, 0x0, sizeof (pthread_mutexattr_t));


if (Pthread_mutexattr_init (&attr)!= 0 | | pthread_mutexattr_setpshared (&ATTR, pthread_process_shared)!= 0)


{


return-1;


}


if (PTHREAD_MUTEXATTR_SETROBUST_NP (&attr, pthread_mutex_robust_np)!= 0)


{


return-1;


}


Pthread_mutex_init (& (_sharedupdatedata->_lock), &attr);


_sharedupdatedata->_version = 0;


return 0;


}





int main ()


{


if (Initsharedmemwrite ("Data.mmap")!= 0) return-1;





int cnt = 200;


int ret = 0;


while (CNT > 0)


{


ret = Pthread_mutex_lock (& (_sharedupdatedata->_lock));


if (ret = = Eownerdead)


{


fprintf (stdout, "%s:version =%ld, lock =%d,%u,%d\n",


Strerror (ret),


_sharedupdatedata->_version,


_sharedupdatedata->_lock.__data.__lock,


_sharedupdatedata->_lock.__data.__count,


_sharedupdatedata->_lock.__data.__owner);


ret = PTHREAD_MUTEX_CONSISTENT_NP (& (_sharedupdatedata->_lock));


if (ret!= 0)


{


fprintf (stderr, "%s\n", Strerror (ret));


Pthread_mutex_unlock (& (_sharedupdatedata->_lock));


Continue


}


}


+ + _sharedupdatedata->_version;


fprintf (stdout, "version =%ld, lock =%d,%u,%d\n", _sharedupdatedata->_version,


_sharedupdatedata->_lock.__data.__lock,


_sharedupdatedata->_lock.__data.__count,


_sharedupdatedata->_lock.__data.__owner);


Usleep (1000*1000);


Pthread_mutex_unlock (& (_sharedupdatedata->_lock));


-CNT;


Usleep (500*1000);


}





Delete _mmapfile;


}





BTW: We all know that lock has overhead, not only the waiting overhead caused by mutual exclusion, but also the lock process is called to the kernel state, the process overhead is very large, there is a mutual exclusion lock called Futex lock (Fast User Mutex), Linux starting from the 2.5.7 version to support Futex, Fast User-level mutex, Fetux lock has better performance, is the user state and the core state of mixing the synchronization mechanism, if there is no lock competition, in the user state can be judged to return, do not need system calls,





Of course, any lock is cost, can not be avoided, use double buffer, release linked list, reference count, can be used to a certain extent to replace the use of locks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.