This article mainly introduces how to share variables between threads in Python multi-process programming. multi-process programming is an important advanced knowledge in Python learning. For more information, see
1. problem:
Some people in the group pasted the following code and asked why the list finally printed a null value?
from multiprocessing import Process, Managerimport os manager = Manager()vip_list = []#vip_list = manager.list() def testFunc(cc): vip_list.append(cc) print 'process id:', os.getpid() if __name__ == '__main__': threads = [] for ll in range(10): t = Process(target=testFunc, args=(ll,)) t.daemon = True threads.append(t) for i in range(len(threads)): threads[i].start() for j in range(len(threads)): threads[j].join() print "------------------------" print 'process id:', os.getpid() print vip_list
In fact, if you understand the multi-threaded model and GIL of python, and then understand the multi-threaded and multi-process principles, it is not difficult to answer the above questions, but it does not matter if you do not know, run the above code and you will know what the problem is.
python aa.pyprocess id: 632process id: 635process id: 637process id: 633process id: 636process id: 634process id: 639process id: 638process id: 641process id: 640------------------------process id: 619[]
Enable the comments on Line 1, and you will see the following results:
process id: 32074process id: 32073process id: 32072process id: 32078process id: 32076process id: 32071process id: 32077process id: 32079process id: 32075process id: 32080------------------------process id: 32066[3, 2, 1, 7, 5, 0, 6, 8, 4, 9]
2. python multi-process variable sharing methods:
(1) Shared memory:
Data can be stored in a shared memory map using Value or Array. For example, the following code
Http://docs.python.org/2/library/multiprocessing.html#sharing-state-between-processes
from multiprocessing import Process, Value, Array def f(n, a): n.value = 3.1415927 for i in range(len(a)): a[i] = -a[i] if __name__ == '__main__': num = Value('d', 0.0) arr = Array('i', range(10)) p = Process(target=f, args=(num, arr)) p.start() p.join() print num.value print arr[:]
Result:
3.1415927[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]
(2) Server process:
A manager object returned by Manager () controls a server process which holds Python objects and allows other processes to manipulate them using proxies.
A manager returned by Manager () will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Queue, Value and Array.
For the code, see the example at the beginning.
Http://docs.python.org/2/library/multiprocessing.html#managers
3. multi-process problems: Data Synchronization
Look at the simple code: a simple counter:
from multiprocessing import Process, Managerimport os manager = Manager()sum = manager.Value('tmp', 0) def testFunc(cc): sum.value += cc if __name__ == '__main__': threads = [] for ll in range(100): t = Process(target=testFunc, args=(1,)) t.daemon = True threads.append(t) for i in range(len(threads)): threads[i].start() for j in range(len(threads)): threads[j].join() print "------------------------" print 'process id:', os.getpid() print sum.value
Result:
------------------------process id: 1737897
Maybe you will ask: WTF? In fact, this problem exists in the multi-thread era, but in the multi-process era, it has been repeated: Lock!
from multiprocessing import Process, Manager, Lockimport os lock = Lock()manager = Manager()sum = manager.Value('tmp', 0) def testFunc(cc, lock): with lock: sum.value += cc if __name__ == '__main__': threads = [] for ll in range(100): t = Process(target=testFunc, args=(1, lock)) t.daemon = True threads.append(t) for i in range(len(threads)): threads[i].start() for j in range(len(threads)): threads[j].join() print "------------------------" print 'process id:', os.getpid() print sum.value
What is the performance of this code? Run it, or increase the number of cycles...
4. final suggestions:
Note that usually sharing data between processes may not be the best choice, because of all the synchronization issues; an approach involving actors exchanging messages is usually seen as a better choice. see also Python documentation: As mentioned above, when doing concurrent programming it is usually best to avoid using shared state as far as possible. this is particle ly true when using multiple processes. however, if you really do need to use some shared data then multiprocessing provides a couple of ways of doing so.
5. Refer:
Http://stackoverflow.com/questions/14124588/python-multiprocessing-shared-memory
Http://eli.thegreenplace.net/2012/01/04/shared-counter-with-pythons-multiprocessing/
Http://docs.python.org/2/library/multiprocessing.html#multiprocessing.sharedctypes.synchronized