Deep understanding of python multi-process programming and deep understanding of python Programming
1. python multi-process programming background
The biggest benefit of multi-process in python is to make full use of the resources of multi-core cpu. Unlike the multi-thread in python, it is limited by GIL and can only be allocated to the cpu. in python multi-process, it is suitable for all scenarios where multithreading can be basically used, so that multi-process can be basically used.
During multi-process programming, it is similar to multithreading. In the multi-threaded package threading, there is a Thread class. There are three methods in it to create a Thread and start the Thread, in fact, in multi-Process programming, there is a Process-like Process, which can also be used in a centralized way; in multithreading, data in the memory can be directly shared, such as list, etc, however, in multiple processes, memory data cannot be shared. Therefore, a separate data structure is required to process shared data. In multithreading, data sharing must ensure data correctness, therefore, there must be some, but in multi-process, there should be few considerations for locks, because the process does not share memory information, and interaction data between processes must pass through a special data structure, in multi-process, the main content is as follows:
2. multi-Process-like Process
The methods of multi-Process and multi-Thread threads are similar. The interfaces are basically the same. For details, refer to the following code:
#!/usr/bin/env pythonfrom multiprocessing import Processimport osimport timedef func(name): print 'start a process' time.sleep(3) print 'the process parent id :',os.getppid() print 'the process id is :',os.getpid()if __name__ =='__main__': processes = [] for i in range(2): p = Process(target=func,args=(i,)) processes.append(p) for i in processes: i.start() print 'start all process' for i in processes: i.join() #pass print 'all sub process is done!'
In the above example, we can see that the multi-process and multi-threaded API interfaces are the same. It shows that the process is created, started to run, and joined to wait for the process to end.
In the function to be executed, the id and pid of the process are printed, so that the id of the parent process and child process can be seen. In linu, the process is mainly issued by fork, when creating a process, you can query the id of the parent process and child process, but the thread id cannot be found in multiple threads. The execution result is as follows:
start all processstart a processstart a processthe process parent id : 8036the process parent id : 8036the process id is : 8037the process id is : 8038all sub process is done!
When querying the id in the operating system, it is best to use pstree, clear:
├─sshd(1508)─┬─sshd(2259)───bash(2261)───python(7520)─┬─python(7521) │ │ ├─python(7522) │ │ ├─python(7523) │ │ ├─python(7524) │ │ ├─python(7525) │ │ ├─python(7526) │ │ ├─python(7527) │ │ ├─python(7528) │ │ ├─python(7529) │ │ ├─python(7530) │ │ ├─python(7531) │ │ └─python(7532)
When running, we can see that if there is no join statement, the main process will not wait until the sub-process ends, but will continue to run, and then wait for the sub-process to execute.
How can I get the return value of a multi-process? Then write the following code:
#!/usr/bin/env pythonimport multiprocessingclass MyProcess(multiprocessing.Process): def __init__(self,name,func,args): super(MyProcess,self).__init__() self.name = name self.func = func self.args = args self.res = '' def run(self): self.res = self.func(*self.args) print self.name print self.res return (self.res,'kel')def func(name): print 'start process...' return name.upper()if __name__ == '__main__': processes = [] result = [] for i in range(3): p = MyProcess('process',func,('kel',)) processes.append(p) for i in processes: i.start() for i in processes: i.join() for i in processes: result.append(i.res) for i in result: print i
Try to return the value from the result to get the return value of the sub-process in the main process. However, there is no result. Later I thought that the sub-process does not share the memory in the process, therefore, using list to store data is obviously not feasible. Interaction between processes must depend on a special data structure. Therefore, the above Code is only the execution process and cannot obtain the return value of the process, however, if the above Code is modified to a thread, the return value can be obtained.
3. Interaction Queue between processes
During inter-process interaction, the same Queue structure can be used in multiple threads, but the Queue in multiprocessing must be used in multiple processes. The Code is as follows:
#!/usr/bin/env pythonimport multiprocessingclass MyProcess(multiprocessing.Process): def __init__(self,name,func,args): super(MyProcess,self).__init__() self.name = name self.func = func self.args = args self.res = '' def run(self): self.res = self.func(*self.args)def func(name,q): print 'start process...' q.put(name.upper())if __name__ == '__main__': processes = [] q = multiprocessing.Queue() for i in range(3): p = MyProcess('process',func,('kel',q)) processes.append(p) for i in processes: i.start() for i in processes: i.join() while q.qsize() > 0: print q.get()
In fact, this is an improvement in the above example. In this example, no other code is used, mainly because Queue is used to save data, so as to exchange data between processes.
When using Queue, we actually use socket, because we still use send and then receive recv.
During data interaction, the parent process interacts with all child processes. Basically, all child processes do not interact with each other unless, however, they do, for example, each process extracts data from the Queue, but the lock should be considered at this time, otherwise it may cause data confusion.
4. Pipe interaction between processes
Pipe can also be used for data interaction between processes. The Code is as follows:
#!/usr/bin/env pythonimport multiprocessingclass MyProcess(multiprocessing.Process): def __init__(self,name,func,args): super(MyProcess,self).__init__() self.name = name self.func = func self.args = args self.res = '' def run(self): self.res = self.func(*self.args)def func(name,q): print 'start process...' child_conn.send(name.upper())if __name__ == '__main__': processes = [] parent_conn,child_conn = multiprocessing.Pipe() for i in range(3): p = MyProcess('process',func,('kel',child_conn)) processes.append(p) for i in processes: i.start() for i in processes: i.join() for i in processes: print parent_conn.recv()
In the above Code, the two sockets returned by Pipe are used to transmit and receive data. In the parent process, parent_conn is used, and child_conn is used in the child process, in this way, the sub-process sends data, and the receiving method recv is carried out in the parent process.
The best thing is to clearly know the number of sending and receiving times, but if an exception occurs, it is estimated that pipe cannot be used.
5. Process pool
In fact, when using multi-process, it is most convenient to use the pool, and there is no pool in multithreading.
When using the pool, you can limit the number of processes each time, that is, the remaining processes are in the queue, and only the set number of processes is running. By default, process is the number of CPUs, that is, according to multiprocessing. cpu_count.
In poo, there are two methods. One is map and the other is imap. In fact, these two methods are extremely convenient. After the execution is complete, you can get the returned results of each process, but the disadvantage is that each time there is only one parameter, that is, in the executed function, there is only one parameter at most. Otherwise, you need to use the method of combining parameters. The Code is as follows:
#!/usr/bin/env pythonimport multiprocessingdef func(name): print 'start process' return name.upper()if __name__ == '__main__': p = multiprocessing.Pool(5) print p.map(func,['kel','smile']) for i in p.imap(func,['kel','smile']): print i
When map is used, a list is directly returned, which is the result of function execution. In imap, an iterator composed of results is returned, if multiple parameters are required, it is estimated that * args is required to use the parameter args.
When using apply. async, you can directly use multiple parameters, as shown below:
#!/usr/bin/env pythonimport multiprocessingimport timedef func(name): print 'start process' time.sleep(2) return name.upper()if __name__ == '__main__': results = [] p = multiprocessing.Pool(5) for i in range(7): res = p.apply_async(func,args=('kel',)) results.append(res) for i in results: print i.get(2.1)
When you get the results, pay attention to using a list for append. Otherwise, the process will be blocked when the get result is obtained, so that the multi-process is programmed into a single process, A list is used to store related results. You can set the timeout value, that is, get (timeout = 5), when getting get data.
Summary:
During multi-process programming, pay attention to interaction between processes. After executing a function, you can use special data structures such as Queue, Pipe, or others to obtain the result of executing the function, when using the pool, you can get the result directly. map and imap both directly get a list and iteratable object, and the result obtained by apply_async needs to be loaded with a list, then obtain each result.
The above in-depth understanding of python multi-process programming is a small part of the Content shared to everyone, I hope to give you a reference, but also hope you can support a lot of help home.