Deep understanding of Python multi-process programming

Source: Internet
Author: User
1. Python multi-process programming background

The greatest benefit of many processes in Python is to make full use of the resources of multicore CPUs, unlike Python's multi-threading, which is constrained by the Gil's limitations, and thus can only be CPU-allocated, in Python's multi-process, suitable for all occasions, basically can use multi-threaded, then basically can use many processes.

In the multi-process programming, in fact, and multithreading almost, in multi-threaded packet threading, there is a threading class thread, in which there are three ways to create a thread, start the thread, in fact, in multi-process programming, there is a process class processes, You can also use the centralized method to use, in the multi-threading, in-memory data can be directly shared, such as list, but in the multi-process, the memory data is not shared, so the need to use a separate data structure to process the shared information, in multi-threading, data sharing, to ensure the correctness of data, This must be done, but in a multi-process, the lock should be considered very few, because the process is not sharing memory information, the interaction between the process data must be through a special data structure, in the multi-process, the main content such as:

2. Multi-Process class process

Multi-Process class process and multi-threaded class thread similar method, the interface is basically the same, the specific look at the following code:

#!/usr/bin/env pythonfrom multiprocessing Import processimport osimport timedef func (name):  print ' Start a process '  C1/>time.sleep (3)  print ' The process parent ID: ', os.getppid ()  print ' The process ID is: ', os.getpid () if __name__ = = ' __main__ ':  processes = [] for  I in range (2):    p = Process (target=func,args= (i,))    Processes.append (p ) for  i in processes:    i.start ()  print ' Start all process ' for  i in processes:    i.join ()    # Pass  print ' All sub process is done! '

As can be seen in the example above, the multi-process and multi-threaded API interface is the same, showing the creation process, then start running, and then join waits for the process to end.

In the function that needs to be executed, the ID and PID of the process are printed out, so that the ID number of the parent process and the child process can be seen, in Linu, the process is mostly fork-out, the ID number of the parent process and child process can be queried when the process is created, and the ID of the thread cannot be found in multiple threads The following results are performed:

Start all Processstart a processstart a processthe process parent id:8036the Process the parent id:8036the process ID is: 8037the Process ID Is:8038all sub process is done!

In the operating system to query the ID, it is best to use Pstree, clear:

├─sshd (1508) ─┬─sshd (2259) ───bash (2261) ───python (7520) ─┬─python (7521)    │      │                    ├─python (7522)    │      │                    ├─python (7523)    │      │                    ├─python (7524)    │      │                    ├─python (7525)    │                    │ ├─python (7526)    │      │                    ├─python (7527)    │      │                    ├─python (7528)    │      │                    ├─python (7529)    │      │                    ├─python (7530)    │      │                    ├─python (7531)    │      │                    └─python (7532)

When running, you can see that if there is no join statement, the main process will not wait for the child process to end, it will continue to execute, and then wait for the child process execution.

In a multi-process, say, how do I get a multi-process return value? Then write the following code:

#!/usr/bin/env pythonimport multiprocessingclass myprocess (multiprocessing. Process):  def __init__ (Self,name,func,args):    super (Myprocess,self). __init__ ()    self.name = name    Self.func = func    Self.args = args    self.res = "  def run (self):    self.res = Self.func (*self.args)    Print self.name    print self.res    return (self.res, ' Kel ') def func (name):  print ' Start process ... '  Return Name.upper () if __name__ = = ' __main__ ':  processes = []  result = [] for  I in range (3):    p = MyProc ESS (' Process ', func, (' Kel ',))    Processes.append (p) for  I in processes:    I.start () for  I in Processes:    i.join () for  I in processes:    result.append (i.res) for  i in result:    print I

Attempting to return a value from the result, resulting in the return value of the child process in the main process, however, there is no result, and later thought, in the process, between processes is not shared memory, then using list to hold the data is obviously not feasible, the interaction between processes must depend on the special data structure, Thus the above code is only the execution process, can not get the return value of the process, but the above code is modified to a thread, then you can get the return value.

3. Interactive queue between processes

When interacting between processes, you can first use the same queue structure in multi-threaded, but in multi-process, you must use the queue in multiprocessing, the code is as follows:

#!/usr/bin/env pythonimport multiprocessingclass myprocess (multiprocessing. Process):  def __init__ (Self,name,func,args):    super (Myprocess,self). __init__ ()    self.name = name    Self.func = func    Self.args = args    self.res = "  def run (self):    self.res = Self.func (*self.args) def func ( NAME,Q):  print ' Start process ... '  q.put (Name.upper ()) If __name__ = = ' __main__ ':  processes = []  q = Multiprocessing. Queue ()  for I in range (3):    p = myprocess (' process ', func, (' Kel ', q))    Processes.append (p) for  I in Processes:    I.start () for  I in processes:    i.join () while  q.qsize () > 0:    print q.get ()

In fact, this is the improvement of the above example, in which there is no other code to use, the main is to use the queue to save data, so as to achieve the purpose of exchanging data between processes.

In the use of the queue, it is useful for the socket, feel, because it is used in the sending send, and then receive recv.

In the data interaction, it is the parent process and all the child processes to interact with the data, all the sub-processes are basically no interaction, unless, however, it is also possible, for example, each process to the queue to fetch data, but this time should be to consider the lock, otherwise it may cause data confusion.

4. Interaction between Processes Pipe

You can also use pipe when interacting with data between processes, with the following code:

#!/usr/bin/env pythonimport multiprocessingclass myprocess (multiprocessing. Process):  def __init__ (Self,name,func,args):    super (Myprocess,self). __init__ ()    self.name = name    Self.func = func    Self.args = args    self.res = "  def run (self):    self.res = Self.func (*self.args) def func ( NAME,Q):  print ' Start process ... '  child_conn.send (Name.upper ()) If __name__ = = ' __main__ ':  processes = [ ]  Parent_conn,child_conn = multiprocessing. Pipe () for  I in range (3):    p = myprocess (' process ', func, (' Kel ', child_conn))    processes.append (p)  For I in processes:    I.start () for  I in processes:    i.join () for  I in processes:    print Parent_ CONN.RECV ()

In the above code, mainly using the two sockets returned in the pipe to transmit and receive data, in the parent process, using parent_conn, in the child process is using child_conn, so that the child process sends the data of the method send, In the parent process, the Receive method recv

The best part is that you know exactly how many times you send and receive, but if an exception occurs, then the pipe cannot be used.

5. Process Pool

In fact, when using multi-process, feel the use of pool is the most convenient, in multi-threading does not exist pool.

When using pool, you can limit the number of processes each time, that is, the remaining processes are queued, and only in the set number of processes running, in the default case, the process is the number of CPUs, that is, according to Multiprocessing.cpu_count () results.

In poo, there are two methods, one is the map one is IMAP, in fact, these two methods super convenient, after the execution, you can get the return of each process, but the disadvantage is that each time, only one parameter, that is, in the execution of the function, the maximum is only one parameter, otherwise, You need to use a method of combining parameters, as shown in the following code:

#!/usr/bin/env Pythonimport multiprocessingdef func (name):  print ' Start process '  return Name.upper () if __name __ = = ' __main__ ':  p = multiprocessing. Pool (5)  print p.map (func,[' kel ', ' Smile ')) for  I in P.imap (func,[' kel ', ' Smile ']):    Print I

When using a map, the direct return is a list, which is the result of the function execution, and in IMAP, the return is a result of an iterator, if you need to use more than one parameter, then the estimated need to *args, thus using the parameter args.

When using Apply.async, you can use multiple parameters directly, as follows:

#!/usr/bin/env pythonimport multiprocessingimport timedef func (name):  print ' Start process '  time.sleep (2)  return Name.upper () if __name__ = = ' __main__ ':  results = []  p = multiprocessing. Pool (5) for  I in range (7):    res = P.apply_async (func,args= (' Kel ',))    Results.append (res) for  i in Results:    print I.get (2.1)

When the results are obtained, note that a list is used for append, otherwise the process is blocked by getting the result get, so that a single process is programmed for the multi-process, and a list is used to store the relevant results, and the time-out can be set when the get data is obtained. That is, get (timeout=5), this setting.

Summarize:

In the process of multi-process programming, pay attention to the interaction between processes, after executing the function, how to get the result of executing the function, you can use special data structure, such as queue or pipe or other, when using pool, can directly get the result, Both map and IMAP have direct access to a list and an iterative object, and Apply_async results need to be loaded with a list and then get each result.

This in-depth understanding of Python multi-process programming is a small part of the whole content to share to everyone, I hope to give you a reference, but also hope that we support the script home.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.