Details about python processes and process pools (Processing Library) and pythonprocessing

Source: Internet
Author: User

Details about python processes and process pools (Processing Library) and pythonprocessing

Environment: win7 + python2.7

I always wanted to learn multi-process or multi-thread, but I just read some basic knowledge and briefly introduced it before, so I cannot understand how to apply it, some time ago, I saw that a crawler project on github involves multi-process and multi-thread related content, while looking at Baidu related knowledge points. Now I have written down some related knowledge points and some applications to make a record.

First, let's talk about what a process is: a process is an execution activity of a program on a computer. When a program is run, a process is started. processes are classified into system processes and user processes. as long as the process is used to complete various functions of the operating system is the system process, they are the operating system itself in the running state, and all the processes started by you are the user process. A process is the unit in which the operating system allocates resources.

From the point of view, the user name in the task manager indicates that the system process indicates that the administrator process is the user process, and the net is the Web interface, and the lcacal service is the local service, for more detailed information about the process, see the encyclopedia. You have to save some effort here, or you will not receive it back.

1. simple use of multi-process

Multiprocessing has multiple functions, and I haven't learned much about them yet. Here I only talk about what I know now.

Process Creation: Process (target = main running function, name = custom Process name, args = (parameter ))

Method:

  1. Is_alive (): determines whether the process is alive.
  2. Join ([timeout]): the sub-process ends and then executes the next step. timeout is the timeout time. Sometimes the process is blocked. Set the timeout time for the program to run.
  3. Run (): if the target is not specified when the Process object is created, the run method of Process is executed by default.
  4. Start (): start the process to distinguish run ()
  5. Terminate (): It is not so easy to terminate a process. It seems that it will be better to use the psutil package. If you have the opportunity to learn more and then write it down.

Process starts a Process with start.

Attribute:

  1. Authkey: In this document, the authkey () function finds the following sentence: Set authorization key of process authorization key. Currently, no application instance is found. How is this key used? Not mentioned
  2. Daemon: the parent process is automatically terminated upon termination and cannot generate a new process. It must be set before start ().
  3. Exitcode: the process is running as None. If it is-N, the process is ended by signal N.
  4. Name: process name, custom
  5. Pid: each process has a unique PID Number.

1. Process (), start (), join ()

# -*- coding:utf-8 -*-from multiprocessing import Processimport timedef fun1(t): print 'this is fun1',time.ctime() time.sleep(t) print 'fun1 finish',time.ctime()def fun2(t): print 'this is fun2',time.ctime() time.sleep(t) print 'fun2 finish',time.ctime()if __name__ == '__main__': a=time.time() p1=Process(target=fun1,args=(4,)) p2 = Process(target=fun2, args=(6,)) p1.start() p2.start() p1.join() p2.join() b=time.time() print 'finish',b-a

There are two processes in total. 4 in p1 and p2, arg = (4,) is the parameter of the fun1 function. here we need to use the tulpe type, if two or more parameters are arg = (parameter 1, parameter 2 ...), start the process with start (). Let's wait until the p1 and p2 processes are finished and then execute the next step. let's take a look at the following running results. fun2 and fun1 start to run at the same time. When the running is completed (fun1 sleeps for 4 seconds and fun2 sleeps for 6 seconds at the same time), print 'finish 'is executed ', b-a statement

this is fun2 Mon Jun 05 13:48:04 2017this is fun1 Mon Jun 05 13:48:04 2017fun1 finish Mon Jun 05 13:48:08 2017fun2 finish Mon Jun 05 13:48:10 2017finish 6.20300006866Process finished with exit code 0

Let's take a look at the difference between start () and join ().

# -*- coding:utf-8 -*-from multiprocessing import Processimport timedef fun1(t): print 'this is fun1',time.ctime() time.sleep(t) print 'fun1 finish',time.ctime()def fun2(t): print 'this is fun2',time.ctime() time.sleep(t) print 'fun2 finish',time.ctime()if __name__ == '__main__': a=time.time() p1=Process(target=fun1,args=(4,)) p2 = Process(target=fun2, args=(6,)) p1.start() p1.join() p2.start() p2.join() b=time.time() print 'finish',b-a

Result:

this is fun1 Mon Jun 05 14:19:28 2017fun1 finish Mon Jun 05 14:19:32 2017this is fun2 Mon Jun 05 14:19:32 2017fun2 finish Mon Jun 05 14:19:38 2017finish 10.1229999065Process finished with exit code 0

Now, run the fun1 function, run fun2, and print 'finish '. That is, run process p1 and then run process p2 () the charm. now try to comment out join () and see what will happen again.

# -*- coding:utf-8 -*-from multiprocessing import Processimport timedef fun1(t): print 'this is fun1',time.ctime() time.sleep(t) print 'fun1 finish',time.ctime()def fun2(t): print 'this is fun2',time.ctime() time.sleep(t) print 'fun2 finish',time.ctime()if __name__ == '__main__': a=time.time() p1=Process(target=fun1,args=(4,)) p2 = Process(target=fun2, args=(6,)) p1.start() p2.start() p1.join() #p2.join() b=time.time() print 'finish',b-a

Result:

this is fun1 Mon Jun 05 14:23:57 2017this is fun2 Mon Jun 05 14:23:58 2017fun1 finish Mon Jun 05 14:24:01 2017finish 4.05900001526fun2 finish Mon Jun 05 14:24:04 2017Process finished with exit code 0

This operation completes fun1 (because the p1 process uses join (), the main program waits for p1 to run and then executes the next step), and continues to run the print 'finish 'of the main process ', the final fun2 operation is complete.

2. name, daemon, is_alive ():

#-*-Coding: UTF-8-*-from multiprocessing import Processimport timedef fun1 (t): print 'this is fun1', time. ctime () time. sleep (t) print 'fun1 finish ', time. ctime () def fun2 (t): print 'this is fun2', time. ctime () time. sleep (t) print 'fun2 finish ', time. ctime () if _ name _ = '_ main _': a = time. time () p1 = Process (name = 'fun1 process', target = fun1, args = (4,) p2 = Process (name = 'fun2 process', target = fun2, args = (6,) p1.daemon = True p2.daemon = True p1.start () p2.start () p1.join () print p1, p2 print 'process 1: ', p1.is _ alive (), 'process 2: ', p2.is _ alive () # p2.join () B = time. time () print 'finish ', B-

Result:

This is fun2 Mon Jun 05 14:43:49 2017 this is fun1 Mon Jun 05 14:43:49 worker fun1 finish Mon Jun 05 14:43:53 2017 <Process (fun1 Process, stopped daemon)> <Process (fun2 Process, started daemon)> Process 1: False Process 2: Truefinish 4.06500005722 Process finished with exit code 0

We can see that the name is to give the process a name, run to print 'process 1: ', p1.is _ alive (), 'process 2:', p2.is _ alive () in this statement, the p1 process has ended (False is returned) and the p2 process is still running (True is returned), but p2 does not use join (). Therefore, the main process is directly executed, because daemon = Ture is used, the parent process is automatically terminated after termination, and the entire program is forcibly terminated if the p2 process is not terminated.

3. run ()

When the target function is not specified in the Process, the run () function is used by default to run the program,

# -*- coding:utf-8 -*-from multiprocessing import Processimport timedef fun1(t): print 'this is fun1',time.ctime() time.sleep(t) print 'fun1 finish',time.ctime()def fun2(t): print 'this is fun2',time.ctime() time.sleep(t) print 'fun2 finish',time.ctime()if __name__ == '__main__': a = time.time() p=Process() p.start() p.join() b = time.time() print 'finish', b - a

Result:

finish 0.0840001106262

From the results, we can see that process p has nothing to do. In order to make the process run normally, we will write:

The target function has no parameters:

# -*- coding:utf-8 -*-from multiprocessing import Processimport timedef fun1(): print 'this is fun1',time.ctime() time.sleep(2) print 'fun1 finish',time.ctime()def fun2(t): print 'this is fun2',time.ctime() time.sleep(t) print 'fun2 finish',time.ctime()if __name__ == '__main__': a = time.time() p=Process() p.run=fun1 p.start() p.join() b = time.time() print 'finish', b - a

Result:

this is fun1 Mon Jun 05 16:34:41 2017fun1 finish Mon Jun 05 16:34:43 2017finish 2.11500000954Process finished with exit code 0

The target function has parameters:

# -*- coding:utf-8 -*-from multiprocessing import Processimport timedef fun1(t): print 'this is fun1',time.ctime() time.sleep(t) print 'fun1 finish',time.ctime()def fun2(t): print 'this is fun2',time.ctime() time.sleep(t) print 'fun2 finish',time.ctime()if __name__ == '__main__': a = time.time() p=Process() p.run=fun1(2) p.start() p.join() b = time.time() print 'finish', b - a

Result:

this is fun1 Mon Jun 05 16:36:27 2017fun1 finish Mon Jun 05 16:36:29 2017Process Process-1:Traceback (most recent call last): File "E:\Anaconda2\lib\multiprocessing\process.py", line 258, in _bootstrap self.run()TypeError: 'NoneType' object is not callablefinish 2.0529999733Process finished with exit code 0

The target function has a parameter exception. Why? I still cannot find the cause, but it is found that when the last parameter is assigned to the process for running, there will be no other parameters, and this exception will occur. Someone may know it.

2. Process pool

It is convenient to use Process when several or even a dozen processes are required. However, if hundreds of processes are required, it is too stupid to use Process. multiprocessing provides the Pool class, that is to say, the process pool that we want to talk about now can be used to put a large number of processes together, set a maximum number of Running Processes, run only the set number of processes at a time, wait for a process to end, and then add a new process

Pool (processes = num): sets the number of running processes. After a process is completed, a new process is added.

Apply_async (function, (parameter): non-blocking, where the parameter is of the tulpe type,

Apply (function, (parameter): Blocking

Close (): close the pool and no more tasks can be added.

Terminate (): ends a running process and does not process unfinished tasks.

Join (): it serves the same purpose as Process, but should be used after close or terminate.

1. Single Process pool

#-*-Coding: UTF-8-*-from multiprocessing import Poolimport timedef fun1 (t): print 'this is fun1', time. ctime () time. sleep (t) print 'fun1 finish ', time. ctime () def fun2 (t): print 'this is fun2', time. ctime () time. sleep (t) print 'fun2 finish ', time. ctime () if _ name _ = '_ main _': a = time. time () pool = Pool (processes = 3) # three processes can be run simultaneously for I in range (): pool. apply_async (fun1, (I,) pool. close () pool. join () B = time. time () print 'finish ', B-

Result:

this is fun1 Mon Jun 05 15:15:38 2017this is fun1 Mon Jun 05 15:15:38 2017this is fun1 Mon Jun 05 15:15:38 2017fun1 finish Mon Jun 05 15:15:41 2017this is fun1 Mon Jun 05 15:15:41 2017fun1 finish Mon Jun 05 15:15:42 2017this is fun1 Mon Jun 05 15:15:42 2017fun1 finish Mon Jun 05 15:15:43 2017fun1 finish Mon Jun 05 15:15:47 2017fun1 finish Mon Jun 05 15:15:49 2017finish 11.1370000839Process finished with exit code 0

From the above results, we can see that three running processes are set to the upper limit. At 15:15:38, three processes are started at the same time. When the first process ends (the parameter is the process of 3 seconds ), will add a new process, such a loop until the process pool is running and then execute the main process statement B = time. time () print 'finish ', B-. here the non-blocking apply_async () is used, and then the blocking apply () is compared ()

#-*-Coding: UTF-8-*-from multiprocessing import Poolimport timedef fun1 (t): print 'this is fun1', time. ctime () time. sleep (t) print 'fun1 finish ', time. ctime () def fun2 (t): print 'this is fun2', time. ctime () time. sleep (t) print 'fun2 finish ', time. ctime () if _ name _ = '_ main _': a = time. time () pool = Pool (processes = 3) # three processes can be run simultaneously for I in range (): pool. apply (fun1, (I,) pool. close () pool. join () B = time. time () print 'finish ', B-

Result:

this is fun1 Mon Jun 05 15:59:26 2017fun1 finish Mon Jun 05 15:59:29 2017this is fun1 Mon Jun 05 15:59:29 2017fun1 finish Mon Jun 05 15:59:33 2017this is fun1 Mon Jun 05 15:59:33 2017fun1 finish Mon Jun 05 15:59:38 2017this is fun1 Mon Jun 05 15:59:38 2017fun1 finish Mon Jun 05 15:59:44 2017this is fun1 Mon Jun 05 15:59:44 2017fun1 finish Mon Jun 05 15:59:51 2017finish 25.1610000134Process finished with exit code 0

As you can see, blocking occurs when a process ends and another process is executed. We usually use non-blocking apply_async ()

2. Multiple Process pools

The above uses a single process pool. for Multiple Process pools, we can use the for loop to directly view the code

#-*-Coding: UTF-8-*-from multiprocessing import Poolimport timedef fun1 (t): print 'this is fun1', time. ctime () time. sleep (t) print 'fun1 finish ', time. ctime () def fun2 (t): print 'this is fun2', time. ctime () time. sleep (t) print 'fun2 finish ', time. ctime () if _ name _ = '_ main _': a = time. time () pool = Pool (processes = 3) # You can run three processes for fun in [fun1, fun2]: for I in range (): pool. apply_async (fun, (I,) pool. close () pool. join () B = time. time () print 'finish ', B-

Result:

this is fun1 Mon Jun 05 16:04:38 2017this is fun1 Mon Jun 05 16:04:38 2017this is fun1 Mon Jun 05 16:04:38 2017fun1 finish Mon Jun 05 16:04:41 2017this is fun1 Mon Jun 05 16:04:41 2017fun1 finish Mon Jun 05 16:04:42 2017this is fun1 Mon Jun 05 16:04:42 2017fun1 finish Mon Jun 05 16:04:43 2017this is fun2 Mon Jun 05 16:04:43 2017fun2 finish Mon Jun 05 16:04:46 2017this is fun2 Mon Jun 05 16:04:46 2017fun1 finish Mon Jun 05 16:04:47 2017this is fun2 Mon Jun 05 16:04:47 2017fun1 finish Mon Jun 05 16:04:49 2017this is fun2 Mon Jun 05 16:04:49 2017fun2 finish Mon Jun 05 16:04:50 2017this is fun2 Mon Jun 05 16:04:50 2017fun2 finish Mon Jun 05 16:04:52 2017fun2 finish Mon Jun 05 16:04:55 2017fun2 finish Mon Jun 05 16:04:57 2017finish 19.1670000553Process finished with exit code 0

As you can see, fun2.

In addition, if there is no parameter, pool. apply_async (funtion) directly without writing the parameter.

When learning to write a program, you run the program directly without the use of if _ name _ = '_ main _'. In this case, the result will be incorrect, to use the process module on Windows, you must write the code about the process in the current. if _ name _ = '_ main _' Of The py file: The Process Module in Windows can be used normally only after the statement is run. This is not required in Unix/Linux. Some people say this: During execution, because the py you write will be read and executed as a module. Therefore, you must determine whether it is _ main _. That is,:

if __name__ == ‘__main__' :# do something.

I am not clear about it here. I hope I can understand it later.

The learning process also involves Queue and thread threading which are often used together with the process. If you have time to write it, I hope it will be helpful for your learning, we also hope that you can support the customer's home.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.