Preface
Overview of processes and Threads:
Many students have heard that modern operating systems, such as Mac OS X,unix,linux,windows, are supported by "multitasking" operating systems.
What do you mean "multitasking"? To put it simply, the operating system can run multiple tasks at the same time. For example, while you're surfing the Internet with your browser and listening to MP3, while you're working with Word, that's multitasking, at least 3 tasks running at the same time. There are a lot of tasks quietly running in the background at the same time, but the desktop is not displayed.
Multicore CPUs are now very popular, but even the single-core CPUsof the past can do multitasking. Since the CPU execution code is executed sequentially, how does a single-core CPU perform multi-tasking?
The answer is that the operating system turns each task to perform alternately , Task 1 executes 0.01 seconds, switches to Task 2, Task 2 executes 0.01 seconds, then switches to Task 3, executes 0.01 seconds ... This is done repeatedly. On the surface, each task is executed alternately, but because the CPU is executing too fast, we feel as if all the tasks are executing at the same time.
true parallel multitasking can only be done on multicore CPUs , but because the number of tasks is much larger than the number of cores in the CPU, the operating system automatically shifts many tasks to each core.
For the operating system, a task is a process, such as open a browser is to start a browser process, open a notepad started a Notepad process, open two Notepad started the two Notepad process, open a word started a word process.
Some processes do more than one thing at the same time, such as word, which can be typed, spell-checked, and printed at the same time. Within a process , to do multiple tasks at the same time, you need to run multiple "subtasks" at the same time, and we refer to these "subtasks" in the process as threads (thread).
Because each process has at least one thing to do, a process has at least a single thread . Of course, a complex process such as word can have multiple threads, multiple threads can execute simultaneously, multithreading is performed the same way as multiple processes, and the operating system quickly switches between multiple threads , allowing each thread to run momentarily alternately. It looks like it's done at the same time. Of course, a multi-core CPU is required to actually execute multiple threads at the same time.
All of the Python programs we wrote earlier are those that perform single-task processes, that is, only one thread. What if we want to do multiple tasks at the same time?
There are two types of solutions:
One is to start multiple processes, although each process has only one thread, but multiple processes can perform multiple tasks in one piece.
Another way is to start a process that starts multiple threads within a process, so that multiple threads can perform multiple tasks in one piece.
Of course, there is a third way, that is, to start multiple processes, each process to start more than one thread, so that the simultaneous execution of more tasks, of course, this model is more complex, and rarely used .
To summarize, there are 3 ways to implement a multitasking:
Performing multiple tasks at the same time is usually not unrelated to each task, but requires communication and coordination with each other, sometimes task 1 must pause waiting for task 2 to finish before it can continue, and sometimes task 3 and task 4 cannot be executed at the same time, so The complexity of multi-process and multi-threaded programs is much higher than the one-process single-threaded program we wrote earlier.
Because the complexity is high, debugging is difficult, so, is not forced, we do not want to write multi-tasking. However, there are many times when there is no more task to do. If you want to see a movie on your computer, you have to play the video by one thread, another thread plays the audio, otherwise, the single-threaded implementation can only play the video before playing the audio, or play the audio before playing the video, which is obviously not possible.
Python supports both multi-process and multi-threading, and we'll discuss how to write both of these multitasking programs.
Process
First Knowledge:
To get the Python program to implement multi-process (multiprocessing), we first understand the operating system knowledge. The Unix/linux operating system provides fork()
a system call, which is very special. A normal function call, called once, is returned once, but fork()
called once, and returned two times , because the operating system automatically replicates the current process (called the parent process) with a copy (called a child process), and then returns within the parent and child processes, respectively. The child process returns 0
forever, and the parent process returns the IDof the child process. The reason for this is that a parent process can fork out many child processes, so the parent process has to note the ID of each child process, and the child process getppid()
only needs to invoke the ID of the parent process. python fork
's modules encapsulate common system calls, including the ability to easily create child processes in a Python program: os
Import Osprint (' Process (%s) start ... '% os.getpid ()) # only works on unix/linux/mac:pid = Os.fork () if pid = = 0: print ( ' I am child process (%s) and my parent are%s. '% (Os.getpid (), Os.getppid ())) Else: print (' I (%s) just created a child Process (%s). '% (Os.getpid (), PID) # Process (44587) start...# I (44587) just created a child Process (44588). # I AM Child process (44588) and my parent are 44587.
fork
The above code cannot be run on Windows because Windows is not called. Because the Mac system is based on the BSD (Unix) kernel, so, running under the Mac is no problem, we recommend that you learn python! with Mac With a fork
call, a process can replicate a child process to handle a new task when it receives a new task, and the common Apache server is the parent process listening on the port, and whenever there is a new HTTP request, fork out the child process to process the new HTTP request.
Multiprocessing module:
If you are going to write a multi-process service program, Unix/linux is undoubtedly the right choice. Because Windows didn't fork
call, wouldn't it be possible to write multi-process programs in Python on Windows? Because Python is cross-platform, nature should also provide a cross-platform, multi-process support. multiprocessing
modules are multi-process modules with cross-platform versions. The module provides a Process
class to represent a process object, and the following example demonstrates starting a child process and waiting for it to end: multiprocessing
Import Osimport time# child process to execute code def RUN_PROC (name): time.sleep (1) print (' Run children process%s (%s) ... '% (name, OS . Getpid ())) if __name__== ' __main__ ': print (' Parent process%s. '% Os.getpid ()) p = Process (Target=run_proc, args= (' Test ',) # Why does args need to be separated? P.start () # Child process start, do not add this child process does not execute p.join () # Wait for the child process p to execute and then execute down, without this key, the main program is completed, the child process will continue to execute unaffected Print (' child process end. '), # Parent process 8428.# Run child process Test (9392) ... # Child process end.
Process instantiation when executing Self._args = tuple (args) operation, if not, separate the generated Slef._args is a letter, passed two parameters above is not add, number, as follows:
def __init__ (self, group=none, Target=none, Name=none, args= (), kwargs={}, *, Daemon=none): Assert that group is None, ' gro Up argument must is None for now ' Count = Next (_process_counter) self._identity = _current_process._identity + (count,) self._config = _current_process._config.copy () self._parent_pid = Os.getpid () Self._popen = None Self._target = target Self._args = tuple (args) a = (' ers ') b = tuple (a) print (b) # (' E ', ' R ', ' s ') a1 = (' ers ', ' GTE ') B1 = Tup Le (A1) print (B1) # (' ers ', ' GTE ')
Process Code
Pool Process Pools:
If you want to start a large number of child processes, you can create the child processes in bulk using the process pool:
From multiprocessing import Pool,cpu_countimport OS, time, Randomdef long_time_task (name): Print (' Run task%s (%s) ... ') % (name, Os.getpid ())) start = Time.time () time.sleep (Random.random () * 3) end = Time.time () print (' Task%s ru NS%0.2f seconds. '% (name, (End-start))) def Bar (ARG): print ('-->exec done: ', arg,os.getpid ()) if __name__== ' __main_ _ ': Print (' Parent process%s. '% Os.getpid ()) p = Pool (Cpu_count ()) # Gets the current number of CPU cores, multi-core CPUs in the case of multi-process to achieve real concurrency For I in range (5): # P.apply_async (Func=long_time_task, args= (i,), Callback=bar) #callback回调 execute the func and then execute callback Execute P.apply_async (long_time_task, args= (i)) with the main program print (' Waiting for all subprocesses done ... ') p.close () p . Join () #! Wait until the process pool finishes executing, or the process pool closes print directly (' All subprocesses-done ') after the master process finishes executing. # Parent Process 4492.# waiting for all subprocesses done...# Run task 0 (3108) ... # Run Task 1 (7936) ... # Run Task 2 (1123 6) ... # Run Task 3 (8284) ... # task 2 runs 0.86 seconds.# run Task 4 (11236) ... # task 0 runs 1.34 seconds.# Task 1 runs 1.49 seconds.# Task 3 runs 2.62 seconds.# Task 4 runs 1.90 Seco nds.# all subprocesses done.
focus: After the process in another process pool has finished executing, the process shuts down automatically destroys, no longer consumes the memory, in the same vein, the non-process pool creates the child process, the execution completes also automatically destroys, The specific tests are as follows:
From multiprocessing import Pool,cpu_countimport OS, time, Randomdef long_time_task (name):p rint (' Run task%s (%s) ... '% (n Ame, Os.getpid ())) start = Time.time () time.sleep (Random.random () * 3) end = Time.time () print (' Task%s runs%0.2f seconds. '% (name, (End-start))) def count_process (): import psutil PIDs = Psutil.pids () process_name = []for pid I n pids:p = psutil. Process (PID) Process_name.append (P.name ()) # gets the proc name # process_name.append (P.num_threads ()) # Gets the number of threads in the process # PR int Process_nameprint len (process_name) if __name__== ' __main__ ':p rint (' Parent process%s. '% Os.getpid ()) p = Pool (4) for I in range (5): P.apply_async (Long_time_task, args= (i,)) print (' Waiting for all subprocesses done ... ') Count_proc ESS () # process Pool START process number (contains system other app process) P.close () P.join () count_process () # process pool closed when process count print (' All Subprocesses done. ') # Parent Process 8860.# waiting for all subprocesses done...# Run task 0 (2156) ... # Run Task 1 (1992) ... # Run Task 2 (10680) ... # Run Task 3 (11216) ... # 109 start # task 2 runs 0.93 seconds.# Run Task 4 (10680) ... # T Ask 1 runs 1.71 seconds.# task 3 runs 2.01 seconds.# task 0 runs 2.31 seconds.# Task 4 runs 2.79 seconds.# 105 knots Bundle # All subprocesses done.
destroy the process pool process after it finishes executing
Code interpretation:
Invoking a method on an Pool
object join()
waits for all child processes to complete before the call must be called before the call join()
close()
close()
can continue to add new Process
.
Note that the result of the output, task 0
,, 1
2
3
is executed immediately, and the task 4
waits for a previous task to complete before it executes, because Pool
the default size on my computer is 4, so Execute up to 4 processes at a time. This is a Pool
deliberately designed limitation, not the operating system's limit. If you change to:
p = Pool (5)
You can run 5 processes at a time.
Because Pool
the default size is the number of cores of the CPU, if you unfortunately have a 8-core CPU, you have to submit at least 9 sub-processes to see the wait effect above.
Inter-process communication:
Process
There is definitely a need for communication, and the operating system provides many mechanisms for communicating between processes . The Python multiprocessing
module wraps the underlying mechanism, providing Queue
, Pipes
and so on, a variety of ways to exchange data.
Queue
For example, we create two sub-processes in the parent process, one to Queue
write the data, and one to Queue
read the data from the inside:
from multiprocessing import process, Queueimport OS, time, random# write the code that the data process executes: def write (q): Print (' Process to write:%s '% os.getpid ()) "for value in [' A ', ' B ', ' C ']: print (' Put%s to Queu E ... '% value ' q.put (value) time.sleep (Random.random ()) # Read data Process Execution code: def read (q): Print (' Process to read: %s '% os.getpid ()) While true:value = Q.get (True) print (' Get%s from queue. '% value) if __name__== ' __mai N__ ': # Parent process creates a queue and passes to individual sub-processes: Q = queue () PW = Process (Target=write, args= (q,)) PR = Process (Target=read, args= (q,)) # start child process PW, write: Pw.start () # Start child process PR, read: Pr.start () # Wait for PW to end: Pw.join () # PR process is a dead loop, cannot wait for its end, can only forcibly terminate: PR. Terminate () # Force close subprocess # process to write:9472# put a to queue...# process to read:3948# Get A from queue.# put B to queue...# get B from queue.# Put C-queue...# Get C from queue.
Under Unix/linux, the multiprocessing
module encapsulates the fork()
call so that we don't need to focus on fork()
the details. Since Windows is not fork
called, it is multiprocessing
necessary to " emulate " The fork
effect that all Python objects of the parent process must be serialized through pickle and then passed to the child process, all, if multiprocessing
In the Windows downgrade failed, first consider whether the pickle failed.
Sharing data between processes:
Sometimes we need not only inter-process data transfer, but also a multi-process sharing, that is, the same global variables can be used, such as: Why the list of the following program output is empty?
from multiprocessing import Process, Managerimport os# manager = Manager () vip_ List = [] #vip_list = Manager.list () def testFunc (cc): Vip_list.append (cc) print ' Process ID: ', os.getpid () if __name__ = = ' __main__ ': threads = [] for ll in range: T = Process (Target=testfunc, args= (ll,)) T.daemon = True Threads.append (t) for I in range (len (threads)): Threads[i].start () for J in Range (len (threads)): Threads[j].join () print "------------------------" print ' Process ID: ', os.getpid () print vip_list# proces s id:9436# Process id:11120# process id:10636# process id:1380# process id:10976# process id:10708# process id:2524 # process id:9392# Process id:10060# process id:8516#------------------------# process id:9836# []
If you understand the multithreading model of Python, the GIL problem, and then understand the multi-threaded, multi-process principle, the above questions are not difficult to answer, but if you do not know it is OK, run the above code you know what is the problem. because inter-process memory is independent
As mentioned above, in concurrent programming, it is best to avoid sharing state as much as possible. This is especially true when working with multiple processes. However, if you do need to use some shared data, there are two ways to deal with it.
① Shared Memory:
The data can be stored in a shared memory map using values or arrays. For example, the following code:
From multiprocessing import Process, Value, Arraydef f (N, a): N.value = 3.1415927 for i in range (Len (a)): A[i] =-a[i]if __name__ = = ' __main__ ': num = Value (' d ', 0.0) arr = Array (' I ', Range (Ten)) p = Process (target=f, arg s= (num, arr)) P.start () p.join () print num.value print arr[:] # 3.1415927# [0,-1,-2,-3,-4,-5,- 6,-7,-8,-9]
The "I" and "I" parameters used when creating num and arr are types of types used by the array module: "represents a double-precision floating-point number", "I" represents a signed integer. These shared objects will be process-and thread-safe. For more flexibility in using shared memory, you can use multi-processing. The Sharedctypes module supports the creation of any type of ctypes object that is allocated from shared memory.
② service process:
Manager () returns the manager object that controls a server process that holds Python objects and allows other processes to manipulate them using proxies. Manager () The returned managers will support type columnslist
,dict
,Namespace
,Lock
,RLock
,Semaphore
,BoundedSemaphore
,Condition
,Event
,Queue
,Value
andArray
。 As follows:
From multiprocessing import Process, Managerdef f (D, L): d[1] = ' 1 ' d[' 2 '] = 2 d[0.25] = None l.reverse () I F __name__ = = ' __main__ ': Manager = Manager () d = manager.dict () L = manager.list (range) p = Process (Target=f, args= (d, L)) P.start () p.join () print D print l# {0.25:none, 1: ' 1 ', ' 2 ': 2}# [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
The server Process Manager is more flexible than using shared memory objects because they can be used to support arbitrary object types. In addition, a single manager can be shared through processes on different computers on the network. However, they are slower than using shared memory.
More-" Click "
Summary
Under Unix/linux, you can use fork()
calls to implement multiple processes.
To implement multi-process across platforms, you can use multiprocessing
modules.
Inter-process communication is Queue(多进程间)
achieved through, Pipes(两个进程间)
and so on.
Supplementary small Knowledge Point-"The parent process to open up sub-processes, sub-processes to open sub-process, if the processing process killed, child process will be killed?"
Import timefrom multiprocessing Import processimport osdef count_process (): import psutil PIDs = Psutil.pids () print Len (PIDs) def test3 (): count_process () for I in range:p rint "test3%s"%os.getpid () Time.sleep (0.5) def test1 ():p rint "Test1%s"%os.getpid () P2 = Process (Target=test3, name= "Protest2") P2.start () p2.join () if __ name__ = = ' __main__ ': count_process () p1 = Process (Target=test1, name= "Protest1") P1.start () Time.sleep (2) p1.terminate () time.sleep (2) count_process () for I in range:p rint (i) Time.sleep (1) # 86# test1 9500# 88# test3 3964# test3 3964# test3 3964# test3 3964# test3 3964# test3 3964# test3 3964# tes T3 3964# 87# 0# test3 3964# test3 3964# # * 3# 4# AA 6# 7# 8# 9
The mental journey of child process