In-depth analysis of multi-process, multi-threading, and co-routines in Python

Source: Internet
Author: User
History of processes and Threads

We all know that computers are made up of hardware and software. The CPU in the hardware is the core of the computer and it assumes all the tasks of the computer. Operating system is the software running on the hardware, is the manager of the computer, it is responsible for the management and allocation of resources, task scheduling. A program is a software that runs on a system that has some kind of functionality, such as a browser, a music player, and so on. Every time the execution of the program, will complete a certain function, such as browser to help us Open the Web page, in order to ensure its independence, we need a dedicated management and control of the execution of the program data structure-process control block. A process is a dynamic execution of a program on a data set. The process is generally composed of three parts: program, data set and process control block. The program we write is used to describe what the process is going to accomplish and how it is done, and the data set is the resource that the program needs to use in its execution, and the process control block is used to document the external characteristics of the process, to describe the process of its execution, and it can be used to control and manage the process, which is the only sign of

In the early operating system, the computer has only one core, the smallest unit of Process Execution program, and the task scheduling takes the time slice rotation preemptive method to carry on the process dispatch. Each process has its own piece of separate memory, which guarantees the isolation of the memory address space between the processes. With the development of computer technology, the process has a lot of drawbacks, one is the process of the creation, revocation and switching overhead is relatively large, the second is due to symmetric multiprocessor (symmetric multiprocessor (symmetricalmulti-processing) is called SMP, refers to a computer on the collection of a group of processors (Multi-CPU), the sharing of memory subsystem between each CPU and the bus structure, can meet multiple operating units, and multi-process parallel overhead is too large. The concept of threading is introduced at this time. A thread is also called a lightweight process, which is a basic CPU execution unit and the smallest unit in a program's execution, consisting of a thread ID, a program counter, a register collection, and a stack. The introduction of threads reduces the overhead of program concurrency and improves the concurrency performance of the operating system. Threads do not have their own system resources, they only have resources that are essential at run time. However, threads can share other resources owned by the process with other threads of the same process.

The relationship between a process and a thread

Threads are part of a process, threads run in process space, threads generated by the same process share the same memory space, and threads generated by the process are forced to exit and clear when the process exits. Threads can share all the resources owned by a process with other threads that belong to the same process, but they do not inherently have system resources and have only a bit of information that is essential in the run (such as program counters, a set of registers, and stacks).

Python threads

Threading is used to provide thread-related operations, which are the smallest unit of work in an application.

1. Threading Module

The threading module is built on the _thread module. The thread module processes and controls threads in a low-level, primitive way, while the threading module provides a more convenient API to process threads by encapsulating the thread two times.

Import threadingimport timedef worker (num): "" "Thread Worker Function:return:" "" Time.sleep (1) print ("The num is%d"% num) returnfor i in range: T = Threading. Thread (target=worker,args= (i,), name= "t.%d"% i) T.start ()

The code above creates 20 "foreground" threads, then the controller is handed over to the CPU,CPU according to the specified algorithm for scheduling, shard execution instructions.

Thread Method Description

T.start (): Activates thread,

T.getname (): Gets the name of the thread

T.setname (): Sets the name of the thread

T.name: Gets or sets the name of the thread

T.is_alive (): Determines whether a thread is active

T.isalive (): Determines whether a thread is active

T.setdaemon () is set to either a background thread or a foreground thread (default: False), and a Boolean value that sets whether the thread is a daemon thread and must be used after the start () method is executed. If it is a background thread, during the main thread execution, the background thread is also in progress, after the main thread executes, the background thread will stop whether or not it succeeds or not; If the foreground thread is executing, the foreground thread is also in progress, and after the main thread executes, waiting for the foreground thread to finish executing, the program stops

T.isdaemon (): Determines whether it is a daemon thread

T.ident: Gets the identifier of the thread. The thread identifier is a non-0 integer that is valid only after the start () method is called, otherwise it only returns none.

T.join (): Executes each thread one by one, execution continues, and the method makes multithreading meaningless

T.run (): The Run method that automatically executes thread objects after a thread is dispatched by the CPU

2, Line lock threading. Rlock and Threading.lock

The CPU then executes other threads because the threads are randomly dispatched, and each thread may execute only n execution. In order to ensure the accuracy of the data, the concept of lock was introduced. Therefore, the following problems may occur:

Example: Assuming that all elements of List A is 0, when one thread prints all the elements of the list backwards from the previous one, and the other thread modifies the list's elements from the back to 1, then the elements of the list will be partially 0 and some 1 in the output, resulting in inconsistent data. The presence of the lock solves the problem.

Import Threadingimport timeglobals_num = 0lock = Threading. Rlock () def Func (): Lock.acquire () # Get lock  Global globals_num Globals_num + 1 time.sleep (1) print (globals_num) Lock.rel Ease () # release lock for I in range (Ten): T = Threading. Thread (Target=func) T.start ()

3, Threading. The difference between Rlock and Threading.lock

Rlock is allowed to be acquire multiple times in the same thread. But lock does not allow this situation. If you use Rlock, then acquire and release must appear in pairs, that is, call n times acquire, must call the N-time release to really release the occupied locks.

Import Threadinglock = Threading. Lock () #Lock对象lock. Acquire () Lock.acquire () #产生了死琐. Lock.release () lock.release () Import Threadingrlock = Threading. Rlock () #RLock对象rLock. Acquire () Rlock.acquire () #在同一线程内, the program does not clog. Rlock.release () rlock.release ()

4, Threading. Event

The events of the Python thread are used by the main thread to control the execution of other threads, and the event provides three methods set, wait, clear.

Event handling mechanism: A global definition of a "flag", if the "flag" value is False, then when the program executes the Event.wait method is blocked, if the "flag" value is true, then the Event.wait method will no longer block.

clear: Set "Flag" to False
set: Set "Flag" to True
Event.isset (): Determines whether the identity bit is ture.

Import threadingdef do:  print (' start ')  event.wait ()  print (' execute ') Event_obj = threading. Event () for I in range:  t = Threading. Thread (Target=do, args= (Event_obj,))  T.start () event_obj.clear () InP = input (' input: ') if InP = = ' true ':  event_ Obj.set (

When the thread executes, if flag is false, the thread blocks and the thread does not block when flag is true. It provides both local and remote concurrency.

5, Threading. Condition

A condition variable is always associated with some type of lock, which can be useful by using the default case or creating one, when several condition variables must be shared and the same lock. Locks are part of the Conditon object: There is no need to track them separately.

The condition variable is subject to the context Management protocol: The WITH statement block can obtain a connection to the lock before it is closed. Acquire () and release () call the appropriate method associated with the lock.

Other methods that are associated with the lock must be called, and the Wait () method releases the lock, which is blocked until another thread wakes it up with notify () or Notify_all (). Once awakened, wait () will regain the lock and return,

The condition class implements a Conditon variable. This Conditiaon variable allows one or more threads to wait until they are notified by another thread. If the lock parameter is given a non-null value, then he must be a lock or Rlock object, which is used to make the underlying lock. Otherwise, a new Rlock object is created to make the underlying lock.

Wait (timeout=none): Wait for notification, or wait until the set time-out expires. When this wait () method is called, a RuntimeError exception is thrown if the thread calling it does not get a lock. Wati () After the lock is released, the other process that is called with the same condition is blocked until it wakes up with notify () or Notify_all (). Wait () can also specify a time-out.

If there is a waiting thread, the Notify () method wakes up a thread waiting for the Conditon variable. Notify_all () wakes all the threads waiting for the conditon variable.

Note: Notify () and Notify_all () do not release locks, that is, the threads are not immediately returned to their wait () calls after they are awakened. The lock ownership is discarded unless the thread calls notify () and Notify_all ().

In a typical design style, the condition variable is used to access some shared states with the lock Tongxu, and the thread repeatedly calls wait () before it gets to the state it wants. The thread that modifies the state calls notify () or Notify_all () when their state changes, and in this way the thread obtains as much as possible the desired state of the waiting person. Example: Producer-Consumer model,

Import Threadingimport timedef Consumer (cond): With  cond:    print ("Consumer before Wait")    cond.wait ()    Print ("Consumer after Wait") def producer (cond): With  cond:    print ("producer before Notifyall")    Cond.notifyall ()    print ("producer after Notifyall") condition = Threading. Condition () C1 = Threading. Thread (name= "C1", Target=consumer, args= (condition,)) C2 = Threading. Thread (name= "C2", Target=consumer, args= (condition,)) P = Threading. Thread (name= "P", Target=producer, args= (condition,)) C1.start () Time.sleep (2) C2.start () Time.sleep (2) P.start ()

6. Queue Module

The queue is the pair of queues, it is thread-safe

For example, we go to McDonald's for dinner. There is a chef in the restaurant, the front desk is responsible for the kitchen to sell the food to customers, customers to the front desk to collect good food. The front desk here is the equivalent of our queue. To form a piping sample, the cook cooks the food through the front desk to the customer, so-called one-way queue

This model is also called producer-consumer models.

Import queue
Q = queue. Queue (maxsize=0) # Constructs an advanced presentation queue, maxsize specifies the queue length, which is 0 o'clock, which indicates that the queue length is unrestricted.
Q.join () # Wait until the queue is Kong, and perform other operations
Q.qsize () # Returns the size of the queue (unreliable)
Q.empty () # returns True if the queue is empty, otherwise false (unreliable)
Q.full () # returns True when the queue is full, otherwise false (unreliable)
Q.put (item, Block=true, Timeout=none) # put item in the queue tail, item must exist, parameter block defaults to True, indicating that when the queue is full, it waits for the queue to give the available location.
is non-blocking when false, and a queue is raised if it is full at this time. Full exception. Optional parameter timeout, which indicates when the setting is blocked,
If the queue cannot give the location where item is placed, the queues are raised. Full exception

Q.get (Block=true, Timeout=none) # removes and returns a value for the header of the queue, the optional parameter block defaults to True, indicating that when the value is fetched, if the queue is empty, it is blocked, false, not blocked,
If the queue is empty at this point, queue is raised. Empty exception. Optional parameter timeout, which indicates that the setting is blocked

If the queue is empty, an empty exception is thrown.

Q.put_nowait (item) # Equivalent to put (Item,block=false)
Q.get_nowait () # equivalent to get (Item,block=false)

The code is as follows:

#!/usr/bin/env pythonimport Queueimport threadingmessage = Queue.queue (Ten) def producer (i): While  True:    Message.put (i) def consumer (i): While  True:    msg = Message.get () for I in range:  t = Threading. Thread (Target=producer, args= (i,))  T.start () for I in range:  t = Threading. Thread (Target=consumer, args= (i,))  T.start ()

Then make yourself a thread pool:

Method One

# Simple transfer of threads to queue import Threadingimport timeimport queueclass threadingpool ():  def __init__ (self,max_num = ten):    Self.queue = queue. Queue (max_num) for    I in range (max_num):      self.queue.put (Threading. Thread)  def getthreading (self):    return Self.queue.get ()  def addthreading (self):    self.queue.put ( Threading. Thread) def func (p,i):  time.sleep (1)  print (i)  p.addthreading () if __name__ = = "__main__":  p = Threadingpool ()  for I in range:    thread = p.getthreading ()    t = thread (target = func, args = (p,i))    T.start ()

Method Two

#往队列中无限添加任务import queueimport threadingimport Contextlibimport timestopevent = object () class ThreadPool (object): Def __ Init__ (self, max_num): self.q = queue. Queue () Self.max_num = max_num Self.terminal = False self.generate_list = [] Self.free_list = [] def run (self , func, args, Callback=none): "" "Thread pool performs a task:p Aram Func: Task function:p Aram args: task function required parameter:p Aram callback: Task execution failed Or after successful execution of the callback function, the callback function has two parameters 1, the task function execution state; 2, the Task function return value (default is None, that is: Do not execute the callback function): return: If the thread pool has been terminated, returns true otherwise none "" "If Len (self.fre    e_list) = = 0 and len (self.generate_list) < Self.max_num:self.generate_thread () W = (func, args, callback,) Self.q.put (W) def Generate_thread (self): "" Creates a Thread "" "T = Threading. Thread (Target=self.call) T.start () def call (self): "" "Loop to get the task function and execute the task function" "" Current_thread = Threading.cur Rentthread self.generate_list.append (current_thread) event = Self.q.get () # gets thread while event! = Stopevent: # judgment received The number of threads taken is not equal to the global variable func,arguments, callback = event # split ancestor, get Execute function, parameter, callback function Try:result = func (*arguments) # Execute function status = True          Except Exception as E: # function execution Failed status = False result = e If callback is not none:try:      Callback (status, result) except Exception as E:pass # self.free_list.append (current_thread) # event = Self.q.get () # Self.free_list.remove (Current_thread) with Self.work_state (): event = self. Q.get () Else:self.generate_list.remove (Current_thread) def close (self): "" "closes the thread and turns off the variable that transmits the global non-ancestor: Retu    RN: "" "For I in Range (len (self.generate_list)): Self.q.put (stopevent) def terminate (self):" "abruptly closing thread : Return: "" "self.terminal = True while Self.generate_list:self.q.put (stopevent) self.q.empty () @cont Extlib.contextmanager def work_state (self): Self.free_list.append (threading.currentthread) Try:yield final Ly:self.free_list.Remove (Threading.currentthread) def work (i): print (i) return I +1 # return to callback function Def callback (ret): print (ret) pool = Threadpoo L (+) for item in range: Pool.run (func=work, args= (item,), Callback=callback) pool.terminate () # pool.close ()

Python process

Multiprocessing is a multi-process Management Pack for Python, and threading. Thread is similar.

1. Multiprocessing Module

By replacing threads with subprocesses directly from the side using the Gil, the multiprocessing module allows programmers to make full use of the CPU on a given machine. In multiprocessing, you create a process object by creating it, and then call its start () method,

From multiprocessing import processdef func (name):  print (' Hello ', name) if __name__ = = "__main__":  p = Process ( target=func,args= (' Zhangyanlin ',))  P.start ()  p.join () # Wait for the process to finish executing

It is best to avoid sharing data as much as possible when using concurrent designs, especially when using multiple processes. If you really need to share data, multiprocessing provides two ways.

(1) Multiprocessing,array,value

The data can be stored in a shared memory map with value or array, as follows:

From multiprocessing import array,value,processdef func (A, a):  A.value = 3.333333333333333 for  i in range (len (b) ):    b[i] =-b[i]if __name__ = = "__main__":  num = Value (' d ', 0.0)  arr = Array (' I ', Range (one))  C = Process (tar Get=func,args= (Num,arr))  d= Process (target=func,args= (Num,arr)) C.start () D.start () c.join (  )  d.join ()  print (num.value) for  i in arr:    print (i)
Output:
3.1415927
[0,-1,-2,-3,-4,-5,-6,-7,-8,-9]

When Num and arr are created, the "D" and "I" parameters are created by the typecodes used by the array module: "D" represents a double-precision floating-point number, and "I" represents a signed integer that will be thread-safe to handle.

The ' i ' parameter in Array (' I ', Range (10)):

' C ': Ctypes.c_char ' u ': Ctypes.c_wchar ' B ': Ctypes.c_byte ' B ': ctypes.c_ubyte
' H ': Ctypes.c_short ' h ': Ctypes.c_ushort ' i ': Ctypes.c_int ' i ': ctypes.c_uint
' L ': ctypes.c_long, ' l ': Ctypes.c_ulong ' F ': Ctypes.c_float ' d ': ctypes.c_double

(2) Multiprocessing,manager

Manager () returned by Manager provides list, dict, Namespace, Lock, Rlock, Semaphore, Boundedsemaphore, Condition, Event, Barrier, Queue, The Value and array type support.

From multiprocessing import process,managerdef F (d,l):  d["name"] = "Zhangyanlin"  d["age"] =  d["Job"] = " Pythoner "  l.reverse () if __name__ = =" __main__ ": With  Manager () as man:    d = man.dict ()    L = man.list (range (ten))    p = Process (target=f,args= (d,l))    P.start ()    p.join ()    print (d)    print (L)

Output: {0.25:none, 1: ' 1 ', ' 2 ': 2} [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

The Server process Manager is more flexible than the shared memory because it can support any object type. In addition, a separate manager can be shared between different computers on the network through a process, but he is slower than shared memory.

2, Process pool (Using a pool of workers)

The pool class describes a worker process pools, and he has several different ways to get tasks to unload worker processes.

A process sequence is maintained internally by the process pool, and when used, a process is fetched in the process pool, and the program waits until a process is available in the process pool sequence if there are no incoming processes available for use.

We can create a process pool with the pool class and expand the submitted task to the process pool. Cases:

#applyfrom multiprocessing Import poolimport timedef F1 (i):  time.sleep (0.5)  print (i)  return i + 100if __ name__ = = "__main__":  pool = Pool (5) for  I in range (1,31):    pool.apply (func=f1,args= (i,)) #apply_asyncdef F1 (i):  time.sleep (0.5)  print (i)  return i + 100def F2 (ARG):  print (ARG) if __name__ = = "__main__":  Pool = Pool (5)  for I in Range (1,31):    pool.apply_async (func=f1,args= (i,), callback=f2)  pool.close ()  Pool.join ()

A process Pool object can control what work in the worker process pool can be committed, it supports asynchronous results for timeouts and callbacks, and has a map-like implementation.

processes: The number of worker processes used, if processes is none then use Os.cpu_count () to return the quantity.
initializer: If initializer is none, then each worker process will call initializer (*initargs) at the beginning.
Maxtasksperchild: The number of tasks that can be completed before the worker process exits, and after completion, replace the original process with a heart-working process to allow idle resources to be freed. Maxtasksperchild default is None, which means that the pool exists as long as the worker process is alive.

context: Used in the context of the development of the work process startup, generally use multiprocessing. Pool () or a context object's pool () method to create one, both methods are appropriately set up for the context

Note: The method of the pool object can only be called by the process that created the pool.

New in version 3.2:maxtasksperchild
New in version 3.4:context

Method of the process pool

Apply (func[, args[, Kwds]): The Func function is called with the ARG and Kwds parameters, and the result is blocked until it returns, for this reason, Apply_async () is more suitable for concurrent execution, and the Func function is only run by one process in the pool.

Apply_async (func[, args[, kwds[, callback[, Error_callback]]]): A variant of the Apply () method, which returns a result object. If callback is specified, then callback can receive a parameter and be called, and when the result is ready for the callback, the callback is called, and when the call fails, the callback is replaced with Error_callback. Callbacks should be completed immediately, otherwise the thread that processed the result will be blocked.

Close (): Prevents more tasks from being submitted to the pool, and the worker process exits when the task is completed.

Terminate (): Stops the worker process immediately, regardless of whether the task is completed or not. When the pool object process is garbage collected, terminate () is called immediately.

Join (): The wait worker thread exits before calling join () and must call Close () or terminate (). This is because the terminated process needs to be called by the parent process wait (join equals wait), otherwise the process will become a zombie process.

Map (func, iterable[, chunksize]) ¶map_async (func, iterable[, chunksize[, callback[, Error_callback]]) ¶imap (func, iterable[, chunksize]) ¶imap_unordered (func, iterable[, chunksize]) Starmap (func, iterable[, chunksize]) ¶starmap_ Async (func, iterable[, chunksize[, callback[, Error_back]])

Co-process

The operation of the thread and process is triggered by the program to trigger the system interface, the final performer is the system, and the operation of the coprocessor is the programmer.

The significance of the existence of the process: for multi-threaded applications, the CPU by slicing the way to switch between threads of execution, thread switching takes time (save state, next continue). , only one thread is used, and a code block execution order is specified in one thread.

Application scenario: When there are a large number of operations in the program that do not require the CPU (IO), it is suitable for the association process;

The event loop is the control point for the execution of the process, and you need to use it if you want to perform the co-process.

The event loop provides the following features:

• Register, execute, and cancel deferred calls (asynchronous functions)
• Create client and server protocols (tools) for communication
• Create sub-processes and protocols (tools) to communicate with other programs
• Send function calls into the thread pool

Example of a process:

Import Asyncioasync def cor1 ():  print ("COR1 start")  await Cor2 ()  print ("COR1 end") Async def Cor2 ():  Print ("COR2") loop = Asyncio.get_event_loop () loop.run_until_complete (Cor1 ()) Loop.close ()

The last three lines are the focus.

Asyncio.get_event_loop (): Asyncio start the default event loop
Run_until_complete (): This function is blocking execution, knowing that all asynchronous functions are executed,
Close (): Closes the event loop.

1, Greenlet

Import Greenletdef fun1 ():  Print ("page")  gr2.switch ()  print ("")  Gr2.switch () def fun2 ():  print ("the")  Gr1.switch ()  print ("gr1") = Greenlet.greenlet (fun1) GR2 = Greenlet.greenlet (fun2) Gr1.switch ()

2, Gevent

Gevent belongs to the third party module need to download the installation package

PIP3 Install--upgrade pip3pip3 install Geventimport geventdef fun1 ():  print ("www.baidu.com")  # The first step  Gevent.sleep (0)  print ("End the Baidu.com") # Third Step def fun2 ():  print ("www.zhihu.com")  # step  Two Gevent.sleep (0)  print ("End th zhihu.com") # Fourth Step Gevent.joinall ([  gevent.spawn (FUN1),  gevent.spawn (fun2) ,])

Automatic switching of IO operation encountered:

Import geventimport requestsdef func (URL):  print ("Get:%s"%url)  gevent.sleep (0)  date =requests.get ( URL)  ret = date.text  print (Url,len (ret)) Gevent.joinall ([  gevent.spawn (func, ' https:// www.python.org/'),  gevent.spawn (func, ' https://www.yahoo.com/'),  gevent.spawn (func, ' https:// github.com/'),])

The above is a small series to introduce you to the in-depth analysis of Python in the multi-process, multi-threading, the relevant knowledge of the association, I hope that we have some help, if you have any questions please give me a message, small series will promptly reply to you. Thank you very much for the support of the Scripting House website!

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.