Detailed introduction to Python processes, threads, and coroutines

Source: Internet
Author: User
We all know the history of processes and threads that computers are composed of hardware and software. The CPU in the hardware is the core of the computer, which undertakes all the tasks of the computer. The operating system is the history of processes and threads running on hardware.

We all know that computers are composed of hardware and software. The CPU in the hardware is the core of the computer, which undertakes all the tasks of the computer. An operating system is a software running on hardware and a computer manager. it is responsible for resource management, allocation, and task scheduling. A program is a software running on the system that has certain functions, such as browsers and music players. Each time a program is executed, some functions are completed. for example, a browser opens a webpage for us to ensure its independence, you need a dedicated data structure for managing and controlling execution programs-process control blocks. A process is a dynamic execution process of a program on a dataset. A process generally consists of three parts: a program, a dataset, and a process control block. The program we compile is used to describe what functions and how the process is completed. the dataset is the resources required by the program during execution. the process control block is used to record the external characteristics of the process, describes the process of execution changes. The system can use it to control and manage processes. it is the only identifier of the system's awareness of the existence of processes.

In the early operating system, the computer had only one core, the smallest unit of the process execution program, and the task scheduling adopted the preemptive method of time slice rotation. Each process has its own independent memory to ensure the isolation of the memory address space between processes. With the development of computer technology, the process has many drawbacks. First, the overhead of process creation, revocation, and switching is relatively large, second, because symmetric multi-processor (SMP) is also called symmetric multi-Processing, a group of processors (multiple CPUs) are collected on a computer ), the emergence of shared memory subsystems and bus structures between CPUs can meet the needs of multiple operating units, while the parallel overhead of multiple processes is too large. This introduces the thread concept. A thread is also called a lightweight process. it is a basic CPU execution unit and the smallest unit in the program execution process. it consists of the thread ID, program counter, register set, and stack. The introduction of threads reduces the overhead of concurrent program execution and improves the concurrency performance of the operating system. A thread does not have its own system resources and only has resources that are essential for running. However, a thread can share other resources owned by a process with other threads of the same process.

Relationship between processes and threads

A thread belongs to a process and runs in the process Space. the threads produced by the same process share the same memory space, when a process exits, all threads generated by the process are forcibly exited and cleared. A thread can share all resources of a process with other threads of the same process, but it basically does not have system resources, only a bit of information (such as program counters, a set of registers, and stacks) is required during running ).

Python thread

Threading is used to provide thread-related operations. a thread is the smallest unit of operation in an application.

1. threading module

The threading module is built on the _ thread module. The thread module processes and controls threads in a low-level and original way, while the threading module provides more convenient APIs to process threads by second encapsulation of threads.

import threadingimport time  def worker(num):    """    thread worker function    :return:    """    time.sleep(1)    print("The num is  %d" % num)    return  for i in range(20):    t = threading.Thread(target=worker,args=(i,),name=“t.%d” % i)    t.start()

The above code creates 20 "foreground" threads, and then the controller is handed over to the CPU. the CPU schedules and executes commands in parts according to the specified algorithm.

Thread method description

T. start (): activates the thread,

T. getName (): get the thread name

T. setName (): specifies the thread name.

T. name: gets or sets the thread name.

T. is_alive (): determines whether the thread is activated.

T. isAlive (): determines whether the thread is activated.

T. setDaemon () is set to the background thread or foreground thread (default: False). If a Boolean value is used to set whether the thread is a daemon thread, it must be used after the start () method is executed. If it is a background thread, the background thread is also running during the main thread execution. after the main thread is executed, the background thread stops no matter whether it is successful or not. if it is a foreground thread, during the execution of the main thread, the foreground thread is also in progress. after the main thread is executed, the program stops after the foreground thread is executed.

T. isDaemon (): determines whether the thread is a daemon.

T. ident: Gets the thread identifier. The thread identifier is a non-zero integer. This attribute is valid only after the start () method is called. Otherwise, it returns only None.

T. join (): executes each thread one by one and continues to run after the execution is completed. this method makes multithreading meaningless.

T. run (): The run method of the thread object is automatically executed after the thread is scheduled by the cpu.

2. threading. RLock and threading. Lock

Since threads are randomly scheduled, and each thread may only execute n executions, the CPU then executes other threads. To ensure data accuracy, the lock concept is introduced. Therefore, the following problems may occur:

For example, if all the elements in List A are 0, when A thread prints all the elements in the list from the front to the back, the other thread modifies the list element from the back to 1, then, when the output is performed, the list elements are partially 0 and partially 1, which leads to data inconsistency. The appearance of the lock solves this problem.

Import threadingimport time globals_num = 0 lock = threading. RLock () def Func (): lock. acquire () # obtain the lock global globals_num + = 1 time. sleep (1) print (globals_num) lock. release () # release the lock for I in range (10): t = threading. thread (target = Func) t. start ()
3. differences between threading. RLock and threading. Lock

RLock allows multiple acquire requests in the same thread. But Lock does not allow this situation. If RLock is used, The acquire and release must appear in pairs. that is, The acquire and release must be called n times to release the occupied objects.

Import threadinglock = threading. Lock () # Lock object lock. acquire () lock. acquire () # generate a deadlock. Lock. release () lock. release () import threadingrLock = threading. RLock () # RLock object rLock. acquire () rLock. acquire () # In the same thread, the program will not be blocked. RLock. release () rLock. release ()
4. threading. Event

The events of the python thread are used by the main thread to control the execution of other threads. the events mainly provide three methods: set, wait, and clear.

Event Processing mechanism: a global "Flag" is defined. if the "Flag" value is False, when the program executes the event. the wait method is blocked. if the "Flag" value is True, the event. the wait method is no longer blocked.

Clear: set "Flag" to False

Set: set "Flag" to True

Event. isSet (): determines whether the flag is true.

import threading  def do(event):    print('start')    event.wait()    print('execute')  event_obj = threading.Event()for i in range(10):    t = threading.Thread(target=do, args=(event_obj,))    t.start()  event_obj.clear()inp = input('input:')if inp == 'true':    event_obj.set()

When the thread is executed, if the flag is False, the thread will be blocked. if the flag is True, the thread will not be blocked. It provides local and remote concurrency.

5. threading. Condition

A condition variable is always associated with certain types of phase locks, which can be used by default or created. it is useful when several condition variables must share the same lock. A lock is part of a conditon object: there is no need to trace it separately.

The condition variable is subject to the context management protocol. before the with statement block is closed, the association with the lock can be obtained. Acquire () and release () call the corresponding method associated with the phase lock.

Other methods associated with the lock must be called. the wait () method will release the lock. it will be blocked until another thread uses Y () or yy_all () to wake it up. Once wait is awakened, wait () will obtain the lock again and return it,

The Condition class implements a conditon variable. This conditiaon variable allows one or more threads to wait until they are notified by another thread. If the lock parameter is given a non-null value, it must be a lock or Rlock object, which is used for the underlying lock. Otherwise, a new Rlock object will be created for the underlying lock.

Wait (timeout = None): wait for the notification, or wait until the set timeout time. When the wait () method is called, if the thread that calls it does not get the lock, a RuntimeError exception will be thrown. After wati () releases the lock, it will be blocked until another process that is called with the same conditions wakes up with Y () or notify_all. You can also specify a timeout value for wait.

If there is a waiting thread, the notify () method will wake up a thread waiting for the conditon variable. Notify_all () will wake up all threads waiting for the conditon variable.

Note: Y () and notify_all () will not release the lock, that is, the thread will not immediately return their wait () call after being awakened. Unless the thread abandons the ownership of the lock after calling notify () and notify_all.

In a typical design style, the condition variable is used to use locks to allow access to some shared States. the thread repeatedly calls wait () before obtaining the desired state (). The thread that modifies the status calls Y () or notify_all () when their status changes. in this way, the thread tries its best to obtain the desired waiting state. Example: producer-consumer model,

import threadingimport timedef consumer(cond):    with cond:        print("consumer before wait")        cond.wait()        print("consumer after wait")  def producer(cond):    with cond:        print("producer before notifyAll")        cond.notifyAll()        print("producer after notifyAll")  condition = threading.Condition()c1 = threading.Thread(name="c1", target=consumer, args=(condition,))c2 = threading.Thread(name="c2", target=consumer, args=(condition,))  p = threading.Thread(name="p", target=producer, args=(condition,))  c1.start()time.sleep(2)c2.start()time.sleep(2)p.start()
6. queue module

The Queue is the Queue, which is thread-safe.

For example, we went to McDonald's for dinner. There is a chef position in the hotel. the front-end is responsible for selling the food prepared in the kitchen to the customer, and the customer will go to the front-end to receive the food prepared. The frontend is equivalent to our queue. The pipeline sample is formed, and the cook prepares the meal and sends it to the customer through the front-end. The so-called one-way queue

This model is also called the producer-consumer model.

Import queue q = queue. Queue (maxsize = 0) # construct an advanced display queue. when maxsize is set to 0, the queue length is not limited. Q. join () # When the queue is kong, perform other operations q. qsize () # return the queue size (unreliable) q. empty () # if the queue is empty, True is returned. otherwise, False (unreliable) q is returned. full () # True is returned when the queue is full; otherwise False (unreliable) q. put (item, block = True, timeout = None) # put the item to the end of the Queue. the item must exist. the default value of the parameter block is True, indicating that when the Queue is full, it will wait for the queue to provide the available location. if it is False, it is not blocked. if the queue is full, the queue will be triggered. full exception. The optional timeout parameter indicates that the set time will be blocked. if the queue cannot give the position of the item, the queue will be triggered. full exception q. get (block = True, timeout = None) # Remove and return a value in the queue header. the optional parameter block is set to True by default, indicating that if the queue is empty, it is blocked. if it is False, it is not blocked. if the queue is empty at this time, the queue is triggered. empty exception. The optional timeout parameter indicates that the setting will be blocked. if the queue is Empty, an Empty exception will be thrown. Q. put_nowait (item) # equivalent to put (item, block = False) q. get_nowait () # equivalent to get (item, block = False)

The code is as follows:

#!/usr/bin/env pythonimport Queueimport threadingmessage = Queue.Queue(10)  def producer(i):    while True:        message.put(i)  def consumer(i):    while True:        msg = message.get()  for i in range(12):    t = threading.Thread(target=producer, args=(i,))    t.start() for i in range(10):    t = threading.Thread(target=consumer, args=(i,))    t.start()

Make a thread pool by yourself:

# Import threadingimport timeimport queueclass Threadingpool (): def init (self, max_num = 10): self. queue = queue. queue (max_num) for I in range (max_num): self. queue. put (threading. thread) def getthreading (self): return self. queue. get () def addthreading (self): self. queue. put (threading. thread) def func (p, I): time. sleep (1) print (I) p. addthreading () if name = "main": p = Threadingpool () for I in range (20): thread = p. getthreading () t = thread (target = func, args = (p, I) t. start ()
# Add the task import queueimport threadingimport contextlibimport timeStopEvent = object () class ThreadPool (object): def init (self, max_num): self. q = queue. queue () self. max_num = max_num self. terminal = False self. generate_list = [] self. free_list = [] def run (self, func, args, callback = None): "" the thread pool executes a task: param func: Task function: param args: parameters required for the task function: param callback: the callback function executed after the task fails or succeeds. the callback function has two parameters: 1. the execution status of the task function; 2. return value of the task function (the default value is None, that is, the callback function is not executed): return: if the thread pool has been terminated, True is returned; otherwise, None "if len (self. free_list) = 0 and len (self. generate_list) <self. max_num: self. generate_thread () w = (func, args, callback,) self. q. put (w) def generate_thread (self): "create a thread" "t = threading. thread (target = self. call) t. start () def call (self): "" loop to get the task function and execute the task function "current_thread = threading. currentThread self. generate_list.appen D (current_thread) event = self. q. get () # get the thread while event! = StopEvent: # determine whether the number of retrieved Threads is not equal to the global variables func, arguments, callback = event # split the ancestor to obtain the execution function, parameters, and callback function try: result = func (* arguments) # execution function status = True failed t Exception as e: # function execution failure status = False result = e if callback is not None: try: callback (status, result) failed t Exception as e: pass # self. free_list.append (current_thread) # event = self. q. get () # self. free_list.remove (current_thread) with self. work_state (): event = self. q. get () else: self. generate_list.remove (current_thread) def close (self): "close the thread and disable it by transmitting the global non-ancestor variables: return: "for I in range (len (self. generate_list): self. q. put (StopEvent) def terminate (self): "suddenly closes the thread: return:" self. terminal = True while self. generate_list: self. q. put (StopEvent) self. q. empty () @ contextlib. contextmanager def work_state (self): self. free_list.append (threading. currentThread) try: yield finally: self. free_list.remove (threading. currentThread) def work (I): print (I) return I + 1 # return to the callback function def callback (ret): print (ret) pool = ThreadPool (10) for item in range (50): pool. run (func = work, args = (item,), callback = callback) pool. terminate () # pool. close ()
Python process

Multiprocessing is a python multi-process management package, similar to threading. Thread.

1. multiprocessing module

Directly use subprocesses to replace the thread with GIL. because of this, the multiprocessing module allows programmers to make full use of the CPU on a given machine. In multiprocessing, a Process is generated by creating a Process object and its start () method is called,

From multiprocessing import Process def func (name): print ('hello', name) if name = "main": p = Process (target = func, args = ('hangyanlin',) p. start () p. join () # wait for the process to complete

When using concurrent design, it is best to avoid sharing data as much as possible, especially when using multi-process. If you really need to share data, multiprocessing provides two methods.

(1) multiprocessing, Array, Value

Data can be stored in a shared memory map using Value or Array, as shown below:

from multiprocessing import Array,Value,Process def func(a,b):    a.value = 3.333333333333333    for i in range(len(b)):        b[i] = -b[i]  if name == "main":    num = Value('d',0.0)    arr = Array('i',range(11))      c = Process(target=func,args=(num,arr))    d= Process(target=func,args=(num,arr))    c.start()    d.start()    c.join()    d.join()     print(num.value)    for i in arr:        print(i)

Output:

  3.1415927  [0, -1, -2, -3, -4, -5, -6, -7, -8, -9]

When num and arr are created, the "d" and "I" parameters are created by the typecodes used by the Array module. "d" indicates a double-precision floating point number, "I" indicates a signed integer. these shared objects are processed by the thread safely.

The 'I' parameter in Array ('I', range (10:

'C': ctypes. c_char 'U': ctypes. c_wchar 'B': ctypes. c_byte 'B': ctypes. c_ubyte

'H': ctypes. c_short 'H': ctypes. c_ushort 'I': ctypes. c_int 'I': ctypes. c_uint

'L': ctypes. c_long, 'L': ctypes. c_ulong 'F': ctypes. c_float 'D': ctypes. c_double

(2) multiprocessing, Manager

The Manager returned by manager () supports the list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Barrier, Queue, Value and Array types.

from multiprocessing import Process,Managerdef f(d,l):    d["name"] = "zhangyanlin"    d["age"] = 18    d["Job"] = "pythoner"    l.reverse() if name == "main":    with Manager() as man:        d = man.dict()        l = man.list(range(10))         p = Process(target=f,args=(d,l))        p.start()        p.join()         print(d)        print(l)


Output:

  {0.25: None, 1: '1', '2': 2}  [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Server process manager is more flexible than shared memory because it supports any object type. In addition, a separate manager can be shared between different computers on the network through a process, but it is slower than shared memory.

2. process pool (Using a pool of workers)

The Pool class describes a worker process Pool. it has several different methods for tasks to detach a worker process.

A process sequence is maintained in the process Pool. when used, a process is obtained from the process Pool. if there is no usable process in the process pool sequence, the program will wait, until there are available processes in the process pool.

We can use the Pool class to create a process Pool and expand the submitted tasks to the process Pool. Example:

#applyfrom  multiprocessing import Poolimport time def f1(i):    time.sleep(0.5)    print(i)    return i + 100 if name == "main":    pool = Pool(5)    for i in range(1,31):        pool.apply(func=f1,args=(i,)) #apply_asyncdef f1(i):    time.sleep(0.5)    print(i)    return i + 100def f2(arg):    print(arg) if name == "main":    pool = Pool(5)    for i in range(1,31):        pool.apply_async(func=f1,args=(i,),callback=f2)    pool.close()    pool.join()

A process pool object can control which jobs in the worker process pool can be submitted. it supports asynchronous results of timeout and callback, and has a map-like implementation.

Processes: the number of worker processes used. if processes is None, the number returned by OS. cpu_count () is used.

Initializer: if initializer is None, each worker will call initializer (* initargs) at the beginning ).

Maxtasksperchild: number of tasks that can be completed before a worker exits. after the task is completed, replace the original process with a working process to release idle resources. The default value of maxtasksperchild is None, which means that the Pool will survive as long as there is a working process.

Context: specifies the context when a worker starts. generally, multiprocessing is used. pool () or the Pool () method of a context object to create a Pool. both methods properly set the context

Note: the method of the Pool object can only be called by the process that creates the pool.

New in version 3.2: maxtasksperchild

New in version 3.4: context

Method of process pool

Apply (func [, args [, kwds]): Use the arg and kwds parameters to call the func function. The results are blocked until they are returned. For this reason, apply_async () it is more suitable for concurrent execution. In addition, the func function is only run by one process in the pool.

Apply_async (func [, args [, kwds [, callback [, error_callback]): a variant of the apply () method, which returns a result object. If callback is specified, callback can receive a parameter and be called. when the result is ready for callback, callback is called. if the call fails, error_callback is used to replace callback. Callbacks should be completed immediately, otherwise the thread of the processing result will be blocked.

Close (): stops more tasks from being submitted to the pool. after the task is completed, the workflow will exit.

Terminate (): Stop the working process immediately regardless of whether the task is completed or not. When garbage collection is performed on the pool object process, terminate () is called immediately ().

Join (): the exit of the wait worker thread. before calling join (), you must call close () or terminate (). This is because the terminated process needs to be called by the parent process wait (join is equivalent to wait). Otherwise, the process will become a zombie process.

Map (func, iterable [, chunksize]) returns

Map_async (func, iterable [, chunksize [, callback [, error_callback]) callback

Imap (func, iterable [, chunksize]) returns

Imap_unordered (func, iterable [, chunksize])

Starmap (func, iterable [, chunksize]) returns

Starmap_async (func, iterable [, chunksize [, callback [, error_back])

Python coroutine

The operations of threads and processes are system interfaces triggered by programs, and the executors are Systems. the operations of coroutines are programmers.

The meaning of coroutine: for multi-threaded applications, the CPU uses slices to switch the execution between threads. it takes time to switch the threads (save the status and continue next time ). Coroutine, only one thread is used to specify the execution sequence of a code block in one thread.

Application scenarios of coroutine: it is applicable to coroutine when there are a large number of operations without CPU (IO) in the program;

Event loop is the control point of coroutine execution. if you want to execute coroutine, you need to use them.

Event loop provides the following features:

Register, execute, and cancel delayed calls (asynchronous functions)

Create client and server protocols (tools) for communication)

Create sub-processes and protocols (tools) for communicating with other programs)

Send function calls to the thread pool

Coroutine example:

import asyncio  async def cor1():    print("COR1 start")    await cor2()    print("COR1 end")  async def cor2():    print("COR2")  loop = asyncio.get_event_loop()loop.run_until_complete(cor1())loop.close()

The last three lines are important.

Asyncio. get_event_loop (): asyncio starts the default event loop.

Run_until_complete (): This function is blocked and knows that all asynchronous functions are completed,

Close (): close event loop.

1. greenlet
import greenletdef fun1():    print("12")    gr2.switch()    print("56")    gr2.switch() def fun2():    print("34")    gr1.switch()    print("78")  gr1 = greenlet.greenlet(fun1)gr2 = greenlet.greenlet(fun2)gr1.switch()
2. gevent

Gevent is a third-party module and requires the installation package to be downloaded.

pip3 install --upgrade pip3pip3 install gevent
Import gevent def fun1 (): print ("www.baidu.com") # Step 1 gevent. sleep (0) print ("end the baidu.com") # Step 3 def fun2 (): print ("www.zhihu.com") # Step 2 gevent. sleep (0) print ("end th zhihu.com") # Step 4 gevent. joinall ([gevent. spawn (fun1), gevent. spawn (fun2),])

Automatic switch upon IO operation:

import geventimport requestsdef func(url):    print("get: %s"%url)    gevent.sleep(0)    date =requests.get(url)    ret = date.text    print(url,len(ret))gevent.joinall([    gevent.spawn(func, 'https://www.pythontab.com/'),    gevent.spawn(func, 'https://www.yahoo.com/'),    gevent.spawn(func, 'https://github.com/'),])

The above is a detailed description of Python processes, threads, and coroutines. For more information, see other related articles in the first PHP community!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.