Python Learning record-multi-process and multi-threading

Source: Internet
Author: User
Tags mutex semaphore switches

Python Learning record-multi-process and multi-threading

[TOC]

1. Processes and Threads

Process

Narrowly defined: A process is a running instance of a program (an instance of a computer programs, which is being executed).
Generalized definition: A process is a running activity of a program with certain independent functions about a data set. It is the basic unit of the operating system dynamic execution, in the traditional operating system, the process is not only the basic allocation unit, but also the basic execution unit.

Thread

A thread is the smallest unit that the operating system can perform operations on. It is included in the process and is the actual operating unit of the process. A thread refers to a single sequential control flow in a process in which multiple threads can be concurrent and each thread performs different tasks in parallel.

Thread-to-process comparison

The difference between a thread and a process:
1) address space and other resources (such as open files): Processes are independent of each other and shared among threads of the same process. Threads within a process are not visible in other processes.
2) Communication: Inter-process communication IPC, between threads can directly read and write process data segments (such as global variables) to communicate-the need for process synchronization and mutual exclusion means of support to ensure data consistency.
3) Create: Creating a new thread is simple, creating a new process requires a clone of the parent process.
4) Scheduling and switching: One thread can control and manipulate other threads in the same process, but the process can only manipulate child processes, and thread context switches are much faster than process context switches.
5) in multi-threaded OS, the process is not an executable entity.

<font color= "Red" > Note:</font>
Threads and processes cannot be compared quickly, because threads are included in the process.

2. Threading Module

There are 2 ways to call threads, as follows:

Call directly

import threadingimport timedef sayhi(num):  # 定义每个线程要运行的函数    print("running on number:%s" % num)    time.sleep(3)if __name__ == ‘__main__‘:    t1 = threading.Thread(target=sayhi, args=(1,))  # 生成一个线程实例    t2 = threading.Thread(target=sayhi, args=(2,))  # 生成另一个线程实例    t1.start()  # 启动线程    t2.start()  # 启动另一个线程    print(t1.getName())  # 获取线程名    print(t2.getName())

An inherited invocation

import threadingimport timeclass MyThread(threading.Thread):    def __init__(self, num):        threading.Thread.__init__(self)        self.num = num    def run(self):  # 定义每个线程要运行的函数        print("running on number:%s" % self.num)        time.sleep(3)if __name__ == ‘__main__‘:    t1 = MyThread(1)    t2 = MyThread(2)    t1.start()    t2.start()

Common use of the Main method:

startThread is ready to wait for CPU scheduling
setNameSet a name for a thread
getNameGet Thread Name
setDaemonSet as a background thread or foreground thread (default)
If it is a background thread, during the main thread execution, the background thread is also in progress, and after the main thread finishes executing, the background thread stops regardless of success or not.
If it is the foreground thread, during the main thread execution, the foreground thread is also in progress, and after the main thread finishes executing, wait for the foreground thread to finish, the program stops
joinExecutes each thread one by one and continues execution after execution, which makes multithreading meaningless
runThe Run method of the thread object is automatically executed after the thread is dispatched by the CPU

2.1 Join & Daemon
import timeimport threadingdef run(n):    print(‘[%s]------running----\n‘ % n)    time.sleep(2)    print(‘--done--‘)def main():    for i in range(5):        t = threading.Thread(target=run, args=[i, ])        t.start()        t.join(1)        print(‘starting thread‘, t.getName())m = threading.Thread(target=main, args=[])m.setDaemon(True)  # 将main线程设置为Daemon线程,它做为程序主线程的守护线程,当主线程退出时,m线程也会退出,由m启动的其它子线程会同时退出,不管是否执行完任务m.start()m.join(timeout=2)print("---main thread done----")
import timeimport threadingdef addNum():    global num  # 在每个线程中都获取这个全局变量    print(‘--get num:‘, num)    time.sleep(1)    num -= 1  # 对此公共变量进行-1操作num = 100  # 设定一个共享变量thread_list = []for i in range(100):    t = threading.Thread(target=addNum)    t.start()    thread_list.append(t)for t in thread_list:  # 等待所有线程执行完毕    t.join()print(‘final num:‘, num)
2.2-wire lock (mutex mutex)

A process can start multiple threads, multiple threads share the memory space of the parent process, which means that each thread can access the same data, because the thread is randomly dispatched between the threads, and each thread may execute only N, the dirty data may appear when multiple threads modify the same piece of data at the same time, so The thread lock appears-allows one thread to perform operations at the same time.

import timeimport threadingdef addNum():    global num  # 在每个线程中都获取这个全局变量    print(‘--get num:‘, num)    time.sleep(1)    num -= 1  # 对此公共变量进行-1操作num = 100  # 设定一个共享变量thread_list = []for i in range(100):    t = threading.Thread(target=addNum)    t.start()    thread_list.append(t)for t in thread_list:  # 等待所有线程执行完毕    t.join()print(‘final num:‘, num)

Because python2.7 and above, we have automatically added the lock, so we don't need to be fine-corrected.

import timeimport threadingdef addNum():    global num  # 在每个线程中都获取这个全局变量    print(‘--get num:‘, num)    time.sleep(1)    lock.acquire()  # 修改数据前加锁    num -= 1  # 对此公共变量进行-1操作    lock.release()  # 修改后释放num = 100  # 设定一个共享变量thread_list = []lock = threading.Lock()  # 生成全局锁for i in range(100):    t = threading.Thread(target=addNum)    t.start()    thread_list.append(t)for t in thread_list:  # 等待所有线程执行完毕    t.join()print(‘final num:‘, num)
2.3 Signal Volume (Semaphore)

mutexes allow only one thread to change data at the same time, while Semaphore allows a certain number of threads to change data, such as a toilet with 3 pits, which allows up to 3 people to go to the toilet, while the latter can only wait for someone to come out.

import threading, timedef run(n):    semaphore.acquire()    time.sleep(1)    print("run the thread: %s" % n)    semaphore.release()if __name__ == ‘__main__‘:    num = 0    semaphore = threading.BoundedSemaphore(5)  # 最多允许5个线程同时运行    for i in range(20):        t = threading.Thread(target=run, args=(i,))        t.start()
2.4 Events (event)

The events of the Python thread are used by the main thread to control the execution of other threads, and the events mainly provide three methods, set wait clear .

Event handling mechanism: A global definition of one “Flag” , if the “Flag” value is False , then when the program executes the event.wait method will block, if “Flag” the value is True , then the event.wait method will no longer block.

clear: Will be “Flag” set toFalse
set: Will be “Flag” set toTrue

import threadingdef do(event):    print(‘start‘)    event.wait()    print(‘execute‘)event_obj = threading.Event()for i in range(10):    t = threading.Thread(target=do, args=(event_obj,))    t.start()event_obj.clear()inp = input(‘input:‘)if inp == ‘true‘:    event_obj.set()
2.5 pieces (Condition)

Causes the thread to wait, releasing n threads only if a condition is met.

When conditions are not used:

import threadingdef run(n):    con.acquire()    con.wait()    print("run the thread: %s" %n)    con.release()if __name__ == ‘__main__‘:    con = threading.Condition()    for i in range(10):        t = threading.Thread(target=run, args=(i,))        t.start()    while True:        inp = input(‘>>>‘)        if inp == ‘q‘:            break        con.acquire()        con.notify(int(inp))        con.release()

When working with conditions:

def condition_func():    ret = False    inp = input(‘>>>‘)    if inp == ‘1‘:        ret = True    return retdef run(n):    con.acquire()    con.wait_for(condition_func)    print("run the thread: %s" %n)    con.release()if __name__ == ‘__main__‘:    con = threading.Condition()    for i in range(10):        t = threading.Thread(target=run, args=(i,))        t.start()
2.6 Timers (timer)

Timer, specifying n seconds after an action is performed

from threading import Timerdef hello():    print("hello, world")t = Timer(1, hello)t.start()  # after 1 seconds, "hello, world" will be printed
3. Process
from multiprocessing import Processimport threadingimport timedef foo(i):    time.sleep(1)    print(‘say hi‘,i)for i in range(10):    p = Process(target=foo,args=(i,))    p.start()

<font color= "Red" > note </font>:
Because the data between processes needs to be held separately, the creation process requires very large overhead.

3.1 Process data sharing

The process holds one piece of data, and the data is not shared by default.

from multiprocessing import Processimport timeli = []def foo(i):    li.append(i)    print(‘say hi‘,li)for i in range(10):    p = Process(target=foo,args=(i,))    p.start()print (‘ending‘,li)

The results are similar (each time the result may be sorted).

say hi [2]say hi [3]say hi [5]say hi [0]say hi [1]say hi [4]say hi [6]say hi [7]say hi [8]ending []say hi [9]

Want to share data between processes

Method One: Use array

from multiprocessing import Process,Arraytemp = Array(‘i‘, [11,22,33,44])def Foo(i):    temp[i] = 100+i    for item in temp:        print(i,‘----->‘,item)for i in range(2):    p = Process(target=Foo,args=(i,))    

Method Two: Manage.dict () shared data

from multiprocessing import Process,Managermanage = Manager()dic = manage.dict()def Foo(i):    dic[i] = 100+i    print dic.values()for i in range(2):    p = Process(target=Foo,args=(i,))    p.start()    p.join()
3.2 Process Pool

A process sequence is maintained internally by the process pool, and when used, a process is fetched in the process pool, and the program waits until a process is available in the process pool sequence if there are no incoming processes available for use.

There are two methods in a process pool:

    • Apply
    • Apply_async
from  multiprocessing import Process,Poolimport timedef Foo(i):    time.sleep(2)    return i+100def Bar(arg):    print(arg)pool = Pool(5)print(pool.apply(Foo,(1,)))                                                                                                                                                                                                              print(pool.apply_async(func =Foo, args=(1,)).get())for i in range(10):    pool.apply_async(func=Foo, args=(i,),callback=Bar)print(‘end‘)pool.close()pool.join() #进程池中进程执行完毕后再关闭,如果注释,那么程序直接关闭。
4. Co-process

The operation of the thread and process is triggered by the program to trigger the system interface, the final performer is the system, and the operation of the coprocessor is the programmer.

The significance of the existence of the process: for multi-threaded applications, the CPU by slicing the way to switch between threads of execution, thread switching takes time (save state, next continue). , only one thread is used, and a code block execution order is specified in one thread.

Application scenario: When there are a large number of operations in the program that do not require the CPU (IO), it is suitable for the association process;

4.1 Greenlet
from greenlet import greenletdef test1():    print(12)    gr2.switch()    print(34)    gr2.switch()def test2():    print(56)    gr1.switch()    print(78)gr1 = greenlet(test1)gr2 = greenlet(test2)gr1.switch()

Operation Result:

12563478
4.2 gevent

Python provides basic support for the process through yield, but not entirely. Third-party gevent provide Python with a more complete range of support.

Gevent is a third-party library, through the Greenlet implementation of the process, the basic idea is:

When an greenlet encounters an IO operation, such as accessing the network, it automatically switches to the other Greenlet, waits until the IO operation is complete, and then switches back to execution at the appropriate time. Because the IO operation is very time-consuming and often puts the program in a waiting state, with gevent automatically switching the co-process for us, it is guaranteed that there will always be greenlet running, rather than waiting for IO.

import geventdef foo():    print(‘Running in foo‘)    gevent.sleep(0)    print(‘Explicit context switch to foo again‘)def bar():    print(‘Explicit context to bar‘)    gevent.sleep(0)    print(‘Implicit context switch back to bar‘)gevent.joinall([    gevent.spawn(foo),    gevent.spawn(bar),])

Operation Result:

Running in fooExplicit context to barExplicit context switch to foo againImplicit context switch back to bar

Automatic switching of IO operation encountered:

from gevent import monkey; monkey.patch_all()import geventimport urllib.requestdef f(url):    print(‘GET: %s‘ % url)    resp = urllib.request.urlopen(url)    data = resp.read()    print(‘%d bytes received from %s.‘ % (len(data), url))gevent.joinall([        gevent.spawn(f, ‘https://www.python.org/‘),        gevent.spawn(f, ‘https://www.baidu.com/‘),        gevent.spawn(f, ‘https://www.so.com/‘),])

Python Learning record-multi-process and multi-threading

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.