Attention
1. Cannot open the process indefinitely, cannot infinite the thread
The most common is the open process pool, the thread pool. Where the callback function is very important
The callback function can actually be used as a programming idea, who gets rid of the good
2. As long as you use concurrency, there will be a lock problem, but you can not always go to lock it yourself
Then we'll use the queue, which also solves the problem of automatic locking.
A point that extends from the queue is also a very important concept. It will also be used to write programs later.
This thought. Is the problem of producers and consumers
One, Python standard module--concurrent.futures (concurrent future)
What the Concurent.future module needs to know
1The . Concurent.future module is used to create parallel tasks that provide a higher level of interface in order to execute calls asynchronously2. concurent.future This module is very convenient to use, its interface is also packaged in a very simple3the. Concurent.future module can implement both process pool and thread pool4. module import process pool and thread pool fromConcurrent.futuresImportProcesspoolexecutor,threadpoolexecutor can also import a executor, but you do not, this class is an abstract class abstract class is intended to standardize his subclass must have some method (and the method of the abstract class must be implemented), But abstract classes cannot be instantiated5. P=Processpoolexecutor (Max_works) for a process pool if you do not write Max_works: The default is the number of CPUs, the default is 4 p= Threadpoolexecutor (max_works) for thread pooling if you do not write Max_works: The default is the number of CPUs *56if it is a process pool, the resulting result if it is an object. We have to use a. Get () method to get results, but now with the Concurent.future module, we can use the Obj.result method P.submit (task,i)#equivalent to Apply_async async methodP.shutdown ()#The default has a parameter wite=true (equivalent to close and join)
Second, thread pool
Process pool: A thread that controls a certain number of threads within a process
process pools and thread pools based on the Concurent.future module (their synchronous execution and asynchronous execution are the same)
# Process pool based on Concurrent.futures module # 1. Synchronous execution--------------from concurrent.futures import Processpoolexecutor, Threadpoolexecutorimport os,time,randomdef Task (n): Print (' [%s] is running '%os.getpid ()) Time.sleep (Random.randint (1 , 3)) #I/o-intensive, generally with threads, takes a process long time to return n**2if __name__ = = ' __main__ ': start = Time.time () p = processpoolexecutor () For I in range: #现在是开了10个任务, if it is hundreds of tasks, you can not open the wireless process, then you have to consider the number of control # threads, then you have to consider the pool of obj = P.submit (task,i) . Result () #相当于apply同步方法 p.shutdown () #相当于close和join方法 print (' = ' *30) print (Time.time ()-start) #17.36499309539 795# 2. Asynchronous execution-----------# from concurrent.futures import processpoolexecutor,threadpoolexecutor# import os,time,random# def task (n): # print (' [%s] is running '%os.getpid ()) # Time.sleep (Random.randint (1,3)) #I/O-intensive, general thread, takes a long time to process # return n**2# If __name__ = = ' __main__ ': # start = Time.time () # p = processpoolexecutor () # l = []# for i In Range: #现在是开了10个任务, then if it is hundreds of tasks, you can not open the wireless process, thatYou have to consider the control # # of threads, then you have to consider the pool # obj = P.submit (task,i) #相当于apply_async () Async method # L.append (obj) # p.sh Utdown () #相当于close和join方法 # print (' = ' *30) # print ([Obj.result () for obj in L]) # Print (Time.time ()-start) #5. 362306594848633
#基于concurrent. Futures module's thread pool from concurrent.futures import Processpoolexecutor,threadpoolexecutorfrom Threading Import Currentthreadimport Os,time,randomdef Task (n): print ('%s:%s is running '% (CurrentThread (). GetName (), Os.getpid ())) #看到的pid都是一样的 because the thread is sharing a process Time.sleep (Random.randint (1,3)) #I/O intensive, generally with threads, The process takes a long time to return n**2if __name__ = = ' __main__ ': start = Time.time () p = threadpoolexecutor () #线程池 #如果不给定值, Default cup*5 l = [] for I in range: #10个任务 # thread pool is more efficient than obj = p.submit (task,i) #相当于apply_ Async Async Method l.append (obj) p.shutdown () #默认有个参数wite =true (equivalent to close and join) print (' = ' *30 ) Print ([Obj.result () for obj in L]) print (Time.time ()-start) #3.001171827316284
Apply thread pool (download Web page and parse)
From concurrent.futures import threadpoolexecutor,processpoolexecutorimport requestsimport time,osdef get_page (URL): Print (' <%s> is getting [%s] '% (os.getpid (), url)) response = Requests.get (URL) if response.status_code==200: #2 00 Rep Status: Download succeeded return {' url ': url, ' text ': Response.text}def parse_page (res): res = Res.result () print (' <%s> is getting [%s] '% (Os.getpid (), res[' url "))) with open (' Db.txt ', ' a ') as F:parse_res = ' url:%s size:%s\n '% (res[' u RL '],len (res[' text ')) F.write (parse_res) if __name__ = = ' __main__ ': # p = threadpoolexecutor () p = Processpoo Lexecutor () L = [' http://www.baidu.com ', ' http://www.baidu.com ', ' http://www.baidu.com ', ' Http://www.baidu.com ',] for the URL in l:res = P.submit (Get_page,url). Add_done_callback (Parse_page) #这里的回调函数拿到的 is an object. # First get a result of the returned Res. That is, adding a res.result () #谁好了谁去掉回调函数 # callback function is also a programming idea. Not only the thread pool, but also the P.shutdown () #相当于Close and join print in the process pool (' master ', Os.getpid ())
Application of Map function
# Map Function Example obj= map (Lambda x:x**2, range)print(list (obj))# operating Results [0, 1, 4, 9, +,-
Can be compared with the open process pool/thread pool above, you can see the map function is powerful
#map函数应用 # our P.submit (task,i) and map functions are similar in principle. We can use the map function instead. Reduced code from concurrent.futures import processpoolexecutor,threadpoolexecutorimport os,time,randomdef task (n): print (' [%s] is running '%os.getpid ()) Time.sleep (Random.randint (1,3)) #I/O intensive, general thread, takes a long time to process return n**2if __name__ = = ' __main__ ': p = processpoolexecutor () obj = P.map (task,range)) P.shutdown () #相当于close和join方法 print (' = ' *30) print (obj) #返回的是一个迭代器 print (list (obj))
III. introduction of the Association Process
Co-process: implementation of concurrency under single processes (improve efficiency), the association is essentially a single-threaded time to fully utilize, especially in the IO-intensive task is to encounter the IO switch to other tasks, return the results from the return function processing
When it comes to co-formation, let's first talk about the knowledge points that the association thinks
The key point of switchover is: Save status (continue cutting from where you originally stayed) return : Can only be performed once, the end function of the flag yield : Whenever there is yield in a function, the result of this function becomes a generator, and the generator is essentially an iterator, so how does the iterator work? In the form of a next () method 1.yield statement : Yield1 2.yield The form of the expression: x = Yieldsend You can pass the result of a function to another function, To achieve a switch between single-threaded programs send () you need to use the next () first, but you need to use send at least two yield
Yield review
# yield function (can stop the function, save the original state)--------------def F1 (): print (' first ') yield 1 print (' second ') Yield 2 print (' third ') yield 3# print (F1 ()) #加了yield返回的是一个生成器g = F1 () print (Next (g)) # Returns a value when the yield is met, and saves the original state print (Next (g)) #当遇见了yield的时候就返回一个值print (Next (g)) #当遇见了yield的时候就返回一个值
# yield expression (yield for expression)--------------------import Timedef Wrapper (func): def Inner (*args,**kwargs): ret = Func (*args,**kwargs) next (ret) return ret return inner@wrapperdef consumer (): While True: x= Yield print (x) def producter (target): ' producer made value ' ' # Next (g) #相当于g. Send (None) for I in range (10 ): time.sleep (0.5) target.send (i) #要用send就得用两个yieldproducter (consumer ())
Introduction
The topic in this section is to implement concurrency on a single thread, that is, only in one main thread, and it is clear that the available CPUs are implemented concurrently in only one case,
For this we need to review the nature of Concurrency: Toggle + Save State
The CPU is running a task, and in two cases it cuts away to perform other tasks (switching is controlled by the operating system),
One scenario is that the task is blocked, and the other is that the task is taking too long to calculate
The second case does not improve efficiency, just to allow the CPU to rain and equitably, to achieve the effect that looks like everyone is executed, if more than one program is purely computational tasks, this switch will reduce efficiency. For this we can verify based on yield. Yield itself is a way to save the running state of a task in a single thread, so let's review it briefly:
yiled可以保存状态,
yield
的状态保存与操作系统的保存线程状态很像,但是
yield
是代码级别控制的,更轻量级
send可以把一个函数的结果传给另外一个函数,以此实现单线程内程序之间的切换
Simple cuts can affect efficiency.
#串行执行import Timedef Consumer (res): ' Task 1: Receive data, process data ' passdef producer (): ' Task 2: Production data ' res=[] For i in range (10000000): res.append (i) return resstart=time.time () #串行执行res =producer () consumer (res ) Stop=time.time () print (Stop-start) #1.5536692142486572
# based on yield concurrency execution import timedef Wrapper (func): def Inner (*args,**kwargs): ret =func (*args,**kwargs) Next ( RET) return ret return inner@wrapperdef consumer (): While True: x= yield print (x) def producter ( Target): ' producer made value ' ' # Next (g) #相当于g. Send (None) for I in Range: time.sleep (0.5) Target.send (i) #要用send就得用两个yieldproducter (consumer ())
For single-threaded, we inevitably appear in the program of IO operations, but if we can in their own program (that is, the user program level, not the operating system level) control multiple tasks can encounter IO switch, so that the thread can be in the best state of readiness, which can be executed at any time by the CPU status , which is equivalent to maximizing our own IO operations at the user program level, for the operating system: The Buddy (the thread) seems to be in the process of computing, with less IO.
The nature of the process is that in a single thread, the user controls a task by itself when the IO block is switched on to another task to execute, to improve efficiency.
So we need to find a solution that can meet the following conditions:
1. You can control the switch between multiple tasks, save the state of the task before switching (re-run, you can continue based on the paused position)
2. As a 1 supplement: can detect IO operation, in the case of an IO operation occurs only to switch
Iv. Greenlet
Greenlet module and yield no difference, just simple cut, and efficiency independent.
Just a little better than yield, a little easier when cutting. But it still hasn't solved the efficiency.
Greenlet allows you to cut back and forth between multiple tasks
# Greenlet Example from Greenlet import Greenletimport timedef Eat (name): print ('%s eat 1 '%name) time.sleep # It is not cut when it encounters Io, it has to be gevent g2.switch (' Egon ') print ('%s eat 2 '%name) G2.switch () def play (name): Print ('%s play 1 '%name) g1.switch () print ('%s play 2 '%name) G1=greenlet (Eat) G2=greenlet (play) G1.switch (' Egon ') #可以在第一次switch时传入参数, do not need
So the above method is not feasible, then this is used in the Gevert, that is, the association process. Solves the problem of single-threaded implementation concurrency and improves efficiency
v. Introduction of Gevent
Gevent is a third-party library that makes it easy to implement concurrent or asynchronous programming through Gevent, and the main pattern used in Gevent is Greenlet,
It is a lightweight process that accesses Python in the form of a C expansion module. Greenlet all run inside the main program operating system process, but they are dispatched in a collaborative manner.
# usage g1=gevent.spawn (func,1,,2,3,x=4,y=5) creates a co-object g1,spawn the first argument in parentheses is the function name, such as Eat, which can be followed by multiple arguments, which can be positional arguments or keyword arguments. Are all passed to the function eat g2=# wait for the end of the G1 # wait for the G2 to end # or two-step cooperation step: Gevent.joinall ([g1,g2])g1.value# get func1 return value
For example
# Some methods of Gevent (important) from gevent import monkey;monkey.patch_all () Import Geventimport timedef Eat (name): print ('%s eat 1 '%name) time.sleep (2) #我们用等待的时间模拟IO阻塞 "in the Gevent module to use Gevent.sleep (2) to indicate the time of the wait however we often use time.sleep () Use the habit, then some people think can use Time.sleep (), then also not not to. To use, you have to import from gevent import monkey;monkey.patch_all () This sentence if you do not import directly with the Time.sleep (), you can not achieve the effect of single-threaded concurrency " # Gevent.sleep (2) print ('%s eat 2 '%name) return ' Eat ' Def Play (name): print ('%s play 1 '%name) Time.sleep (3) # Gevent.sleep (3) print ('%s play 2 '%name) return ' paly ' #当有返回值的时候, The Gevent module also provides operations to return results start = Time.time () G1 = Gevent.spawn (Eat, ' Egon ') #执行任务g2 = Gevent.spawn (play, ' Egon ') # The arguments for G1 and G2 can be different # g1.join () #等待g1 # g2.join () #等待g2 # The two sentences above can also be written gevent.joinall ([g1,g2]) print (' Master ', Time.time ()-start) #3.001171588897705print (g1.value) print (G2.value)
It should be stated that:
Gevent.sleep (2) simulates an IO block that gevent can recognize,
and Time.sleep (2) or other blocking, gevent is not directly recognized by the need to use the following line of code, patching, you can identify the
From gevent import Monkey;monkey.patch_all () must be placed in front of the patched person, such as the Time,socket module
Or we simply remember: To use gevent, you need to put the from Gevent import Monkey;monkey.patch_all () to the beginning of the file
Vi. synchronization of Gevent to asynchronous
From gevent import spawn,joinall,monkey;monkey.patch_all () Import timedef Task (PID): "" " Some Non-deterministic task "" " time.sleep (0.5) print (' task%s done '% PID) def synchronous (): For I in range (): Task (i) def asynchronous (): g_l=[spawn (task,i) for me in range () Joinall (g_l) if __name__ = = ' __main __ ': print (' Synchronous: ') synchronous () print (' Asynchronous: ') asynchronous () # An important part of the above program is to encapsulate the task function into the gevent.spawn of the Greenlet internal thread. The initialized greenlet list is stored in the array threads, which is passed to the Gevent.joinall function, which blocks the current process and executes all the given Greenlet. The execution process will not continue until all greenlet have been executed.
VII. Application Examples of gevent
#协程应用爬虫from gevent import monkey;monkey.patch_all () #打补丁import Geventimport Requestsimport timedef get_page (URL): Print (' Get:%s '%url) response = requests.get (URL) if response.status_code== $: #下载成功的状态 print ('%d bytes received from:%s '% (Len (response.text), URL)) start=time.time () Gevent.joinall ([Geven T.spawn (get_page, ' http://www.baidu.com '), Gevent.spawn (get_page, ' https://www.yahoo.com/'), Gevent.spawn (get_page , ' https://github.com/'),]) stop = Time.time () print (' Run time is%s '% (Stop-start))
#协程应用爬虫加了回调函数的from gevent Import joinall,spawn,monkey;monkey.patch_all () Import requestsfrom threading Import Current_ Threaddef Parse_page (res): print ('%s parse%s '% (Current_thread (). GetName (), Len (res))) def get_page (url,callback =parse_page): print ('%s GET%s '% (Current_thread (). GetName (), URL)) response=requests.get (URL) if Response.status_code = =: callback (response.text) if __name__ = = ' __main__ ': urls=[ ' https:// Www.baidu.com ', ' https://www.taobao.com ', ' https://www.openstack.org ', ] tasks=[] for URL In URLs: tasks.append (Spawn (Get_page,url)) Joinall (Tasks)
Eight, Gevent application example Two
You can also use the co-process to implement concurrency
# Server uses the co-process #!usr/bin/env python#-*-coding:utf-8-*-from gevent import Monkey; Monkey.patch_all () Import geventfrom socket import *print (' Start running ... ') def talk (CONN,ADDR): While True:dat A = CONN.RECV (1024x768) print ('%s:%s%s '% (addr[0],addr[1],data)) Conn.send (Data.upper ()) conn.close () def serv ER (ip,duankou): Server = socket (af_inet, Sock_stream) server.setsockopt (Sol_socket, SO_REUSEADDR, 1) server.bind ( (Ip,duankou)) Server.listen (5) While true:conn,addr = Server.accept () #等待链接 gevent.spawn (talk,conn,addr) #异步执行 (P =p Rocess (target=talk,args= (COON,ADDR)) # P.start ()) is equivalent to the two sentences in the open process Server.clos E () if __name__ = = ' __main__ ': Server (' 127.0.0.1 ', 8081)
#客户端开了100个进程 #!usr/bin/env python#-*-coding:utf-8-*-from multiprocessing import processfrom gevent import monkey; Monkey.patch_all () from socket import *def client (Ip,duankou): client = socket (af_inet, sock_stream) Client.connect ((Ip,duankou)) while True: client.send (' Hello '. Encode (' Utf-8 ')) data = CLIENT.RECV (1024 ) Print (Data.decode (' utf-8 ')) if __name__ = = ' __main__ ': for I in range: p = Process (Target=client, Args= ((' 127.0.0.1 ', 8081))) P.start ()
Python Full Stack Development Foundation "26th" (Concurrent.futures module, co-process, Greenlet, gevent)