Single thread concurrency
Concurrency is implemented based on a single thread, i.e. only one main thread (obviously one of the CPUs available) is implemented concurrently, so we need to look back at the nature of Concurrency: Toggle + Save State
The CPU is running a task that, in both cases, will be cut off to perform other tasks (the switchover is controlled by the operating system), in which case the task has been blocked, and the task has been calculated for too long or has a higher priority program replaced it
The second case does not improve efficiency, only to allow the CPU to rain and equitably, to achieve the seemingly all tasks are "simultaneous" effect, if more than one task is pure calculation, this switch will reduce efficiency. For this we can verify based on yield. Yield itself is a way to save the running state of a task in a single thread, so let's review it briefly:
#1 yiled可以保存状态,yield的状态保存与操作系统的保存线程状态很像,
但是yield是代码级别控制的,更轻量级
#2 send可以把一个函数的结果传给另外一个函数,以此实现单线程内程序之间的切换
Simply switching will reduce operational efficiency
#串行执行
Import time
DEF consumer (RES):
"Task 1: Receive data, process data"
Pass
Def producer ():
"Task 2: Production data"
Res=[]
For I in Range (10000000):
Res.append (i)
return res
Start=time.time ()
#串行执行
Res=producer ()
Consumer (RES) #写成consumer (producer ()) Reduces execution efficiency
Stop=time.time ()
Print (Stop-start) #1.5536692142486572
#基于yield并发执行
Import time
DEF consumer ():
"Task 1: Receive data, process data"
While True:
X=yield
Def producer ():
"Task 2: Production data"
G=consumer ()
Next (g)
For I in Range (10000000):
G.send (i)
Start=time.time ()
#基于yield保存状态, implement two tasks to switch back and forth directly, that is, the effect of concurrency
#PS: If you add a print to each task, you can see clearly that the print of two tasks is one time I, that is, executed concurrently.
Producer ()
Stop=time.time ()
Print (Stop-start) #2.0272178649902344
The first case of switching. In the case of a task encountered Io, cut to the task two to execute, so that the task can be used to block the time to complete the task two calculation, the increase in efficiency is this.
Yield does not meet IO switching
import time
def consumer():
‘‘‘任务1:接收数据,处理数据‘‘‘
while True:
x=yield
def producer():
‘‘‘任务2:生产数据‘‘‘
g=consumer()
next(g)
for i in range(10000000):
g.send(i)
time.sleep(2)
start=time.time()
producer() #并发执行,但是任务producer遇到io就会阻塞住,并不会切到该线程内的其他任务去执行
stop=time.time()
print(stop-start)
For single-threaded, we inevitably have IO operations in the program, but if we can control multiple tasks in our own program (that is, at the user program level, not at the operating system level), you can switch to another task to calculate when one task encounters io blocking. This ensures that the thread is in the best possible state, that it can be executed at any time by the CPU, and that we can hide our IO operations to the maximum extent possible at the user program level, thus confusing the operating system to see that the thread seems to have been calculating, the IO is relatively small, This allows more CPU execution permissions to be allocated to our threads.
The nature of the process is that in a single thread, the user controls a task by itself when the IO block is switched on to another task to execute, to improve efficiency. To achieve this, we need to find a solution that can meet the following conditions:
#1. 可以控制多个任务之间的切换,切换之前将任务的状态保存下来,以便重新运行时,
可以基于暂停的位置继续执行。
#2. 作为1的补充:可以检测io操作,在遇到io操作的情况下才发生切换
Introduction of co-process
Co-process: is a single-threaded concurrency, also known as micro-threading, fiber. English name Coroutine. One sentence describes what a thread is: The process is a lightweight thread of user-state, that is, the process is scheduled by the user program itself.
It should be emphasized that:
#1. python的线程属于内核级别的,即由操作系统控制调度(如单线程遇到
io或执行时间过长就会被迫交出cpu执行权限,切换其他线程运行)
#2. 单线程内开启协程,一旦遇到io,就会从应用程序级别(而非操作系统)
控制切换,以此来提升效率(!!!非io操作的切换与效率无关)
Compared to the operating system control thread switching, the user in a single-threaded control of the switch process
The advantages are as follows:
#1. 协程的切换开销更小,属于程序级别的切换,操作系统完全感知不到,因而更加轻量级
#2. 单线程内就可以实现并发的效果,最大限度地利用cpu
Disadvantages are as follows:
#1. 协程的本质是单线程下,无法利用多核,可以是一个程序开启多个进程,
每个进程内开启多个线程,每个线程内开启协程
#2. 协程指的是单个线程,因而一旦协程出现阻塞,将会阻塞整个线程
Summary of the characteristics of the process:
Concurrency must be implemented in only one single thread
No lock required to modify shared data
The context stack in the user program that holds multiple control flows
Additional: A co-process encountered IO operation automatically switch to other Io,yield (how to implement detection, Greenlet can not be implemented, the use of the Gevent module (select mechanism))
Greenlet
If we had 20 tasks within a single thread, it would be cumbersome to use the yield generator in order to switch between multiple tasks (we need to get the generator initialized once before calling send ...). Very cumbersome), while using the Greenlet module can be very simple to achieve these 20 tasks directly switching
#安装
pip3 install greenlet
from greenlet import greenlet
def eat(name):
print(‘%s eat 1‘ %name)
g2.switch(‘egon‘)
print(‘%s eat 2‘ %name)
g2.switch()
def play(name):
print(‘%s play 1‘ %name)
g1.switch()
print(‘%s play 2‘ %name)
g1=greenlet(eat)
g2=greenlet(play)
g1.switch(‘zhangsan‘)#可以在第一次switch时传入参数,以后都不需要
Simple switching (in the absence of an IO or a duplication of memory space) will slow down the execution of the program
#顺序执行
Import time
Def f1 ():
Res=1
For I in Range (100000000):
Res+=i
def f2 ():
Res=1
For I in Range (100000000):
Res*=i
Start=time.time ()
F1 ()
F2 ()
Stop=time.time ()
Print (' Run time is%s '% (Stop-start)) #10.985628366470337
#切换
From Greenlet import Greenlet
Import time
Def f1 ():
Res=1
For I in Range (100000000):
Res+=i
G2.switch ()
def f2 ():
Res=1
For I in Range (100000000):
Res*=i
G1.switch ()
Start=time.time ()
G1=greenlet (F1)
G2=greenlet (F2)
G1.switch ()
Stop=time.time ()
Print (' Run time is%s '% (Stop-start)) # 52.763017892837524
Greenlet just provides a more convenient way to switch than generator, when cutting to a task execution if you encounter Io, then blocking in place, still does not solve the problem of automatically switching to the IO to improve efficiency.
The code for these 20 tasks for single-line thread usually has both a computational and a blocking operation, and we can get stuck in the execution of Task 1 o'clock, using blocking time to perform task 2 .... In this way, the Gevent module is used to improve efficiency.
Gevent Introduction
#安装
pip3 install gevent
Gevent is a third-party library that makes it easy to implement concurrent or asynchronous programming through Gevent, and the main pattern used in Gevent is Greenlet, which is a lightweight coprocessor that accesses Python in the form of a C extension module. Greenlet all run inside the main program operating system process, but they are dispatched in a collaborative manner.
#用法
g1=gevent.spawn(func,1,,2,3,x=4,y=5)创建一个协程对象g1,spawn括号内第一个参数是函数名,如eat,后面可以有多个参数,可以是位置实参或关键字实参,都是传给函数eat的
g2=gevent.spawn(func2)
g1.join() #等待g1结束
g2.join() #等待g2结束
#或者上述两步合作一步:gevent.joinall([g1,g2])
g1.value#拿到func1的返回值
Automatically switch tasks when IO blocking is encountered
import gevent
def eat(name):
print(‘%s eat 1‘ %name)
gevent.sleep(2)
print(‘%s eat 2‘ %name)
def play(name):
print(‘%s play 1‘ %name)
gevent.sleep(1)
print(‘%s play 2‘ %name)
g1=gevent.spawn(eat,‘zhangsan‘)
g2=gevent.spawn(play,name=‘zhangsan‘)
g1.join()
g2.join()
#或者gevent.joinall([g1,g2])
print(‘主‘)
The above example Gevent.sleep (2) simulates an IO block that gevent can identify,
and Time.sleep (2) or other blocking, gevent is not directly recognized by the need to use the following line of code, patching, you can identify the
From gevent import Monkey;monkey.patch_all () must be placed in front of the patched person, such as the Time,socket module
Or we simply remember: To use gevent, you need to put the from Gevent import Monkey;monkey.patch_all () to the beginning of the file
from gevent import monkey;monkey.patch_all()
import gevent
import time
def eat():
print(‘eat food 1‘)
time.sleep(2)
print(‘eat food 2‘)
def play():
print(‘play 1‘)
time.sleep(1)
print(‘play 2‘)
g1=gevent.spawn(eat)
g2=gevent.spawn(play_phone)
gevent.joinall([g1,g2])
print(‘主‘)
Gevent Synchronous and asynchronous
from gevent import spawn,joinall,monkey;monkey.patch_all()
import time
def task(pid):
"""
Some non-deterministic task
"""
time.sleep(0.5)
print(‘Task %s done‘ % pid)
def synchronous():
for i in range(10):
task(i)
def asynchronous():
g_l=[spawn(task,i) for i in range(10)]
joinall(g_l)
if __name__ == ‘__main__‘:
print(‘Synchronous:‘)
synchronous()
print(‘Asynchronous:‘)
asynchronous()
#上面程序的重要部分是将task函数封装到Greenlet内部线程的gevent.spawn。
初始化的greenlet列表存放在数组threads中,此数组被传给gevent.joinall 函数,
后者阻塞当前流程,并执行所有给定的greenlet。执行流程只会在 所有greenlet执行完后才会继续向下走。
An example of application of gevent
Application of the process: Crawler
from gevent import monkey;monkey.patch_all()
import gevent
import requests
import time
def get_page(url):
print(‘GET: %s‘ %url)
response=requests.get(url)
if response.status_code == 200:
print(‘%d bytes received from %s‘ %(len(response.text),url))
start_time=time.time()
gevent.joinall([
gevent.spawn(get_page,‘https://www.python.org/‘),
gevent.spawn(get_page,‘https://www.yahoo.com/‘),
gevent.spawn(get_page,‘https://github.com/‘),
])
stop_time=time.time()
print(‘run time is %s‘ %(stop_time-start_time))
协程应用:爬虫
Identify the QR code in the chart, and welcome to the Python Treasure Book
Python Concurrent Programming Association Process