Some time ago in a project, the project itself is not difficult, but the data there is a data interface service provider there, which means that the front-end access to data requires at least two HTTP requests, the first is the front-end to the back-end of the request, the second is the backend to the data interface request. Sometimes, after the backend receives a request from the front end, it may be necessary to request a number of interfaces, according to the traditional serial execution of the request method, the user experience is definitely very bad, but also a great waste of computing resources, just the previous period of time to learn the process and threading knowledge, so I spent some time, Several feasible schemes were tested and compared. At first I used real network IO to test, found that this method is affected by the network environment is relatively large, in order to be fair, with sleep (0.02) instead of network IO, the next introduction to the program and test results.
Basic scenario: Serial execution
import timedef wget(flag): time.sleep(0.02) #模拟网络io print(flag)count = 100 #进行100次请求start = time.time() #开始时刻for i in range(count): wget(i)end = time.time() #结束时刻cost = end - start #耗时print(‘cost:‘ + str(spend))
Ultimately, it takes more than 2s, and there's no doubt that this is the least efficient solution, just as a reference.
Improved Scenario: Multithreading
The multithreaded timing method obviously cannot replicate the serial execution of the test scheme, because after each thread starts, if the call to join () block the main thread, then the equivalent of serial execution, if not call join (), then end of time will be all the threads completed before the completion of the test results must be denied. So I used a stupid way: at the end of each thread to print the current timestamp, the last time stamp on the console minus the time the thread began to execute the timestamp, is run time-consuming.
import timeimport threadingmutex=threading.Lock() #初始化锁对象def wget(): time.sleep(0.02) #模拟网络io mutex.acquire() #加锁 print(‘endtime:‘+str(time.time())) #当前线程的结束时刻 mutex.release() #释放锁count = 100 #进行100次请求start = time.time() #开始时刻print(‘starttime:‘+str(start))for i in range(count): t = threading.Thread(target=wget) t.start()
The final result is about 0.08s, which is obviously much better than serial execution, but the efficiency of the program does not grow linearly as the number of threads increases, because the overhead of thread creation switching is destroyed in extreme cases. This scenario should be used with caution if the number of threads is not controllable.
Improvement Scenario: Thread pool
To address the shortcomings of the multithreading mentioned above, the thread pool is used here.
import timeimport threadingfrom concurrent.futures import ThreadPoolExecutor, as_completed, wait, ALL_COMPLETEDmutex=threading.Lock() #初始化锁对象def wget(): time.sleep(0.02) #模拟网络io mutex.acquire() #加锁 print(threading.currentThread()) mutex.release() #释放锁size = 40 #线程池大小count = 100 #进行100次请求start = time.time() #开始时刻pool = ThreadPoolExecutor(max_workers=size) #线程池对象tasks = [pool.submit(wget) for i in range(count)]wait(tasks, return_when=ALL_COMPLETED) #等待所有线程完成end = time.time() #结束时刻print(‘spend:‘ + str(end-start))
After testing, it was found that when the number of thread pool threads was set to 40 o'clock, the time-consuming was minimal, about 0.08s, equivalent to the previous scenario, but as the number of threads continued to increase, the thread pool's stability was revealed.
Improved scenario: Co-process asynchronous
import asyncioimport timeasync def wget(flag): #async关键字表明这是一个异步操作 await asyncio.sleep(0.02) #await相当于yield from print(flag)count = 100 #进行100次请求start = time.time() #开始时刻loop = asyncio.get_event_loop() #事件循环对象tasks = [wget(i) for i in range(count)]loop.run_until_complete(asyncio.wait(tasks)) #等待所有协程完成loop.close()end = time.time() #结束时刻print(‘spend:‘ + str(end - start))
The result of this program is quite shocking to me, no use of any multithreading technology, all the operations on one thread to complete, time consuming 0.06s, is the least time-consuming in all scenarios. The more exciting advantage of the process is that the cost of the co-creation is completely negligible compared to the thread, which means that more tasks can be handled by using multiple threads.
Summarize
Obviously in these scenarios, the process is the most advantageous, in the future if there is time, I will also be multi-course multi-process synergy test.
Play Python (7) Python multi-process, multi-threaded comparison