GIL
In Python, the effect of multi-threading in Python is very unsatisfactory due to historical reasons (Gil). Gil makes it possible for Python to take advantage of only one CPU core at any time, and its scheduling algorithm is simple and crude: multi-threaded, let each thread run for a period of time t, and then forcibly suspend the thread, Then run another thread, so that it repeats itself until all threads are finished.
This makes it impossible to effectively utilize the "locality" in the computer system, and frequent thread switching is not very friendly to the cache, resulting in a waste of resources.
It is said that the official python has implemented a removal of the Gil Python interpreter, but its effect is not as good as the Gil interpreter, then give up. Later, Python officially launched a "multi-process alternative to multithreading" scenario, In Python3 There are also concurrent.futures such packages, so that our program can be "simple and performance."
Multi-process/multithreaded +queue
In general, the experience of writing concurrent programs in Python is that compute-intensive tasks use multiple processes, and IO-intensive tasks use multiple processes or multithreading. In addition, because of the resource sharing, so a series of troublesome steps such as synchronization lock, code writing is not intuitive. Another good idea is to use multi-process/ Multithreading +queue method, can avoid locking such troublesome inefficient way.
The queue+ multi-process approach is now used in Python2 to handle an IO-intensive task.
Assuming that you now need to download multiple Web content and parse it, a single process is inefficient, so using multi-process/multithreading is imperative.
We can initialize a tasks queue, which will store a series of Dest_url, while opening 4 processes to the tasks to take the task and then execute, the processing results are stored in a results queue, and finally the results in results parsing. Finally, two queues are closed.
Here are some of the main logic codes.
#-*-Coding:utf-8-*-#IO密集型任务 # Multiple processes downloading multiple pages simultaneously # using queue+ multi-process # because it is IO intensive, you can also use the Threading module import multiprocessingdef Main ( ): Tasks = multiprocessing. Joinablequeue () results = multiprocessing. Queue () Cpu_count = Multiprocessing.cpu_count () #进程数目 the number of ==CPU cores create_process (tasks, results, Cpu_count) #主进程马上创建一系列进程, However, because the blocking queue tasks start empty, the secondary process is all blocked add_tasks (tasks) #开始往tasks中添加任务 Parse (tasks, results) #最后主进程等待其他线程处理完成结果def Create_ Process (tasks, results, Cpu_count): For _ in Range (cpu_count): p = multiprocessing. Process (Target=_worker, args= (tasks, results)) #根据_worker创建对应的进程 P.daemon = True #让所有进程可以随主进程结束而结束 p.start () #启动def _worker (Tasks, results): While True: #因为前面所有线程都设置了daemon =true, so no infinite loop Try:task = Tasks.get () #如果tasks中没有任务, block result = _download (Task) results.put (Result) #some exceptions do not handled Finally:tasks.task_done ( def add_tasks (tasks): For URL in Get_urls (): #get_urls () return a urls_list tasks.put (URL) def parse (tasks, results): Try Tasks.join () except Keyboardinterrupt as Err:print "Tasks has been stopped!" Print err while not results.empty (): _parse (results) if __name__ = = ' __main__ ': Main ()
Using the Concurrent.futures package in Python3
In Python3, you can use the Concurrent.futures package to write more easy-to-use multithreaded/multi-process code. It feels similar to Java's concurrent framework (for reference?)
For example, the following simple code example
Def handler (): futures = Set () with Concurrent.futures.ProcessPoolExecutor (Max_workers=cpu_count) as Executor: For task in Get_task (tasks): Future = Executor.submit (Task) Futures.add (future) def wait_for ( Futures): try: for concurrent.futures.as_completed (futures): err = futures.exception () if Not err: result = Future.result () else: raise err except Keyboardinterrupt as E: for future in Futures: future.cancel () print "Task has been canceled!" Print e return result
Summarize
If some large Python projects are written like this, the efficiency is too low. There are many existing frameworks used in Python that are more efficient to use.
But some of their own "little" program to write this is good.:)