How to Write concurrent programs in Python and write concurrency in python
GIL
In Python, due to historical reasons (GIL), the effect of multithreading in Python is very unsatisfactory. GIL allows Python to use only one CPU core at any time, and its scheduling algorithm is simple and crude: in multithreading, every thread can run for a period of time t, and then force the thread to be suspended, then run other threads until all threads are finished.
This makes it impossible to effectively use the "locality" in the computer system. Frequent thread switching is not very friendly to the cache, resulting in a waste of resources.
It is said that Python has implemented a Python interpreter to remove GIL, but it is not as effective as a GIL interpreter. later, Python officially launched the "using multi-process to replace multithreading" solution, which also included concurrent in Python3. the future packages allow us to write programs to achieve "simplicity and performance ".
Multi-process/multi-thread + Queue
In general, the experience of writing concurrent programs in Python is: computing-intensive tasks use multiple processes, IO-intensive tasks use multiple processes or multithreading. in addition, because resource sharing is involved, synchronization locks and other troublesome steps are required, and the code writing is not intuitive. another good idea is to use the multi-process/multi-thread + Queue method to avoid the trouble and inefficiency of locking.
In Python2, Queue + multi-process is used to process an I/O-intensive task.
If you need to download and Parse Multiple web pages, the efficiency of a single process is very low, so it is imperative to use multi-process/multi-thread.
We can initialize a tasks queue, which will store a series of dest_urls, enable four processes to fetch tasks from tasks and then execute them. The processing results are stored in a results queue, finally, parse the results in results. close two queues.
Below are some major logic code.
#-*-Coding: UTF-8-*-# IO-intensive tasks # multiple processes download multiple webpages at the same time # Use Queue + multi-process # because it is IO-intensive, so we can also use the threading module to import multiprocessingdef main (): tasks = multiprocessing. joinableQueue () results = multiprocessing. queue () cpu_count = multiprocessing. cpu_count () # Number of processes = number of CPU cores create_process (tasks, results, cpu_count) # The main process immediately creates a series of processes, but the task started to be empty because of the blocking queue, all the sub-processes are blocked add_tasks (tasks) # Start to add the task parse (tasks, results) to tasks # Finally, the main process waits for other threads to finish processing. Def create_process (tasks, results, cpu_count): for _ in range (cpu_count): p = multiprocessing. process (target = _ worker, args = (tasks, results) # create the corresponding Process p according to _ worker. daemon = True # enable all processes to end with the completion of the main process p. start () # start def _ worker (tasks, results): while True: # Because daemon = True is set for all the preceding threads, there is no infinite loop. try: task = tasks. get () # if there is no task in tasks, the result = _ download (task) results will be blocked. put (result) # some exceptions do not handled f Inally: tasks. task_done () def add_tasks (tasks): for url in get_urls (): # get_urls () return a urls_list tasks. put (url) def parse (tasks, results): try: tasks. join () Counter t KeyboardInterrupt as err: print "Tasks has been stopped! "Print err while not results. empty (): _ parse (results) if _ name _ = '_ main _': main ()
Use the concurrent. ures package in Python3
In Python3, you can use the concurrent. ures package to write more simple and easy-to-use multi-threaded/multi-process code. It feels similar to the concurrent framework of Java (for reference ?)
For example, the following simple code example
def handler(): futures = set() with concurrent.futures.ProcessPoolExecutor(max_workers=cpu_count) as executor: for task in get_task(tasks): future = executor.submit(task) futures.add(future)def wait_for(futures): try: for future in concurrent.futures.as_completed(futures): err = futures.exception() if not err: result = future.result() else: raise err except KeyboardInterrupt as e: for future in futures: future.cancel() print "Task has been canceled!" print e return result
Summary
If some large Python projects are also written in this way, the efficiency is too low. There are many existing frameworks in Python to use them more efficiently.
However, it is good to write some of your own "cool" programs .:)
Articles you may be interested in:
- Example of concurrent and sequential running of python thread
- Example of concurrent programming in Python
- How to monitor website running exceptions and send emails using python
- Getting started with concurrent programming using the greenlet package in Python
- Example of how to use Python multi-process concurrency (multiprocessing)