Who is faster (detailed description) for python multi-process and multi-thread? Who is more
Python3.6
Threading and multiprocessing
Quad-core + Samsung 250G-850-SSD
Since multi-process and multi-thread programming, I have never understood who is faster. Many people on the Internet say that python multi-process is faster, because GIL (Global interpreter lock ). But when I write code, the test time is faster than multithreading, so what is the problem? The word splitting work has been done recently. The original code is too slow to speed up, so I want to explore effective methods (there are code and at the end of this Article)
Here is a program result diagram to show who is faster in the thread and process.
Some definitions
Parallelism means that two or more events occur at the same time. Concurrency refers to the occurrence of two or more events at the same time interval.
A thread is the smallest unit that the operating system can schedule operations. It is included in the process and is the actual operating unit of the process. The execution instance of a program is a process.
Implementation Process
The multiple threads in python obviously need to get GIL, execute code, and finally release GIL. Therefore, GIL cannot be obtained when multithreading occurs. In fact, it is a concurrent implementation, that is, multiple events occur at the same time interval.
But the process has independent GIL, so it can be implemented in parallel. Therefore, for multi-core CPUs, multi-process is used theoretically to make better use of resources.
Practical problems
Python multithreading is often seen in online tutorials. For example, Web Crawler tutorial and Port Scan tutorial.
Taking port scanning as an example, you can use multi-process to implement the following script, and you will find that python multi-process is faster. Isn't it the opposite of our analysis?
import sys,threadingfrom socket import *host = "127.0.0.1" if len(sys.argv)==1 else sys.argv[1]portList = [i for i in range(1,1000)]scanList = []lock = threading.Lock()print('Please waiting... From ',host)def scanPort(port): try: tcp = socket(AF_INET,SOCK_STREAM) tcp.connect((host,port)) except: pass else: if lock.acquire(): print('[+]port',port,'open') lock.release() finally: tcp.close()for p in portList: t = threading.Thread(target=scanPort,args=(p,)) scanList.append(t)for i in range(len(portList)): scanList[i].start()for i in range(len(portList)): scanList[i].join()
Who is faster?
Because of the python lock issue, the thread will consume resources to compete for the lock and switch the thread. So let's make a bold guess:
In CPU-intensive tasks, multiple processes are faster or have better results. IO-intensive and multi-thread can effectively improve efficiency.
Let's take a look at the following code:
Import timeimport threadingimport multiprocessingmax_process = 4max_thread = max_processdef fun (n, n2): # cpu-intensive for I in range (0, n): for j in range (0, (int) (n * n2): t = I * jdef thread_main (n2): thread_list = [] for I in range (0, max_thread): t = threading. thread (target = fun, args = (50, n2) thread_list.append (t) start = time. time () print ('[+] much thread start') for I in thread_list: I. start () for I in thread_list: I. join () print ('[-] much thread use', time. time ()-start,'s ') def process_main (n2): p = multiprocessing. pool (max_process) for I in range (0, max_process): p. apply_async (func = fun, args = (50, n2) start = time. time () print ('[+] much process start') p. close () # close the process pool p. join () # Wait for all sub-processes to finish print ('[-] much process use', time. time ()-start,'s ') if _ name __= =' _ main _ ': print ("[++] When n = 50, n2 = 0.1: ") thread_main (0.1) process_main (0.1) print (" [++] When n = 50, n2 = 1: ") thread_main (1) process_main (1) print ("[++] When n = 50, n2 = 10:") thread_main (10) process_main (10)
The result is as follows:
As you can see, when the cpu usage is getting higher and higher (more code loops), the gap is getting bigger and bigger. Verify our conjecture
CPU and IO-intensive
1. CPU-intensive code (various cyclic processing, counting, etc)
2. IO-intensive code (File Processing, web crawler, etc)
Judgment Method:
1. Check the CPU usage and hard disk I/O read/write speed.
2. More computing-> CPU; more time wait (such as Web Crawler)-> IO
3. Baidu
Who is faster (detailed) than the above python multi-process and multi-thread is all the content shared by Alibaba Cloud. I hope to give you a reference and support for the customer's house.