Python multithreading and multi-process who is faster?

Source: Internet
Author: User
Tags cpu usage

Python multi-process and multi-threaded who faster
    • python3.6
    • Threading and multiprocessing
    • Quad core + Samsung 250G-850-SSD

Since programming with multi-process and multi-threading, it is not clear who is faster. Many on the web say that Python is more process faster because of the Gil (Global interpreter Lock). But when I was writing the code, the test time was multithreaded faster, so what was going on? Recently do Word segmentation work, the original code speed is too slow, want to speed up, so to explore the effective method (text at the end of the code and)

Here comes a diagram of the program's results, which shows the threads and processes who are faster

Some definitions

Parallelism means that two or more events occur at the same time. Concurrency refers to two or more events that occur at the same time interval

A thread is the smallest unit that the operating system can perform operations on. It is included in the process and is the actual operating unit of the process. The execution instance of a program is a process.

Implementation process

And the multithreading inside the python obviously got to get Gil, execute code, and finally release Gil. So because Gil, multi-threaded time does not get, in fact, it is a concurrent implementation , that is, multiple events, within the same time interval occurs.

But the process has an independent Gil, so it can be implemented in parallel . Therefore, for multi-core CPUs, it is theoretically more efficient to use multi-process resources.

Real problems

Python's multi-threaded figure is often seen in tutorials on the web. such as the web crawler tutorial, port scanning tutorial.

Here with the port scan, you can implement the following script with multiple processes, you will find Python multi-process faster. So, isn't that a contradiction to our analysis?

import sys,threadingfrom socket import *host = "127.0.0.1" if len(sys.argv)==1 else sys.argv[1]portList = [i for i in range(1,1000)]scanList = []lock = threading.Lock()print(‘Please waiting... From ‘,host)def scanPort(port):    try:        tcp = socket(AF_INET,SOCK_STREAM)        tcp.connect((host,port))    except:        pass    else:        if lock.acquire():            print(‘[+]port‘,port,‘open‘)            lock.release()    finally:        tcp.close()for p in portList:    t = threading.Thread(target=scanPort,args=(p,))    scanList.append(t)for i in range(len(portList)):    scanList[i].start()for i in range(len(portList)):    scanList[i].join()
Who's faster?

Because of the problem with Python locks, threads compete for locks, switch threads, and consume resources. So, take a bold guess:

In CPU-intensive tasks, multiple processes are faster or better, while IO-intensive, multi-threading can effectively improve efficiency.


Let's take a look at the following code:

Import timeimport Threadingimport multiprocessingmax_process = 4max_thread = Max_processdef Fun (n,n2): #cpu密集型 for I in range (0,n): to J in range (0, (int) (N*N*N*N2)): T = i*jdef Thread_main (n2): Thread_list = [] F or I in range (0,max_thread): t = Threading.    Thread (target=fun,args= (50,N2)) Thread_list.append (t) start = Time.time () print (' [+] much thread start ') For I in Thread_list:i.start () for I in Thread_list:i.join () print (' [-] much thread use ', Time.time ()-start, ' s ') def Process_main (n2): p = multiprocessing.    Pool (max_process) for I in Range (0,max_process): P.apply_async (func = fun,args= (50,n2)) start = Time.time () Print (' [+] much process start ') p.close () #关闭进程池 p.join () #等待所有子进程完毕 print (' [-] much process use ', time.time ()- Start, ' s ') if __name__== ' __main__ ': Print ("[++]when n=50,n2=0.1:") Thread_main (0.1) Process_main (0.1) print ("[+ +]when n=50,n2=1: ") Thread_main (1)   Process_main (1) print ("[++]when n=50,n2=10:") Thread_main (Ten) Process_main (10) 


The results are as follows:

As you can see, the gap is getting bigger when CPU usage is getting higher (the more code loops). Verify our Guess

CPU and IO-intensive
    1. CPU-intensive code (various loop processing, counting, etc.)
    2. IO-Intensive code (file processing, web crawler, etc.)

Judging method:

    1. Directly look at CPU utilization, hard disk IO read and write speed
    2. Calculate more->cpu; time waits for more (such as web crawler)->io
    3. Please own Baidu
Reference

Why is it recommended to use multiple processes instead of multithreading in Python?
How to tell if a process is IO dense or CPU intensive
Take care of Python multithreading and multi-process

Python multithreading and multi-process who is faster?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.