Python multithreading why can't take advantage of multi-core CPU

Source: Internet
Author: User

The GIL and Python threads are entangled

What's GIL? What effect does it have on our Python program? Let's look at a question first. What is the CPU usage of this Python program?

# do not imitate at work, Danger:)dead_loop():    Truepassdead_loop()     

What is the answer, taking up 100% CPU? That's a single core! It has to be an antique CPU without Hyper-threading. On my dual-core CPU, this dead loop only eats up one of my core workloads, that is, it consumes only 50% CPUs. How can you make it occupy 100% of the CPU on a dual-core machine? The answer is very easy to think of, with two threads on the line, the thread is not exactly the concurrent sharing CPU computing resources. Unfortunately the answer is right, but it's not that simple. The following program has a dead loop thread outside the main thread

 import threadingdef dead_loop  (): while true: pass # a new Dead loop thread t = threading. Thread (target=dead_loop) t. Start () # main thread also goes into the dead loop dead_loop () t< Span class= "O". join ()             

According to the truth, it should be able to occupy two cores of CPU resources, but the actual operation of the situation is nothing change, or only accounted for the 50% CPU. Why is that? Is the python thread not a native thread of the operating system? Open System Monitor to find out how this python process, which accounted for 50%, is actually running on two threads. Why can't these two dead loops take up dual-core CPU resources? Actually, the Black hand behind the scenes is GIL.

GIL's Myth: Pain and happiness

The GIL is all called Global interpreter lock , meaning the whole interpreter lock. In the mainstream implementation of the Python language CPython, the GIL is a genuine global lock, and when the interpreter interprets the execution of any Python code, it needs to be acquired before it can be released in the case of an I/O operation. If the program is purely computational and there is no I/O operation, the interpreter will release the lock every 100 times, allowing other threads to execute (this number can be adjusted by Sys.setcheckinterval ). So while the CPython line libraries directly encapsulates the native thread of the operating system, the CPython process is a whole, and at the same time only one of the threads that gets the Gil is running, and the other threads are waiting for the Gil to be released. This explains the results of our experiments above: although there are two threads of the dead loop, and there are two physical CPU cores, because of the GIL limit, two threads just do ticks and the total CPU usage is slightly below 50%.

It seems that Python is not a force. GIL directly causes CPython to not take advantage of physical multicore performance acceleration operations. Then why is there such a design? I guess it's still a matter of history. Multi-core CPU in the 1990 's also belongs to the class of sci-fi, Guido van Rossum in the creation of Python, also can not imagine his language will someday be used to very likely 1000+ a core CPU above, a global lock to fix multithreading security in that era should be the simplest economic design. Simple and satisfying, it is the right design (it should be suitable for design, not good or bad). Strange only the development of hardware is too fast, Moore's law to the software industry dividends so soon to the end. In just 20 years, code workers can't expect the old software to run faster by simply upgrading the CPU. In the multicore era, programming free lunch was gone. If the program does not use concurrency to squeeze the operational performance of each core, it means that it will be eliminated. This is true of software, as is the language. What about the Python response?

Python's response is simple, status quo. The GIL is still in the latest Python 3. The reason why not remove, but the following points:

    • To practice the martial, brandished from the palace:

      CPython's GIL is intended to protect all global interpreters and environment state variables. If you remove the GIL, multiple finer-grained locks are required to protect the many global states of the interpreter. Or use the Lock-free algorithm. Either way, multithreading security is much harder than using a single GIL lock. And the changed object is a 20-year-old CPython code tree, and no matter how many third-party extensions are dependent on GIL. For the Python community, this is not the same as brandished from the palace, the re-visit.

    • Even from the palace, it may not be successful:

      A bull man once made a validation CPython, removed the GIL and added more fine-grained locks. But after a real test, this version has a lot of performance degradation for a single-threaded application, which is better than the GIL version only if the physical CPU used exceeds a certain number. It's no wonder. A single thread would not need any locks. Lock GIL, a coarse-grained lock, is certainly much faster than managing a large number of fine-grained locks alone. And now most of the Python programs are single-threaded. Furthermore, in terms of demand, using Python is by no means a fancy for its computational performance. Even with multicore, its performance is unlikely to be the same as C + +. It took a lot of effort to take the GIL away, instead of making most of the programs slowed down, this is not the opposite.

    • Is it true that Python's excellent language gives up the multicore era simply because of the difficulty and significance of the change? In fact, do not change the most important reason is also: no self-palace, the same can be successful!

Other Martial

And besides cutting off GIL, there's a way to make Python live in the multicore age. Let's go back to the first question in this article: How can This dead-loop Python script take up 100% of the CPU on a dual-core machine? In fact, the simplest answer should be: Run the two python dead loop program! That is, with two python processes that occupy a CPU core, respectively. Indeed, multi-process is also a good way to take advantage of multiple CPUs. It's just that the memory address space between processes is independent, and it's much more cumbersome to communicate with each other. In this way, Python introduced the multiprocessing, a multi-process standard library in 2.6, to simplify the process of Python programming to a similar level of multithreading, greatly reducing the GIL's inability to take advantage of multi-core embarrassment.

This is just one way, if you do not want to use a multi-process such heavyweight solution, there is a more thorough scenario, abandon Python, instead of C + +. Of course, you do not have to do so, you just need to write the key parts in C/D + + as Python extension, the other parts are still written in Python, so that Python's return to Python,c C. General computing-intensive programs are written in C code and integrated into Python scripts (such as the NumPy module) in an extended way. In the extension it is possible to create native threads in C without locking the GIL, making full use of the CPU's computational resources. However, writing Python extensions is always a complex one. Fortunately, Python has another mechanism for interoperability with C modules: ctypes

Use cTYPES to bypass GIL

Unlike Python extensions, cTYPES allows Python to directly invoke any of the export functions of the C Dynamic library. All you have to do is write some Python code with ctypes. The coolest thing is that cTYPES will release the GIL before calling the C function. So, we can use cTYPES and C dynamic libraries to make Python take full advantage of the computational power of the physical kernel. Let's actually verify that this time we write a dead loop function with C

extern"C"{  deadloop(True}}      

Build the dynamic library libdead_loop.so ( dead_loop.dll on Windows) with the above C code compilation

, and then use cTYPES to load the dynamic library in Python, calling the deadloop in the main thread and the new thread, respectively.

 from ctypes import *from threading import threadlib = span class= "n" >cdll. Loadlibrary ( "libdead_loop.so" ) t = thread (target= lib. Deadloop) t. Start () lib. Deadloop ()               

This time again to see the system Monitor,python interpreter process has two threads running, and the dual-core CPU is full, ctypes really give force! It should be recalled that the GIL was freed by cTYPES before invoking the C function. But the Python interpreter will still lock the GIL when executing any piece of Python code. If you use Python's code as the callback of the C function, the GIL will jump out as long as the Python callback method is executed. For example, the following:

extern"C"{  Callbackcall(CallbackCallbackCallback} } 
FromcTYPESImport*FromThreadingImportThreadDefDead_loop():WhileTrue:PassLib=Cdll.LoadLibrary("Libcall.so")callback = cfunctype (None) callback = callback ( Dead_loop) t = thread ( target=lib. Callargs= (callback t. Start () lib. Call (callback)        

Note the difference between this and the previous example, this time the dead loop occurs in Python code (deadloop function) and the C code is only responsible for invoking this callback. Running this example, you will find that the CPU usage is still only 50% less. GIL's working again.

In fact, from the above example, we can also see an application of ctypes, that is, the use of Python to write automated test cases, through the ctypes directly call the C module interface to the module for black box testing, even if it is about the module C interface multithreaded Security test, cTYPES is the same Can do.

Conclusion

Although the CPython line libraries encapsulates the native thread of the operating system, it does not take advantage of the computational power of multiple CPU cores due to the existence of the GIL. Fortunately now Python has the I-Ching tendon (multiprocessing), the star-sucking Dafa (C language extension mechanism) and the lone Nine Swords (ctypes), enough to meet the challenges of the multi-core era, GIL or not cut is not important, is not it.

Python multithreading why can't take advantage of multi-core CPU

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.