Python's Gil and multithreaded performance in-depth analysis

Source: Internet
Author: User
Tags color representation advantage

What's Gil?

The first thing to be clear is that Gil is not a Python feature, it is a concept introduced when implementing the Python parser (CPython). Just like C + + is a set of language (syntax) standards, but can be compiled into executable code with different compilers. Well-known compilers such as Gcc,intel c++,visual C + +.

As with Python, the same piece of code can be executed through different Python execution environments such as Cpython,pypy,psyco. Like Jpython, there's no Gil. However, because CPython is the default Python execution environment in most environments. So in a lot of people's concept CPython is Python, and it's also taken for granted that Gil is a flaw in the Python language. So let's be clear: Gil is not a Python feature, and Python can be completely independent of Gil.

Why would there be Gil?

Due to physical constraints, CPU manufacturers in the core frequency of the game has been replaced by multi-core. In order to make more efficient use of multi-core processor performance, there is a multithreaded programming approach, which brings about the difficulty of data consistency and state synchronization between threads. Even in the CPU internal cache is no exception, in order to effectively solve the data synchronization between multiple caches, the vendors spent a lot of thought, but also inevitably brought a certain loss of performance.

Python, of course, can not escape, in order to take advantage of multi-core, Python began to support multithreading. The easiest way to solve the data integrity and state synchronization between multithreading is to lock it. So with the Gil this super lock, and as more and more code base developers accept this setting, they start to rely heavily on this feature (that is, the default Python internal object is Thread-safe, without having to consider additional memory locks and synchronizations when implemented).

Slowly this realization was found to be a sore and inefficient egg. But when people tried to split and remove Gil, it was hard to get rid of a large number of library code developers who relied heavily on the Gil.

To make an analogy, a "small project" like MySQL, which takes a buffer Pool mutex and splits the large lock into small locks, has taken from 5.5 to 5.6 to over 5.7 major editions for nearly 5 years, and continues. MySQL This is behind the company support and a fixed development team of products go so hard, that is not to mention the Python core development and code contributors highly community-based team?

So simply saying Gil's presence is more of a historical cause. If you push to the back, the problem of multithreading still has to face, but at least it will be more elegant than the current Gil this way.

The impact of Gil

From the above introduction and the official definition, Gil is undoubtedly a global exclusive lock. There is no doubt that the existence of global locks will have a small impact on the efficiency of multithreading. Even almost equals Python is a single-threaded program.

Then the reader will say, the global lock as long as the efficiency of the release of the diligent is not bad ah. As long as the time of the IO operation, can release the Gil, this can also improve the efficiency of the operation. Or the difference will not be less efficient than a single path. That's theoretically true, but actually? Python is worse than you think.

Here's a comparison of Python's efficiency compared to multithreading and single-threaded. Test method is simple, a loop 100 million times the counter function. One executes two times through a single thread, one multi-threaded execution. Finally, the total execution time is compared. The test environment is dual-core Mac Pro. Note: In order to reduce the impact of the loss of the line threading itself on the test results, the single-threaded code also uses threads. Just perform the sequence two times and simulate a single thread.

Sequential execution of a single-threaded (single_thread.py)

The code is as follows Copy Code

#! /usr/bin/python

From threading Import Thread
Import time

Def my_counter ():
i = 0
For _ in range (100000000):
i = i + 1
Return True

def main ():
Thread_array = {}
Start_time = Time.time ()
For Tid in range (2):
t = Thread (Target=my_counter)
T.start ()
T.join ()
End_time = Time.time ()
Print ("Total time: {}". Format (End_time-start_time))

if __name__ = = "__main__":
Main ()

Two concurrent threads executing concurrently (multi_thread.py)

The code is as follows Copy Code

#! /usr/bin/python

From threading Import Thread
Import time

Def my_counter ():
i = 0
For _ in range (100000000):
i = i + 1
Return True

def main ():
Thread_array = {}
Start_time = Time.time ()
For Tid in range (2):
t = Thread (Target=my_counter)
T.start ()
Thread_array[tid] = t
For I in range (2):
Thread_array[i].join ()
End_time = Time.time ()
Print ("Total time: {}". Format (End_time-start_time))

if __name__ = = "__main__":
Main ()

The following figure is the test result

Test results One

You can see Python is 45% slower than a single thread in a multi-threaded situation. According to previous analysis, even if there is a Gil global lock exists, serialization of multithreading should be the same as the single line Chengyu efficiency. So how could there be such a bad result?

Let's analyze the reasons for this by using the Gil implementation principle.

Defects in the current Gil design

A scheduling method based on Pcode quantity

According to the Python community, the operating system's own thread-scheduling is very mature and stable, and there's no need to do it yourself. So Python threads are a pthread of C language and are scheduled via the operating system scheduling algorithm (for example, Linux is CFS). In order for each thread to be able to use CPU time on average, Python calculates the number of micro-codes that are currently executing, forcing the Gil to be released after a certain threshold is reached. It also triggers a thread schedule for the operating system (of course, the actual context switch is determined by the operating system).

Pseudo code

The code is as follows Copy Code
While True:
Acquire GIL
For I in 1000:
Do something
Release GIL
/* Give operating System a chance to do thread scheduling * *

This pattern has no problem with only one CPU core. Any thread that is aroused can successfully obtain the Gil (because the Gil is freed only to cause thread scheduling). But when the CPU has multiple cores, the problem comes. As you can see from the pseudocode, there is almost no gap between the release Gil and the acquire Gil. So when other threads on the other core are awakened, most of the time the main thread is already getting the Gil again. The thread that wakes up at this time can only waste CPU time, watching another thread carry Gil merrily. Then reach the switching time after the status of the scheduled, and then wake up, and then wait, to this reciprocating vicious circle.

PS: Of course this implementation is primitive and ugly, and each version of Python is gradually improving the interaction between Gil and thread scheduling. For example, try holding the Gil in thread context switching, releasing the Gil while Io waits. But what's not going to change is that the Gil presence makes the operating system thread-scheduling this inherently expensive operation more extravagant.
Extended reading about Gil's impact

In order to intuitively understand the performance impact of Gil on multithreading, a test result diagram is borrowed directly here (see figure below). The figure shows two threads executing on a dual-core CPU. Two threads are CPU-intensive operator threads. The green section indicates that the thread is running and that the red portion is scheduled to be awakened by a thread when it performs a useful calculation, but the time that the Gil is unable to wait for a valid operation cannot be obtained.

As shown in the diagram, the presence of the Gil causes multithreading to not be able to do a good deal of instant multi-core CPU concurrency.

Can Python's IO-intensive threads benefit from multiple threads? Let's look at the test results below. The meaning of the color representation is consistent with the above figure. The white part indicates that the IO thread is waiting. As a result, when IO threads receive packets causing a terminal switch, it is still due to the presence of a CPU-intensive thread that can not get the Gil lock, thus endless loop waiting.


GIL IO Performance

The simple conclusion is that Python multithreading on multi-core CPUs only has a positive effect on IO-intensive computing, and when at least one CPU-intensive thread exists, the multiple-threading efficiency can be drastically reduced by Gil.

How to avoid being affected by the Gil

Said so much, if not to say the solution is just a popular science post, but the egg. Gil's so lame, is there any way around it? Let's take a look at the ready-made options.

Replacing thread with multiprocessing

The multiprocessing library is largely designed to compensate for the low efficiency of the thread library because of Gil. It fully replicates a set of thread-provided interfaces for easy migration. The only difference is that it uses multiple processes rather than multithreading. Each process has its own independent Gil, so there is no Gil scramble between processes.

Of course, multiprocessing is not a panacea. Its introduction will increase the difficulty of data communication and synchronization between threads during program implementation. Take the counter to give an example, if we want multiple threads to accumulate the same variable, for thread, declare a global variable, with thread. The lock context wraps around three lines and it's done. Multiprocessing because the process can not see each other's data, only through the main thread to declare a queue,put to get or use share memory method. This extra implementation cost makes it painfully easy to encode multithreaded programs that are inherently painful. Specific difficulties where interested readers can expand the reading of this article

With a different parser

It was mentioned earlier that since Gil is only a product of CPython, is the other parser better? Yes, parsers like Jpython and IronPython do not need Gil's help because they implement language features. However, with the use of java/c# for parser implementations, they also lost the opportunity to take advantage of the features of many of the community's C-language modules. So these parsers have always been relatively small. After all, function and performance everyone will choose the former in the early days, the is better than perfect.

So it's hopeless?

Of course, the Python community is also working very hard to constantly improve the Gil, even trying to get rid of Gil. And in each iteration has made a lot of progress. Interested readers can expand the reading of this slide

Another improvement reworking the GIL

Change the switching granularity from opcode count to based on time slice count
Prevent the last thread that released the Gil Lock from being dispatched again immediately
New Thread priority feature (high priority thread can force other threads to release the held Gil Lock)
Summarize

The Python Gil is actually a trade-off between functionality and performance, and it is especially reasonable and has an objective element that is difficult to change. From this part of the analysis, we can do some of the following simple summary:

Because of the Gil, only the IO bound scenario will get better performance with multiple threads
If the parallel computing performance of the program can consider the core part of the C module, or simply in other languages to implement
Gil will continue to exist for a long time, but it will be continually improved

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.