Python multithreaded exploration

Last Update:2017-05-30 Source: Internet

Author: User

Tags mutex

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As we have already learned, the main reason for the low efficiency of Python multithreading is the existence of the Gil, the global interpreter lock, which is the Universal interpreter lock. Here we continue to take a detailed look at the Gil's instructions and how to avoid the Gil's impact, thus improving the execution efficiency of Python multi-threading.
What is Gil?
The first thing to make clear is that the Gil is not a Python feature, it is a concept introduced when implementing the Python parser (CPython). Just like C + + is a set of language (syntax) standards, but can be compiled into executable code with different compilers. Well-known compilers such as Gcc,intel c++,visual C + +. Python is the same, and the same piece of code can be executed through different Python execution environments such as Cpython,pypy,psyco. Like the Jpython there is no Gil. However, because CPython is the default Python execution environment for most environments. So in a lot of people's concept CPython is Python, also take for granted the Gil to the Python language flaw. So let's be clear here: Gil is not a Python feature, and Python can be completely independent of the Gil.

So what is the Gil in CPython implementation? Gil full name Global interpreter lock to avoid misleading, let's take a look at the official explanation:

In CPython, the global interpreter lock, or GIL, was a mutex that prevents multiple native threads from executing Python by Tecodes at once. This lock is necessary mainly because CPython ' s memory management are not thread-safe. (However, since the GIL exists, other features has grown to depend on the guarantees that it enforces.)

Okay, does it look bad? A mutex that prevents multi-threaded concurrent execution of machine code, at first glance, is a bug-like global lock! Don't worry, we are analyzing slowly below.

Why would there be Gil?
Due to physical constraints, each CPU vendor's game on the core frequency has been replaced by multicore. In order to make more efficient use of multi-core processor performance, there is a multi-threaded programming, and the resulting is the data consistency between the threads and state synchronization difficulties. Even if the cache inside the CPU is no exception, in order to effectively solve the data synchronization between multiple caches, the manufacturers spent a lot of effort, but also inevitably brought some performance loss.

Python, of course, can not escape, in order to take advantage of multicore, Python began to support multithreading. The simplest way to solve data integrity and state synchronization between multiple threads is to lock them up. So with the Gil this super lock, and when more and more code base developers accept this setting, they start to rely heavily on this feature (that is, the default Python internal objects are thread-safe, without having to consider additional memory locks and synchronous operations when implemented).

Slowly this realization was found to be egg-sore and inefficient. But when you try to split and remove the Gil, it's hard to get rid of a lot of library code developers who are heavily dependent on Gil. How hard is that? To make an analogy, a "small project" such as MySQL, in order to split the buffer Pool mutex this large lock into small locks also took from 5.5 to 5.6 to more than 5.7 large version for nearly 5 years, and continues. What's so hard about MySQL, which is backed by a company and has a fixed development team, not to mention the highly community-based team of core development and code contributors like Python?

So simply saying that Gil's existence is more of a historical reason. If the problem of multithreading is still to be faced, but at least it will be more elegant than the present Gil.

What's the effect of Gil?
Judging from the above introduction and the official definition, Gil is undoubtedly a global exclusive lock. There is no doubt that the existence of a global lock has a small effect on the efficiency of multithreading, and even almost equals that Python is a single-threaded program. Then the reader will say that the global lock as long as the release of the diligent efficiency is not bad AH. As long as it takes time-consuming IO operations to release the Gil, it can also improve operational efficiency. Or no worse than the efficiency of a single thread. Theoretically so, and in fact? Python is worse than you think.

Below we compare the efficiency of Python in multi-threaded and single-threaded. The test method is simple, a counter function that loops 100 million times. One executes two times through a single thread, one multithreaded execution. Finally, the total execution time is compared. The test environment is dual-core Mac Pro. Note: In order to reduce the impact of the performance loss of the line libraries itself on the test results, the single threaded code also uses threads. Just sequence the execution two times, simulating a single thread.

Sequential execution of single-threaded (single_thread.py)

#! /usr/bin/python

From threading Import Thread
Import time

Def my_counter ():
i = 0
For _ in range (100000000):
i = i + 1
Return True

def main ():
Thread_array = {}
Start_time = Time.time ()
For Tid in range (2):
t = Thread (Target=my_counter)
T.start ()
T.join ()
End_time = Time.time ()
Print ("Total time: {}". Format (End_time-start_time))

if __name__ = = ' __main__ ':
Main ()
Simultaneous execution of two concurrent threads (multi_thread.py)

#! /usr/bin/python

From threading Import Thread
Import time

Def my_counter ():
i = 0
For _ in range (100000000):
i = i + 1
Return True

def main ():
Thread_array = {}
Start_time = Time.time ()
For Tid in range (2):
t = Thread (Target=my_counter)
T.start ()
Thread_array[tid] = t
For I in range (2):
Thread_array[i].join ()
End_time = Time.time ()
Print ("Total time: {}". Format (End_time-start_time))

if __name__ = = ' __main__ ':
Main ()
After testing, the final result is that single-threaded execution takes 11.5 seconds, while multi-threading is 16.2 seconds.

Surprisingly, Python is 45% slower than a single thread in multi-threaded situations. According to the previous analysis, even if there is a Gil global lock exist, the serialization of multithreading should be the same as the single-line threads efficiency. So how could it have been such a bad result?

Let's analyze the reason for this by the Gil implementation principle.

Defects in the current Gil design

Scheduling mode based on Pcode number

According to the idea of the Python community, the operating system itself thread scheduling is very mature and stable, there is no need to do a set. So the python thread is a pthread of the C language and is dispatched via the operating system scheduling algorithm (for example, Linux is CFS). To allow each thread to take advantage of CPU time on average, Python calculates the number of micro-codes currently executing, forcing the Gil to be freed after a certain threshold is reached. At this point, the operating system's thread scheduling is also triggered (although it is true that the context switch is self-determined by the operating system).

Pseudo code

While True:
Acquire GIL
For I in 1000:
Do something
Release GIL
/* Give Operating System a chance to do thread scheduling */
This mode has no problem in the case of only one CPU core. Any thread that is aroused can successfully get to the Gil (because only the Gil is freed to cause thread scheduling). But when the CPU has multiple cores, the problem comes. From the pseudo-code you can see that there is almost no gap between the release Gil and the acquire Gil. So when other threads on other cores are awakened, most of the time the main thread has acquired the Gil again. The thread that was awakened at this time can only waste CPU time in vain, watching another thread carry the Gil happily. Then reach the switching time after entering the state to be scheduled, and then be awakened, and then wait, in order to this cycle of cyclic.

PS: Of course this implementation is primitive and ugly, and each version of Python is gradually improving the interaction between Gil and thread scheduling. For example, try to hold the Gil in a thread context switch, release the Gil while waiting on Io, and so on. But what cannot be changed is that the existence of the Gil makes the expensive operation of operating system thread scheduling more extravagant.

To intuitively understand the performance impact of the Gil for multithreading, a test result diagram is directly borrowed here (see). The figure shows the execution of two threads on a dual-core CPU. Two threads are CPU-intensive operations threads. The green section indicates that the thread is running, and in performing a useful calculation, the red part is the thread that is scheduled to wake up, but cannot get the time that the Gil has caused a valid operation to wait.

As the diagram shows, the existence of the Gil causes multithreading to not be very good immediately multi-core CPU concurrency processing capability.

Can Python's IO-intensive threads benefit from multi-threading? Let's take a look at the test results below. The color represents the meaning and consistency. The white part indicates that the IO thread is waiting. It can be seen that when the IO thread receives the packet caused by the terminal switchover, still due to the presence of a CPU-intensive thread, the Gil lock cannot be obtained, resulting in endless loop waiting.

The simple summary is that Python's multithreading on a multicore CPU only has a positive effect on IO-intensive computing, and when there is at least one CPU-intensive thread present, the multi-threading efficiency can be greatly reduced by the Gil.

How to avoid being affected by the Gil
Say so much, if not to say the solution is just a science post, and then eggs. Gil is so rotten, is there any way to get around it? Let's take a look at some of the ready-made solutions.

* Replace thread with multiprocessing

The emergence of the multiprocessing library is largely to compensate for the inefficiency of the thread library because of the Gil. It completely replicates a set of thread-provided interfaces for easy migration. The only difference is that it uses multiple processes rather than multithreading. Each process has its own independent Gil, so there will not be a process of Gil scramble.

Of course multiprocessing is not a panacea. Its introduction increases the difficulty of data communication and synchronization between threads when the program is implemented. Take the counter to give an example, if we want multiple threads to accumulate the same variable, for thread, declare a global variable, with thread. Lock's context wraps around three lines and it's done. Multiprocessing because the process can not see the other side of the data, only through the main thread to declare a queue,put get or share memory method. This extra cost of implementation makes it even more painful to encode a very painful multithreaded program. Specific difficulties in which interested readers can expand to read this article

* with other parsers

It was also mentioned that since Gil is only a product of CPython, is it better to have other parsers? Yes, parsers like Jpython and IronPython don't need Gil's help because they implement language features. However, with the use of java/c# for parser implementations, they also lost the opportunity to take advantage of the many useful features of the community's C-language modules. So these parsers have always been relatively small. After all, the function and performance of everyone in the beginning will choose the former, done is better than perfect.

So it's hopeless?

Of course the Python community is also working very hard to constantly improve the Gil, even trying to remove the Gil. And in each iteration has made a lot of progress. Interested readers can extend the reading of this slide another improvement reworking the Gil-change the granularity of the switch from opcode count based to time slice count-avoid the last time the thread of the GIL lock is dispatched again immediately- New Thread priority feature (high-priority threads can force other threads to release the Gil Lock held)

Summarize
The Python Gil is actually the product of the tradeoff between function and performance, which is especially reasonable and has more difficult to change objective factors. From this analysis, we can do some simple sums:-because the Gil exists, only the IO bound scenario will get better performance of multiple threads-if the parallel computing performance of the program can consider the core part of the C module, or simply in other languages to implement- The Gil will continue to exist for a longer period of time, but it will be continuously improved.

Python multithreaded exploration

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More