Talk about Python threads & GIL

Last Update:2016-12-06 Source: Internet

Author: User

Tags mutex

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Let's talk a little bit about Python threads.

Refer to this article https://www.zhihu.com/question/23474039/answer/24695447

In short, as probably the only multi-threaded interpreted language (Perl multithreading is disabled, PHP is not multi-threading), Python multithreading is compromise, at any time only a Python interpreter in the interpretation of Python bytecode. Ruby also has thread support, and at least Ruby MRI has a Gil.

The first thing to know about GIL, full nameGlobal Interpreter Lock。后面详细介绍。

If your code is CPU-intensive, multiple threads of code There's a good chanceis performed linearly. So in this case multithreading is chicken, efficiency may be less than a single thread because there is a context switch

However: If your code is IO-intensive, multithreading can significantly improve efficiency. For example, making crawlers (I don't understand why Python's total crawlers are linked ...). But it only reminds me of this example ... ), most of the time crawlersis waiting for the socket to return data。 At this point the C code has the release Gil, and the end result is that another thread can continue executing while the thread waits for IO. In turn, youCPU-intensive code should not be written in Python... The efficiency lies there ... If you really need to use concurrent in CPU-intensive code, useMultiprocessingLibrary. This library is based on multi process implementation of the class multi thread API interface, and with pickle partial implementation of variable sharing. Add one more, if you don't know whether your code is CPU-intensive or IO-intensive, teach you a way:

Multiprocessing This module has a dummy sub module, which is an API that implements multiprocessing based on multithread. Let's say you're using a multiprocessing pool that uses multiple processes to implement the Concurrencyfrom multiprocessing import Pool If you change this code to the following, It becomes multithreaded implementation concurrencyfrom Multiprocessing.dummy import Pool

Two ways to run a bit, which speed to use whichever is OK.

UPDATE: Just found concurrent.futures this thing, including Threadpoolexecutor and Processpoolexecutor, may be simpler than multiprocessing

Above the multiprocessing. Pool and Concurrent.future don't experiment. have the chance to experiment again. there is not much python now.

Gil OK, now look back at the Gil. Reference HTTP://WWW.TUICOOL.COM/ARTICLES/7ZIRA2R The first thing to be clear is that GILis not a Python feature, it is a concept introduced when implementing the Python parser (CPython). Just like C + + is a set of language (syntax) standards, but can be compiled into executable code with different compilers. Well-known compilers such as Gcc,intel c++,visual C + +. Python is the same, the same piece of code can beCpython,pypy,psycoThe different python execution environments to execute. Like the Jpython there is no Gil. However, because CPython is the default Python execution environment for most environments. So in many people's concept CPython is Python, also take for granted GILDue to the limitations of the Python language. So let's be clear here: Gil is not a python feature, Python can be completely independent of the Gil

So what is the Gil in CPython implementation? Gil full name Global Interpreter Lock to avoid misleading, let's take a look at the official explanation:

In CPython, the global interpreter lock, or GIL, was a mutex that prevents multiple native threads from executing Python by Tecodes at once. This lock is necessary mainly because CPython ' s memory management are not thread-safe. (However, since the GIL exists, other features has grown to depend on the guarantees that it enforces.)

A mutex that prevents multi-threaded concurrent execution of machine code, at first glance, is a bug-like global lock! Don't worry, we are analyzing slowly below.

To take advantage of multicore, Python began to support multithreading. The simplest way to solve data integrity and state synchronization between multiple threads is to lock them up. So with the Gil this super lock, and when more and more code base developers accept this setting, they start to rely heavily on this feature (that is, the default Python internal objects are thread-safe, without having to consider additional memory locks and synchronous operations when implemented).

Slowly this realization was found to be egg-sore and inefficient. But when you try to split and remove the Gil, it's hard to get rid of a lot of library code developers who are heavily dependent on Gil. How hard is that? To make an analogy, "small project" such as MySQL, in order to divide the buffer Pool mutex this large lock into small locks also spent from 5.5 to 5.6 to more than 5.7 large version of the time for nearly 5 years, Ben and continues. What's so hard about MySQL, which is backed by a company and has a fixed development team, not to mention the highly community-based team of core development and code contributors like Python? So simply saying that Gil's existence is more of a historical reason. Multi-threaded problems still have to be faced if pushed back, but at least it will be more elegant than the current Gil.

The impact of the Gil

Judging from the above introduction and the official definition, Gil is undoubtedly a global exclusive lock. There is no doubt that the presence of a global lock can have a small impact on the efficiency of multithreading. Even Python is almost equal to a single-threaded program. Then the reader will say that the global lock as long as the release of the diligent efficiency is not bad AH. As long as it takes time-consuming IO operations to release the Gil, it can also improve operational efficiency. Or no worse than the efficiency of a single thread. Theoretically so, and in fact? Python is worse than you think.

Below we compare the efficiency of Python in multi-threaded and single-threaded.

The test method is simple, a counter function that loops 100 million times. One executes two times through a single thread, one multithreaded execution. Finally, the total execution time is compared. The test environment is dual-core Mac Pro. Note: In order to reduce the impact of the performance loss of the line libraries itself on the test results, the single threaded code also uses threads. Just sequence the execution two times, simulating a single thread.

Sequential execution of single-threaded (single_thread.py)

#! /usr/bin/pythonfrom Threading Import threadimport timedef my_counter ():    i = 0 for    _ in range ( 100000000):        i = i + 1    return truedef main ():    Thread_array =start_time =   Time.time () for Tid in range (2): T = Thread (target=my_counter) T.start () Thread_array[tid] = T for i In range (2end_time =   time.time () print ("Total time: {}". Format (End_time- Start_ Time)) If __name__ = = ' __main__ ': Main ()

Simultaneous execution of two concurrent threads (multi_thread.py)

#! /usr/bin/pythonfrom Threading Import threadimport timedef my_counter ():    i = 0 for    _ in range ( 100000000):        i = i + 1    return truedef main ():    Thread_array = {} start_time = time.time () for Tid in range (2): T = Thread (target=my_counter) T.start () Thread_array[tid] = t for i in range (2): thre Ad_array[i].join () End_time = time.time () print ("Total time: {}". Format (End_time- start_time)) If __name__ = = ' __main__ ': Main ()

is the test result.

You can see that Python is 45% slower than a single thread in multi-threaded situations. According to the previous analysis, even if there is a Gil global lock exist, the serialization of multithreading should be the same as the single-line threads efficiency. So how could it have been such a bad result?

The reason for this can be analyzed through the Gil implementation principle.

Current Gil design flaws based on opcode number of scheduling methods

According to the idea of the Python community, the operating system itself thread scheduling is very mature and stable, there is no need to do a set. So the python thread is a pthread of the C language and is dispatched via the operating system scheduling algorithm (for example, Linux is CFS).
In order for each thread to take advantage of CPU time on average, Python calculates the current number of micro-codes (that is, opcode) that have been executed, forcing the Gil to be released after a certain threshold is reached.
At this point, the operating system's thread scheduling is also triggered (although it is true that the context switch is self-determined by the operating system).   while True:    acquire GIL    for I in the:        do   Something    release GIL    /**

The CFS mentioned above is the Linux process scheduling algorithm, see: http://www.cnblogs.com/charlesblc/p/6135887.html

This mode has no problem in the case of only one CPU core. Any thread that is aroused can successfully get to the Gil (because only the Gil is freed to cause thread scheduling). But when the CPU has multiple cores, the problem comes. From the pseudocode you can see that release GIL acquire GIL there is almost no gap between the lines. So when other threads on other cores are awakened , most of the time the main thread has acquired the Gil again. The thread that was awakened at this time can only waste CPU time in vain, watching another thread carry the Gil happily. Then reach the switching time after entering the state to be scheduled, and then be awakened, and then wait, in order to this cycle of cyclic.

PS: Of course this implementation is primitive and ugly, and each version of Python is gradually improving the interaction between Gil and thread scheduling. For example, try to hold the Gil and then do a thread context switch, releasing the Gil while waiting on Io. But what cannot be changed is that the existence of the Gil makes the expensive operation of operating system thread scheduling more extravagant.

Extended reading on the Gil influence

To intuitively understand the performance impact of the Gil for multithreading, a test result diagram is directly borrowed here (see). The figure shows the execution of two threads on a dual-core CPU. Two threads are CPU-intensive operations threads. The green section indicates that the thread is running, and in performing a useful calculation, the red part is the thread that is scheduled to wake up, but cannot get the time that the Gil has caused a valid operation to wait.

As the diagram shows, the existence of the Gil causes multithreading to not be very good immediately multi-core CPU concurrency processing capability.

Can Python's IO-intensive threads benefit from multi-threading? Let's take a look at the test results below. The color represents the meaning and consistency. The white part indicates that the IO thread is waiting. It can be seen that when the IO thread receives the packet caused by the terminal switchover, still due to the presence of a CPU-intensive thread, the Gil lock cannot be obtained, resulting in endless loop waiting.

The simple summary is that Python's multithreading on a multicore CPU only has a positive effect on IO-intensive computing, and when there is at least one CPU-intensive thread present, the multi-threading efficiency can be greatly reduced by the Gil.

How to avoid being affected by the Gil

Say so much, if not to say the solution is just a science post, and then eggs. Gil is so rotten, is there any way to get around it? Let's take a look at some of the ready-made solutions.

Replace thread with multiprocess

The emergence of the Multiprocess library is largely to compensate for the inefficiency of the thread library because of the Gil. It completely replicates a set of thread-provided interfaces for easy migration. The only difference is that it uses multiple processes rather than multithreading. Each process has its own independent Gil, so there will not be a process of Gil scramble.

Of course multiprocess is not a panacea. Its introduction increases the difficulty of data communication and synchronization between threads when the program is implemented. Take the counter to give an example, if we want multiple threads to accumulate the same variable, for thread, declare a global variable, with thread. Lock's context wraps around three lines and it's done. Multiprocess because the process can not see the other side of the data, only through the main thread to declare a queue,put get or share memory method. This extra cost of implementation makes it even more painful to encode a very painful multithreaded program. Specific difficulties in which interested readers can expand to read this article

with other parsers

It was also mentioned that since Gil is only a product of CPython, is it better to have other parsers? Yes, parsers like Jpython and IronPython (C #) do not need Gil's help because they implement language features. However, with the use of java/c# for parser implementations, they also lost the opportunity to take advantage of the many useful features of the community's C-language modules. So these parsers have always been relatively small. After all, the function and performance of everyone in the Done is better than perfect early days will choose the former.

So it's hopeless?

Of course the Python community is also working very hard to constantly improve the Gil, even trying to remove the Gil. And in each iteration has made a lot of progress. Interested readers can expand the reading of this slide

Another improvement reworking the GIL

-Change the switching granularity from opcode count to time slice based counting

-Avoid the last thread that freed the Gil lock to be dispatched again immediately

-New Thread priority feature (high priority thread can force other threads to release the Gil Lock held)

Summarize

The Python Gil is actually the product of the tradeoff between function and performance, which is especially reasonable and has more difficult to change objective factors. From the analysis of this part, we can do some simple sums:

-Due to the existence of Gil, only the IO bound scenario with multiple threads will get better performance

-If the parallel computing performance of the program can consider the core part of the C module, or simply in other languages to implement

-Gil will continue to exist for a longer period of time, but it will be continuously improved

Finish

Talk about Python threads & GIL

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More