A quick introduction to Python GIL, PythonGIL

Source: Internet
Author: User

A quick introduction to Python GIL, PythonGIL

When I was new to Python, I often heard the word GIL. I found that this word is often equivalent to Python's ability to efficiently implement multiple threads. Based on the research attitude of understanding GIL not only, but also, the blogger collected various materials and spent several hours in a week to understand GIL in depth, this article also aims to help readers better and objectively understand GIL.

What is GIL?

The first thing to note is that GIL is not a Python feature, but a concept introduced when implementing the Python Parser (CPython. Just like C ++ is a set of language (syntax) standards, but different compilers can be used to compile executable code. Well-known compilers such as GCC, intel c ++, and Visual C ++. The same is true for Python. the same piece of code can be executed in different Python execution environments, such as CPython, PyPy, and Psyco. Like JPython, there is no GIL. However, CPython is the default Python execution environment in most environments. Therefore, CPython is Python in many concepts, and GIL is taken for granted as a defect in the Python language. Therefore, we need to make it clear that GIL is not a feature of Python, and Python is completely independent of GIL.

So what is GIL in CPython implementation? GIL stands for Global Interpreter Lock. To avoid misleading, let's take a look at the official explanation:

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)

Okay, does it look terrible? A Mutex that prevents concurrent execution of machine code by multiple threads is a BUG-like global lock at first glance! Don't worry. Let's analyze it slowly.

Why GIL?

Due to physical limitations, the core frequency competitions of various CPU manufacturers have been replaced by multiple cores. To make better use of the performance of multi-core processors, there is a multi-threaded programming method, which brings about the difficulty of data consistency and State synchronization between threads. Even the Cache inside the CPU is no exception. to effectively solve the problem of data synchronization between multiple caches, each vendor has spent a lot of time and inevitably brought about a certain performance loss.

Of course, Python cannot be escaped. To use multiple cores, Python began to support multithreading. The simplest method for data integrity and state synchronization between multiple threads is locking. So with GIL, the super lock, and when more and more code library developers accept this setting, they begin to rely heavily on this feature (that is, the internal object of python is thread-safe by default, you do not need to consider extra memory locks and synchronization operations when implementing these functions ).

Slowly, this implementation method is found to be cool and inefficient. However, when we try to split and remove GIL, we find that a large number of library code developers have been heavily dependent on GIL and it is very difficult to remove it. How hard is it? Make an analogy, for small projects like MySQL, it took nearly five years to split the large lock in the Buffer Pool Mutex into various small locks, from 5.5 to 5.6, and then to more than 5.7 major editions, and continues. Why is it so difficult for a product backed by a company with a fixed development team to develop MySQL? What's more, a team with a high degree of community for core development and code contributors like Python?

Therefore, the existence of GIL is more of a historical reason. If it is pushed back, the problem of multithreading still needs to be addressed, but at least it will be more elegant than the current GIL method.

Influence of GIL

According to the previous introduction and official definitions, GIL is undoubtedly a global exclusive lock. There is no doubt that the existence of global locks will have a significant impact on the efficiency of multithreading. Python is even a single-threaded program.
The reader will say that the global lock will not be less efficient as long as it is released. As long as GIL can be released during time-consuming IO operations, it can still improve the running efficiency. Or it will not be less efficient than a single thread. In theory, but in reality? Python is worse than you think.

Next we will compare the efficiency of Python in multithreading and single thread. The test method is very simple. It is a counter function with 0.1 billion cycles. One thread is executed twice, and the other is multi-threaded. Finally, compare the total execution time. The test environment is dual-core Mac pro. Note: to reduce the impact of performance loss of the thread library on the test results, the single-threaded Code also uses the thread. Only two sequential executions are performed to simulate a single thread.

Single-threaded sequential execution (single_thread.py)

#! /usr/bin/python from threading import Threadimport time def my_counter(): i = 0 for _ in range(100000000):  i = i + 1 return True def main(): thread_array = {} start_time = time.time() for tid in range(2):  t = Thread(target=my_counter)  t.start()  t.join() end_time = time.time() print("Total time: {}".format(end_time - start_time)) if __name__ == '__main__': main()

Two concurrent threads simultaneously executed (multi_thread.py)

#! /usr/bin/python from threading import Threadimport time def my_counter(): i = 0 for _ in range(100000000):  i = i + 1 return True def main(): thread_array = {} start_time = time.time() for tid in range(2):  t = Thread(target=my_counter)  t.start()  thread_array[tid] = t for i in range(2):  thread_array[i].join() end_time = time.time() print("Total time: {}".format(end_time - start_time)) if __name__ == '__main__': main()

Is the test result

It can be seen that python is 45% slower than a single thread in the case of multiple threads. According to the previous analysis, even if a GIL global lock exists, the serialization of multithreading should have the same efficiency as that of a single thread. So how can there be such a bad result?

Let's use the implementation principle of GIL to analyze the causes.

Current GIL design defects

Pcode-Based Scheduling

According to the idea of the Python community, the thread scheduling of the operating system itself is very mature and stable, and there is no need to do it on its own. Therefore, the Python thread is a pthread in C language and is scheduled through the Operating System Scheduling Algorithm (for example, linux is CFS ). To enable each thread to use the average CPU time, python calculates the number of executed micro-code and forces GIL to be released after a certain threshold is reached. This will also trigger a thread scheduling of the Operating System (of course, whether context switching is actually determined by the operating system ).

Pseudocode

while True: acquire GIL for i in 1000:  do something release GIL /* Give Operating System a chance to do thread scheduling */

This mode has only one CPU core. When a thread is invoked, GIL can be obtained successfully (because thread scheduling is triggered only when GIL is released ). However, when the CPU has multiple cores, the problem arises. The pseudocode shows that there is almost no gap between release GIL and acquire GIL. Therefore, when other threads on other cores are awakened, the main thread has again acquired GIL in most cases. At this time, the thread that is awakened and executed can only waste CPU time in vain, watching the other thread execute happily with GIL. After the switching time is reached, the system enters the waiting for scheduling status and is awakened again. Then, the system waits for a vicious circle.

PS: of course, this implementation method is primitive and ugly. In each version of Python, the interaction between GIL and thread scheduling is gradually improved. For example, first try to hold GIL in the thread context switch and release GIL while I/O is waiting. But what cannot be changed is that the existence of GIL makes the expensive operation scheduled by the operating system thread more luxurious.

About GIL Extension

To intuitively understand the performance impact of GIL on multithreading, a test result diagram is used here (SEE ). The figure shows the execution of two threads on the dual-core CPU. Both threads are CPU-intensive computing threads. The green part indicates that the thread is running and useful computing is being executed. The red part indicates that the thread is scheduled to wake up, but the GIL cannot be obtained, resulting in a wait time for effective calculation.

As shown in the figure, the existence of GIL results in the failure of concurrent processing of multi-core CPUs.

Can I benefit from multithreading in Python I/O-intensive threads? Let's take a look at the following test results. Color indicates the meaning and consistency. The white part indicates that the IO thread is waiting. It can be seen that when the IO thread receives the data packet and causes the terminal to switch, it still cannot obtain the GIL lock due to the existence of a CPU-intensive thread, thus endless loop waiting.

In a simple summary, the multiple threads of Python are on multi-core CPUs and only have a positive effect on IO-intensive computing. when at least one CPU-intensive thread exists, the efficiency of multithreading will be greatly reduced by GIL.

How to avoid the impact of GIL

If we talk so much about it, if we don't talk about the solution, it's just a popular post. Is there any way to bypass GIL if it is so bad? Let's take a look at the ready-made solutions.

Replacing Thread with multiprocess

The emergence of the multiprocess library is largely to make up for the defect that the thread library is inefficient due to GIL. It completely copies a set of interfaces provided by the thread to facilitate migration. The only difference is that it uses multi-process instead of multi-thread. Each process has its own independent GIL, so there will be no GIL competition between processes.

Of course, multiprocess is not a panacea. Its introduction will increase the difficulty of data communication and synchronization between threads during program implementation. Take the counter for example. If we want multiple threads to accumulate the same variable, for the thread, declare a global variable and wrap it in the context of thread. Lock. Because the data of the other party cannot be seen between processes, multiprocess can only declare a Queue in the main thread, put and then get or use the sharemory method. This extra implementation cost makes it even more painful to encode a very painful multi-threaded program. Readers who are interested in specific difficulties can expand to read this article.

Use another parser

As mentioned before, since GIL is only a product of CPython, is it better for other Resolvers? Yes, because of the language features, Parser such as JPython and IronPython do not need the help of GIL. However, because Java/C # is used for parser implementation, they also lose the opportunity to take advantage of the useful features of many C language modules in the community. Therefore, these Resolvers have always been relatively small. After all, the former and Doneisbetterthanperfect will be selected in the early stages.

So it's not saved?

Of course, the Python community is constantly striving to improve GIL, or even try to remove GIL. And has made a lot of progress in each small version. Interested readers can expand to read this Slide

Another improvement is ReworkingtheGIL.

-Change the switch granularity from opcode-based count to time slice-based count
-Avoid the threads that last released the GIL lock from being immediately scheduled again.
-Added the thread priority function (a high-priority thread can force other threads to release the GIL lock held)

Summary

PythonGIL is actually the product of the trade-off between functions and performance. It is especially rational and has objective factors that are difficult to change. From the analysis of this score, we can make the following simple summary:

·Due to the existence of GIL, only multi-threading in IOBound scenarios can achieve better performance.
·For programs with high parallel computing performance, consider converting the core part into a C Module, or simply using other languages.
·GIL will continue to exist for a long period of time, but it will be continuously improved.

The above is all about Python GIL in this article. I hope it will be helpful to you. If you are interested, you can continue to refer to other related topics on this site. If you have any shortcomings, please leave a message. Thank you for your support!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.