What is the performance of PythonGIL multithreading? In-depth explanation of GIL

Source: Internet
Author: User
When I was new to Python, I often heard the word GIL. I found that this word is often equivalent to Python's ability to efficiently implement multiple threads. Based on the research attitude of understanding GIL not only, but also, the blogger collected various materials and spent several hours in a week to understand GIL in depth, this article also aims to help readers better and objectively understand GIL.

When I was new to Python, I often heard the word GIL. I found that this word is often equivalent to Python's ability to efficiently implement multiple threads. Based on the research attitude of understanding GIL not only, but also, the blogger collected various materials and spent several hours in a week to understand GIL in depth, this article also aims to help readers better and objectively understand GIL.

What is GIL?

The first thing to be clear is:GILIt is not a feature of Python. it is a concept introduced when implementing the Python parser (CPython. Just like C ++ is a set of language (syntax) standards, but different compilers can be used to compile executable code. Well-known compilers such as GCC, intel c ++, and Visual C ++. The same is true for Python. the same piece of code can be executed in different Python execution environments, such as CPython, PyPy, and Psyco. Like JPython, there is no GIL. However, CPython is the default Python execution environment in most environments. Therefore, CPython is Python in many people's concepts.GILIt is attributed to the Python language defect. Therefore, we need to make it clear that GIL is not a feature of Python, and Python is completely independent of GIL.

So what is GIL in CPython implementation? GIL full nameGlobal Interpreter LockTo avoid misleading, let's take a look at the official explanation:

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. this lock is necessary mainly because CPython's memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces .)

Okay, does it look terrible? A Mutex that prevents concurrent execution of machine code by multiple threads is a BUG-like global lock at first glance! Don't worry. let's analyze it slowly.

Why GIL?

Due to physical limitations, the core frequency competitions of various CPU manufacturers have been replaced by multiple cores. To make better use of the performance of multi-core processors, there is a multi-threaded programming method, which brings about the difficulty of data consistency and state synchronization between threads. Even the Cache inside the CPU is no exception. to effectively solve the problem of data synchronization between multiple caches, each vendor has spent a lot of time and inevitably brought about a certain performance loss.

Of course, Python cannot be escaped. to use multiple cores, Python began to support multithreading.The simplest method for data integrity and state synchronization between multiple threads is locking.So with GIL, the super lock, and when more and more code library developers accept this setting, they begin to rely heavily on this feature (that is, the internal object of python is thread-safe by default, you do not need to consider extra memory locks and synchronization operations when implementing these functions ).

Slowly, this implementation method is found to be cool and inefficient. However, when we try to split and remove GIL, we find that a large number of library code developers have been heavily dependent on GIL and it is very difficult to remove it. How hard is it? Make an analogy, for small projects like MySQL, it took nearly five years to split the large lock in the Buffer Pool Mutex into various small locks, from 5.5 to 5.6, and then to more than 5.7 Major editions, and continues. Why is it so difficult for a product backed by a company with a fixed development team to develop MySQL? What's more, a team with a high degree of community for core development and code contributors like Python?

Therefore, the existence of GIL is more of a historical reason. If it is pushed back, the problem of multithreading still needs to be addressed, but at least it will be more elegant than the current GIL method.

Influence of GIL

According to the previous introduction and official definitions, GIL is undoubtedly a global exclusive lock. There is no doubt that the existence of global locks will have a significant impact on the efficiency of multithreading. Python is even a single-threaded program. The reader will say that the global lock will not be less efficient as long as it is released. As long as GIL can be released during time-consuming IO operations, it can still improve the running efficiency. Or it will not be less efficient than a single thread. In theory, but in reality? Python is worse than you think.

Next we will compare the efficiency of Python in multithreading and single thread. The test method is very simple. it is a counter function with 0.1 billion cycles. One thread is executed twice, and the other is multi-threaded. Finally, compare the total execution time. The test environment is dual-core Mac pro. Note: to reduce the impact of performance loss of the thread library on the test results, the single-threaded code also uses the thread. Only two sequential executions are performed to simulate a single thread.

Single-threaded sequential execution (single_thread.py)
#! /usr/bin/pythonfrom threading import Threadimport timedef my_counter():    i = 0    for _ in range(100000000):        i = i + 1    return Truedef main():    thread_array = {}    start_time = time.time()    for tid in range(2):        t = Thread(target=my_counter)        t.start()        t.join()    end_time = time.time()    print("Total time: {}".format(end_time - start_time))if name == 'main':    main()
Two concurrent threads simultaneously executed (multi_thread.py)
#! /usr/bin/pythonfrom threading import Threadimport timedef my_counter():    i = 0    for _ in range(100000000):        i = i + 1    return Truedef main():    thread_array = {}    start_time = time.time()    for tid in range(2):        t = Thread(target=my_counter)        t.start()        thread_array[tid] = t    for i in range(2):        thread_array[i].join()    end_time = time.time()    print("Total time: {}".format(end_time - start_time))if name == 'main':    main()

Is the test result

In a simple summary, the multiple threads of Python are on multi-core CPUs and only have a positive effect on IO-intensive computing. when at least one CPU-intensive thread exists, the efficiency of multithreading will be greatly reduced by GIL.

How to avoid the impact of GIL

If we talk so much about it, if we don't talk about the solution, it's just a popular post. Is there any way to bypass GIL if it is so bad? Let's take a look at the ready-made solutions.

Replacing Thread with multiprocessing

The emergence of the multiprocessing Library is largely to make up for the defect that the thread library is inefficient due to GIL. It completely copies a set of interfaces provided by the thread to facilitate migration. The only difference is that it uses multi-process instead of multi-thread. Each process has its own independent GIL, so there will be no GIL competition between processes.

Of course, multiprocessing is not a panacea. Its introduction will increase the difficulty of data communication and synchronization between threads during program implementation. Take the counter for example. if we want multiple threads to accumulate the same variable, for the thread, declare a global variable and wrap it in the context of thread. Lock. Because the data of the other party cannot be seen between processes, multiprocessing can only declare a Queue in the main thread, put and get or use the share memory method. This extra implementation cost makes it even more painful to encode a very painful multi-threaded program. Readers who are interested in specific difficulties can expand to read this article.

Use another parser

As mentioned before, since GIL is only a product of CPython, is it better for other resolvers? Yes, because of the language features, parser such as JPython and IronPython do not need the help of GIL. However, because Java/C # is used for parser implementation, they also lose the opportunity to take advantage of the useful features of many C language modules in the community. Therefore, these resolvers have always been relatively small. After all, you will choose the former in the early stages of functionality and performance,Done is better than perfect.

So it's not saved?

Of course, the Python community is constantly striving to improve GIL, or even try to remove GIL. And has made a lot of progress in each small version. Interested readers can expand to read this Slide another improvement Reworking the GIL-change the switching granularity from opcode-based count to time slice count-to prevent the last thread to release the GIL lock from being immediately scheduled again- added the thread priority function (a high-priority thread can force other threads to release the GIL lock held)

Summary

Python GIL is actually the product of the trade-off between functions and performance. it is especially rational and has objective factors that are difficult to change. From the analysis of this score, we can make the following simple Summary:-because of the existence of GIL, only multi-threading in the I/O Bound scenario can achieve better performance. if you want a program with high parallel computing performance, you can consider turning the core part into a C module, or simply implement it in other languages-GIL will continue to exist for a long period of time, but will continue to improve it.

Reference

Python's hardest problem Official documents about GIL Revisiting thread priorities and the new GIL

What is the performance of Python GIL multithreading? For more information about GIL, see other related articles in the first PHP community!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.