Http://zhuoqiang.me/python-thread-gil-and-ctypes.htmlGIL and Python thread
What is Gil? What impact does it have on our Python program? Let's look at a question first. Run the following Python program. What is the CPU usage?
# Do not imitate in the work, dangerous: def dead_loop (): While true: passdead_loop ()
What is the answer? 100% CPU usage? That's a single core! It also has an antique CPU without hyper-threading. On my dual-core CPU, this endless loop only consumes the workload of one core, that is, it only consumes 50% of the CPU. So how can we make it occupy 100% of the CPU on a dual-core machine? The answer is easy to think of. Just use two threads. Isn't the thread actually sharing CPU computing resources concurrently. Unfortunately, the answer is correct, but it is not that simple. The following program starts an endless thread out of the main thread.
Import threadingdef dead_loop (): While true: pass # A new dead loop thread t = threading. thread (target = dead_loop) T. start () # The main thread also enters the dead loop dead_loop () T. join ()
It should be able to occupy the CPU resources of two cores, but the actual running status remains unchanged, or it only accounts for less than 50% of the CPU. Why? Isn't the python thread a native thread of the operating system? Open System Monitor to check whether there are two threads running the 50% Python process. So why can't these two dead-loop threads fully occupy dual-core CPU resources? In fact, Gil is the Black Hand behind the scenes.
Gil's Myth: pain and happiness
The entire Gil process is global interpreter lock, meaning the global interpreter lock. In the mainstream implementations of cpython in Python, Gil is a genuine global thread lock. When the interpreter interprets and executes any Python code, it must obtain the lock first.
The lock is released during I/O operations. If it is a pure computing program without I/O operations, the interpreter will release the lock every 100 operations, giving other threads the opportunity to execute (This number of times can passSYS. setcheckinterval). Therefore, although cpython's thread library directly encapsulates the native threads of the operating system, the cpython process as a whole only has one thread that obtains Gil at a time, other threads are waiting for Gil release. This explains the above experimental results: Although there are two dead-loop threads and two physical
CPU core, but due to Gil restrictions, the two threads only perform time-sharing switching, and the total CPU usage is slightly lower than 50%.
It seems that python is not powerful. Gil causes cpython to be unable to use the performance of multiple physical cores to accelerate operations. So why is there such a design? I guess it is still a legacy issue. Multi-core CPUs were still sci-fi in the 1990 s. When Guido van rosum created python, he could not think that his language would be used on a CPU with more than 1000 cores one day, A global lock to deal with multi-thread security should be the simplest and economical design in that era. Simple and able to meet the needs, that is, the appropriate design (for the design, it should only be appropriate or not, but not good or bad ). It's strange that the development of hardware is too fast. Moore's Law's bonus to the software industry is coming soon. Short
In less than 20 years, code workers cannot expect that upgrading the CPU alone will make the old software run faster. In the multi-core era, free programming lunch is gone. If the program cannot squeeze the computing performance of each core concurrently, it means it will be eliminated. The same is true for software and languages. What about the countermeasures of python?
Python is easy to handle, so it should not be changed. Gil still exists in the latest Python 3. The reason for not removing this is as follows:
If you want to practice the power of your work, you can play the sword yourself:
Cpython Gil is intended to protect all global interpreters and environment state variables. If Gil is removed, multiple fine-grained locks are required to protect the global states of the interpreter. Or use the lock-free algorithm. In either case, multithreading security is much more difficult than using a single Gil lock. In addition, the modified object still has a 20-Year-Old cpython code tree, and no matter how many third-party extensions are also dependent on Gil. For the python community, this is not the same as a new one.
Even in the private sector, it may not be successful:
A cool once performed cpython for verification, removing Gil and adding more fine-grained locks. However, after actual tests, for a single-threaded program, the performance of this version is greatly reduced, only after a certain number of physical CPUs are used, the performance of Gil is better than that of Gil. No wonder. A single thread does not need any locks. For the single lock management itself, the coarse-grained lock Gil is certainly much faster than the many fine-grained locks managed. Currently, most Python programs are single-threaded. Furthermore, from the perspective of demand, it is not because of the computing performance of Python. Even if multiple cores can be used, the performance cannot be comparable to that of C/C ++. It took a lot of effort
If Gil is removed, most programs are slowed down.
Is it true that a good language like python has abandoned the multi-core era just because it is difficult to change and has little significance? In fact, the most important reason for not making changes is: You don't have to be yourself, and you can do the same!
Others
In addition to Gil, is there a way to make Python live in the multi-core era? Let's go back to the question at the beginning of this article: how can we make this endless Python script occupy 100% of the CPU on a dual-core machine? In fact, the simplest answer should be: run two Python endless loop programs! That is to say, two Python processes that occupy one CPU core are used to achieve this. Indeed, multi-process is also a good way to use multiple CPUs. Only the memory address space between processes is independent, and mutual collaborative communication is much more troublesome than multithreading. Thanks to this, Python has introducedMultiprocessingThis multi-process standard library enables multi-process
Python programming is simplified to a level similar to multithreading, which greatly reduces the embarrassment of Gil over multi-core exploitation.
This is just a method. If you don't want to use a heavyweight solution like multi-process, there is a more thorough solution. Instead, you can use Python instead of C/C ++. Of course, you don't need to do this. You just need to write the key part in C/C ++ as Python extension, and write the other part in Python to make Python belong to Python, c. Generally, computation intensive programs are written in C code and integrated into Python scripts (such as the numpy module) in an extended manner ). In expansion, you can use C to create native threads without locking Gil and fully utilize the computing resources of the CPU. However, writing Python extensions is always complicated. Good news
Python also has another mechanism for interworking with the C Module: ctypes
Use ctypes to bypass Gil
Different from Python extensions, ctypes allows python to directly call the export functions of any c dynamic library. All you have to do is use ctypes to write Python code. The coolest thing is that ctypes releases Gil before calling the C function. Therefore, we can use the ctypes and C dynamic libraries to allow python to make full use of the computing power of the physical kernel. Let's verify it. This time we use C to write an endless loop function.
extern"C"{ void DeadLoop() { while (true); }}
Use the above C code to compile and generate a dynamic libraryLibdead_loop.so(On Windows, yesDead_loop.dll)
Then, we need to use ctypes to load the dynamic library in Python and callDeadloop
from ctypes import *from threading import Threadlib = cdll.LoadLibrary("libdead_loop.so")t = Thread(target=lib.DeadLoop)t.start()lib.DeadLoop()
Now let's look at system monitor. There are two threads running in the python interpreter process, and the dual-core CPU is full. ctypes is really awesome! Note that Gil is released by ctypes before calling the C function. However, the python interpreter locks Gil when executing any piece of Python code. If you use Python code as the callback of the C function, Gil will jump out as long as the python callback method is executed. For example:
extern"C"{ typedef void Callback(); void Call(Callback* callback) { callback(); }}
from ctypes import *from threading import Threaddef dead_loop(): while True: passlib = cdll.LoadLibrary("libcall.so")Callback = CFUNCTYPE(None)callback = Callback(dead_loop)t = Thread(target=lib.Call, args=(callback,))t.start()lib.Call(callback)
Note that this is different from the previous example. This endless loop occurs in Python code (DeadloopC code is only responsible for calling this callback. Run this example and you will find that the CPU usage is still less than 50%. Gil is working again.
In fact, from the above example, we can also see that an application of ctypes is to use python to write automated test cases and use ctypes to directly call the interface of Module C to perform black box testing on this module, ctypes can also test the multi-thread security of the C interface of this module.
Conclusion
Although cpython's thread library encapsulates the native threads of the operating system, multithreading cannot utilize the computing power of multiple CPU cores due to the existence of Gil. Fortunately, Python now has the multiprocessing (multiprocessing) function, and is capable of coping with the challenges of the multi-core era by absorbing the Star Algorithm (C Language extension mechanism) and ctypes, gil cutting is not important anymore, isn't it.