Python multi-threading, multi-process, and Gil

Last Update:2017-03-02 Source: Internet

Author: User

Tags mutex

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Tag: Sang-so ROM Force design concept example OBA thread priority sync

Multithreading uses the threading module to create a thread to pass in a function

This approach is most basic, which is to call the constructor of the thread class in threading, then specify the parameter Target=func, and then call the start () method with the instance of the returned thread, that is, start running the thread, the thread will execute the function func, of course, If Func requires arguments, you can pass in the parameter args= (...) in the thread's constructor. The sample code is as follows

import threading#用于线程执行的函数def counter(n): cnt = 0; for i in xrange(n): for j in xrange(i): cnt += j; print cnt;if __name__ == ‘__main__‘: #初始化一个线程对象，传入函数counter，及其参数1000 th = threading.Thread(target=counter, args=(1000,)); #启动线程 th.start(); #主线程阻塞等待子线程结束 th.join();

Passing in a callable object

Many of the Python objects are what we call callable, which is any object that can be invoked through the function operator "()" (see chapter 14th of Python core programming). The object of the class is also callable, and when called, the object's built-in method call () is called automatically, so this method of creating a new thread is to assign the thread an object that the call method is overloaded with. The sample code is as follows:

Import threading#可调用的类ClassCallable(object):Def__init__(Self, Func, args): Self.func = func; Self.args = args;def __call__ (self): Apply ( Self.func, Self.args);  #用于线程执行的函数 def counter (n): cnt = 0; for i in xrange (n): for J in xrange (i): cnt + = j; print cnt; if __name__ =  ' __main__ ': # Initializes a thread object, passes in the callable callable object, and initializes the object th = Threading with the function counter and its parameter 1000. Thread (Target=callable (counter, (1000,)));  #启动线程 Th.start ();  #主线程阻塞等待子线程结束 Th.join ();

The key sentence of this example is apply (Self.func, Self.args); This is done using the function object and its arguments that were passed in at initialization time to make a call

Inherit the thread class

This approach implements custom threading behavior by inheriting the thread class and overloading its run method, as shown in the following example code:

Import threading, Time, randomDefCounter(): CNT =0;For IIn Xrange (10000):For JIn Xrange (i): cnt + = j;ClassSubthread(Threading. Thread):def __init__ def run 0; while i <  4: print self.name,< Span class= "hljs-string" > ' counting...\n '; Counter (); print self.name, ' finish\n '; i + = 1; if __name__ =  ' __main__ ': th = subthread (  Thread-1 '); Th.start (); Th.join (); print  ' all done ';

Using the multiprocessing creation process to pass in a function

The process is created in exactly the same way as the thread, except to threading. The thread is replaced with multiprocessing.process. The multiprocessing module tries to maintain consistency with the threading module on the method name, and the sample code can refer to the thread section above. This gives only the first way to use the function

import multiprocessing, timedef run(): i = 0; while i<10000: print ‘running‘; time.sleep(2); i += 1;if __name__ == ‘__main__‘: p = multiprocessing.Process(target=run); p.start(); #p.join(); print p.pid; print ‘master gone‘;

Create a process pool

The module also allows you to create a set of processes at a time before assigning them tasks. Detailed information can refer to the manual, this part of the study is not much, dare not write.

pool = multiprocessing.Pool(processes=4)pool.apply_async(func, args...)

Benefits of using processes

Fully parallel, no Gil limit, can take full advantage of multi-CPU multi-core environment, can accept the Linux signal, will see later, this function is very useful

GIL

Python multithreading has a nasty limitation, the Global Interpreter lock (interpreter lock), which means that only one thread can use the interpreter at any one time, and running multiple programs with a single CPU means that everyone is wheeled, which is called "Concurrency", not "parallel".

The explanation in the manual is to ensure the correctness of the object model! The trouble with this lock is that if there is a compute-intensive thread that takes up the CPU, the other threads have to wait ...., imagine your multiple threads have such a thread, much more tragic, multi-line Chengsheng is made into serial; Of course, this module is not useless,

The manual also says: When used for IO-intensive tasks, the thread releases the interpreter during IO, so that other threads have the opportunity to use the interpreter! So whether to use this module requires consideration of the type of task being faced.

What's Gil?

The first thing to be clear is that the Gil is not a Python feature, it is a concept introduced when implementing the Python parser (CPython).

is like C + + is a set of language (syntax) standards, but can be compiled into executable code with different compilers. Well-known compilers such as Gcc,intel c++,visual C + +. The same is true for

python, where the same code can be executed through different Python execution environments, such as Cpython,pypy,psyco. Like the Jpython there is no Gil.

However, because CPython is the default Python execution environment for most environments. So in a lot of people's concept CPython is Python, also take for granted the Gil to the Python language flaw.

So let's be clear here: Gil is not a python feature, Python can be completely independent of Gil

So what is the Gil in the CPython implementation? Gil full name Global interpreter lock to avoid misleading, let's take a look at the official explanation:

in CPython, the global interpreter lock, or GIL,  is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython ' s memory management is not thread-safe. (However, since the GIL exists, other features has grown to depend on the guarantees that it enforces.)

Okay, does it look bad? A mutex that prevents multi-threaded concurrent execution of machine code, at first glance, is a bug-like global lock! Don't worry, we are analyzing slowly below.

Why would there be Gil?

Due to physical constraints, each CPU vendor's game on the core frequency has been replaced by multicore. In order to make more efficient use of multi-core processor performance, there is a multi-threaded programming, and the resulting is the data consistency between the threads and state synchronization difficulties. Even if the cache inside the CPU is no exception, in order to effectively solve the data synchronization between multiple caches, the manufacturers spent a lot of effort, but also inevitably brought some performance loss.

Python, of course, can not escape, in order to take advantage of multicore, Python began to support multithreading. The simplest way to solve data integrity and state synchronization between multiple threads is to lock them up. So with the Gil this super lock, and when more and more code base developers accept this setting, they start to rely heavily on this feature (that is, the default Python internal objects are thread-safe, without having to consider additional memory locks and synchronous operations when implemented).

Slowly this realization was found to be egg-sore and inefficient. But when you try to split and remove the Gil, it's hard to get rid of a lot of library code developers who are heavily dependent on Gil. How hard is that? To make an analogy, a "small project" such as MySQL, in order to split the buffer Pool mutex this large lock into small locks also took from 5.5 to 5.6 to more than 5.7 large version for nearly 5 years, and continues. What's so hard about MySQL, which is backed by a company and has a fixed development team, not to mention the highly community-based team of core development and code contributors like Python?

So simply saying that Gil's existence is more of a historical reason. Multi-threaded problems still have to be faced if pushed back, but at least it will be more elegant than the current Gil.

The impact of the Gil

Judging from the above introduction and the official definition, Gil is undoubtedly a global exclusive lock. There is no doubt that the presence of a global lock can have a small impact on the efficiency of multithreading. Even Python is almost equal to a single-threaded program. Then the reader will say that the global lock as long as the release of the diligent efficiency is not bad AH. As long as it takes time-consuming IO operations to release the Gil, it can also improve operational efficiency. Or no worse than the efficiency of a single thread. Theoretically so, and in fact? Python is worse than you think.

Below we compare the efficiency of Python in multi-threaded and single-threaded. The test method is simple, a counter function that loops 100 million times. One executes two times through a single thread, one multithreaded execution. Finally, the total execution time is compared. The test environment is dual-core Mac Pro. Note: In order to reduce the impact of the performance loss of the line libraries itself on the test results, the single threaded code also uses threads. Just sequential execution two times, simulating a single thread

Sequential execution of single-threaded (single_thread.py)

#! /usr/bin/pythonFrom threadingImport ThreadImport timedef my_counter (): i = 0 for _ in range ( 100000000): i = i + 1 return True< Span class= "hljs-function" >def main  (): Thread_array = {} start_time = Time.time () for tid in range (2): T = Thread (target=my_counter) T.start () t.join () End_time = Time.time () print ( "total time: {}". Format (End_time-start_time)) if __name__ =  ' __main__ ': Main ()

Simultaneous execution of two concurrent threads (multi_thread.py)

#! /usr/bin/pythonFrom threadingImport ThreadImport timeDefMy_counter(): i =0for _ in range (100000000): i = i + Span class= "Hljs-number" >1 return true def main  (): Thread_array = {} start_time = Time.time () for tid in range (2): T = Thread (target=my_counter) T.start () Thread_array[tid] = t Span class= "Hljs-keyword" >for i in range (2): Thread_array[i]. Join () End_time = Time.time () print ( "total time: {}". Format (end_time-start_time)) if __name__ = =  ' __main__ ': Main ()

You can see that Python is 45% slower than a single thread in multi-threaded situations. According to the previous analysis, even if there is a Gil global lock exist, the serialization of multithreading should be the same as the single-line threads efficiency. So how could it have been such a bad result?

Let's analyze the reason for this by the Gil implementation principle.

Current Gil design flaws based on pcode number of scheduling methods

According to the idea of the Python community, the operating system itself thread scheduling is very mature and stable, there is no need to do a set. So the python thread is a pthread of the C language and is dispatched via the operating system scheduling algorithm (for example, Linux is CFS). To allow each thread to take advantage of CPU time on average, Python calculates the number of micro-codes currently executing, forcing the Gil to be freed after a certain threshold is reached. At this point, the operating system's thread scheduling is also triggered (although the actual context switch is determined by the operating system)

Pseudo code

while True:    acquire GIL    for i in 1000: do something release GIL /* Give Operating System a chance to do thread scheduling */

This mode has no problem in the case of only one CPU core. Any thread that is aroused can successfully get to the Gil (because only the Gil is freed to cause thread scheduling). But when the CPU has multiple cores, the problem comes. From the pseudo-code you can see that there is almost no gap between the release Gil and the acquire Gil. So when other threads on other cores are awakened, most of the time the main thread has acquired the Gil again. The thread that was awakened at this time can only waste CPU time in vain, watching another thread carry the Gil happily. Then reach the switching time after entering the state to be scheduled, and then be awakened, and then wait, in order to this cycle of cyclic.

PS: Of course this implementation is primitive and ugly, and each version of Python is gradually improving the interaction between Gil and thread scheduling. For example, try to hold the Gil in a thread context switch, release the Gil while waiting on Io, and so on. But what cannot be changed is that the existence of the Gil makes the expensive operation of operating system thread scheduling more extravagant. Extended reading on the Gil influence

To intuitively understand the performance impact of the Gil for multithreading, a test result diagram is directly borrowed here (see). The figure shows the execution of two threads on a dual-core CPU. Two threads are CPU-intensive operations threads. The green section indicates that the thread is running, and in performing a useful calculation, the red part is the thread that is scheduled to wake up, but cannot get the time that the Gil has caused a valid operation to wait.

The simple summary is that Python's multithreading on multicore CPUs only has a positive effect on IO-intensive computations, and that when there is at least one CPU-intensive thread present, multi-threading efficiency can be greatly reduced by the Gil

How to avoid being affected by the Gil

Say so much, if not to say the solution is just a science post, and then eggs. Gil is so rotten, is there any way to get around it? Let's take a look at some of the ready-made solutions.

Replace thread with multiprocessing

The emergence of the multiprocessing library is largely to compensate for the inefficiency of the thread library because of the Gil. It completely replicates a set of thread-provided interfaces for easy migration. The only difference is that it uses multiple processes rather than multithreading. Each process has its own independent Gil, so there will not be a process of Gil scramble.

Of course multiprocessing is not a panacea. Its introduction increases the difficulty of data communication and synchronization between threads when the program is implemented. Take the counter to give an example, if we want multiple threads to accumulate the same variable, for thread, declare a global variable, with thread. Lock's context wraps around three lines and it's done. Multiprocessing because the process can not see the other side of the data, only through the main thread to declare a queue,put get or share memory method. This extra cost of implementation makes it even more painful to encode a very painful multithreaded program. Specific difficulties in which interested readers can expand to read this article

with other parsers

It was also mentioned that since Gil is only a product of CPython, is it better to have other parsers? Yes, parsers like Jpython and IronPython don't need Gil's help because they implement language features. However, with the use of java/c# for parser implementations, they also lost the opportunity to take advantage of the many useful features of the community's C-language modules. So these parsers have always been relatively small. After all, the function and performance of everyone in the beginning will choose the former, done is better than perfect

So it's hopeless?

Of course the Python community is also working very hard to constantly improve the Gil, even trying to remove the Gil. And in each iteration has made a lot of progress. Interested readers can extend the reading of this slide another improvement reworking the Gil-change the granularity of the switch from opcode count based to time slice count-avoid the last time the thread of the GIL lock is dispatched again immediately- New Thread priority feature (high-priority threads can force other threads to release the Gil Lock held)

Summarize

The Python Gil is actually the product of the tradeoff between function and performance, which is especially reasonable and has more difficult to change objective factors. From this analysis, we can do some simple sums:-because the Gil exists, only the IO bound scenario will get better performance of multiple threads-if the parallel computing performance of the program can consider the core part of the C module, or simply in other languages to implement- The Gil will continue to exist for a longer period of time, but it will be continuously improved

Python multi-threading, multi-process, and Gil

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More