CPython-based GIL (Global interpreter Lock)

Last Update:2017-12-05 Source: Internet

Author: User

Tags mutex

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

An introduction

Definition: In CPython, the global interpreter lock, or GIL, was a mutex that prevents multiple native threads from executing pytho N Bytecodes at once. This lock is necessary mainly because CPython ' s memory management are not thread-safe. (However, since the GIL exists, other features has grown to depend on the guarantees that it enforces.) Conclusion: in the CPython interpreter, multiple threads that are opened under the same process can only have one thread at a time and cannot take advantage of multicore advantages

The first thing to be clear is GIL not the Python feature , which is a concept introduced when implementing the Python parser (CPython). Just like C + + is a set of language (syntax) standards, but can be compiled into executable code with different compilers.

Well-known compilers such as Gcc,intel c++,visual C + +. Like Python, the same piece of code can be executed through different Python execution environments such as Cpython,pypy,psyco, where the Jpython does not have a Gil.

However, because CPython is the default Python execution environment for most environments. So in many people's concept CPython is Python, also take it for granted that the GIL Python language defects.

So let's be clear here:Gil is not a python feature, Python can be completely independent of the Gil

This article thoroughly analyzes the Gil's effect on Python multithreading, and it is highly recommended to look at: http://www.dabeaz.com/python/UnderstandingGIL.pdf (be recommend by Egon Teacher)

Two Gil Introduction

Gil is the essence of a mutex, since it is a mutex, all the nature of the mutex is the same, all the concurrent operation into serial, in order to control the same time shared data can only be modified by a task, and thus ensure data security.

One thing is certain: to protect the security of different data, you should add a different lock.

To understand the Gil, first make a point: each time you execute a python program, you create a separate process. For example, Python Test.py,python Aaa.py,python bbb.py will produce 3 different Python processes

" " #验证python test.py only produces a process #test.py content import os,timeprint (Os.getpid ()) Time.sleep (+) " "  # under Windows tasklist | findstr python # PS aux |grep python under Linux

In a python process, not only the main thread of the test.py or other threads opened by the thread, but also the interpreter-level thread of the interpreter-enabled garbage collection, in short, all threads are running within this process, without a doubt

# 1 All data is shared, where the code as a data is shared by all threads (all code for test.py and all code for the CPython interpreter) For example: test.py defines a function work (code content), where all threads in the process can access the code of the business, so we can open three threads and target points to that code, which means it can be executed.  #2 Threads of the task, all need to use the task code as a parameter to the interpreter code to execute, that is, all the thread to run their own tasks, the first thing to do is to have access to the interpreter code.

Comprehensive:

If multiple threads are target=work, then the execution process is

Multiple lines enters upgradeable access to the interpreter's code, that is, get execute permission, and then give the target code to the interpreter code to execute

The code of the interpreter is shared by all threads, so the garbage collection thread can also access the interpreter's code to execute, which leads to a problem: for the same data 100, it is possible that thread 1 executes the x=100 while garbage collection performs the recovery of 100 operations, there is no clever way to solve this problem , is to lock processing, such as Gil, to ensure that the Python interpreter can only execute one task at a time code

Three Gil and lock

The Gil protects the data at the interpreter level and protects the user's own data by locking them up, such as

Four Gil and multithreading

With Gil's presence, at the same moment only one thread in the same process is executed

Heard here, some students immediately questioned: The process can take advantage of multicore, but the overhead, and Python's multithreaded overhead, but can not take advantage of multicore advantage, that is, Python is useless, PHP is the most awesome language?

Don't worry, Mom's not finished yet.

To solve this problem, we need to agree on several points:

# 1. Is the CPU used for computing, or is it used for I/O?  #2. Multi-CPU means that multiple cores can be computed in parallel, so multicore boosts compute performance #

A worker is equivalent to the CPU, at this time the calculation is equivalent to workers in the work, I/O blocking is equivalent to work for workers to provide the necessary raw materials, workers work in the process if there is no raw materials, the workers need to work to stop the process until the arrival of raw materials.

If your factory is doing most of the tasks of preparing raw materials (I/O intensive), then you have more workers, the meaning is not enough, it is not as much as a person, in the course of materials to let workers to do other work,

Conversely, if your plant has a full range of raw materials, the more workers it is, the more efficient it is.

Conclusion:

　　For computing, the more CPU, the better, but for I/O, no more CPU is useless

Of course, to run a program, with the increase in CPU performance will certainly be improved (regardless of the increase in size, there is always improved), this is because a program is basically not pure computing or pure I/O, so we can only compare to see whether a program is computationally intensive or I/o-intensive, Further analysis of the Python multithreading in the end there is no useful

# Analysis: we have four tasks to deal with, the processing method must be to play a concurrency effect, the solution can be: Scenario One: Open four process Scenario two: a process, open four threads #  If the four tasks are computationally intensive, no multicore to parallel computing, the scenario increases the cost of creating a process, and if the four tasks are I/o intensive, the cost of the scenario one creation process is large, and the process is not much faster than the thread, the scheme wins  # Multi-core scenario, analysis results:  If four tasks are computationally intensive, multicore means parallel computing, in Python a process in which only one thread executes at the same time does not use multicore, scheme one wins if four tasks are I /O intensive, no more cores can not solve i/# Conclusion: Now the computer is basically multicore, Python for compute-intensive task open multi-threading efficiency does not bring much performance improvement, or even better than serial ( Without a lot of switching), however, there is a significant increase in efficiency for IO-intensive tasks.

Five multi-threading performance testing

 fromMultiprocessingImportProcess fromThreadingImportThreadImportOs,timedefWork (): Res=0 forIinchRange (100000000): Res*=Iif __name__=='__main__': L=[]    Print(Os.cpu_count ())#This machine is 4 coresstart=time.time () forIinchRange (4): P=process (Target=work)#time-consuming 5sP=thread (Target=work)#time-Consuming 18sl.append (P) p.start () forPinchl:p.join () Stop=time.time ()Print('run time is%s'% (Stop-start))

computationally Intensive: high-efficiency multi-process

 fromMultiprocessingImportProcess fromThreadingImportThreadImportThreadingImportOs,timedefWork (): Time.sleep (2)    Print('===>')if __name__=='__main__': L=[]    Print(Os.cpu_count ())#This machine is 4 coresstart=time.time () forIinchRange (400):        #p=process (target=work) #耗时12s多, most of the time spent on the creation processP=thread (Target=work)#time-consuming 2sl.append (P) p.start () forPinchl:p.join () Stop=time.time ()Print('run time is%s'% (Stop-start))

I/O intensive: high-efficiency multithreading

Application:

Multithreading for IO-intensive, such as sockets, crawlers, web
Multi-process for computational-intensive, such as financial analysis

Some Content From--egon ' s Blog

CPython-based GIL (Global interpreter Lock)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More