Read Catalogue
- An introduction
- Two Gil Introduction
- Three Gil and lock
- Four Gil and multithreading
- Five multi-threading performance testing
An introduction
" " definition: In CPython, the global interpreter lock, or GIL, was a mutex that prevents multiple native threads from executing Py Thon Bytecodes at once. This lock is necessary mainly because CPython ' s memory management are not thread-safe. (However, since the GIL exists, other features has grown to depend on the guarantees that it enforces.) " " conclusion: In the CPython interpreter, multiple threads that are opened under the same process can only have one thread at a time and cannot take advantage of multicore advantages
The first thing to be clear is GIL
not the Python feature, which is a concept introduced when implementing the Python parser (CPython). Just like C + + is a set of language (syntax) standards, but can be compiled into executable code with different compilers. Well-known compilers such as Gcc,intel c++,visual C + +. Python is the same, and the same piece of code can be executed through different Python execution environments such as Cpython,pypy,psyco. Like the Jpython there is no Gil. However, because CPython is the default Python execution environment for most environments. So in many people's concept CPython is Python, also take it for granted that the GIL
Python language defects. So let's be clear here: Gil is not a python feature, Python can be completely independent of the Gil
This article thoroughly analyzes the Gil's effect on Python multithreading, and it is highly recommended to look at: http://www.dabeaz.com/python/UnderstandingGIL.pdf
Two Gil Introduction
Gil is the essence of a mutex, since it is a mutex, all the nature of the mutex is the same, all the concurrent operation into serial, in order to control the same time shared data can only be modified by a task, and thus ensure data security.
One thing is certain: to protect the security of different data, you should add a different lock.
To understand the Gil, first make a point: each time you execute a python program, you create a separate process. For example, Python Test.py,python Aaa.py,python bbb.py will produce 3 different Python processes
verifying that Python test.py only produces a single process
In a python process, not only the main thread of the test.py or other threads opened by the thread, but also the interpreter-level thread of the interpreter-enabled garbage collection, in short, all threads are running within this process, without a doubt
# 1 All data is shared, where the code as a data is shared by all threads (all code for test.py and all code for the CPython interpreter) For example: test.py defines a function work (code content), where all threads in the process can access the code of the business, so we can open three threads and target points to that code, which means it can be executed. #2 Threads of the task, all need to use the task code as a parameter to the interpreter code to execute, that is, all the thread to run their own tasks, the first thing to do is to have access to the interpreter code.
Comprehensive:
If multiple threads are target=work, then the execution process is
Multiple lines enters upgradeable access to the interpreter's code, that is, get execute permission, and then give the target code to the interpreter code to execute
The code of the interpreter is shared by all threads, so the garbage collection thread can also access the interpreter's code to execute, which leads to a problem: for the same data 100, it is possible that thread 1 executes the x=100 while garbage collection performs the recovery of 100 operations, there is no clever way to solve this problem , is to lock processing, such as Gil, to ensure that the Python interpreter can only execute one task at a time code
Three Gil and lock
The Gil protects the data at the interpreter level and protects the user's own data by locking them up, such as
Four Gil and multithreading
With Gil's presence, at the same moment only one thread in the same process is executed
Heard here, some students immediately questioned: The process can take advantage of multicore, but the overhead, and Python's multithreaded overhead, but can not take advantage of multicore advantage, that is, Python is useless, PHP is the most awesome language?
Don't worry, Mom's not finished yet.
To solve this problem, we need to agree on several points:
# 1. Is the CPU used for computing, or is it used for I/O? #2. Multi-CPU, which means that multiple cores can be computed in parallel, so multicore boosts compute performance #3. Once each CPU encounters I/O blocking, it still needs to wait, so check i/ O operation is of no use
A worker is equivalent to the CPU, at this time the calculation is equivalent to workers in the work, I/O blocking is equivalent to work for workers to provide the necessary raw materials, workers work in the process if there is no raw materials, the workers need to work to stop the process until the arrival of raw materials.
If your factory is doing most of the tasks of preparing raw materials (I/O intensive), then you have more workers, the meaning is not enough, it is not as much as a person, in the course of materials to let workers to do other work,
Conversely, if your plant has a full range of raw materials, the more workers it is, the more efficient it is.
Conclusion:
For computing, the more CPU, the better, but for I/O, no more CPU is useless
Of course, to run a program, with the increase in CPU performance will certainly be improved (regardless of the increase in size, there is always improved), this is because a program is basically not pure computing or pure I/O, so we can only compare to see whether a program is computationally intensive or I/o-intensive, Further analysis of the Python multithreading in the end there is no useful
# Analysis: we have four tasks to deal with, the processing method must be to play a concurrency effect, the solution can be: Scenario One: Open four process Scenario two: a process, open four threads # If the four tasks are computationally intensive, no multicore to parallel computing, the scenario increases the cost of creating a process, and if the four tasks are I/o intensive, the cost of the scenario one creation process is large, and the process is not much faster than the thread, the scheme wins # Multi-core scenario, analysis results: If four tasks are computationally intensive, multicore means parallel computing, in Python a process in which only one thread executes at the same time does not use multicore, scheme one wins if four tasks are I /O intensive, no more cores can not solve i/# Conclusion: Now the computer is basically multicore, Python for compute-intensive tasks open multi-threading efficiency does not bring much performance improvement, It's not even as serial (without a lot of switching), but there's a significant increase in efficiency for IO-intensive tasks.
Five multi-threading performance testing
fromMultiprocessingImportProcess fromThreadingImportThreadImportOs,timedefWork (): Res=0 forIinchRange (100000000): Res*=Iif __name__=='__main__': L=[] Print(Os.cpu_count ())#This machine is 4 coresstart=time.time () forIinchRange (4): P=process (Target=work)#time-consuming 5sP=thread (Target=work)#time-Consuming 18sl.append (P) p.start () forPinchl:p.join () Stop=time.time ()Print('run time is%s'% (Stop-start))
computationally Intensive: high-efficiency multi-process
fromMultiprocessingImportProcess fromThreadingImportThreadImportThreadingImportOs,timedefWork (): Time.sleep (2) Print('===>')if __name__=='__main__': L=[] Print(Os.cpu_count ())#This machine is 4 coresstart=time.time () forIinchRange (400): #p=process (target=work) #耗时12s多, most of the time spent on the creation processP=thread (Target=work)#time-consuming 2sl.append (P) p.start () forPinchl:p.join () Stop=time.time ()Print('run time is%s'% (Stop-start))
I/O intensive: high-efficiency multithreading
Application:
Multithreading for IO-intensive, such as sockets, crawlers, web
Multi-process for computational-intensive, such as financial analysis
Python GIL (Global interpreter Lock)