Thread vs. Global interpreter Lock (GIL)

Source: Internet
Author: User
Tags mutex osclass

I. Introduction to Threading

1. What is a thread

Each process has an address space, and there is a control thread by default. If a process is likened to a workshop work process then the thread is an assembly line in the workshop.

A process is simply used to bring resources together (a process is just a resource unit, or a collection of resources), and the thread is the executing unit on the CPU.

The concept of multithreading (that is, multiple control threads) is that multiple control threads exist in one process, and multiple control threads share the process's address space (resources)

It is much more expensive to create a process than a thread. The process is equivalent to building a workshop, and the open thread is equivalent to building a pipeline.

2. The difference between threads and processes

1.Threads share the address space of the process that created it; Processes has their own address space.2.threads has direct access to the data segment of its process; Processes has their own copy of the data segment of the parent process.3.threads can directly communicate with other thre Ads of its process; Processes must use interprocess communication to communicate with sibling processes.4.new threads is easily created; New processes require duplication of the parent process.5.threads can exercise considerable control over Threads of the SA Me process; Processes can only exercise control over child processes.6.changes to the main thread (cancellation, priority change, etc. affect the behavior of the other threads of the process; Changes to the parent process does isn't affect child processes.

Chinese translation:

1. The thread shares the address space of the process that created it, and the process has its own address space. 2. The thread can access the data segment of its process directly, and the process has its own copy of the parent process data segment. 3. The thread can communicate directly with the other threads of the process, and the process must use interprocess communication to communicate with the sibling process. 4. New threads are easy to create; The new process needs to replicate the parent process. 5. Threads can have considerable control over the threads of the same process, and the process can only perform control on the child process. 6. Changes to the main thread (cancellation, priority changes, and so on) may affect the behavior of other threads of the process, and changes to the parent process do not affect the child process.

3, the advantages of multithreading

Multithreading is the same as multi-process, where multiple threads are opened in one process

1) Multithreading share the address space (resource) of a process

2) threads are more lightweight than processes, threads are easier to create revocable than processes, and in many operating systems, creating a line turndown creates a process 10-100 times faster, which is useful when a large number of threads require dynamic and rapid modification

3) If multiple threads are CPU intensive, there is no performance gain, but if there is a lot of computation and a lot of I/O processing, having multiple threads allows these activities to overlap with each other, which speeds up the execution of the program.

4) in a multi-CPU system, in order to maximize the use of multicore, you can open more than one thread, more than the cost of open process is much smaller. (This one does not apply to Python)

Two, Python's concurrent programming multi-Threading

1, Threading Module Introduction

The multiprocessing module completely imitates the interface of the threading module, which has a great similarity in the use level, so it is no longer described in detail

The multiprocessing module is also not very familiar with friends can review the multi-threaded introduction of the essay:

30, the basic theory of the process, concurrency (multiprocessing module): http://www.cnblogs.com/liluning/p/7419677.html

Official documents: https://docs.python.org/3/library/threading.html?highlight=threading# (English good can try the challenge)

2. Two ways to turn on the thread (and the process is identical)

In both ways, we have a way of opening a process that can be reviewed briefly.

1) Way One:

From threading import thread#from  multiprocessing  import  processimport osdef talk ():    print ('%s are Running '%os.getpid ()) If __name__ = = ' __main__ ':    t=thread (Target=talk)    # t=process (target=talk)    T.start ()    print (' Master ', Os.getpid ())

2) mode two:

#开启线程from Threading Import Threadimport osclass MyThread (Thread):    def __init__ (self,name):        super (). __INIT__ ( )        Self.name=name    def run (self):        print (' pid:%s name:[%s]is running '% (Os.getpid (), self.name)) if __name__ = = ' __main__ ':    t=mythread (' Lln ')    t.start ()    print (' main T ', Os.getpid ()) #开启进程from multiprocessing Import Processimport osclass myprocess (Process):    def __init__ (self,name):        super (). __init__ ()        self.name=name    def run (self):        print (' pid:%s name:[%s]is running '% (Os.getpid (), self.name)) if __name__ = = ' __main__ ':    t=myprocess (' lll ')    t.start ()    print (' main p ', Os.getpid ())

3. The difference between opening multiple threads under one process and opening multiple sub-processes under one process

1) Compare speed: (see Hello and Main thread/main process print speed)

From threading import threadfrom multiprocessing import processimport osdef work ():    print (' hello ') if __name__ = = ' __m Ain__ ':    #在主进程下开启线程    t=thread (target=work)    t.start ()    print (' main thread/main process ')    #在主进程下开启子进程    t= Process (target=work)    t.start ()    print (' main thread/main process ')

2) PID difference: (thread and master process are the same, child process differs from main process)

From threading import threadfrom multiprocessing import processimport osdef work ():    print (' My PID: ', Os.getpid ()) if __name__ = = ' __main__ ':    #part1: Open multiple threads under the main process, each thread is the same as the PID of the main process    T1=thread (target=work)    T2=thread ( Target=work)    T1.start ()    t2.start ()    print (' main thread/main process pid: ', Os.getpid ())    #part2: Open multiple processes, Each process has a different PID    p1=process (target=work)    p2=process (target=work) P1.start ()    P2.start ()    Print (' main thread/main process pid: ', Os.getpid ())

3) Whether the data is shared (the thread shares the data with the main process, and the child process simply copies the main process past operations not the same data)

From  Threading Import threadfrom multiprocessing import processdef work ():    global n    n = 1n =  #主进程数据i F __name__ = = ' __main__ ':    # p=process (target=work)    # P.start ()    # p.join ()    # Print (' main ', N) # There is no doubt that the child process P has changed its global N to 99, but only to its own, the view parent process n is still    t=thread (target=work)    T.start ()    t.join ()    Print (' primary ', N) #查看结果为99 because the in-process data is shared among threads in the same process

4. Practice

1) Three tasks, one to receive user input, one to format the user input into uppercase, a formatted result into the file

From threading Import threadmsg = []msg_fort = []def Inp (): When    True:        msg_l = input (' >>: ')        if not msg _l:continue        Msg.append (msg_l) def Fort (): While    True:        if msg:            res = Msg.pop ()            msg_fort.append ( Res.upper ()) def Save ():    with open (' Db.txt ', ' a ') as F: While        True:            if Msg_fort:                f.write ('%s\n '%msg_ Fort.pop ())                F.flush ()  #强制将缓冲区中的数据发送出去, do not wait until the buffer is full if __name__ = = ' __main__ ':    p1 = Thread (TARGET=INP)    P2 = thread (target=fort)    p3 = thread (target=save)    P1.start () P2.start (    )    P3.start ()

2) The service-side client example in the previous essay is implemented with multithreading (you can read the previous essays without knowing)

Service SideClient

5, threading module Other methods

Thread instance object's method  # isAlive (): Returns whether the thread is active.  # getName (): Returns the thread name.  # SetName (): Sets the thread name. Some of the methods provided by the threading module are:  # threading.currentthread (): Returns the current thread variable.  # threading.enumerate (): Returns a list that contains the running thread. Running refers to threads that do not include pre-and post-termination threads until after the thread has started and ends.  # Threading.activecount (): Returns the number of running threads with the same result as Len (Threading.enumerate ()).
Test

Other threads, such as the main thread

From threading import Thread,currentthread,activecountimport Os,time,threadingdef Talk ():    time.sleep (2)    Print ('%s is running '%currentthread (). GetName ()) If __name__ = = ' __main__ ':    t=thread (Target=talk)    T.start ( )    t.join ()    print (' master ')

6. Daemon Thread

1) The difference between daemon and daemon

For the main process, running complete means that the main process code has finished running

For the main thread, it means that all the non-daemon threads in the process of the main thread are running, and the main thread is running complete.

2) Detailed description

The main process is finished after its code has finished (the daemon is recycled at this point), and the main process will always wait until the non-daemon has finished running and reclaim the child process's resources (otherwise it will produce a zombie process) before it ends

The main thread runs after the other non-daemon threads have finished running (the daemon is recycled at this point). Because the end of the main thread means the end of the process, the resources of the process as a whole are recycled, and the process must ensure that the non-daemon threads are finished before they end.

Daemon Threadsexamples of confusing people

Third, Python GIL (Global interpreter Lock)

1. Definition:

In CPython, the global interpreter lock, or GIL, was a mutex that prevents multiple native threads from executing Python by Tecodes at once. In CPython, the global interpreter lock is a mutex, or Gil, that can prevent multiple local threads from executing python bytecode. This lock is necessary mainly because CPython ' s memory management are not thread-safe. It is required, primarily because CPython's memory management is not thread-safe. (However, since the GIL exists, other features has grown to depend on the guarantees that it enforces.) However, because Gil exists, other features have evolved to rely on it for assurance.

Conclusion: In the CPython interpreter, multiple threads that are opened under the same process can only have one thread at a time and cannot take advantage of multicore advantages

Attention:

The first thing to be clear is GIL not the Python feature, which is a concept introduced when implementing the Python parser (CPython). Just like C + + is a set of language (syntax) standards, but can be compiled into executable code with different compilers. Well-known compilers such as Gcc,intel c++,visual C + +. Python is the same, and the same piece of code can be executed through different Python execution environments such as Cpython,pypy,psyco. Like the Jpython there is no Gil. However, because CPython is the default Python execution environment for most environments. So in many people's concept CPython is Python, also take it for granted that the GIL Python language defects. So let's be clear here: Gil is not a python feature, Python can be completely independent of the Gil

Have confidence in their English level can look at: http://www.dabeaz.com/python/UnderstandingGIL.pdf (this article thoroughly analyzes the Gil on Python multi-threading effect)

2, Gil Introduction

Gil is the essence of a mutex, since it is a mutex, all the nature of the mutex is the same, all the concurrent operation into serial, in order to control the same time shared data can only be modified by a task, and thus ensure data security.

One thing is certain: to protect the security of different data, you should add a different lock.

To understand the Gil, first make a point: each time you execute a python program, you create a separate process. For example, Python Test.py,python Aaa.py,python bbb.py will produce 3 different Python processes

"' #验证python test.py will only produce a process #test.py content import os,timeprint (Os.getpid ()) Time.sleep (+)" Python3 test.py # Under Windows tasklist |findstr python# under Linux PS aux |grep python Verify that Python test.py will only produce a single process

In a python process, there is not only the main thread of the test.py or other threads that are turned on by the thread, but also the interpreter-level threads of the interpreter-enabled garbage collection, in short, all threads run within this process, without a doubt:

#1 all the data is shared, where the code as a data is also shared by all threads (all code of test.py and all the code of the CPython interpreter) #2 All threads of the task, need to use the code of the task as arguments passed to the interpreter code to execute, That is, all threads want to run their own tasks, the first thing to do is to have access to the interpreter's code.

Comprehensive:

If the target=work of multiple threads, then the execution process is multiple lines enters upgradeable access to the interpreter code, that is, get execute permissions, and then the target code to the interpreter code to execute

Gil protects the data at the interpreter level and protects the user's own data by locking them

Protect your data or you need to lock yourself

3. Gil and Multithreading

With Gil's presence, at the same moment only one thread in the same process is executed

Hear here, whether you have doubts: The process can take advantage of multicore, but the cost is large, and Python's multithreading overhead is small, but can not take advantage of multicore advantage, that is, Python useless

To solve this problem, we need to agree on several points:

Conclusion:

For computing, the more CPU, the better, but for I/O, no more CPU is useless

Of course, to run a program, with the increase in CPU performance will certainly be improved (regardless of the increase in size, there is always improved), this is because a program is basically not pure computing or pure I/O, so we can only compare to see whether a program is computationally intensive or I/o-intensive, Further analysis of the Python multithreading in the end there is no useful

#分析: We have four tasks to deal with, the processing method must be to play a concurrency effect, the solution can be: Scenario One: Open four process Scenario two: a process, open four threads # single core case, the analysis results: If four tasks are computationally intensive, no multicore to parallel computing, Scenario one increases the cost of creating a process, and the scheme wins if four tasks are I/O intensive, the cost of the scenario one creation process is large, and the process is much less than the thread, and the scenario is faster than the threads, the result of the analysis: if four tasks are computationally intensive, multicore means parallel computing, In Python a process in the same time only one thread execution is not multi-core, scheme one win if four tasks are I/O intensive, no more cores can not solve I/O problem, scenario two wins # conclusion: Today's computers are mostly multicore, Python's efficiency in multi-threading for computationally intensive tasks does not bring much performance gains, or even better serial (no large switching), but there is a significant increase in the efficiency of IO-intensive tasks.

4. Performance test

computationally Intensive: high-efficiency multi-processI/O intensive: high-efficiency multithreading

Summarize:

Multithreading for IO-intensive, such as sockets, crawlers, web

Multi-process for computational-intensive, such as financial analysis

Thread vs. Global interpreter Lock (GIL)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.