Try multi-thread programming in Python

Source: Internet
Author: User

Try multi-thread programming in Python

This article mainly introduces the attempt of multi-thread programming in Python. Due to the existence of GIL, multithreading has always been a hot issue in the Python development field. For more information, see

Multi-task can be completed by multiple processes, or by multiple threads in a process.

We mentioned earlier that a process is composed of several threads. A process has at least one thread.

Since threads are execution units directly supported by the operating system, high-level languages usually have built-in multi-threaded support, and Python is no exception. In addition, Python threads are real Posix Threads, instead of the simulated thread.

The Python Standard Library provides two modules: thread and threading. thread is a low-level module, and threading is an advanced module, which encapsulates thread. In most cases, we only need to use the advanced module threading.

To start a Thread is to pass in a function and create a Thread instance, and then call start () to start execution:

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

Import time, threading

 

# Code executed by the new thread:

Def loop ():

Print 'thread % s is running... '% threading. current_thread (). name

N = 0

While n <5:

N = n + 1

Print 'thread % s >>> % s' % (threading. current_thread (). name, n)

Time. sleep (1)

Print 'thread % s ended. '% threading. current_thread (). name

 

Print 'thread % s is running... '% threading. current_thread (). name

T = threading. Thread (target = loop, name = 'loopthread ')

T. start ()

T. join ()

Print 'thread % s ended. '% threading. current_thread (). name

The execution result is as follows:

?

1

2

3

4

5

6

7

8

9

Thread MainThread is running...

Thread LoopThread is running...

Thread LoopThread >>> 1

Thread LoopThread >>> 2

Thread LoopThread >>> 3

Thread LoopThread >>> 4

Thread LoopThread >>> 5

Thread LoopThread ended.

Thread MainThread ended.

By default, any process starts a thread. We call this thread as the main thread, and the main thread can start a new thread. The threading module of Python has a current_thread () function, it always returns the instance of the current thread. The name of the main thread instance is MainThread, which is specified when the sub-thread is created. We use LoopThread to name the sub-thread. The name is only used for display during printing. It has no other meaning at all. If it is not named, Python will automatically name the Thread as Thread-1, Thread-2 ......

Lock

The biggest difference between multithreading and multi-process is that a copy of the same variable exists in each process, which does not affect each other, all variables are shared by all threads. Therefore, any variable can be modified by any thread. Therefore, the biggest risk of data sharing between threads is that multiple threads change a variable at the same time, messed up the content.

Let's take a look at how multiple threads operate on a variable at the same time mess up the content:

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

Import time, threading

 

 

# Assume this is your bank deposit:

Balance = 0

 

Def change_it (n ):

# Save it first and then fetch it. The result should be 0:

Global balance

Balance = balance + n

Balance = balance-n

 

Def run_thread (n ):

For I in range (100000 ):

Change_it (n)

 

T1 = threading. Thread (target = run_thread, args = (5 ,))

T2 = threading. Thread (target = run_thread, args = (8 ,))

T1.start ()

T2.start ()

T1.join ()

T2.join ()

Print balance

We have defined a shared variable balance. The initial value is 0 and the two threads are started. The result is saved first and then retrieved. Theoretically, the result is 0. However, because thread scheduling is determined by the operating system, when t1 and t2 are executed alternately, the balance result may not be zero as long as the number of cycles is sufficient.

The reason is that a statement in advanced language is a number of statements executed by the CPU, even if a simple calculation:

?

1

Balance = balance + n

There are also two steps:

Calculate balance + n and store it in temporary variables;

Assign the value of the temporary variable to balance.

That is, it can be viewed:

?

1

2

X = balance + n

Balance = x

Since x is a local variable, both threads have their own x. When the code is executed normally:

Initial Value balance = 0

?

1

2

3

4

5

6

7

8

9

T1: x1 = balance + 5 # x1 = 0 + 5 = 5

T1: balance = x1 # balance = 5

T1: x1 = balance-5 # x1 = 5-5 = 0

T1: balance = x1 # balance = 0

 

T2: x2 = balance + 8 # x2 = 0 + 8 = 8

T2: balance = x2 # balance = 8

T2: x2 = balance-8 # x2 = 8-8 = 0

T2: balance = x2 # balance = 0

Result: balance = 0.

However, t1 and t2 run alternately. If the operating system executes t1 and t2 in the following sequence:

Initial Value balance = 0

?

1

2

3

4

5

6

7

8

9

10

11

T1: x1 = balance + 5 # x1 = 0 + 5 = 5

 

T2: x2 = balance + 8 # x2 = 0 + 8 = 8

T2: balance = x2 # balance = 8

 

T1: balance = x1 # balance = 5

T1: x1 = balance-5 # x1 = 5-5 = 0

T1: balance = x1 # balance = 0

 

T2: x2 = balance-5 # x2 = 0-5 =-5

T2: balance = x2 # balance =-5

Result balance =-5

The reason is that multiple statements are required to modify the balance. When these statements are executed, the thread may be interrupted, causing multiple threads to disrupt the content of the same object.

When the two threads store the same data at the same time, the balance may be incorrect. You certainly do not want your bank deposits to become a negative number inexplicably. Therefore, we must ensure that when a thread modifies the balance, other threads cannot be changed.

If we want to ensure that the balance calculation is correct, we need to lock change_it (). When a thread starts to execute change_it (), we say that the thread has obtained the lock, therefore, other threads cannot execute change_it () at the same time. They can only wait until the lock is released and can be changed after the lock is obtained. Since there is only one lock, no matter how many threads, only one thread can hold the lock at most at the same time, it will not cause a conflict of modification. Creating a Lock is implemented through threading. Lock:

?

1

2

3

4

5

6

7

8

9

10

11

12

13

Balance = 0

Lock = threading. Lock ()

 

Def run_thread (n ):

For I in range (100000 ):

# Obtain the lock first:

Lock. acquire ()

Try:

# Rest assured:

Change_it (n)

Finally:

# Release the lock after modification:

Lock. release ()

When multiple threads execute lock. acquire () at the same time, only one thread can successfully obtain the lock and then continue to execute the code. Other threads will continue to wait until the lock is obtained.

The lock must be released after the lock is used up. Otherwise, the threads waiting for the lock will wait forever and become dead threads. So we use try... finally to ensure that the lock will be released.

The advantage of the lock is to ensure that a certain segment of key code can only be fully executed by one thread from start to end. Of course, there are also many disadvantages. The first is to prevent concurrent execution of multiple threads, A code segment containing the lock can only be executed in single thread mode, and the efficiency is greatly reduced. Second, because multiple locks can exist, different threads hold different locks, and try to obtain the lock held by the other side, it may cause a deadlock, resulting in all the threads suspended, it cannot be executed or ended. It can only be forcibly terminated by the operating system.

Multi-core CPU

If you unfortunately have a multi-core CPU, you must be thinking that multi-core should be able to execute multiple threads at the same time.

What will happen if an infinite loop is written?

Enable Activity Monitor of Mac OS X or Windows Task Manager to Monitor the CPU usage of a process.

We can monitor that an infinite loop thread occupies 100% of a CPU.

If there are two dead loop threads, in the multi-core CPU, it can be monitored to occupy 200% of the CPU, that is, it occupies two CPU cores.

To fully run the N-core CPU core, you must start N dead loop threads.

Try to write an endless loop in Python:

?

1

2

3

4

5

6

7

8

9

10

Import threading, multiprocessing

 

Def loop ():

X = 0

While True:

X = x ^ 1

 

For I in range (multiprocessing. cpu_count ()):

T = threading. Thread (target = loop)

T. start ()

Start N threads with the same number of CPU cores. On the 4-core CPU, only 160% of the CPU usage can be monitored, that is, less than two cores are used.

Even if 100 threads are started, the usage is about 170%, and there are still less than two cores.

However, C, C ++, or Java can be used to rewrite the same dead loop. The entire core can be fully occupied, and the 4-core can run to 400%, and the 8-core can run to 800%, why does Python fail?

Although the Python thread is a real thread, when the Interpreter executes the code, there is a GIL Lock: Global Interpreter Lock. Before any Python thread is executed, the GIL Lock must be obtained first. Then, each time 100 bytecode is executed, the interpreter Automatically releases the GIL lock, giving other threads the opportunity to execute the lock. This GIL global lock actually locks the Execution Code of all threads. Therefore, multithreading can only be executed alternately in Python, even if 100 threads run on 100-core CPUs, only one core can be used.

GIL is a legacy of the Python interpreter design. Generally, the interpreter we use is the officially implemented CPython. To really use multiple cores, unless we rewrite an interpreter without GIL.

Therefore, multiple threads can be used in Python, but do not expect to use multiple cores effectively. If multi-threaded Multi-core is required, it can only be implemented through C extension, but this will lose the ease-of-use feature of Python.

However, do not worry too much. Although Python cannot implement multi-core tasks using multiple threads, it can implement multi-core tasks through multiple processes. Multiple Python processes have independent GIL locks, which do not affect each other.

Summary

Multi-threaded programming. The model is complex and prone to conflicts. locks must be used to isolate them. At the same time, be careful when deadlock occurs.

The Python interpreter is designed with GIL global locks, so that multiple threads cannot use multiple cores. Multi-threaded concurrency is a beautiful dream in Python.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.