Usage of multi-thread programming in Python

Last Update:2017-01-13 Source: Internet

Author: User

Tags semaphore sleep function in python

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The Threading module has appeared since Python 1.5.2 and is used to enhance the underlying multi-thread module thread. The Threading module makes it easier to operate on multiple threads and supports multiple operations at the same time.

Note: Multithreading in Python is best used to handle I/O operations, such as downloading resources from the Internet or reading files or directories locally. If you want to perform CPU-intensive operations, you need to use the Python multiprocessing module. The reason for this is that Python has a global interpreter lock (GIL), so that all child threads must run in the same main thread. Because of this, when you use multiple threads to process multiple CPU-intensive tasks, you will find that it actually runs slower. Therefore, we will focus on the areas where multithreading is best at: I/O operations!

Thread introduction

Multithreading allows you to run a long piece of code just like running an independent program. This is a bit like calling a subprocess, but the difference is that you call a function or a class, rather than an independent program. In my opinion, examples are more helpful. Here is a simple example:

Import threading

Def doubler (number ):
"""
A function that can be used by a thread
"""
Print (threading. currentThread (). getName () + '\ n ')
Print (number * 2)
Print ()

If _ name _ = '_ main __':
For I in range (5 ):
My_thread = threading. Thread (target = doubler, args = (I ,))
My_thread.start ()
Here, we import the threading module and create a regular function called doubler. This function accepts a value and doubles the value. It also prints the name of the thread that calls this function, and prints a blank line at the end. In the last part of the code, we create five threads and start them in sequence. When we instantiate a thread, you will notice that we pass the doubler function to the target parameter, and also pass the parameter to the doubler function. The Args parameter looks a bit strange, because we need to pass a sequence to the doubler function, but it only accepts one variable, so we put the comma at the end to create a sequence with only one parameter.

Note that if you want to wait for the end of a thread, you need to call the join () method.

When you run the above code, you will get the following output:

Thread-1

Thread-2

Thread-3

Thread-4

Thread-5

8
Of course, normally you do not want the output to be printed to the standard output. If we do this unfortunately, the final display will be messy. You should use the Python logging module. It is thread-safe and performs well. Let's use the logging module to modify the above example and name our thread. The code is as follows:

Import logging
Import threading

Def get_logger ():
Logger = logging. getLogger ("threading_example ")
Logger. setLevel (logging. DEBUG)

Fh = logging. FileHandler ("threading. log ")
Fmt = '% (asctime) s-% (threadName) s-% (levelname) s-% (message) s'
Formatter = logging. Formatter (fmt)
Fh. setFormatter (formatter)

Logger. addHandler (fh)
Return logger

Def doubler (number, logger ):
"""
A function that can be used by a thread
"""
Logger. debug ('doubler function executing ')
Result = number * 2
Logger. debug ('doubler function ended with: {} '. format (
Result ))

If _ name _ = '_ main __':
Logger = get_logger ()
Thread_names = ['Mike ', 'George', 'Wanda', 'dingbat', 'Nina ']
For I in range (5 ):
My_thread = threading. Thread (
Target = doubler, name = thread_names [I], args = (I, logger ))
My_thread.start ()
The biggest change in the code is to add the get_logger function. This code creates a logger set to the debugging level. It stores logs in the current directory (that is, the directory where the script runs) and sets the format of each line of logs. The format includes the timestamp, thread name, log record level, and log information.

In the doubler function, we replace the print statement with the logging statement. You will notice that when creating a thread, we passed a logger object to the doubler function. The reason for doing so is that if you instantiate a logging object in each thread, multiple logging singleton instances will be generated, and there will be a lot of repeated content in the log.

Finally, create a name list and use the name keyword parameter to set a specific name for each thread so that you can name the thread. Run the preceding code to obtain a log file containing the following content:

20:39:50, 055-Mike-DEBUG-doubler function executing
20:39:50, 055-Mike-DEBUG-doubler function ended with: 0
20:39:50, 055-George-DEBUG-doubler function executing
20:39:50, 056-George-DEBUG-doubler function ended with: 2
20:39:50, 056-Wanda-DEBUG-doubler function executing
20:39:50, 056-Wanda-DEBUG-doubler function ended with: 4
20:39:50, 056-Dingbat-DEBUG-doubler function executing
20:39:50, 057-Dingbat-DEBUG-doubler function ended with: 6
20:39:50, 057-Nina-DEBUG-doubler function executing
20:39:50, 057-Nina-DEBUG-doubler function ended with: 8
The output result is self-explanatory, so we will continue to introduce other content. In this section, we will talk about how to implement multithreading by inheriting threading. Thread. In the last example, a subclass is created by inheriting threading. Thread, instead of directly calling the Thread function.

The updated code is as follows:

Import logging
Import threading

Class MyThread (threading. Thread ):

Def _ init _ (self, number, logger ):
Threading. Thread. _ init _ (self)
Self. number = number
Self. logger = logger

Def run (self ):
"""
Running thread
"""
Logger. debug ('calling doubler ')
Doubler (self. number, self. logger)

Def get_logger ():
Logger = logging. getLogger ("threading_example ")
Logger. setLevel (logging. DEBUG)

Fh = logging. FileHandler ("threading_class.log ")
Fmt = '% (asctime) s-% (threadName) s-% (levelname) s-% (message) s'
Formatter = logging. Formatter (fmt)
Fh. setFormatter (formatter)

Logger. addHandler (fh)
Return logger

If _ name _ = '_ main __':
Logger = get_logger ()
Thread_names = ['Mike ', 'George', 'Wanda', 'dingbat', 'Nina ']
For I in range (5 ):
Thread = MyThread (I, logger)
Thread. setName (thread_names [I])
Thread. start ()
In this example, we only create a subclass that inherits from threading. Thread. As before, input a number that needs to be doubled and a logging object. However, this time, the thread name setting method is a bit different. It is set by calling the setName method of the thread object. However, you still need to call start to start the thread, but you may notice that we do not need to define this method in the subclass. When start is called, it starts the thread by calling the run method. In our class, we call the doubler function for processing. In the output result, the content of some additional information is almost the same. Run this script to see what you will get.

Thread lock and thread synchronization

When you have multiple threads, you need to consider how to avoid thread conflicts. I mean, you may encounter multiple threads simultaneously accessing the same resource. If you do not consider these problems and develop corresponding solutions, you will always encounter these difficult problems at the worst time during product development.

The solution is to use the thread lock. The lock is provided by the threading module of Python and can be held by a thread at most. When a thread tries to obtain a lock that has been locked on the resource, the thread usually stops running until the lock is released. Let's take a look at a very typical example where the lock function should not be available:

Import threading

Total = 0

Def update_total (amount ):
"""
Updates the total by the given amount
"""
Global total
Total + = amount
Print (total)

If _ name _ = '_ main __':
For I in range (10 ):
My_thread = threading. Thread (
Target = update_total, args = (5 ,))
My_thread.start ()
If you add the time. sleep function to the above code and give a different length of time, this example may be more interesting. In any case, the problem here is that a thread may have called the update_total function and has not completed the update. At this time, another thread may also call it and try to update the content. The value may be increased only once based on the operation execution sequence.

Let's add a lock to this function. There are two methods to achieve this. The first method is to use try/finally to ensure that the lock will be released. The following is an example:

Import threading

Total = 0
Lock = threading. Lock ()

Def update_total (amount ):
"""
Updates the total by the given amount
"""
Global total
Lock. acquire ()
Try:
Total + = amount
Finally:
Lock. release ()
Print (total)

If _ name _ = '_ main __':
For I in range (10 ):
My_thread = threading. Thread (
Target = update_total, args = (5 ,))
My_thread.start ()
As above, the lock is obtained before any processing. Then, update the value of total, release the lock, and print the current value of total. In fact, we can use the with statement of Python to avoid the tedious statement like try/finally:

Import threading

Total = 0
Lock = threading. Lock ()

Def update_total (amount ):
"""
Updates the total by the given amount
"""
Global total
With lock:
Total + = amount
Print (total)

If _ name _ = '_ main __':
For I in range (10 ):
My_thread = threading. Thread (
Target = update_total, args = (5 ,))
My_thread.start ()
As you can see, we no longer need try/finally as the context manager, but instead use the with statement.

Of course, you will also encounter the situation of having to access multiple functions through multiple threads in the code. When you write concurrent code for the first time, the code may be like this:

Import threading

Total = 0
Lock = threading. Lock ()

Def do_something ():
Lock. acquire ()

Try:
Print ('lock acquired in the do_something function ')
Finally:
Lock. release ()
Print ('lock released in the do_something function ')

Return "Done doing something"

Def do_something_else ():
Lock. acquire ()

Try:
Print ('lock acquired in the do_something_else function ')
Finally:
Lock. release ()
Print ('lock released in the do_something_else function ')

Return "Finished something else"

If _ name _ = '_ main __':
Result_one = do_something ()
Result_two = do_something_else ()
This code works normally in the above circumstances, but suppose you have multiple threads that call these two functions. When a thread is running the two functions, the other thread may modify the data. The final result is incorrect. The problem is that you may not even immediately realize that the result is wrong. What are the solutions? Let's try to find out the answer.

Usually, the first thing that comes to mind is locking where the two functions are called. Let's try to modify the above example as follows:

Import threading

Total = 0
Lock = threading. RLock ()

Def do_something ():

With lock:
Print ('lock acquired in the do_something function ')
Print ('lock released in the do_something function ')

Return "Done doing something"

Def do_something_else ():
With lock:
Print ('lock acquired in the do_something_else function ')
Print ('lock released in the do_something_else function ')

Return "Finished something else"

Def main ():
With lock:
Result_one = do_something ()
Result_two = do_something_else ()

Print (result_one)
Print (result_two)

If _ name _ = '_ main __':
Main ()
When you really run this code, you will find that it is just suspended. The reason is that we only tell the threading module to obtain the lock. So when we call the first function, it finds that the lock has been acquired and then suspends itself until the lock is released, but this will never happen.

The real solution is to use the Re-Entrant Lock ). The solution provided by the threading module is to use the RLock function. Replace lock = threading. lock () with lock = threading. RLock () and run the code again. Now the code can run normally.

If you want to run the above code in the thread, you can replace the following code to directly call the main function:

If _ name _ = '_ main __':
For I in range (10 ):
My_thread = threading. Thread (
Target = main)
My_thread.start ()
Each thread runs the main function, and the main function calls the other two functions in turn. In the end, 10 result sets are generated.

Timer

The Threading module has an elegant Timer class that you can use to implement the actions that will take place after a specified time. They will actually start their own custom threads and can be run by calling the start () method on the regular thread. You can also call its cancel method to stop the timer. It is worth noting that you can even cancel the timer before it starts.

One day, I encountered a special situation: I needed to communicate with a started sub-process, but I needed it to have timeout processing. Although there are many different ways to handle this special problem, my favorite solution is to use the Timer class of the threading module.

In the following example, we will use the ping command for demonstration. In Linux, the ping command continues until you manually kill it. Therefore, in the Linux World, Timer classes are very convenient. Example:

Import subprocess

From threading import Timer

Kill = lambda process: process. kill ()
Cmd = ['ping', 'www .google.com ']
Ping = subprocess. Popen (
Cmd, stdout = subprocess. PIPE, stderr = subprocess. PIPE)

My_timer = Timer (5, kill, [ping])

Try:
My_timer.start ()
Stdout, stderr = ping. communicate ()
Finally:
My_timer.cancel ()

Print (str (stdout ))
Here we call kill in lambda expression to kill the process. Start the ping command and create the Timer object. You will notice that the first parameter is the number of seconds to wait, and the second parameter is the function to be called, followed by the input parameter to call the function. In this example, our function is a lambda expression that passes in a list with only one element. If you run this code, it will run for 5 seconds and then print the ping result.

Other thread components

The Threading module supports other functions. For example, you can create a Semaphore, which is one of the oldest synchronization primitives in computer science. Basically, a semaphore manages a built-in counter. When you call acquire, the counter will decrease. On the contrary, when you call release, the counter will increase progressively. According to its design, the counter value cannot be less than zero. Therefore, if the acquire method is called when the counter is zero, this method will block the thread.

Note: generally, a value greater than zero is initialized when semaphores are used, for example, semaphore = threading. Semaphore (2)
Another very useful synchronization tool is Event ). It allows you to use signal to implement thread communication. In the next section, we will give an instance that uses the event.

Finally, the Barrier object is added to Python 3.2. Barrier is the synchronization primitive in the management thread pool. Multiple threads in the thread pool need to wait for each other. If you want to pass the barrier, each thread must call the wait () method. The thread will be blocked before other threads call this method. All threads will be released at the same time.

Thread communication

In some cases, you want threads to communicate with each other. As mentioned earlier, you can create an Event object to achieve this goal. However, the more common method is to use a Queue ). In our example, both methods are involved. Let's see what it looks like:

Import threading

From queue import Queue

Def creator (data, q ):
"""
Generate data for consumption and wait for the consumer to finish processing
"""
Print ('creating data and putting it on the queue ')
For item in data:
Evt = threading. Event ()
Q. put (item, evt ))

Print ('Waiting for data to be doubled ')
Evt. wait ()

Def my_consumer (q ):
"""
Consume and process part of the data

What we do here is to double the input.

"""
While True:
Data, evt = q. get ()
Print ('data found to be processed: {} '. format (data ))
Processed = data * 2
Print (processed)
Evt. set ()
Q. task_done ()

If _ name _ = '_ main __':
Q = Queue ()
Data = [5, 10, 13,-1]
Thread_one = threading. Thread (target = creator, args = (data, q ))
Thread_two = threading. Thread (target = my_consumer, args = (q ,))
Thread_one.start ()
Thread_two.start ()

Q. join ()
Let's take a look. First, we have a creator function (also called a producer), which we use to create (or consume) the data we want to operate on. Then, use another function my_consumer to process the created data. The Creator function uses the put method of the Queue to insert data into the Queue. The consumer continuously checks whether there is more data and processes the data when it finds the data. The Queue object handles all the processes of obtaining and releasing locks. We don't need to worry too much about these processes.

In this example, create a list and then create two threads, one as the producer and the other as the consumer. You will find that we have passed the Queue object to both threads, which hides the details about lock processing. The queue transfers data from the first thread to the second thread. When the first thread puts data into the queue, it also transmits an Event, and then suspends itself, waiting for the Event to end. On the consumer side, that is, the second thread, data processing is performed. After data processing is complete, the set method of the Event will be called to notify the first thread that the data processing has been completed and production can continue.

The last line of code calls the join method of the Queue object, which tells the Queue to wait until all threads end. When the first thread puts all the data in the queue, the operation ends.

Conclusion

The above covers many aspects of threads, including:

Basic thread knowledge
Lock operation method
What are events and how to use them?
How to use a timer
Implement inter-thread communication through Queues/Events
Now you know how to use threads and what threads are good at. I hope they can be used in your code.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More