Introduction to Python Multithreading

Source: Internet
Author: User

Import threading
Import datetime

Class threadclass (threading. Thread ):
Def run (Self ):
Now = datetime. datetime. Now ()
Print "% s says hello World at time: % s" % (self. getname (), now)
For I in range (2 ):
T = threadclass ()

T. Start ()

Python thread0724.py
Thread-1 says hello World at time: 10:22:25. 393383
Thread-2 says hello World at time: 10:22:25. 393675

Introduction

For python, there is no lack of concurrency options. Its standard library includes support for threads, processes, and asynchronous I/O. In many cases, Python simplifies the use of various concurrency methods by creating high-level modules such as Asynchronous, thread, and sub-process. In addition to the standard library, there are also some third-party solutions, such as twisted, stackless, and process modules. This article focuses on the use of Python threads and uses some practical examples. Although many good online resources detail thread APIs, this article attempts to provide some practical examples to illustrate some common thread usage modes.

Global interpreter lock indicates that the python interpreter is NOT thread-safe. The current thread must hold a global lock for secure access to Python objects. Because only one thread can obtain the python object/c api, the interpreter regularly releases and reacquires the lock every time it passes through 100 bytecode instructions. The interpreter can check the thread switching frequency through
sys.setcheckinterval()Function.

In addition, the lock will be released and re-acquired based on potential blocking I/O operations. For more information, see
Gil and threading stateAndThreading the global interpreter lock.

It should be noted that due to Gil, CPU-limited applications will not benefit from the use of threads. We recommend that you use processes or create processes and threads in a hybrid manner when using python.

First, it is very important to clarify the differences between processes and threads. The difference between threads and processes is that they share status, memory, and resources. For a thread, this simple difference is both its advantage and Its disadvantage. On the one hand, threads are lightweight and easy to communicate with each other, but on the other hand, they also bring about various problems including deadlocks, contention conditions and high complexity. Fortunately, because of Gil and the queue module, it is much less complex to use the Python language than to use other languages.

Use Python threads

To continue learning the content in this article, I assume that you have installed Python 2.5 or later, because many examples in this article will use the new features of the Python language, these features only appear after python2.5. To start using a python thread, we will start with a simple "Hello World" Example:

Hello_threads_example

                                import threading        import datetime                class ThreadClass(threading.Thread):          def run(self):            now = datetime.datetime.now()            print "%s says Hello World at time: %s" %             (self.getName(), now)                for i in range(2):          t = ThreadClass()          t.start()      

If you run this example, you will get the following output:

      # python hello_threads.py       Thread-1 says Hello World at time: 2008-05-13 13:22:50.252069      Thread-2 says Hello World at time: 2008-05-13 13:22:50.252576      

Observe the output result carefully. You can see that the Hello world statement is output from both threads, with a date stamp. If you analyze the actual code, you will find that there are two import statements. One statement imports the date and time module, and the other statement imports the thread module. Class
ThreadClassInherited fromthreading.ThreadBecause of this, you need to define a run method to execute the code you want to run in this thread. The only thing to note in this run method is that,self.getName()Is a method used to determine the thread name.

The last three lines of code actually call this class and start the thread. If you pay attention to it, you will find that the actual startup thread ist.start(). Inheritance is taken into account when designing the thread module, and the thread module is actually built on the basis of the underlying thread module. In most cases
threading.ThreadInheritance is a best practice because it creates a regular API for Thread Programming.

Use thread queue

As mentioned above, when multiple threads need to share data or resources, the usage of threads may become complicated. The thread module provides many synchronization primitives, including semaphores, conditional variables, events, and locks. When these options exist, the best practice is to focus on using queues. In comparison, queues are easier to process, and thread programming is more secure, because they can effectively transmit all access to resources from a single thread, it also supports a clearer and more readable design mode.

In the next example, you will first create a program that is executed in serial or in sequence to obtain the URL of the website and display the first 1024 bytes of the page. The following is a typical example. First, let's use
urllib2Module to obtain these pages (one page at a time), and timing the code running time:

URL Acquisition sequence

                        import urllib2        import time                hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",        "http://ibm.com", "http://apple.com"]                start = time.time()        #grabs urls of hosts and prints first 1024 bytes of page        for host in hosts:          url = urllib2.urlopen(host)          print url.read(1024)                print "Elapsed Time: %s" % (time.time() - start)      

When running the above example, you will get a large number of output results in the standard output. However, you will get the following content:

        Elapsed Time: 2.40353488922          

Let's analyze this code carefully. You have imported only two modules. First,urllib2The module reduces the complexity of work and obtains web pages. Then
time.time()You create a start time value, call the function again, and subtract the start value to determine how long it took to execute the program. Finally, let's analyze the execution speed of the program. Although the result of "2.5 seconds" is not too bad, if you need to retrieve hundreds of web pages, follow the average value, it takes about 50 seconds. Study how to create a thread version that can increase the execution speed:

URL obtaining thread-based

                          #!/usr/bin/env python          import Queue          import threading          import urllib2          import time                    hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",          "http://ibm.com", "http://apple.com"]                    queue = Queue.Queue()                    class ThreadUrl(threading.Thread):          """Threaded Url Grab"""            def __init__(self, queue):              threading.Thread.__init__(self)              self.queue = queue                      def run(self):              while True:                #grabs host from queue                host = self.queue.get()                            #grabs urls of hosts and prints first 1024 bytes of page                url = urllib2.urlopen(host)                print url.read(1024)                            #signals to queue job is done                self.queue.task_done()                    start = time.time()          def main():                      #spawn a pool of threads, and pass them queue instance             for i in range(5):              t = ThreadUrl(queue)              t.setDaemon(True)              t.start()                         #populate queue with data                 for host in hosts:                queue.put(host)                      #wait on the queue until everything has been processed                queue.join()                    main()          print "Elapsed Time: %s" % (time.time() - start)      

For this example, more code needs to be explained, but it is not much complicated than the first thread example because the queue module is used. This mode is common and recommended when using threads in Python. The procedure is described as follows:

  1. CreateQueue.Queue()And then fill it with data.
  2. Pass the instance of the filled data to the Thread class, which is inheritedthreading.Thread.
  3. Generate a daemon thread pool.
  4. Each time a project is retrieved from the queue, the data in the thread and the run method are used to execute the corresponding work.
  5. Usequeue.task_done()The function sends a signal to the completed queue of the task.
  6. The Join Operation on the queue actually means that the main program is exited when the queue is empty.

When using this mode, note that by setting the daemon thread to true, the main thread or program can exit only when the daemon thread is active. This method creates a simple way to control the program flow, because before exiting, you can perform the join operation on the queue or wait until the queue is empty. The queue module documentation details the actual processing process. See references:

join()
Keep blocked until all projects in the queue are processed. When a project is added to the queue, the total number of unfinished tasks increases. When the user thread calls task_done () to retrieve the project and complete all the work, the total number of unfinished tasks is reduced. When the total number of incomplete tasks is reduced to zero,join()The blocking status ends.

Back to Top

Use multiple queues

Because the mode described above is very effective, you can connect to the additional thread pool and queue for expansion, which is quite simple. In the preceding example, you only output the starting part of the web page. In the next example, the complete web page obtained by each thread is returned and the result is placed in another queue. Then, set another thread pool added to the second queue, and then perform corresponding processing on the web page. The work in this example includes parsing Web pages using a third-party Python module named beautiful Soup. To use this module, you only need two lines of code to extract the title tag of each accessed page and print it out.

Multi-queue Data Mining websites

                import Queueimport threadingimport urllib2import timefrom BeautifulSoup import BeautifulSouphosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",        "http://ibm.com", "http://apple.com"]queue = Queue.Queue()out_queue = Queue.Queue()class ThreadUrl(threading.Thread):    """Threaded Url Grab"""    def __init__(self, queue, out_queue):        threading.Thread.__init__(self)        self.queue = queue        self.out_queue = out_queue    def run(self):        while True:            #grabs host from queue            host = self.queue.get()            #grabs urls of hosts and then grabs chunk of webpage            url = urllib2.urlopen(host)            chunk = url.read()            #place chunk into out queue            self.out_queue.put(chunk)            #signals to queue job is done            self.queue.task_done()class DatamineThread(threading.Thread):    """Threaded Url Grab"""    def __init__(self, out_queue):        threading.Thread.__init__(self)        self.out_queue = out_queue    def run(self):        while True:            #grabs host from queue            chunk = self.out_queue.get()            #parse the chunk            soup = BeautifulSoup(chunk)            print soup.findAll(['title'])            #signals to queue job is done            self.out_queue.task_done()start = time.time()def main():    #spawn a pool of threads, and pass them queue instance    for i in range(5):        t = ThreadUrl(queue, out_queue)        t.setDaemon(True)        t.start()    #populate queue with data    for host in hosts:        queue.put(host)    for i in range(5):        dt = DatamineThread(out_queue)        dt.setDaemon(True)        dt.start()    #wait on the queue until everything has been processed    queue.join()    out_queue.join()main()print "Elapsed Time: %s" % (time.time() - start)

If you run this version of the script, you will get the following output:

  # python url_fetch_threaded_part2.py   [<title>Google</title>]  [<title>Yahoo!</title>]  [<title>Apple</title>]  [<title>IBM United States</title>]  [<title>Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs & more</title>]  Elapsed Time: 3.75387597084

When analyzing this code, you can see that we have added another queue instance and passed the queue to the first thread pool class.ThreadURL. Next, for another thread pool class
DatamineThread, Almost copies the same structure. In the run method of this class, obtain the web page and text block from each thread in the queue, and then use beautiful Soup to process the text block. In this example, use beautiful Soup to extract the title tag of each page and print it out. This example can be easily applied to some more valuable application scenarios, because you have mastered the core content of basic search engines or data mining tools. One idea is to use beautiful Soup to extract links from each page and navigate by them.

Back to Top

Summary

This article studies Python threads and describes best practices for using queues to reduce complexity, reduce minor errors, and improve code readability. Although this basic mode is relatively simple, it can be used to solve various problems by connecting the queue and thread pool. In the last section, you begin to study how to create a more complex processing pipeline that can be used as a model for future projects. The references section provides many excellent references on regular concurrency and threads.

Finally, it is important to note that the thread cannot solve all the problems. In many cases, it may be more appropriate to use the process. In particular, when you only need to create many sub-processes and listen for the response, the standard library sub-process module may be easier to use. For more official instructions, see references.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.