An introductory tutorial on network programming using threads in Python _python

Source: Internet
Author: User

Introduction

For Python, there is no shortage of concurrency options, and the standard library includes support for threads, processes, and asynchronous I/O. In many cases, Python simplifies the use of various concurrent methods by creating high-level modules such as asynchrony, threads, and child processes. In addition to the standard libraries, there are third-party solutions, such as Twisted, stackless, and process modules. This article focuses on using Python threads and uses a few practical examples to illustrate them. Although there are a number of good online resources detailing the threading API, this article attempts to provide some practical examples to illustrate some common thread usage patterns.

The Global Interpreter lock (Interpretor Lock) demonstrates that the Python interpreter is not thread-safe. The current thread must hold a global lock to secure access to the Python object. Because only one thread can get the Python object/C API, the interpreter regularly releases and restarts locks every 100 byte code instruction. The frequency with which the interpreter checks for a thread switch can be controlled by the Sys.setcheckinterval () function.

In addition, locks are freed and re-acquired according to potential blocking I/O operations. For more detailed information, see the Gil and Threading state and Threading the Global interpreter Lock in the Resources section.

It should be explained that because GIL,CPU limited applications will not benefit from the use of threads. When using Python, it is recommended that you use processes, or mix and create processes and threads.

It is important to first understand the difference between the process and the thread. Threads differ from processes in that they share state, memory, and resources. For threads, this simple distinction is both an advantage and a drawback. On the one hand, threads are lightweight and easy to communicate with each other, but on the other hand they also bring a variety of problems, including deadlocks, race conditions, and high complexity. Fortunately, because of GIL and the queue module, the complexity of using the Python language thread implementation is much lower than in other languages.
using Python threads

To continue with the content in this article, I assume that you have installed Python 2.5 or later, because many of the examples in this article will use the new features of the Python language, which only appear after Python2.5. To start using the Python language thread, we'll start with the simple Hello World example:
Hello_threads_example

    Import Threading
    Import DateTime
    
    class ThreadClass (threading. Thread):
     def run (self): now
      = Datetime.datetime.now ()
      print '%s says Hello World in time:%s '% 
      (self.get Name (), now) for
    
    I in range (2):
     t = threadclass ()
     T.start ()

If you run this example, you will get the following output:

   # python hello_threads.py 
   Thread-1 says hello World at time:2008-05-13 13:22:50.252069 Thread-2 says
   hello Worl D at Time:2008-05-13 13:22:50.252576

With a careful look at the output, you can see that the Hello World statement is exported from two threads with a date stamp. If you analyze the actual code, you will find that it contains two import statements, one statement imports the DateTime module, and the other statement imports the threading module. Class ThreadClass inherits from Threading. Thread, and because of that, you need to define a run method that executes the code that you want to run in that thread. The only thing to note in this run method is that Self.getname () is a method for determining the name of the thread.

The last three lines of code actually call the class and start the thread. If you notice, you will find that the actual boot thread is T.start (). Inheritance is considered in the design of the threading module, and the threading module is actually built on the underlying thread module. For most cases, from threading. Thread inheritance is a best practice because it creates a general API for threading programming.
using thread queues

As mentioned earlier, when multiple threads need to share data or resources, it can complicate the use of threads. The threading module provides a number of synchronization primitives, including semaphores, condition variables, events, and locks. When these options exist, the best practice is to focus instead on using queues. In contrast, queues are easier to handle and can make thread programming more secure because they effectively deliver all the access to a single thread to resources and support clearer, more readable design patterns.

In the next example, you will first create a serial or sequential program that gets the URL for the Web site and displays the first 1024 bytes of the page. Sometimes using threads can accomplish tasks faster, and here's a typical example. First, let's use the URLLIB2 module to get these pages (get one page at a time) and time the code to run:
URL Fetch sequence

    Import urllib2
    Import time
    
    hosts = ["Http://yahoo.com", "http://google.com", "http://amazon.com",
    "http ://ibm.com "," http://apple.com "]
    
    start = Time.time ()
    #grabs URLs of hosts and prints a, 1024 of page
    For host in hosts:
     URL = urllib2.urlopen (host)
     print Url.read (1024)
    
    print ' Elapsed time:%s '% (Time.time ()- Start

When you run the above example, you will get a large amount of output in the standard output. But in the end you'll get the following:

    Elapsed time:2.40353488922

Let's take a closer look at this piece of code. You have only imported two modules. First, the URLLIB2 module reduces the complexity of the work and gets the Web page. Then, by calling Time.time (), you create a start time value, then call the function again, and subtract the start value to determine how long it took to execute the program. Finally, analyze the execution speed of the program, although "2.5 seconds" This result is not too bad, but if you need to retrieve hundreds of Web pages, then according to this average, it will take about 50 seconds. Explore how to create a threaded version that can improve execution speed:
URL Fetch Line Chenghua

     #!/usr/bin/env python import Queue import threading import urllib2 import Time hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com", "http://ibm.com", "http://apple.com"] que UE = Queue.queue () class Threadurl (threading. Thread): "" "Threaded Url Grab" "Def __init__ (self, queue): Threading. Thread.__init__ (self) self.queue = Queue def run (self): while True: #grabs host from Que UE host = Self.queue.get () #grabs URLs of hosts and prints the 1024 bytes of page URL = u Rllib2.urlopen (host) print Url.read (1024) #signals to queue job are done Self.queue.task_don 
      E () start = Time.time () def main (): #spawn A pool of threads, and pass them queue instance  For I in range (5): T = Threadurl (queue) T.setdaemon (True) T.start () #populate queue WITH data for host in Hosts:queue.put (host) #wait on the queue until everything has been pro

 Cessed Queue.join () main () print "Elapsed time:%s"% (Time.time ()-start)

For this example, there is more code to be explained, but it is not much more complicated than the first threading example, because the queue module is used. This pattern is a common and recommended way to use threads in Python. The detailed work steps are described as follows:

    • Create an instance of Queue.queue (), and then populate it with data.
    • Passes the instance of the filled data to the thread class, which is inherited by threading. Thread in the way it was created.
    • Generate a daemon thread pool.
    • Each time you take a project out of the queue and use the data and run methods in that thread to perform the appropriate work.
    • After you complete this work, use the Queue.task_done () function to send a signal to the queue that the task has completed.
    • Performing a join operation on a queue actually means waiting for the queue to be empty before exiting the main program.

One thing to be aware of when using this pattern is that by setting the daemon to True, the main thread is allowed or the program can exit only if the daemon thread is active. This approach creates an easy way to control the process, because before exiting, you can perform a join operation on the queue, or wait until the queue is empty. The Queue module documentation details the actual process, see resources:

Join ()
remain blocked until all items in the queue have been processed. When you add an item to the queue, the total number of unfinished tasks increases. When the consumer thread invokes Task_done () to indicate that the item is retrieved and all work is done, the total number of unfinished tasks is reduced. When the total number of unfinished tasks is reduced to 0 o'clock, join () ends the blocking state.

Using multiple queues

Because the schema described above is very effective, you can extend it by connecting additional thread pools and queues, which is fairly straightforward. In the example above, you just output the beginning of a Web page. The next example returns the full Web page that each thread gets, and then places the results in another queue. Then, set up another thread pool that is joined to the second queue, and then perform the appropriate processing on the Web page. The work done in this example involves parsing a Web page using a third-party Python module called Beautiful Soup. With this module, you need only two lines of code to extract the title tag of each page that you access and print it out.
Multi-queue Data Mining web site

Import Queue Import Threading import urllib2 import time from BeautifulSoup import beautifulsoup hosts = [Http://yahoo.  com "," http://google.com "," http://amazon.com "," http://ibm.com "," http://apple.com "] queue = Queue.queue () out_queue = Queue.queue () class Threadurl (threading. Thread): "" "Threaded Url Grab" "Def __init__ (self, queue, Out_queue): Threading.  Thread.__init__ (self) self.queue = Queue Self.out_queue = Out_queue def run (self): while True: #grabs Host from Queue host = Self.queue.get () #grabs URLs of hosts and then grabs chunk of webpage url = URL Lib2.urlopen (host) chunk = Url.read () #place chunk into out queue self.out_queue.put (chunk) #sig Nals to \ Queue job is done Self.queue.task_done () class Dataminethread (threading. Thread): "" "Threaded Url Grab" "Def __init__ (Self, out_queue): Threading.
 Thread.__init__ (self) self.out_queue = Out_queue def run (self): while True:     #grabs host from Queue chunk = Self.out_queue.get () #parse the chunk soup = beautifulsoup (chunk) Print Soup.findall ([' title ']) #signals to the queue job is done self.out_queue.task_done () start = Time.tim E () def main (): #spawn A pool of threads, and pass them queue instance to I in range (5): T = threadurl (Queue, OU T_queue) T.setdaemon (True) T.start () #populate queue with data for host in Hosts:queue.put (host) for I in range (5): dt = Dataminethread (out_queue) Dt.setdaemon (True) Dt.start () #wait on the queue until every

 Thing has been processed queue.join () Out_queue.join () main () print "Elapsed time:%s"% (Time.time ()-start)

If you run this version of the script, you will get the following output:

 # python url_fetch_threaded_part2.py 

 [<title>google</title>]
 [<title>yahoo!</title ] [
 <title>apple</title>]
 [<TITLE>IBM United States</title>]
 [<title >amazon.com:online Shopping for electronics, apparel,
 Computers, Books, DVDs & More</title>]
 Elapsed time:3.75387597084

When you analyze this code, you can see that we have added another queue instance and then passed the queue to the first thread pool class Threadurl. Next, the exact same structure is almost replicated for another thread pool class Dataminethread. In this class's Run method, get the Web page, text block from each thread in the queue, and then use beautiful Soup to process the text block. In this example, use beautiful Soup to extract the title tag for each page and print it out. This example can be easily extended to some of the more valuable scenarios because you have the core of a basic search engine or data mining tool. One idea is to use beautiful Soup to extract links from each page and then navigate through them.

Summarize

This article studied Python threads and illustrated the best practices for using queues to reduce complexity and reduce subtle errors and improve code readability. Although this basic pattern is simpler, you can use this pattern to solve a wide variety of problems by connecting queues and thread pools together. In the final section, you begin to look at how to create a more complex processing pipeline that can be used as a model for future projects. The Resources section provides excellent references for general concurrency and threading.

Finally, it is important to note that threads do not solve all the problems, and in many cases the process may be more appropriate. In particular, standard library subprocess modules may be easier to use when you only need to create many child processes and listen to responses. Please refer to the Resources section for more official documentation.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.