Introduction
For Python, there is no shortage of concurrency options, and its standard library includes support for threads, processes, and asynchronous I/O. In many cases, Python simplifies the use of various concurrency methods by creating high-level modules such as async, threading, and sub-processes. In addition to the standard libraries, there are third-party solutions, such as Twisted, stackless, and process modules. This article focuses on threading using Python, and uses some practical examples to illustrate. Although there are a number of good online resources detailing the threading API, this article attempts to provide some practical examples to illustrate some common thread usage patterns.
Global Interpretor Lock indicates that the Python interpreter is not thread-safe. The current thread must hold a global lock to secure access to the Python object. Because only one thread can get the Python object/C API, the interpreter releases and re-obtains the lock regularly, with each 100-byte instruction. The frequency with which the interpreter checks for thread switching can be
sys.setcheckinterval()function to control. In addition, locks are released and re-acquired based on potential blocking I/O operations. For more detailed information, see the Resources section in the
Gil and threading StateAnd
threading the Global interpreter Lock。 It is necessary to note that, because GIL,CPU-restricted applications will not benefit from the use of threads. When using Python, it is recommended to use a process, or mix to create processes and threads.
It is important to first understand the difference between the process and the thread. the difference between threads and processes is that they share state, memory, and resources . For threads, this simple distinction is both its advantage and its drawbacks. On the one hand, threads are lightweight and easy to communicate with each other, but on the other hand, they also bring a variety of issues including deadlocks, race conditions, and high complexity. Fortunately, because of the GIL and queue modules, the complexity of using the Python language thread is much less complex than using other languages.
Using a Python thread
To continue learning about this article, I assume you've installed Python 2.5 or later, because many of the examples in this article will use the new features of the Python language, which appear only after Python2.5. To start using the Python language thread, we'll start with a simple "Hello World" Example:
#!/usr/bin/env python#Coding=utf-8ImportThreadingImportdatetimeclassThreadClass (Threading. Thread):defRun (self): now=Datetime.datetime.now ()Print "%s says Hello World at time:%s"%(Self.getname (), now) forIinchRange (2): T=ThreadClass () T.start ( )
Results:
Thread-1 says Hello World at time:2012-06-20 14:43:26.981173Thread– 2 says Hello World at time:2012-06-20 14:43: 26.981375
Looking closely at the output, you can see that the Hello World statement is output from two threads and has a date stamp. If you parse the actual code, you will find that it contains two import statements, one that imports a datetime module, and another that imports a thread module. Class ThreadClass inherits from threading.Thread , and because of this, you need to define a run method that executes the code that you want to run in that thread. The only thing to note in this run method is that it self.getName() is a method to determine the name of the thread.
The last three lines of code actually call the class and start the thread. If you notice, you will find that the actual boot thread is t.start() . The threading module is designed with inheritance in mind, and the threading module is actually built on the underlying threading module. For most cases, threading.Thread inheritance is a best practice because it creates a general API for threading programming.
Using Thread queues
as mentioned earlier, when multiple threads need to share data or resources, the use of threads can become complex. The threading module provides a number of synchronization primitives, including semaphores, condition variables, events, and locks. When these options exist, the best practice is to focus instead on using queues. By comparison, queues are easier to handle and can make threading more secure because they effectively deliver all access to resources for a single thread and support clearer, more readable design patterns , such as "URL get threading "
#!/usr/bin/env python#Coding=utf-8ImportUrllib2Import TimeImportQueueImportthreadinghosts= ["http://yahoo.com","http://baidu.com","http://amazon.com","http://ibm.com","http://apple.com"]queue=Queue.queue ()classThreadurl (Threading. Thread):" "theaded URL Grab" " def __init__(self,queue): Threading. Thread.__init__(self) self.queue=QueuedefRun (self):"""docstring for Run""" whileTrue:#grabs host from QueueHost =Self.queue.get ()#grabs URLs of hosts and prints first 1024x768 bytes of pageURL =Urllib2.urlopen (host)PrintUrl.read (1024) #signals to queue job was doneSelf.queue.task_done () Start=time.time ()defMain ():"""docstring for main""" #spawn A poll of threads, and pass them queue instance forIinchRange (5): T=threadurl (queue) T.setdaemon (True) T.start ()#populate queue with data forHostinchHosts:queue.put (host)#wait on the queue until everything have been processedQueue.join () main ()Print "Elapsed Time:%s"% (Time.time ()-start)
For this example, there is more code to explain, but it is not much more complicated than the first threading example, because the queue module is used. This pattern is a common and recommended way to use threads in Python. The specific work steps are described below:
1. Create an instance of Queue.queue () and populate it with data. 2. Pass the instance of the populated data to the thread class, which is inherited by the threading. Thread is created in the way. 3. Generate the daemon thread pool. 4. Remove an item from the queue each time, and use the data in the thread and the Run method to perform the work accordingly. 5. After completing this work, use the Queue.task_done () function to send a signal to the queue that the task has completed. 6. Performing a join operation on a queue actually means waiting until the queue is empty before exiting the main program.
One thing to note when using this pattern is that by setting the daemon to True, the main thread or the program can exit only if the daemon is active. This approach creates an easy way to control the process, because before exiting, you can perform a join operation on the queue, or wait until the queue is empty. The Queue module documentation details the actual process, see resources:
join()
remains blocked until all items in the queue have been processed. When you add a project to the queue, the total number of outstanding tasks increases. When a consumer thread calls Task_done () to indicate that the project has been retrieved and all the work has been done, the total number of unfinished tasks is reduced. When the total number of unfinished tasks is reduced to 0 o'clock, the
join() blocking state is ended.
Using multiple queues
Because the pattern described above is very effective, it is fairly straightforward to extend it by connecting additional thread pools and queues. In the example above, you just output the beginning of the Web page. The next example returns the full Web page fetched by each thread, and then places the result in another queue. Then, set up another thread pool that is joined to the second queue, and then perform the appropriate processing on the Web page. The work done in this example involves parsing a Web page using a third-party Python module called Beautiful Soup. With this module, you can extract the title tag of each page you visit and print output, such as the " Multi-queue data mining site " example, with just two lines of code:
#!/usr/bin/env python#Coding:utf-8ImportQueueImportThreadingImportUrllib2Import Time fromBeautifulSoupImportbeautifulsouphosts= ["http://yahoo.com","http://baidu.com","http://amazon.com","http://ibm.com","http://apple.com"]queue=queue.queue () out_queue=Queue.queue ()classThreadurl (Threading. Thread):" "threaded URL Grab" " def __init__(self,queue,out_queue): Threading. Thread.__init__(self) self.queue=Queue Self.out_queue=Out_queuedefRun (self):"""grabs host from Queue"""Host=Self.queue.get ()#grabs URLs of hosts and then grabs chunk of webpageURL =Urllib2.urlopen (host) Chunk=Url.read ()#Place Chunk into Out_queuetself.out_queue.put (Chunk)#signals to queue job was doneSelf.queue.task_done ()classDataminethread (Threading. Thread):" "Thread Url Grab" " def __init__(self, out_queue): Threading. Thread.__init__(self) self.out_queue=Out_queuedefRun (self):"""grabs host from queue"""Chunk=Self.out_queue.get ()#Parse the chunkSoup =BeautifulSoup (Chunk)PrintSoup.findall (['title']) #signals to queue job was doneSelf.out_queue.task_done () Start=time.time ()defMain ():#spawn a pool of threads, and pass them queue instance forIinchRange (5): T=Threadurl (queue,out_queue) T.setdaemon (True) T.start ()#populate queue with data forHostinchHosts:queue.put (host) forIinchRange (5): DT=Dataminethread (out_queue) Dt.setdaemon (True) Dt.start ()#wait on the queue until everything have been processedQueue.join () Out_queue.join () main ()Print "Elapsed Time:%s"% (Time.time ()-start)
When you analyze this code, you can see that we have added another queue instance and then passed that queue to the first thread pool class ThreadURL . Next, for another thread pool class DatamineThread , the exact same structure is almost duplicated. In this class's Run method, you get a Web page, a block of text from each thread in the queue, and then use Beautiful Soup to process the block of text. In this example, Beautiful Soup is used to extract the title tag for each page and print it out. It is easy to generalize this example to some of the more valuable scenarios because you have mastered the core content of the basic search engine or data mining tool. One idea is to use Beautiful Soup to extract links from each page and then follow them to navigate.
Summarize
This article explores the threads of Python and illustrates the best practices for using queues to reduce complexity and reduce subtle errors and improve code readability. Although this basic pattern is relatively simple, you can use this pattern to solve a wide variety of problems by connecting the queue to the thread pool. In the final section, you begin to explore how to create a more complex processing pipeline that can be used as a model for future projects. The Resources section provides a number of excellent references for general concurrency and threading.
Finally, it is important to point out that threads do not solve all problems, and in many cases, the use of processes may be more appropriate. In particular, the standard library subprocess module may be easier to use when you only need to create many child processes and listen for responses. For more official documentation, please refer to the Resources section.
Article Address: http://www.ibm.com/developerworks/cn/aix/library/au-threadingpython/
Beginner python thread (GO)