In order to solve the problem of blocking (such as I/O), we need to design the program concurrently.
This article will create some simple but efficient threading usage patterns by combining threads and queues to easily complete threading programming in Python.
One, using threads
First look at a few examples of threads, there is no blocking, very simple:
import threading import datetime class MyThread (Threading. Thread): def run: now = Datetime.datetime.now () print ( " {} says Hello World at time: {} " Span style= "color: #000000;" >.format (Self.getname (), now)) for i in< /span> Range (2 = MyThread () T.start ()
Code Interpretation
1. All two threads output the Hello World statement with a date stamp.
2. Two import statements: One imports the DateTime module and the other imports the threading module.
3. Class My Thread inherits from threading.Thread , and because of this, you need to define a run method that executes the code that you want to run in that thread.
4. In the Run method, self.getName() a method is used to determine the name of the thread.
5. The last three lines of code actually call the class and start the thread. If you notice, you will find that the actual boot thread is t.start() .
Second, use thread queue
As mentioned earlier, when multiple threads need to share data or resources, the use of threads can become complex. The threading module provides a number of synchronization primitives, including semaphores, condition variables, events, and locks. When these options exist, the best practice is to focus instead on using queues. By comparison, queues are easier to handle and can make threading programming more secure because they effectively deliver all access to resources for a single thread and support clearer, more readable design patterns.
In the next example, we aim to get the URL of the site and display the first 300 bytes of the page.
First look at the serial mode or execute the implementation code in turn:
fromUrllibImportRequestImporttimehosts= ["http://yahoo.com","http://amazon.com","http://ibm.com","http://apple.com"]start=time.time ()#grabs URLs of hosts and prints first 1024x768 bytes of page forHostinchHosts:url=Request.urlopen (host)Print(Url.read (200))Print("Elapsed Time:%s"% (Time.time ()-start))
Code Interpretation
1. The urllib module reduces the complexity of getting a Web page. Two times Time.time () is used to calculate the program run time.
2. The execution speed of this program is 13.7 seconds, the result is not too good, not too bad.
3. However, if you need to retrieve hundreds of Web pages, the total time will take about 1000 seconds, depending on the average. What if you need to retrieve more pages?
The following gives the threaded version:
ImportQueueImportThreading fromUrllibImportRequestImporttimehosts= ["http://yahoo.com","http://amazon.com","http://ibm.com","http://apple.com"]in_queue=queue. Queue ()classThreadurl (Threading. Thread):"""threaded URL Grab""" def __init__(self, in_queue): Threading. Thread.__init__(self) self.in_queue=In_queuedefRun (self): whileTrue:#grabs host from queueHost =Self.in_queue.get ()#grabs URLs of hosts and prints first 1024x768 bytes of pageURL =Request.urlopen (host)Print(Url.read (200)) #signals to queue job was doneSelf.in_queue.task_done () Start=time.time ()defMain ():#spawn a pool of threads, and pass them queue instance forIinchRange (4): T=Threadurl (in_queue) T.setdaemon (True) T.start ()#populate queue with data forHostinchHosts:in_queue.put (host)#wait on the queue until everything have been processedIn_queue.join () main ()Print("Elapsed Time:%s"% (Time.time ()-start))
Code Interpretation
1. Compared to the first threading example, it is not much more complicated because the queue module is used.
2. Create an queue.Queue() instance and populate it with data.
3. Pass the instance of the populated data to the thread class, which is created by inheritance threading.Thread .
4. Generate the daemon thread pool.
5. Remove an item from the queue each time, and use the data in the thread and the Run method to perform the work accordingly.
6. After completing this work, use the queue.task_done() function to send a signal to the queue that the task has completed.
7. Performing a join operation on a queue actually means waiting until the queue is empty before exiting the main program.
One thing to note when using this pattern is that by setting the daemon to True, the main thread or the program can exit only if the daemon is active. This approach creates an easy way to control the process, because before exiting, you can perform a join operation on the queue, or wait until the queue is empty.
join()remains blocked until all items in the queue have been processed. When you add a project to the queue, the total number of outstanding tasks increases. When a consumer thread calls Task_done () to indicate that the project has been retrieved and all the work has been done, the total number of unfinished tasks is reduced. When the total number of unfinished tasks is reduced to 0 o'clock, the join() blocking state is ended.
Third, use multiple queues
The next example has two queues. One of the queues gets the full Web page for each thread, and then places the results in the second queue. Then, set up another thread pool that is joined to the second queue, and then perform the appropriate processing on the Web page.
Extracts the title tag of each page that is visited and prints it out.
ImportQueueImportThreading fromUrllibImportRequestImport Time fromBs4Importbeautifulsouphosts= ["http://yahoo.com","http://amazon.com","http://ibm.com","http://apple.com"]in_queue=queue. Queue () Out_queue=queue. Queue ()classThreadurl (Threading. Thread):"""threaded URL Grab""" def __init__(self, In_queue, Out_queue): Threading. Thread.__init__(self) self.in_queue=In_queue Self.out_queue=Out_queuedefRun (self): whileTrue:#grabs host from queueHost =Self.in_queue.get ()#grabs URLs of hosts and then grabs chunk of webpageURL =Request.urlopen (host) Chunk=Url.read ()#Place chunk to out queueself.out_queue.put (Chunk)#signals to queue job was doneSelf.in_queue.task_done ()classDataminethread (Threading. Thread):"""threaded URL Grab""" def __init__(self, out_queue): Threading. Thread.__init__(self) self.out_queue=Out_queuedefRun (self): whileTrue:#grabs host from queueChunk =Self.out_queue.get ()#Parse the chunkSoup =BeautifulSoup (Chunk)Print(Soup.findall (['title'])) #signals to queue job was doneSelf.out_queue.task_done () Start=time.time ()defMain ():#spawn a pool of threads, and pass them queue instance forIinchRange (4): T=Threadurl (In_queue, Out_queue) T.setdaemon (True) T.start ()#populate queue with data forHostinchHosts:in_queue.put (host) forIinchRange (4): DT=Dataminethread (out_queue) Dt.setdaemon (True) Dt.start ()#wait on the queue until everything have been processedIn_queue.join () Out_queue.join () main ()Print("Elapsed Time:%s"% (Time.time ()-start))
Code Interpretation
1. We added another queue instance and then passed the queue to the first thread pool class ThreadURL .
2. For another thread pool class DatamineThread , the exact same structure is almost duplicated.
3. In the Run method of this class, fetch the Web page, block of text from each thread in the queue, and then use Beautiful Soup to process the block of text.
4. Use Beautiful Soup to extract the title tag for each page and print it out.
5. It is easy to generalize this example to some of the more valuable scenarios because you have mastered the core content of the basic search engine or data mining tool.
6. One idea is to use Beautiful Soup to extract links from each page and then follow them to navigate.
Usage patterns for Python threads