Tag: Python queue thread
I learned Python during my internship. I haven't been writing a 1000-line script for less than two months now, but the multithreading efficiency is too low. If this article is incorrect, please correct me.
Here is an article: http://www.ibm.com/developerworks/cn/aix/library/au-threadingpython? Ca = absorption and notes after the drs-tp3008, the original information is very much, it is recommended to the original.
Introduction: by combining threads and queues, you can create some simple and effective modes to solve the problem of concurrent processing.
The effect is like waiting for meals in the school canteen. The big mom's spoon won't rest.
1. Simple multi-thread processing:
View Code
1 import threading
2 import datetime
3
4 class ThreadClass(threading.Thread):
5 def run(self):
6 now = datetime.datetime.now()
7 print "%s says Hello World at time: %s" %
8 (self.getName(), now)
9
10 for i in range(2):
11 t = ThreadClass()
12 t.start()
The seventh line of code is wrong. line feed should not be entered. It is pasted in the original text.
Defines a class that inherits from threading. Thread and re-defines the run () method, that is, the things to be processed after the Thread is started, and then starts two threads to know that the task is processed completely.
2. Taking processing a large number of URLs as an example, this time I put the entire example, the original code is a bit messy, and I made a little bit of work:
I'm sorry, but I'm not doing anything.
Note: to process a URL queue, two classes inherited from threading. Thread are defined, and the _ init _ () and run () methods are redefined.
ThreadURL: This Thread class is used to extract a URL from the queue to generate a BS package and save the processing result to out_queue.
ThreadPrint: used to process the result that ThreadURL is saved to out_queue. The principle is the same as ThreadURL.
First function: GetBSFromURL (), used to generate the BeautifulSoup package from the URL
Process:
Define two queues. The queue is used to save the task to be processed, the out_queue is used to save the processing result, and the processing result must also be processed by the queue thread.
Start five threads ThreadURL, initialize, generate a daemon thread pool, and start the thread. When each thread completes processing, send a processing completion signal to the queue through the task_done () of the queue, you can process the next task, so that the URL in the queue can be processed in turn.
Start filling the list of tasks to be processed into the queue
Start m threads ThreadPrint to process the result. The method is the same as above.
Queue. join ()
Out_queue.join ()
Wait for the queue to be empty. All tasks are processed and the program exits.
View Code
1 # -*- coding:utf-8 -*-
2 '''
3 Created on Oct 8, 2011
4
5 @author: Lannik Cooper
6 '''
7 import urllib2
8 from BeautifulSoup import BeautifulSoup
9 import json
10 import time
11 import threading
12 from mx import DateTime
13 import Queue
14
15 def GetBSFromURL(url):
16 soup=None
17 while True:
18 try:
19 f=urllib2.urlopen(url)
20 if f!=None:
21 soup=BeautifulSoup(f)
22 f.close()
23 if soup!=None:
24 break
25 pass
26 except:
27 time.sleep(3)
28 pass
29 pass
30 return soup
31
32
33 class ThreadURL(threading.Thread):
34 def __init__(self,queue,out_queue):
35 threading.Thread.__init__(self)
36 self.queue=queue
37 self.out_queue=out_queue
38 pass
39
40 def run(self):
41 while True:
42 host=self.queue.get()
43 soup=GetBSFromURL(host)
44 self.out_queue.put(soup)
45 self.queue.task_done()
46 pass
47 pass
48
49 class ThreadPrint(threading.Thread):
50 def __init__(self,out_queue):
51 threading.Thread.__init__(self)
52 self.out_queue=out_queue
53 pass
54 def run(self):
55 while True:
56 soup=self.out_queue.get()
57 print soup.title.text
58 self.out_queue.task_done()
59 pass
60 pass
61 pass
62
63 def main():
64 hosts=["http://google.com.hk", "http://t.qq.com", "http://amazon.cn","http://www.baidu.com", "http://t.sina.com"]
65 start=time.time()
66 queue=Queue.Queue()
67 out_queue=Queue.Queue()
68 for i in range(5):
69 t=ThreadURL(queue,out_queue)
70 t.setDaemon(True)
71 t.start()
72 pass
73 for host in hosts:
74 queue.put(host)
75 pass
76 for i in range(1):
77 t=ThreadPrint(out_queue)
78 t.setDaemon(True)
79 t.start()
80 pass
81 queue.join()
82 out_queue.join()
83 print "Elapsed Time: %s" % (time.time() - start)
84 pass
85
86 main()
In this way, the thread is combined with the queue for multi-thread processing.
Problem:
Due to the short reading time, the articles mentioned in the original article have not yet been read, so many things are still unclear. For example, the role of Thread setDaemon () is not very clear.
Add another article after obtaining some information.
At present, it is the hot time for campus recruitment. The time is very difficult and GDT should be used well.
To be continued...